środa, 26 kwietnia 2023

OpenTelemetry - what is it & why should you care?

Per project's about page (https://opentelemetry.io/about/):


OpenTelemetry provides the libraries, agents, and other components that you need to capture telemetry from your services so that you can better observe, manage, and debug them

But what does that actually mean?


Let's first take a look at the domain - telemetry. It's composed of logs, metrics and traces. First two have been present in (enterprise) Java for a long time. The third component, traces, started to gain popularity in recent years with the rise of distributed architectures.

In case of monolithic architectures, logs usually contain all necessary data, ie complete business operation. In microservices worlds it's not that simple. Engineers need to have a way to somehow correlate execution happening in multiple places, often in an asynchronous manner (messaging systems anyone?). This is where distributed tracing comes into play. Each instrumented (more on this later) component involved in processing of a business operation generates a single data point (aka span). Instrumentation also makes sure that all spans of a business operation (called trace) are correlated (with traceId) and are in a chronological order

Why is OpenTelemetry so important?


Multiple companies have been using a kind of telemetry (sometimes even distributed traces!) for a long time now. Why should they switch to OpenTelemetry now? The answer is simple - standardisation across industry leaders. I've seen multiple in-house attempts to telemetry, usually as a way to help developers hunt and fix issues - a process that is crucial for revenue streams ;-) Such solutions are often very limited due to custom protocols, lack of resources to keep them in a good shape (hackathon / side project) or limited scope / functionality. Moreover, switching telemetry backend, used to visualise and explore telemetry, is almost impossible. That's why industry leaders (Splunk, NewRelic, Microsoft, Amazon - and others) have decided to join forces and create a standard. Everything open sourced, under Cloud Native Computing Foundation (CNCF, https://www.cncf.io/). 

OK, I get the idea, but WHY should I use it?


Here are some compelling reasons:

Debugging made easier

When an issue arises in a distributed system, it can be challenging to pinpoint the root cause. With distributed tracing, you can get a holistic view of the entire request flow across different services and identify the problematic component quickly. You can trace the request path and see the exact timing, order and success/error state of each operation across different services. This can significantly reduce the time spent on debugging and resolving issues.

Performance optimization

Telemetry allows you to capture performance metrics and analyze them to optimize your system's performance. Using traces, you can identify the slowest components in the request path and optimize them to reduce latency .

Understanding system behavior

Tracing provides insights into how your distributed system behaves in production. You can observe the actual request flow, identify any bottlenecks or anomalies, and gain a better understanding of how different components interact with each other. This can help you make informed decisions about system design, resource allocation, and capacity planning. Not an easy thing in a complex distributed architecture!

System monitoring

Telemetry is a valuable tool for monitoring the health and performance of your Java applications in production. You can set up alerts and notifications based on trace data to proactively detect and resolve any issues before they impact your users. Tracing data can also be used for generating reports, dashboards, and visualizations to gain real-time insights into the state of your system.

Vendor-agnostic instrumentation

OpenTelemetry is supported by many backend vendors, including Splunk, Datadog, New Relic, and others. This means that you can easily switch between different backends for visualizing and analyzing your telemetry data without having to change your instrumentation code. This flexibility allows you to choose the backend that best fits your requirements and budget, without being locked into a specific vendor.

OK, fine, but HOW can I use it?


OpenTelemetry project provides a bunch of different components, but since this blog is mainly about Java, let's focus on.. Java ;-) In the first paragraph an "instrumentation" was mentioned. It is, in short, a way to add tracing capabilities to your Java applications without much hassle. OpenTelemetry provides libraries and an JVM agent that can be easily integrated into any codebase to automatically capture and propagate trace information.
Adding OTEL to existing deployment is as easy as adding JVM agent and few configuration properties. Latest quick start guide is always available here: https://github.com/open-telemetry/opentelemetry-java-instrumentation#getting-started

Perfect! Now WHAT can I expect as a telemetry data?


In a microservices architecture, a single business operation often spans multiple services. A typical trace would contain multiple spans, each representing a different operation in the overall business process. Each span would include a unique identifier (the spanId), a reference to its parent span (the parentSpanId) and trace identifier (traceId). This allows all the spans to be correlated and reconstructed into the full trace.

Here's an example trace with three spans:

spans: - Span: name: "Frontend service" spanId: 0123456789abcdef traceId: 098k0k0 parentSpanId: null - Span: name: "Backend service" spanId: 23456789abcdef01 traceId: 098k0k0 parentSpanId: 0123456789abcdef - Span: name: "Database service" spanId: 3456789abcdef012 traceId: 098k0k0 parentSpanId: 23456789abcdef01


Postface

I hope that all of above convinced you, my dear reader, to go and try OpenTelemetry. Please bear in mind that the project delivers much, much more that I have described here - ranging from a great number of automatically instrumented libraries to passing seamless context between various provided services (like AWS SQS for example ;-) ).

piątek, 24 marca 2023

Eliminate dreadful NPEs with NullAway

What is NullAway

Static code analysis has always been a little controversial in the Java development community. As with almost each and every tool, it can be used for good (to improve code quality) or bad (infuriate engineers with picky rules). 

Null Away (https://github.com/uber/NullAway) is however different. There are no rules to configure, no code styles to discuss. Just one plugin that will help eliminate NPEs in your code by reviewing all code execution paths.


Configuration


For Gradle

https://github.com/uber/NullAway#gradle


For Maven: 



<plugin>

        <groupId>org.apache.maven.plugins</groupId>

        <artifactId>maven-compiler-plugin</artifactId>

        <configuration>

          <source>${maven.compiler.source}</source>

          <target>${maven.compiler.target}</target>

          <fork>true</fork>

          <compilerArgs>

            <arg>-XDcompilePolicy=simple</arg>

            <arg>-Xplugin:ErrorProne -XepAllErrorsAsWarnings -XepExcludedPaths:.*/src/test/java/.* -Xep:NullAway:ERROR -XepOpt:NullAway:TreatGeneratedAsUnannotated=true -XepOpt:NullAway:ExcludedClasses=sf.cloudmetricsyncer.CloudProvider -XepOpt:NullAway:AnnotatedPackages=sf.cloudmetricsyncer,sf.externalmonitor</arg>

            <arg>-J--add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED</arg>

            <arg>-J--add-exports=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED</arg>

            <arg>-J--add-exports=jdk.compiler/com.sun.tools.javac.main=ALL-UNNAMED</arg>

            <arg>-J--add-exports=jdk.compiler/com.sun.tools.javac.model=ALL-UNNAMED</arg>

            <arg>-J--add-exports=jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED</arg>

            <arg>-J--add-exports=jdk.compiler/com.sun.tools.javac.processing=ALL-UNNAMED</arg>

            <arg>-J--add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED</arg>

            <arg>-J--add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED</arg>

            <arg>-J--add-opens=jdk.compiler/com.sun.tools.javac.code=ALL-UNNAMED</arg>

            <arg>-J--add-opens=jdk.compiler/com.sun.tools.javac.comp=ALL-UNNAMED</arg>

          </compilerArgs>

          <annotationProcessorPaths>

            <path>

              <groupId>com.google.errorprone</groupId>

              <artifactId>error_prone_core</artifactId>

              <version>2.16</version>

            </path>

            <path>

              <groupId>com.uber.nullaway</groupId>

              <artifactId>nullaway</artifactId>

              <version>0.10.5</version>

            </path>

          </annotationProcessorPaths>

        </configuration>

      </plugin>


 

How to


For a new project it’s easy - just enable the plugin and fix the build each time it is required. For old code, especially with a large codebase, it might be trickier. NullAway plugin is not reporting all of the violations on the first run - only a subset, making the whole iterative process longer and more painful. 


Here’s a proposed approach I’ve successfully applied to a number of projects:


  1. Start with service boundaries - API, DB data model. Review which fields definitely need to be nullable. If possible implement appropriate validation to reject illegal NULLs or apply defaults.

  2. If you have a good modularization of the code - split work by sub-package, using configuration options “-XepOpt:NullAway:AnnotatedPackages=XYZ” and “-XepOpt:NullAway:UnannotatedSubPackages=ABC”

  3. Run the build and review  “returning @Nullable expression from method with @NonNull return type” messages. Consider if you really need to return NULLs in such cases. Perhaps a default value would be better.

  4. Review “initializer method does not guarantee @NonNull field XXX is initialized along all control-flow paths (remember to check for exceptions or early returns).” messages. Perhaps an instance of the class is initialized in a different way (guaranteed to be called after construction but before the first use)? If so, add a pre-configured (plugin configuration) initialization annotation to the method. This is also a great moment to think about making the data model (or parts of the model) immutable without any NULLs.

  5. Review messages pertaining the class hierarchy (super / subclass) 

    1. parameter X is @NonNull, but parameter in superclass method ABC#abc) is @Nullable"

    2. method returns @Nullable, but superclass method ABC#abc returns @NonNull

  6. Review “passing @Nullable parameter 'null' where @NonNull is required” - do you need to pass an explicit null? Perhaps a default (empty) value would be better?

  7. Review “passing @Nullable parameter XXX where @NonNull is required”. This often means that a data model class properties are nullable. Perhaps the model entity can be turned into immutable one, getting rid of NULLs there entirely?




Tips


Initialisation annotation

In some cases instance is initialized before the first use in some other way than via a constructor. This may happen in a typical framework implementing some kind of an instance lifecycle. In order to notify NullAway that fields initialization will happen in such method, one can use the initialization annotation:  https://github.com/uber/NullAway/wiki/Supported-Annotations#initialization 


NullSafeMap

Map get() is treated as inherently unsafe (nullable). In many cases however a programmer knows that map will contain mapping for a particular key. In order to use NullAway efficiently in such cases it’s a good idea to have an utility (eg NullSafeMap) with static nullSafeGet(Map, key) method annotated with @SuppressWarnings("NullAway") 


Use standards 

Out of possible annotations, I recommend: javax.annotation.Nullable. Keep it… standardized ;-)


Local variables

It’s not possible to suppress a check for a local variable - in such cases it’s best to extract a separate method (one liner) out of expression where the variable is used and annotate the method.


Generated code

One can easily ignore generated code (since they can’t do much about it) using: -XepOpt:NullAway:TreatGeneratedAsUnannotated=true 


Last resort - ignoring particular classes

Exclude some classes (only if really needed) using:  -XepOpt:NullAway:ExcludedClasses=XYZ



Now go out there, have fun coding and hunt down some nasty NullPointerExceptions! ;-)