Configuring Zipkin Tracing with Spring Boot

11 minute read Published:

Demonstrates how to trace a Spring Boot application that includes multiple hop services.
Table of Contents

Tracing Motivation

The more distributed a system the harder it is to debug errors, find latency, and to understand the potentially cascading impact of a bug. External monitoring only tells you the overall response time and number of invocations but doesn’t give us a way to understand internal invocations. Structured log entries can be correllated, eventually, to paint a picture of the movement of a request through a system, but structured logs are not easy to query.

Trace Primer

Distributed tracing platforms like Open Zipkin record trace data. Trace data is composed of a parent:child tree structure called a Directed Acyclic Graph (DAG for short). A root node represents the trace or overall journey, and each span represents an individual hop along the service route. To illustrate better, I have included an ASCII diagram from openzipkin github.

``` Client Tracer Server Tracer ┌─────────────────—–─┐ ┌────────────────—–──┐ │ │ │ │ │ TraceContext │ Http Request Headers │ TraceContext │ │ ┌─────────────—–─┐ │ ┌───────────────────┐ │ ┌────────────—–──┐ │ │ │ TraceId │ │ │ X─B3─TraceId │ │ │ TraceId │ │ │ │ │ │ │ │ │ │ │ │ │ │ ParentSpanId │ │ Extract │ X─B3─ParentSpanId │ Inject │ │ ParentSpanId │ │ │ │ ├─┼─────────>│ ├────────┼>│ │ │ │ │ SpanId │ │ │ X─B3─SpanId │ │ │ SpanId │ │ │ │ │ │ │ │ │ │ │ │ │ │ Sampling decision │ │ │ X─B3─Sampled │ │ │ Sampling decision │ │ │ └──────────—–────┘ │ └───────────────────┘ │ └────────────—–──┘ │ │ │ │ │ └────────────────—–──┘ └───────────────—–───┘ ``` An upstream HTTP call with B3Propagation. At this time of writing, B3 propagation is supported for Finagle, HTTP and gRPC. We will utilize both methods in this example.

For further visualization of what a span and trace look like in a system wired with Zipkin.

zipkin-tracing

Zipkin itself has a design based on the Google Dapper[http://pub.google.com/papers/dapper.pdf] paper. As the homepage says, a key advantage was the need to implement a non-invasive library. The minimal JRE requirements are JDK 6 for core modules, additional requirements are up-stream (eg. storage) Check out the Zipkin git to gather additional requirements for your project.

Setup Zipkin Server

Configuring Zipkin is easy using environment variables. In this section we will take a look at getting the Zipkin server up and running with ingress from HTTP, Kafka and RabbitMQ.

In order to match the transport coverage given in later application sections, we will briefly explore setting up Zipkin server in it’s most basic form for three different ingress configurations.

HTTP Collector

There are a few ways to start Zipkin. To start a basic configuration of Zipkin with HTTP ingress, simply execute the following in an empty console.

startzipkin.sh.

$ java -jar /path/zipkin-server.jar

Enable Kafka Collector option

To enable the Kafka Collector, you will need to have a Kafka sevice running. For information on standing up a Kafka server, see the quickstart section. Running the zipkin server is done with the following method:

$ KAFKA_ZOOKEEPER=127.0.0.1:2181
$ java -jar /path/kafka-server.jar

Enable RabbitMQ Collector

To start up Zipkin server with RabbitMQ Collectors active:

startZipkinRabbit.sh.

$ export RABBIT_URI=amqp://localhost:5672/
$ java -jar /path/zipkin-server.jar

Standing up the traced Application

In this example, we will declare profiles for the back-ends (kafka, rabbit, zipkin/http) as the ingress for our trace reporting. We will then take on an application which exposes multiple services with HTTP and gRPC.

Furthermore, we will organize the application into four modules (trace-configuration, http-server, grpc-server, grpc-client) in order to keep concerns separate.

Generate a new Spring application with the dependencies for Web and Lombok. Use this link to generate a new project from the Spring Initializr (http://start.spring.io). We’ll stand up a HTTP/REST API and it’s endpoints. One endpoint will call the other in the order [frontend→backend→delay]. We’ll use the Spring Framework RestTemplate to make HTTP calls.

RestController.java.

@Profile("http-web")
@RestController
public class TracingRestController {
    private final RestTemplate restTemplate;
    private final org.slf4j.Logger log = org.slf4j.LoggerFactory.getLogger(TracingRestController.class);

    public TracingRestController(RestTemplate restTemplate) {
        this.restTemplate = restTemplate;
    }

    @GetMapping("/backend")
    public String backend(@RequestHeader("client-id") String clientId) {
        restTemplate.getForObject("http://localhost:8080/delay", String.class);
        return "Greetings " + clientId;
    }

    @GetMapping("/delay")
    public void addDelay() throws Exception {
        long delay = ThreadLocalRandom.current().nextInt(1, 5);
        TimeUnit.SECONDS.sleep(delay);
        return;
    }

    @GetMapping("/frontend")
    public String frontend() {
        return restTemplate
                .getForObject("http://localhost:8080/backend", String.class);
    }
}

The application bootstrap with @SpringBootApplication annotation makes launching the servers super simple.

TracingApplication.java.

@SpringBootApplication(scanBasePackages = {"mcp"})
public class TracingApplication {
    public static void main(String[] args) {
        SpringApplication.run(TracingApplication.class, args);
    }
}

Configure the logger and give this node a name.

application.properties.

logging.pattern.level=%d{ABSOLUTE} [%X{traceId}/%X{spanId}] %-5p [%t] %C{2} - %m%n
logging.level.root=info
logging.level.mcp.cloudtrace=info

spring.zipkin.service.name=http-service
spring.application.name=spring-tracing-http

Configure the Tracing Bean

To start tracing, we need to configure a brave.Tracing bean into our application context. This will provide application-specific (this example’s) trace functionality within the zipkin trace instrumentation API. It serves as the server-specific configuration bean for our running node.

TracingConfiguration.java.

@Configuration
public class TracingConfiguration {
    @Bean
    Tracing tracing(@Value("${spring.application.name:spring-tracing}") String serviceName,
                    Reporter<Span> spanReporter) {
        return Tracing
                .newBuilder()
                .sampler(Sampler.ALWAYS_SAMPLE)
                .localServiceName(serviceName)
                .propagationFactory(ExtraFieldPropagation
                        .newFactory(B3Propagation.FACTORY, "client-id"))
                .currentTraceContext(MDCCurrentTraceContext.create())
                .spanReporter(spanReporter)
                .build();
    }

    @Bean
    HttpTracing httpTracing(Tracing tracing) {
        return HttpTracing.create(tracing);
    }
}

Because we are using SLF4j - that implements it’s own version of Managed Diagnostic Context (MDC). Thus, brave.context.slf4j.MDCCurrentTraceContext is a ready-made Trace Context that will expose current trace and span ID’s to SLF4j as logging properties with the given names: traceId, spanId, parentId. If you are using log4j2 instead, then a provided class brave.context.log4j2.ThreadContextCurrentTraceContext will do the same for log4j2’s ThreadContext.

Configure Trace Reporting (sending)

Spans are created in instrumentation, transported out-of-band, and eventually persisted. Zipkin uses Reporters zipkin2.reporter.Reporter to sends spans (or encoded spans) recorded by instrumentation out of process. There are a couple of default Reporters that do not send but can help with testing: Reporter.NOOP and Reporter.CONSOLE.

Via HTTP

ReportToZipkinConfiguration.java.

@Profile("report-to-zipkin")
@Configuration
class TracingReportToZipkinConfiguration {

    @Bean
    Sender sender(@Value("${mcp.zipkin.url}") String zipkinSenderUrl) {
        return OkHttpSender.create(zipkinSenderUrl);
    }

    @Bean
    Reporter<Span> spanReporter(Sender sender) {
        return AsyncReporter.create(sender);
    }
}

In this case, we have configured an (ThreadSafe)AsyncReporter that will give us protection from latency or exceptions when reporting spans out of process. In order to abstract transport specifics, the zipkin2.reporter.Sender component is used to encode and trasmit spans out of process using HTTP.

Indirect reporting is possible using Kafka and RabbitMQ among other modules. The next two sections takes a close look at setting up our application to report via Kafka Topics, and another via RabbitMQ queues.

Via Kafka Sender

Support for Kafka topics is possible through the use of zipkin2.reporter.kafka11.KafkaSender sender. Create a new configuration class and add it to the kafka profile.

KafkaReportingConfiguration.java.

@Profile("report-to-kafka")
@Configuration
public class TracingReportToKafkaConfiguration {

    @Bean
    Sender sender(@Value("${mcp.kafka.url}") String kafkaUrl) throws IOException {
        return KafkaSender.create(kafkaUrl);
    }

    @Bean
    Reporter<Span> spanReporter(Sender sender) {

        return AsyncReporter.create(sender);
    }

}

Via RabbitMQ Sender

Another common Sender is the zipkin2.reporter.amqp.RabbitMQSender sender. This will ship JSON encoded spans to a Queue.

Setting up the RabbitMQSender requires a host URL, and the name of the queue which Zipkin-server is expected to consume.

RabbitMQReportingConfiguration.java.

@Profile("report-to-rabbit")
@Configuration
public class TracingReportToRabbitConfiguration {
    @Bean
    Sender sender(@Value("${mcp.rabbit.url}") String rabbitmqHostUrl,
                  @Value("${mcp.rabbit.queue}") String zipkinQueue) throws IOException {
        return RabbitMQSender.newBuilder()
                .queue(zipkinQueue)
                .addresses(rabbitmqHostUrl).build();
    }

    @Bean
    Reporter<Span> spanReporter(Sender sender) {
        return AsyncReporter.create(sender);
    }
}

Instrumenting the Web stack

This section will specificly discuss how to enable tracing on your WebMVC and RestTemplate components.

WebMVC

To instrument SpringMVC endpoints, we will need to configure an instance of the brave.spring.webmvc.TracingHandlerInterceptor class. To configure the interceptor, we will need to register a org.springframework.web.servlet.config.annotation.WebMvcConfigurerAdapter that gives us hooks into SpringMVC’s InterceptorRegistry (or alternately use WebMvcConfigurer when using Spring 5.0 or more).

WebMvcConfiguration.

@Configuration
public class WebMVCTracingConfiguration extends WebMvcConfigurerAdapter {
    private final HttpTracing httpTracing;

    public WebMVCTracingConfiguration(HttpTracing httpTracing) {
        this.httpTracing = httpTracing;
    }

    @Override
    public void addInterceptors(InterceptorRegistry registry) {
        registry.addInterceptor(TracingHandlerInterceptor.create(httpTracing));
    }
}

This interceptor receives an HttpTracing bean which gives our Tracing bean the functionality to apply HTTP specific tracing instrumentation to the client/server.

RestTemplate

In order to apply Trace Context propagation to our restTemplate we must provide -like the server setup- an org.springframework.http.client.ClientHttpRequestInterceptor to do the client-side tracing work. We’ll use the RestTemplateBuilder component to construct an zipkin instrumented RestTemplate bean.

TraceClientConfiguration.

@Configuration
class WebClientTracingConfiguration {
    @Bean
    RestTemplate restTemplate(HttpTracing tracing) {
        return new RestTemplateBuilder()
                .additionalInterceptors(TracingClientHttpRequestInterceptor.create(tracing))
                .build();
    }
}

Observing (propagated) trace detials

Lets observe some tracing ativity. With the current setup, we whould get a failry modest span graph (just the 3 hops) upon accessing the endpoint.

Start the server with http-web and report-to-zipkin profiles active so that we can access and then visit the service span graph in the Zipkin console.

start_the_server.

$ mvn spring-boot:run -Dspring.profiles.active=http-web,report-to-console

Now, when we call our endpoint, we should see a traceId, spanId, and our client-id as it would have commuted across the entire request chain.

GET frontend.

$ curl -H "client-id: tracing" http://localhost:8080/frontend
Greetings tracing%

Lets take a look at the span graph. You’ll need to access Zipkin webUI via http://localhost:9411/zipkin/:

http-frontend-span

This is a single request we made to /frontend it will simply call /backend which in turn calls /delay in the same server. However simple this is, the same concept is illustrated - instrumentation across all components in a trace will give us variety of detials (e.g. timing) that our service call takes in it’s trajectory back to the origin.

Instrumenting the gRPC stack

Two new modules will get created grpc-client and grpc-server which will have the standard spring-boot and zipkin dependencies, but most of all include several dependencies related to the gRPC project.

The details of dependency management is outside of the scope of this article. For the actual dependencies, check out [trace-grpc-server/pom.xml] and copy all the lognet and grpc dependencies.

Lognet’s GRPC-starter

To expose gRPC effortlessly, use LogNet’s grpc-spring-boot-starter. This module helps by generating gRPC service stubs during build process, in the generate-sources goal. It also has the spring-boot hooks to make configuring a gRPC service seemless.

To begin, we’ll configure a protobuf .proto service definition so that we can code the server.

greeting.proto.

syntax = "proto3";

option java_multiple_files = true;
package mcp;

message Greeting {
    string hello = 1;
}

message Greet {
    string name = 1;
}

message Empty {

}

service GreetingService {
    rpc greeting(Greet) returns (Greeting);
}

You can generate stubs by simply invoking

stub_maker.sh.

$ mvn generate-sources

GrpcService.java.

@GRpcService
public class GrpcService extends GreetingServiceGrpc.GreetingServiceImplBase {
    private final org.slf4j.Logger log = org.slf4j.LoggerFactory.getLogger(GrpcService.class);
    private final RestTemplate restTemplate;

    public GrpcService(RestTemplate restTemplate) {
        this.restTemplate = restTemplate;
    }
    @Override
    public void greeting(Greet request, StreamObserver<Greeting> responseObserver) {
        log.info("Greetings, " + request.getName());
        responseObserver.onNext(
                Greeting
                        .newBuilder()
                        .setHello("hello " + request.getName())
                        .build());
        restTemplate.getForObject("http://localhost:8080/delay", String.class);
        responseObserver.onCompleted();
    }
}

Apply the org.lognet.GRpcService annotation to mark this bean for service registration at startup.

Instrumenting gRPC Server

To intercept service calls and instrument tracing, wire in a brave.grpc.GrpcTracing bean to obtain an instance of io.grpc.ServerInterceptor. Use the org.lognet.GRpcGlobalInterceptor annotation to mark the interceptor bean as global. This will expose tracing to all GRPC endpoints (visible on the ApplicationContext) in this service.

GrpcServerConfiguration.java.

0@Configuration
public class TracingGrpcServerConfiguration {
    @Bean
    public GrpcTracing grpcTracing(Tracing tracing) {
        return GrpcTracing.create(tracing);
    }

    @Bean
    @GRpcGlobalInterceptor
    public ServerInterceptor grpcServerInterceptor(GrpcTracing grpcTracing) {
        return grpcTracing.newServerInterceptor();
    }
}

The command to run this new service. We will run this application standalone since it shouldnt live in the same VM as the HTTP server. Therefore, in a new terminal:

grpcstart.sh.

~/projects/spring-tracing/trace-grpc-service $ mvn spring-boot:run -Dspring.profiles.active=grpc-web,report-to-zipkin
... logging ...
...

Should this succeed, there will be a grpc server listening on default port 6565.

Instrumenting gRPC Client

There is already a pretty succinct document for tracing gRPC services on the openzipkin grpc sender.

Our grpc client will be used by another project (the web server) to access the gRPC greeting service. This client of course will have it’s own module to separate concerns.

GrpcClient.java.

@Component
public class GreetingClient {
    private final ManagedChannel managedChannel;

    public GreetingClient(ManagedChannel managedChannel) {
        this.managedChannel = managedChannel;
    }

    @PostConstruct
    private void initializeClient() {
        greetingServiceBlockingStub = GreetingServiceGrpc.newBlockingStub(managedChannel);
    }

    private GreetingServiceGrpc.GreetingServiceBlockingStub
            greetingServiceBlockingStub;

    public Greeting greeting(String name) {

        Greet greeting = Greet
                .newBuilder()
                .setName(name)
                .build();

        return greetingServiceBlockingStub.greeting(greeting);
    }
}

We need to change an endpoint in order to make use of the grpc client. Rather than make the new controller, we will need to modify the backend endpoint in our HTTP/REST controller:

TracingGrpcRestController.java.

    @GetMapping("/backend")
    public String backend(HttpServletRequest req) {
        String clientId = Optional
                .ofNullable(req.getHeader("client-id"))
                .orElse("none");

        return greetingClient.greeting(clientId).getHello();
    }

On the client side, we must wire an io.grpc.ManagedChannel with an interceptor from our grpcTracing bean ( as with the server ).

GrpcClientTraceConfiguration.java.

    @Bean
    public ManagedChannel managedChannel(ManagedChannelBuilder channelBuilder) {
        return channelBuilder
                .build();
    }
    @Bean
    public ManagedChannelBuilder managedChannelBuilder(GrpcTracing grpcTracing) {
        return ManagedChannelBuilder.forAddress("localhost", 6565)
                .intercept(grpcTracing.newClientInterceptor())
                .usePlaintext(true);
    }

This completes the configuration for our Grpc service/client.

Note: Dont forget to restart the service so that the new grpc backend is exposed (port 8080).

Now when we test /frontend and as a result of instrumenting all the endpoints. We can observe the span graph when viewing the Zipkin UI:

grpc-spans

The final span graph shows us the route detail to our updated HTTP Rest service.

Afterwords

We can apply this capability to any of our (Java) microservices, but fear not! Zipkin is polyglot by nature and works with many other platforms. If you have pologlot concerns in your service paths, then check out any of the links in the next section. Should we/ require tracability across services that are not bound to JVM, its good to know that a multi-platform solution exists.

Reading List