Distributed tracing in microservices
Distributed tracing is a method for tracking all the operations within a distributed system that have been triggered by a specific request. These include which components were touched, how the data flowed between the components, the dependencies that exist, and any changes that occurred to the systems and services. The information provided by distributed tracing enables end-to-end visibility into the microservices architecture and insights for troubleshooting errors.
How does tracing work?
How is distributed tracing used in microservices? The core concept that enables Distributed Tracing is Context Propagation.
A Context is an object that contains the information for the sending and receiving service to correlate one span with another and associate it with the trace overall.
Propagation is the mechanism that moves Context between services and processes. By doing so, it assembles a Distributed Trace.
This method tracks requests across services, while monitoring the flow of the request across all the devices, databases, serverless functions and third-party APIs. Each such tracked action is called a ‘span’. Related solutions aggregate these spans and build a directed graph from them. This graph is called a trace. Some developer tools, like Jaeger and Helios, enable visualization of the trace and show developers how the data flows through the app, including complex sync and async flows (HTTP requests, gRPC calls, serverless invocations, messaging queues, event streams and more). Distributed tracing with OpenTelemetry will be discussed below.
Related: API monitoring vs. observability in microservices- Troubleshooting guide
Distributed tracing vs. logging
Distributed tracing and logging are two different approaches that can be used by developers to understand the behavior of a system.
The first is a method that provides context and information about how a request is processed across multiple microservices. It provides a complete picture of the request flow through all microservices. Visualization of this information can help pinpoint where a problem occurred in the system. Then, these insights can be used for troubleshooting and debugging and for improving development velocity.
Logging, on the other hand, is the process of recording system events, typically for debugging and auditing purposes. Logs provide a historical record of what has happened in the system, but do not give a complete picture of a request flow.
Logs and traces complement each other. Logs can help identify that an issue occurred, while traces give more context about where and why it occurred.
Distributed tracing: languages and components
Tracing with Node.js
Node.js enables overriding implementations at runtime by replacing the implementation of functions (i.e., monkey-patching), making it rather simple to implement distributed tracing. To use tracing with a Node.js application, developers can use open-source solutions like OpenTracing or OpenTelemetry Node.js or implement a tool that leverages these OSS and ads visualization and other advanced capabilities, such as Helios. Click here to see how to instrument distributed tracing for your Node.js application with Helios based on OpenTelemetry.
Tracing with Golang
Go's strong typing and compiling into machine code makes it difficult to add instrumentation code dynamically. In addition, making runtime changes to compiled machine code is risky and it may be considered a security problem. Helios took the legwork out of OpenTelemetry instrumentation in Go by taking a new approach that is both easy to implement and non-intrusive.
Tracing with Grafana
Grafana has developed Grafana Tempo, an open source distributed tracing backend. Integrated with Grafana and Prometheus, Grafana Tempo can be used for ingesting tracing. With Helios, traces can be displayed on Grafana dashboards.
Tracing with Kafka
Kafka is an open source event streaming platform that captures real-time data. It is a popular tool but it impedes developer observability since it decouples producers and consumers and uses asynchronous processes. This means there are no direct transactions to trace or any explicit dependencies. This makes tracing for Kafka all the more important. Here’s how to to run distributed tracing with Kafka with Helios.
Tracing with Java
Java does not enable overriding implementations at runtime by replacing the implementation of functions. However, it supports a mechanism called the Java agent – enabling dynamic bytecode modification that essentially enables similar capabilities to the ones we have in Node. The Java agent is a separate JAR that’s provided as an argument to the application JAR and performs the instrumentation. Learn more about how to get started with OTel-based tracing in Java or onboard Java observability now .
Tracing with Python
Python supports object-oriented and procedural-oriented programming techniques and does not require variable declarations since it is a dynamically typed language. Like all other instrumentation libraries, OpenTelemetry based instrumentation for Python works by wrapping existing function implementations and extracting the necessary data. Get started with tracing in Python.
Distributed tracing use cases
Visualized tracing data can help developers gain a broad understanding of their microservices architecture, as well as granular insights into specific requests and dependencies. This capability makes tracing a good solution for a number of developer use cases:
Troubleshooting and debugging
It is time-consuming and challenging for developers to identify issues, reproduce scenarios and fix bugs in microservices. This is mainly due to the lack of visibility into the architecture and missing information that is not available through logs, like HTTP request body, Kafka messages and Lambda events.
To help troubleshoot and debug issues, some distributed tracing systems gather payloads and error data for identifying bottlenecks, identifying broken flows, and reproducing them.
Developer-first observability
Lack of visibility into microservices means developers lack confidence to make changes in microservices, since they don’t know what might break. This is due to how microservices are designed: they require properly configured APIs, taking into account response handling, error handling, requests, security, and a number of other factors. Otherise, they won’t be able to communicate.
A tracing system provides visibility into the architecture and data insights across all environments – from local to testing to staging to production. By seeing data flows, payloads, dependencies, and errors, developers can then acquire the information they need to develop and deploy production-ready code, while also ensuring services interact with each other as part of a software development lifecycle.
Operations
Microservices provide teams with the flexibility to choose which technology stacks and framework to use and implement. However, this poses operational challenges in terms of communication, monitoring, scalability, and consistency among services. With distributed tracing, requests, queries, and payloads can be shared and reused, which helps developers operate and collaborate more effectively.
Testing
Testing microservices takes a very long time and is inaccurate. The loosely coupled nature of microservices and their optional boundaries and connection points that create dependencies, make testing them very complex. In most cases, developers can only reliably test their own services, since broader tests require relying on potentially outdated testing or staging environments, or on mocking, which is also complex. Even when testing does take place, results can be flaky. This means developers cannot be confident that testing will ensure code quality and application functionality and performance.
Using trace based testing, comprehensive tests can be automatically generated. Trace-based tests can even be generated directly from production, ensuring consistency and reliability.
Who can use distributed tracing
This method can be used by developers and DevOps to improve their understanding of their architectures and environments. It assists them improve their velocity and efficiency. However, tracing holds high promise especially for developers, who can gain newfound visibility so they can track and monitor activities in the entire development process that occurs before production. As a result, they are able to develop, test and troubleshoot with confidence.
A tracing solution for developers should enable them to:
- Access data across all environments
- View data at any stage: immediately after an API call, during log browsing, when viewing an error report, etc.
- See all information in full context
- Filter data and search inside it
- Collaborate with their team
Distributed tracing with OpenTelemetry (OTel)
OpenTelemetry (OTEL) is an open-source collection of tools, APIs, SDKs, and tools for creating and gathering telemetry data, including traces from microservices. Then, through solutions like open-source Jaeger or Zipkin or Helios, developers can visualize the traces, see their microservices architecture and troubleshoot errors.
OpenTelemetry enables easy integration with existing tools, is vendor-agnostic and supports multiple technologies. The project also boasts a vibrant open source community that contributes to it constantly. To date, OpenTelemetry is the industry standard for collecting distributed tracing data.
Tracing visualization
Visualiztion leverages OpenTelemetry tracing to provide granular visibility with unique and immediate insights. Developers can see and understand how their services interact with each other and where any errors and performance issues actually occurred. Troubleshooting becomes effortless.
Learn More about using Helios OTel based visualization and insights
Deep
Visualization
Understand your architecture and identify workflows and dependencies.
Immediate
Insights
Get granular visibility into issues and errors.
Simple
Troubleshooting
Use the provided information to quickly realize the problem and proceed to fixing it.
Built-in
Collaboration
Easily share traces, tests and triggers with your team.
Deep visibility into your services
Distributed tracing aggregates the operations that occur in microservices based apps with a certain context. But without purpose-built visualization methods, views are cumbersome and lack data needed to understand and troubleshoot complex requests.
Helios provides deep visualization that enables seeing into complex sync and async workflows, understanding the dependencies between different components and detecting changes across versions. This visibility is key to troubleshooting your applications.
Here’s an example:
How does tracing with Helios work?
Distributed tracing with Jaeger
Jaeger is a popular open source distributed tracing solution that was developed and released by Uber. With Jaeger, developers can monitor and troubleshoot microservices. They can use it for distributed context propagation, transaction monitoring, root cause analysis, service dependency analysis and identifying bottlenecks for optimization. Jaeger was inspired by OpenTracing. Includes Cassandra, Elasticsearch and in-memory storage backends that are built-in, provides adaptive sampling, and more.
Augment Jaeger with Helios
Jaeger enables a basic level of visualization of distributed tracing. By enhancing Jaeger with Helios,
you can get:
- Enhanced visibility - more context, depth and insights that pop out.
- An API catalog - including HTTP, gRPC, GraphQL, Kafka, RabbitMQ, and serverless APIs.
- Testing capabilities - Automatically create microservices tests instead of having to script them.
- E2E Flow Replaying - after fixing a problem in a feature, ensure the updated version will work, error-free.
FAQs
Visualized tracing data can help developers troubleshoot and debug their microservices. This is due to the granular visibility and insights distributed tracing solutions provide. Testing is also a useful use case. Learn more.
A tracing solution for developers should enable them to access and view data, get contextual information, filter data and collaborate with their team and with DevOps.
The most popular open source solutions that can leverage tracing are Jaeger and Zipkin. They rely on the telemetry data gathered by OpenTelemetry.
More recent solutions can provide additional capabilities: more tracing data from payloads and visualization. This enables troubleshooting before production. Read more.
Leveraging a range of instrumentation layers, distributed tracing can be adopted in a way that will help engineering teams boost productivity without compromising data privacy.
On this page
- Distributed Tracing in Microservices
- How Does Distributed Tracing Work?
- Distributed Tracing vs. Logging
- Distributed in different languages
- Distributed Tracing Use Cases
- Troubleshooting and Debugging
- Developer Observability
- Operations
- Testing
- Who Can Use Distributed Tracing
- Distributed Tracing with OpenTelemetry (OTel)
- Distributed Tracing Visualization
- Deep Visibility Into Your Services
- How Does Distributed Tracing With Helios Work?
- Distributed Tracing with Jaeger
- Augment Jaeger With Helios
- Distributed Tracing FAQs
More Resources on Distributed Tracing with Helios:


Increase your dev velocity
with actionable telemetry data