OpenTelemetry (OTel) is emerging as the industry standard for system observability and distributed tracing across cloud-native and distributed architectures. But where do developers fit in? With OTel’s main use case focusing on production monitoring and observability, I find that many developers are still not fully familiar with OTel. Others believe it is more of a tool for DevOps/SRE.
That’s too bad – distributed tracing should be used by developers daily, for a wide variety of use cases that range from local development, troubleshooting, testing, debugging, documentation, and more.
OpenTelemetry (OTel): A Brief Reminder
OpenTelemetry (OTel) is an open-source solution that provides a collection of SDKs, APIs, and tools for collecting and correlating telemetry data (i.e., logs, traces, and metrics) from different interactions (API calls, messaging frameworks, DB queries, and more) between components in cloud-native, distributed systems. After exporting the data to different backends like Jaeger or Zipkin, R&D organizations gain observability into their systems.
Needless to say, OTel answers a very widespread pain among R&D organizations – it provides them with tools to identify errors, issues, and bottlenecks in microservices and distributed architectures.
OTel for Developers?
OTel is slowly but surely becoming the industry standard for collecting telemetry data. Leading technology companies like Google, Microsoft, Amazon, Splunk and Datadog, are investing heavily in OpenTelemetery.
Datadog, for example, donated OTel’s Java SDK to the community and Google is including OTel as a built-in configuration into many of its GCP SDKs. According to Gartner, by 2025 – 70% of cloud applications monitoring will be based on OpenTelemetry instrumentation.
However, many developers may still refer to OTel as a technology that’s mostly relevant for DevOps and SRE. This comes as no surprise, as even leading companies like Splunk refer to it as “critical for helping DevOps and IT groups”, without mentioning its relevance to developers.
Additionally, many developers are still unfamiliar with the concept of distributed tracing; even though many of them create “request ids” for transactions across services and monitor results in systems like DataDog and New Relic, they are not aware of the vast potential this technology has for gaining data-driven insights and reducing their development and troubleshooting overhead.
It’s no wonder many developers think it’s DevOps’s turf.
But actually, OTel has the potential to help developers directly. We’ve all experienced the pain of working in a distributed, cloud-native environment. I remember one of my first; half a decade ago, I was working on a mobile security product that protected iOS and Android devices from several threat vectors, including malware. One of our most complex flows – creating a static analysis report on an Android APK or iOS IPA – was constantly breaking, and we never seemed to be able to stabilize it. The flow involved several services, communicating synchronously (HTTP) and asynchronously (SQS, Celery).
On each on-call duty I had, something else caused a failure – once it was an unexpected format in the Celery job payload; another time, a null-pointer; occasionally it was a DevOps issue, like an SQS misconfiguration or low container resources. Each and every time, we had to check all the potential failure points, searching through logs and SSHing to machines, looking for errors and warnings, trying to correlate everything together. I would succeed, eventually, but it took me and my teammates a lot of time and effort.
Developers today don’t have to go through such a tiring process anymore. OTel is a solution that makes distributed tracing data accessible, so they can gain visibility into the entire flow, end-to-end – and quickly!
When Should Developers Use OTel?
I’m a true believer in the potential of OpenTelemetry – we’re only scratching the surface in terms of its potential. Making its data accessible to developers will reveal capabilities way beyond the classic observability and monitoring use cases. I personally came across the following use-cases, whether first-hand or from the experience of others. A few quick examples:
- Production readiness – the way to production starts in every developer’s own environment and continues through the integration and staging environment. Leveraging OTel capabilities should start there, long before production.
- Testing – distributed tracing data can be used to validate the behavior of behaviors deep within the system. Unlike traditional UI or API testing that are essentially “black box”, data from OTel can be used to make complex assertions that otherwise would be very difficult to implement.
- Collaboration – the complexity of building and maintaining distributed applications often manifests in the need to pinpoint and reuse specific requests, queries, and payloads and share them between developers. OTel makes this possible, and easy.
- Documentation – deducing application APIs and expected behavior from traffic, and converting it to standards like Swagger/OpenAPI is something that can fairly easily be done by inspecting OTel data.
- Security – back in 2012 I was working on a web application security product that used instrumentation capabilities to identify unsanitized payloads that propagated from browser requests all the way to the DB. We had to build everything from scratch – and now OTel makes it much easier.
- Onboarding – combining collaboration and documentation, OTel data can be used to create dynamic onboarding experiences, reducing the time-to-value from new developers significantly. Seeing the system visually, examining specific instances of its important flows makes much more sense to me than going over stale architecture slides and theoretical explanations.
- Troubleshooting – using existing traffic (HTTP requests / DB queries / messaging payloads) to reproduce application states with ease, and speed up debugging and development in general.
I’m sure OTel will provide lots of new possibilities for developers, which I haven’t listed above. By being able to see their systems like never before and gaining access to data that was hidden, the opportunities for developers are endless.
The Commoditization of OTel
Just like Kubernetes has made deployment of containerized applications easy, I believe OTel will become a commodity, making system observability easy. Soon, OTel will be in widespread usage across organizations. Visibility into system architecture will become the norm. The ability to gather the data and act on the insights it provides will become easier and more common.
We see this as a great opportunity for R&D organizations. As OTel becomes more widely adopted, distributed tracing data will soon become a gold mine of opportunities for the use cases described above, and many more we have yet to imagine and discover. Don’t wait.
If you haven’t checked out OTel yet, I highly recommend you do. And if you need help implementing it or extracting the data to gain advanced insights, reach out