The age-old dilemma of privacy and security vs. productivity pops up for developers every time they consider introducing a new technology to their stack. The dilemma is often viewed as a trade off: on one hand, privacy and security measures can slow down how quickly new features can be rolled out; on the other hand, prioritizing productivity and business enablement over privacy and security can increase the risk of breaches to an organization. The bottom line is that privacy and security live on a spectrum and every organization must decide for itself what its risk tolerance is.
Engineering decision-makers and CISOs have asked us about similar considerations when it comes to distributed tracing, especially for the purpose of microservices observability. When they consider introducing a new tool into their systems, they’re thinking of the overhead associated with the trust and accountability of the risks that tool can introduce. For example, installing an SDK in your application means you’re introducing code that is not your own into your system and it can potentially crash your app or simply have code vulnerabilities. This opens your application up to stability and security risks, not to mention possible risks associated with the exposure of sensitive customer data. Data collected for instrumentation may contain sensitive information such as user credentials, PII, or financial information. In light of these risks, engineers are asking questions about distributed tracing such as, “What is the risk that we’re introducing?” and “How much risk are we willing to tolerate?”
The answer doesn’t have to be all or nothing.
At Helios, we offer multiple implementation layers that enable organizations to drive dev velocity and engineering productivity, while complying with the organization’s risk profile. Helios is a dev-first observability platform that helps Dev and Ops teams shorten the time to find and fix issues in distributed applications. Built on OpenTelemetry, Helios provides traces and correlates them with logs and metrics, enabling end-to-end app visibility and faster troubleshooting.
In this blog, we look specifically at the different layers Helios provides to help teams adopt distributed tracing and accelerate productivity while ensuring data privacy and compliance. The notion of layers is important as it allows the much needed flexibility each team can leverage to comply with its own regulations. We deep dive into how each layer can be used to support different organizational risk profiles and business policies.
What are the privacy challenges of distributed tracing?
Instrumenting running application data to provide distributed tracing-driven observability, as is the case with any telemetry or monitoring system, could introduce risk around sensitive information being inadvertently captured and transmitted. Instrumentation involves the collection and transmission of large amounts of data across different components of a system, including sensitive data such as user IDs and authentication tokens. This data should be protected to prevent unauthorized access or breaches. Additionally, distributed tracing data may be subject to compliance regulations, such as GDPR or HIPAA, which require strict data protection and privacy measures.
So why would you want to implement distributed tracing?
Distributed tracing offers many benefits for improving the developer experience, especially when it comes to microservices observability. It provides valuable insights into system performance and behavior, helping developers reduce mean time to resolution (MTTR) of issues in their distributed apps. Leveraging OpenTelemetry, tools like Helios provide granular visibility into traces, so you can understand the dependencies between different components in your application and how requests and data flow through your entire system. Leveraging such a widely adopted, industry-backed open source project such as OpenTelemetry reduced the risk of installing 3rd-party code by relying on the immense power and wisdom of the community to build the most robust, secure and practical instrumentation solution. With Helios, you get end-to-end visibility into your application across microservices, serverless functions, databases, and 3rd party APIs, and you can see distributed tracing information in full context – including payloads, headers, and all span attributes. You can get insights as early as in your local and integration environment, all the way to production.
Adapting distributed tracing in line with your risk profile
Helios’ mission is to help engineering teams adopt distributed tracing and OpenTelemetry so they benefit from enhanced developer productivity and best practices when it comes to microservices observability. Helios seeks to enable teams to achieve more with the resources they have, where they are in their growth journeys. To meet organizations where they are specifically in their risk profiles, Helios exposes various layers of control to set and select what data is instrumented in a system and how. The layers range from instrumenting all data, to collecting only metadata, and different options in between.
The Helios SDK offers both allowlist and blocklist capabilities to ensure that the sensitive data collected complies with your company’s policies and needs. To comply with more strict security guidelines, the SDK also supports a mode in which only metadata is collected. Users can also choose to simply block specific paths from being collected altogether.
The table below outlines the layers Helios provides to engineering teams that wish to leverage distributed tracing while meeting privacy requirements:
|🟢 Flexible||Instrument all data for optimal E2E visibility and complete troubleshooting and debugging coverage||C++ | Erlang | Go | Java | .NET | Web JS | Node.js | Python | Ruby|
|⚪️ Diligent||Obfuscate specific data, based on what’s configured specifically in the blocklist||Data obfuscation blocklists|
|🟡 Moderate||Obfuscate all data, except for what’s configured specifically in the allowlist||Data obfuscation allowlists|
|🟠 Conservative||Metadata-only mode ensures headers and payloads are not collected||Metadata-only mode|
|🔴 Stringent||Filter specific URLs and paths from being collected by the SDK altogether||Data filtering|
For customers who have flexible privacy requirements, they can opt to have all their data instrumented for maximum E2E visibility and complete troubleshooting and debugging coverage. Customers can benefit from distributed tracing and observability over microservices in C++, Go, Java, .NET, Python, and other languages. They can drill down to services, view specific requests, and gain additional context with enriched OpenTelemetry instrumentation. Helios delivers all the context needed for troubleshooting of issues, including all payloads (e.g. HTTP request/response bodies, message queues content, DB queries and results, etc.) headers, and all span attributes.
Customers who are asked to apply privacy practices more diligently can obfuscate specific data according to a configured blocklist. They can set what specific data must be obfuscated by the Helios SDK, and only data that matches an expression in the JSONPaths configured will be collected in a secret manner. For example, a user can mark sensitive information like first name, last name, and credit card number to be obfuscated.
If customers go one level more cautious with their privacy requirements, they can choose to obfuscate all data by default, except for what’s configured in an allowlist. Customers can put into an allowlist specific data that does not need to be obfuscated by the Helios SDK. Only data that matches an expression in the JSONPaths configured will be collected and sent as is.
Customers with more conservative privacy requirements can can choose to receive metadata only, which means that the Helios SDK will take all the payload data and neutralize it, leaving only the metadata that OpenTelemetry provides. When the metadata-only mode is configured, no content of headers or payloads (including HTTP request/response bodies, message queues content, DB queries and results, etc.) is collected.
In the face of the most extreme privacy requirements, customers can choose to drop certain traffic & data altogether. They can filter specific URLs and paths from being collected by the Helios SDK. Users can control what data is being traced and reported to Helios; if a user marks something as sensitive, no part of it will be collected by the SDK.
Distributed tracing improves engineering productivity and developer experience by accelerating dev velocity and driving engineering excellence, especially when it comes to microservices observability. With distributed tracing, engineering teams get the observability insights they need, when they need them most, to troubleshoot and improve the performance of their apps. To benefit from distributed tracing, teams don’t need to forego privacy requirements. Balancing between developer productivity and security is ultimately a tradeoff, and Helios provides layers you can use to ensure you’re taking advantage of distributed tracing insights without sacrificing the business’ data privacy policies. Naturally, instrumenting all data and providing the complete context lead to faster results and ultimately reduce MTTR. Helios allows developers to be in control of what data is instrumented in their applications and how. This is one more way in which we wish to see more and more teams embrace distributed tracing and do more with the resources they have.