Join the new Helios product community on Slack!

Identifying root cause in minutes


illustration

Challenge

Replicating the precise scenario you need in distributed cloud-native environments can be a puzzle. Because application flows pass through multiple services, understanding the relationships between your microservices components is a complex and tedious task. Often, components are siloed which makes it nearly impossible to pinpoint and troubleshoot issues fast. Developers have to manually sift through logs or keep attempting to recreate issues on their local environments, in order to find and resolve them.

How Helios can help your team

Helios is a developer platform that adapts OpenTelemetry instrumentation to provide distributed tracing data as early as in your local environment. It integrates with your existing logging, error monitoring, and CI, so you can get a full picture of every issue. With Helios you get visibility into payloads, logs, metrics and errors through end-to-end traces so you can find the root cause of issues and fix them faster than ever.

With Helios you can:

  • Pinpoint bottlenecks and broken flows in your application
  • Reproduce the exact flow with a button click, including HTTP requests, Kafka and RabbitMQ messages, and Lambda invocations in a couple of clicks
  • Filter errors by service, API calls, message queues, and streams with extensive search capabilities
  • Visually understand the full context and see what has changed, where, when and how

Example scenario

You’re working on a feature and discover that your code doesn’t work in production, even after you’ve tested it locally and completed your integration tests. It worked in your local environment, then in staging, but in production it doesn’t. You can use Helios to quickly identify the flow, lookup the successful execution and the failed one, look at the trace visualizations and identify what happened in seconds. From there, make the appropriate fix in your code, and generate a test for the scenario to ensure the issue doesn’t happen again. Turnaround time since encountering the issue until having a new regression test and a working feature in production: 15 minutes.

See the live trace visualization in the Helios Sandbox.

image
“Helios got us from an error message to the root cause in two minutes.”
Yair Morgenstern, Software Engineer, Twist BioScience

Increase your dev velocity
with actionable telemetry data