The Challenges of Collecting Runtime Data

Written by

Ran Nozik

Subscribe to our Blog

Get the Latest News and Content

Collecting data in real-time plays a crucial role in securing, monitoring, and troubleshooting applications. This real-time data, often referred to as ‘runtime data,’ provides unique insights into the application’s behavior, which aren’t available through other collection techniques.

But the tricky part is that collecting runtime data comes with challenges. For example, collecting data about incoming HTTP traffic requires careful handling to ensure that we don’t overwhelm storage, CPU resources, or block essential application processes. So, in this article, I will discuss these challenges in detail.

1. Dealing with multiple programming languages

Each programming language requires specific instrumentation techniques to collect runtime data. Interpreted languages (Node.js/Python, etc.) tend to be relatively similar, as they allow direct replacement of function implementation at runtime, while virtual runtime languages (Java, C#) have different built-in mechanisms that are part of the language’s runtime (e.g., the javaagent or JVMTI in Java). Each of these requires a deep understanding of language runtimes to instrument them effectively and safely. With the rise of microservices, organizations often build complex applications with services written in multiple languages. Providing coverage for all languages and gaining expertise in each language is certainly a challenge.

2. Dealing with different deployment platforms

When collecting runtime data, you’ll face different challenges depending on where the applications are deployed. Whether you’re using Kubernetes, managed container runtimes like AWS ECS, or serverless systems, each environment has its own quirks that affect your ability to deploy your sensors and collect data. A few of these challenges:

Kubernetes versions – Different versions of Kubernetes come with different features that impact data collection. For example, admissions webhooks, which support certain types of instrumentation, were introduced in K8s 1.9

Privileged access – some collection techniques, and specifically eBPF, require the runtime sensor to run as root. Not all deployment modes support that (e.g., EKS Fargate).

Sandboxed runtimes – in serverless deployments, the runtime is sandboxed and access to the file system, kernel, and even the execution flow isn’t available. The cloud provider may provide some mechanisms to overcome some of these limitations (e.g., the LAMBDA_TASK_ROOT environment variable), but generally speaking, such environments require a dedicated approach.

3. Low-friction deployment and maintenance

Low-friction deployment is essential in modern software development, where the goal is to achieve seamless deployments with minimal or, ideally, zero manual intervention. For example, Helm charts or other Infrastructure as Code (IaC) solutions can significantly reduce the friction associated with deployments. Some cloud providers also offer APIs and add-ons, such as AWS EKS add-ons or Lambda layers, facilitating smoother deployments.

Additionally, the runtime sensor mustn’t impact the performance of the application. Some of the things that should be taken into account:

Efficient instrumentation: Uncareful instrumentation (i.e., hooking into any function invocation of recording all TCP traffic and analyzing it on the server side) may cause performance degradation and increase resource utilization.
Data volume: Generating large volumes of runtime data can overwhelm the available storage and network bandwidth.
I/O and CPU Intensiveness: resource-intensive collection activities can put pressure on the system’s resources.

4. Integrating new data collection mechanisms without harming existing ones

Most systems have multiple runtime sensors for different purposes. Adding a sensor that aims to provide security insights should not interfere with existing observability sensors. Some collection techniques are more prone to such collisions – specifically, application layer instrumentation must be done with extreme care to avoid colliding with other solutions that may already be in place. Another thing to consider is excluding the other sensors from being monitored themselves, which may create irrelevant and large volumes of data.

Conclusion

To sum up, real-time data collection is a critical component of application security, monitoring, and troubleshooting. It serves as a guiding light to provide crucial insights for each of these use cases. However, the process of collecting runtime data presents several challenges, demanding careful consideration and experience in multiple collection techniques.

Given our vast experience with runtime data collection, we’re able to overcome these challenges. Helios combines multiple runtime data collection methods to provide context, granular data, and a comprehensive picture of the application’s security posture in runtime. Learn more about our full-stack approach here.

Subscribe to our Blog

Get the Latest News and Content

About Helios

Helios is an applied observability platform that produces actionable security and monitoring insights. We apply our deep runtime data collection capabilities to help Sec, Dev, and Ops teams understand the actual application risk posture, prioritize vulnerabilities, shorten troubleshooting time, and reduce MTTR.

The Author

Ran Nozik

CTO and co-founder of Helios. An experienced R&D leader, and mathematician at heart. Passionate about improving modern software development, and a big fan of and contributor to OpenTelemetry. After serving as an officer in unit 8200 and leading R&D efforts in the cybersecurity department, working as a Senior Software Developer, and becoming an Engineering Team Leader, Ran co-founded Helios, a production-readiness platform for developers. Ran holds a B.Sc. in Computer Science and Mathematics from the Hebrew University of Jerusalem.

Ran Nozik

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

The Challenges of Collecting Runtime Data

Ran Nozik

1. Dealing with multiple programming languages

2. Dealing with different deployment platforms

3. Low-friction deployment and maintenance

4. Integrating new data collection mechanisms without harming existing ones

Conclusion

Ran Nozik

Product

USE CASES

Resources

Company