Traditional software application infrastructure generally followed a monolithic pattern; one application connected to one database. Understanding what was going on inside the application itself was a simple proposition, with relevant logs and metrics all originating from the same source. When it came time to scale, the answer was to get a bigger server or to get more servers.
In contrast, modern software application infrastructure distributes the logical parts of an application across workloads, platforms, and even data centers and cloud providers. Therefore, traditional, monolithic-focused monitoring tools cannot provide a complete picture of what’s going on inside an application. Engineering teams that don’t have a full understanding of their architecture will be slow to iterate and release new features, putting business outcomes in jeopardy. So the question becomes: what can provide that complete picture?
When most engineers familiar with web-facing software think of monitoring, they think of metrics, logging, and traditional performance metrics like response time. However, these measures don’t tell the whole story of distributed app performance and behavior. This is where tracing can help.
What is Tracing?
Tracing is generally defined as the act of following a user request through a multi-service architecture. Tracing is a necessity to gain a holistic understanding of application behavior, particularly applications on distributed infrastructure.
Tracing can further be broken down into Spans. The first Span of a trace is called the Root Span, which triggers the entire request/trace. Spans triggered by the Root Span represent more granular parts of the request. Spans are typically associated with individual URIs or services that take part in the larger request context, such as authentication.
What is Trace-based Testing?
Tracing/ distributed tracing is a great visibility tool for understanding application behavior. However, it comes with a caveat; it’s difficult for developers to take advantage of the benefits offered by tracing while working in a distributed system, where traditional testing methodologies may not apply.
Fortunately, trace-based testing fills that gap. Trace-based testing is a form of software testing that utilizes the core functionality of distributed tracing in order to produce more comprehensive, application-wide test results and data.
Trace-based testing allows developers to make validations based on tracing data. Furthermore, trace-based testing can be integrated with existing testing frameworks to provide specific data and query validations throughout the lifecycle of a request. This enhanced visibility becomes possible via test logic that extends beyond the boundaries of an individual function or application component.
As a result, trace-based testing also improves the overall functionality and usefulness of application testing. Consider unit tests: it’s often left to developers to enumerate various aspects of how their application is expected to behave in a given scenario, and then to create tests to validate that behavior. Unit testing helps improve code quality, but testing complex application logic is difficult and incurs a cognitive tax on the developer. With trace-based testing, developers can still be productive with unit-testing practices they know, but with the enhanced capabilities offered by tracing features added on.
Finally, trace-based testing provides fast feedback, for improved development patterns. Immediate and actionable feedback from trace-based testing empowers developers to quickly gain an understanding of how their proposed changes impact application behavior and allows them to iterate on and deliver features faster.
Trace-based Testing in Action
A great way to highlight the benefits of trace-based testing is to see it in action. In this example, the application is a basic web service for providing restaurant recommendations. The application uses Flask as a frontend request router, API, and web server. Backend data storage is provided by MongoDB, which has been pre-populated with example data. Frontend requests are also routed to a Kafka server for later review or replay. For development, everything is running in local Docker containers.
Before getting into setting up trace-based testing, it will be helpful to understand how developers might use traditional testing methods in this scenario. Consider one of the application’s API endpoints:
Typically, a developer will need to spend time manually designing and implementing tests that will cover as much application behavior as possible. For this use case, Flask applications can take advantage of the pytest framework with fixtures. A developer can use fixtures to set up a mock client, run tests, and then tear down any resources that were used after the tests have concluded.
For the example application, a developer might create this test for the /fivestar endpoint:
def test_get_restaurant(client): response = client.get("/fivestar") assert response.data['rating'] == 5
This test checks that the response from the application indeed returns a restaurant with a rating of 5. However, because this is a fixture, this is not occurring in the context of a live application. The developer will also need to create mock data fixtures to mimic the behavior of the backend database.
The problems with the traditional testing approach are becoming plainly obvious: developers have to spend critical cycles manually crafting tests that only emulate application behavior in a sterile environment. Even with these tests, the developer still does not have a clear picture of how a customer request will actually behave in the application infrastructure.
Trace-based Testing: A Tutorial
With trace-based testing, developers have a more comprehensive view of their applications. Let’s see how it works with Helios.
Using the button in the upper left, we can add services to our environment:
Since the example application uses Flask, a Python module, Python is the obvious choice for the service:
The environment variables need to be available during application runtime. For Docker-based applications, these are commonly placed in the Dockerfile. Once the variables have been populated, and the application started, there still won’t be any tracing data visible. The missing element is some actual requests.
The demo application will have three distinct services: frontend, API, and web. All requests will pass through the frontend to either the API or web endpoint. All requests will also be recorded in a Kafka queue called “review-app”.
Requests to the API for a randomized 5-star restaurant:
$ curl http://127.0.0.1:4080/?endpoint=fivestar-api
Request to the API for all restaurants serving pizza:
$ curl http://127.0.0.1:4080/?endpoint=restaurant&cuisine=Pizza
And then make a visit to the 5-star and cuisine endpoints in the browser:
Now there should be several services present in the UI, along with their recent request traces:
Individual requests, shown as “operations”, show a data-rich, user-friendly representation of individual requests as they traverse spans and nodes:
With this data available, developers can get real-time insights into application behavior and performance. Clicking through one of the traces presents an extensive, in-depth picture of the entire lifecycle of the request:
Even though this is being handled via UI, Helios makes it possible to generate repeatable test automation, which gives development teams the capability to standardize and scale trace-based testing for the entire stack.
Switching to the “Test” interface allows users to create programmatic test fixtures with options for defining query parameters and expected return values:
Clicking “Generate test code” generates fully functional tests for common language testing modules, in this case, pytest:
Developers can copy this code to existing test libraries, which can be reused across different deployments. To further extend testing capabilities, Helios also lets developers perform validations during tracing tests. Looking at the “Test” tab for one of the five-star endpoint traces:
The empty blue circle on the span leading to the MongoDB node is a validation checkpoint. Enabling this checkpoint provides a regex validation mechanism for query and return values:
A good validation for the `/fivestar` endpoint would be to assert that the return value has the expected value of “5” for the “rating” field. The Data validation should be configured as follows:
Clicking “Generate test code” produces a complete test, including the validation assertion:
This code needs to be copied to the local development test suite. For the purposes of this demonstration, the test can be placed in the root directory in a file called `test_api.py`. The Helios docs provide instructions for running this test with the pytest module. A successful test run will produce a URL linking to the results in the Helios UI. Clicking that URL shows the specific test invocation:
Clicking on the test shows a new trace result, with the test results as added context:
The ability to seamlessly interweave tracing and programmatic tests provides detailed, data-driven insights into application behavior that just isn’t possible with traditional testing methods.
Trace-based Testing Wrap-up
Modern application infrastructure is complex. Traditional testing, such as unit tests, are good development practices, but they require extensive work by development teams to provide even partial testing coverage for a complex, distributed application.
Trace-based testing helps developers understand this complexity, and gives developers confidence in their test suites, and ultimately their application code. Automated trace-based testing solutions integrate with existing developer workflows in a frictionless way; avoiding the need to rely on ops teams or heavy modifications to application dependencies. By providing visibility into application behavior and architecture components, trace-based testing provides immediate, actionable feedback for developers to improve their applications and critically, to improve the overall customer experience.
To get started with trace-based testing for free with Helios, click here.