Testing today’s environments is more challenging than it was a few years ago. The transition to distributed environments has created complexity, overhead, and friction when writing and running new backend tests. These tests require a lot of preparation, infrastructure building, and maintenance since many services communicate asynchronously and they often miss exceptions that are thrown on the “deeper layer” of the system architecture and it’s hard to make it testable. In this blog, I would like to show that by running trace-based automated tests, developers can validate their data through robust tests with almost zero effort. Here’s how it can be done.
Backend Testing in a Distributed Environment: A Messy Affair
What does the testing of a distributed backend look like today? The following flow depicts a trace of a typical financial transaction app. As you can see, there are a lot of components that depend on each other. There are two Kafka topics, as well as Postgres, Dynamo DB, third-party APIs, and five microservices. Any code change in any service can and often does affect multiple others.
If this were a monolith, a failure would simply return an HTTP 500 status code from the server. But in this microservices architecture, you might get 200 from the BFF (Backend for Front) microservice while the exception would still be thrown on another microservice, without you receiving any indication about it.
How Developers Usually Build Test Automation Infrastructure for Microservices
Let’s say we want to test the following use cases. They are both parts of the backend e2e happy flow.
- Each time the POST request gets to the `deposit` endpoint, we will check the email is sent to the customer via SES.
- Ensuring the client was charged and that the Stripe call was successful.
There are a few ways to build test automation for these two scenarios.
1. Log-based Testing
Assuming that the feature’s developer added logs for each operation, we can fetch the logs from those services and validate that the relevant data exists.
For example:
class OrderTest(unittest.TestCase): def test_process_order_happy_flow(self): with self.assertLogs('foo', level='INFO') as cm: requests.get(f'/process_order/{TEST_CLIENT_ID}') self.assertEqual( cm.output, [f'INFO:send email to {TEST_CLIENT_ID}', f'INFO:Charge {TEST_CLIENT_ID} succeeded!'] )
What are the challenges with this approach?
- We assume that the developer added logs, which is not always true.
- We can only validate the data that was written in the logs. If, for example, there are no written payloads of a request to the logs, they can’t be validated.
- We coupled logs and the tests – which sounds like a perfect way to create flaky tests.
- We are testing only the logs of the operation and not the operation itself. As a result, we don’t know if the operation succeeded.
2. DB Querying
By saving the operation indication in DB, the DB can be queried during the test to validate if it exists.
class OrderTest(unittest.TestCase): def test_process_order_happy_flow(self): requests.get(f'/process_order/{TEST_CLIENT_ID}') client = Client.get_by_id(TEST_CLIENT_ID) assert client.charged_successfully is True assert client.email_sent is True
What are the challenges with this approach?
- We need to expose the database to the tests project, which sometimes requires heavy lifting.
- We need to engineer the database for the tests, which is overengineering and not the focus of our work, making it a weird thing to do.
- We coupled our DB model to the tests, which is logically wrong.
- And again, just like with the previous log solution, we are testing the DB object update operation and not the actual operation, which is entirely wrong and can’t detect any real issues.
These testing solutions are only the tip of the iceberg of odd testing solutions for distributed applications.
A New Backend Testing Paradigm
At Helios, we are bringing a new testing paradigm to the field – trace-based testing. Instead of over-engineering your applications to make them testable, we suggest a new way that doesn’t require any engineering to your code. This way is built upon traces.
Traces enables a new type of testing because they allow us to see all the operations that are triggered in our distributed system by a single operation. This makes it easy to approach each operation and see it as part of a whole, and not as an individual action. In addition, traces can be created automatically, without a developer having to decide where to insert them, unlike logs.
The building block of a trace is a span (you can read more about that here). A span is an interaction between two components in your application. Spans allow us to review the attributes we want to validate when testing.
As a result:
- Developers don’t need to create test infrastructure that exposes interfaces of ‘inner layers’ in their systems.
- Developers and Automation engineers can create meaningful tests without fully understanding the system and how it operates.
- It is extremely easy to see all kinds of flows in the application across multiple services.
- If developers see a trace of a bug in production – they can create a test directly from the bug. This is an implementation of the paradigm that says “create tests that make you avoid each bug that you find”.
How To Test with Traces and Spans with Helios
Once you have Helios installed, spans are created automatically by Helios’s auto instrumentation SDK, which is based on OpenTelemetry. OpenTelemetry (OTel), is an open-source solution that provides a collection of SDKs, APIs, and tools for collecting and correlating telemetry data from different interactions in cloud-native, distributed systems.
Through spans, Helios collects all the payloads of every communication between two components in the system, i.e each request and response. With this capability, developers can build robust tests without changing a single code line.
Here is the code for the use cases from before, generated automatically by Helios and based on traces:
def test_by_helios(): requests.post(URL, headers=HEADERS, json=DATA) http_post_spans = test_trace_manager.find_spans( service=CHARGE_SERVICE_NAME, operation=CHARGE_SPAN_OPERATION, span_selectors=STRIPE_SPAN_SELECTORS ) assert http_post_spans, 'HTTPS POST in accounts-service did not occur' ses_send_spans = test_trace_manager.find_spans( service=SES_SERVICE_NAME, operation=SES_SPAN_OPERATION, span_selectors=SES_SPAN_SELECTORS ) assert ses_send_spans, 'aws.ses.sendEmail in emails-service did not occur'
Notice the test_trace_manager.find_spans(...)
method. This method allows developers to write tests based on spans that occurred in a specific trace.
The reason this is so disruptive is that now developers do not need to make their code testable by design. Instead, they get it automatically, out-of-the-box. Helios enables anyone, regardless of their code experience, to generate backend automation tests.
Generating a Trace-Based Test in Helios
To generate the test from a trace in the Helios app, you can change the mode from view to test and then set a validation checkpoint for each span.
Helios users can also configure each validation checkpoint and select what they want to validate.
After that, they can generate the test code and export it to any supported language.
Here’s what the auto-generated test will look like:
Using Helios, now developers can implement the known method of creating a test from a bug. In addition, using Helios trace visualization tool – the developer can now create a test from a trace visually, which is a game-changer in the backend testing domain.
What’s Next for Automated Testing in Microservices?
The way developers usually create back-end tests for distributed environments requires many changes, tweaks, and engineering to develop tests. By using Helios, developers now can use a tracing base test method to write tests without engineering around the code to make their code testable.
No expertise is needed. Even people without coding experience can generate tests with Helios. This includes tests for very complex architecture, which could include Databricks, GRPC communication, messaging queues, and many more components.
Want to try our microservices testing capabilities? You can sign up for free or you can start by playing with it using Sandbox and enjoy the ride.
Alternatively, learn more about setup actions when building complex E2E tests in Helios