Instrumentation Under the Hood: A Technical Explanation
Like all instrumentation libraries, OpenTelemetry operates by wrapping existing function implementations and extracting the necessary pieces of data. These include the function parameters, duration, and results. Sometimes, changes to the data are made as well (e.g., for context propagation purposes) , in a cautious manner.
requests lib, for example, exposes a separate function for each HTTP method (
requests.put, and so on). But each of these functions eventually calls an internal
request method, whose parameters are the method, URL and all the
kwargs arguments. The function then returns a response object.
A simplified way of explaining how instrumenting
requests would look something like:
To close the loop, the original function implementation only needs to be replaced with the new one,
wrapped_request. For dynamic languages like JS and Python, this is done by simply holding a reference to the original implementation and replacing the function by its name. A pseudocode implementation (which isn’t very very far from a real life code ) looks like this:
Users of these requests will not notice a thing – they will continue calling
requests.post like they did before. But the auto-instrumentation will collect the necessary data for monitoring, troubleshooting and many other use cases.
In JS, since everything is an object, patching a method is as easy as reassigning a variable. Let’s look at a simple example:
What we did was simple – we replaced the implementation of the specific
Person instance, and added an additional print, to show how the instrumentation works. As a side note, notice that this code change only affects the specific instance of the
Person class. To patch all instances, we could have simply replaced the
require function. This is the function that loads modules by their name, triggering the instrumentation process. For example, when the developer calls
require('kafkajs'), OTel uses the require-in-the-middle module to apply changes to the `kafkajs` module. This change wraps the necessary functions in a similar manner as shown above, using the shimmer library, and returns the patched module back to the user code. From the end-user’s perspective – the change was completely transparent and they are not aware of any changes made.
You may have noticed that this mechanism implicitly assumes that the `require-in-the-middle` hook was set before the call to
require('kafkajs'). If `kafkajs` (or any other module we are trying to instrument) is loaded before the hook is set, it will simply “miss” its opportunity to patch the necessary functions. This is a big potential pitfall – it assumes the developer knows exactly where to put the OpenTelemetry initialization code. In many cases – this may not be trivial, and we have indeed seen many developers “misplace” the OTel initialization code, causing the instrumentation to behave unexpectedly. Data from modules that were
required before OTel are missing (typically, HTTP frameworks like express/koa), while data from other modules appear properly.
How is this problem solved?
As described above, using OpenTelemetry in JS requires a good understanding of the application’s initialization flow. A module that is loaded before OTel will not be properly instrumented, and it is often happening implicitly through cascading
requires. But how can you be 100% sure the module was loaded after OTel?
In some cases (AWS Lambda, for instance) the developer may not even have control over the loaded modules, as the Lambda runtime comes with preloaded modules and calls a handler function that the developer provides. In this case, adding the initialization code at the top of the handler file just won’t work. There are other similar examples – where the code runs as part of homegrown microservices templates, whose initialization flow isn’t accessible (and perhaps even known) to the developer.
The most reliable way to avoid these problem is to use the native Node.js functionality of –require – to make sure the OTel initialization code is called before anything else. Setting
require this code ensures no module is loaded before the require-in-the-middle hook. The typical way for doing this is by creating a file with the OpenTelemetry initialization code (let’s call it
otel_init.js). Assuming the application’s main file is
app.js, you can either:
- Replace the
node app.jscommand with
node --require otel_init.js app.js.
- If you’re unable (or prefer not to) change the command, setting the
NODE_OPTIONSenvironment variable to
--require otel_init.jswill also do the trick.
What about cases in which
require-in-the-middle cannot work at all? webpack bundles the entire module with all of its dependencies into a single file, and the modules are not loaded using the native
require function (except for modules that are defined externally), but rather by a unique identifier allocated by webpack.
How can OTel work in such conditions? Stay tuned for our next blog posts.
To get started with Helios, which leverages OTel’s capabilities to help engineering teams build production-ready cloud-native applications, sign up here.