Opbeat is joining forces with Elastic – Read our blog post

How we instrument Node.js

One of our core principles is that any Opbeat agent can be installed in minutes, with little to no configuration, and begin giving users valuable insights right off the bat. However, when developing our Node.js module, upholding this principle wasn’t as simple as I had originally hoped.

I’m the lead Node developer at Opbeat and a member of the Node.js Tracing Working Group under the Node.js Foundation. My daily job involves maintaining the Opbeat Node.js agent, that our users install on their production servers to monitor the health and performance of their applications. One of our core principles is that any Opbeat agent can be installed in minutes, with little to no configuration, and begin giving users valuable insights right off the bat. However, when developing our Node.js module, upholding this principle wasn’t as simple as I had originally hoped.

The Curse of the Event Loop

Before I started programming in Node.js, I was a Ruby developer. I was very skilled at making web applications using the Ruby on Rails web framework, and every time I needed to access some property of the current request, it was right there. It was easy because at any given time, Rails was only processing a single request. Rails would get the request, call the controller, render the view, send the generated HTML back to the client and then be ready to serve yet another request. To process multiple simultaneous requests, I’d have to spin up multiple Ruby processes.

I quickly found that Node.js was very different. Due to the event loop, and given enough memory, a single Node.js process can handle thousands of requests simultaneously. But this nifty feature came with a cost: I could no longer just pull the current request out of thin air. The Node HTTP server might be serving multiple requests simultaneously - so which one is it?

The de facto solution to this problem is to pass the request and response objects along to each function that needs them. This works ok when you have control over the codebase. But it can be pretty cumbersome if you’re 10 levels deep in a call stack and suddenly realize that you need a header from the current request.

In the Opbeat agent for Node, this is even more problematic. When an uncaught exception is detected in your application, it would aid in your debugging to know exactly what HTTP request caused that exception to occur. The Opbeat web interface shows you a performance breakdown by endpoint, so we need to be able to tie things like database queries to their parent HTTP request – without requiring our users to change anything in their applications.

Keeping state

It turns out that to solve the above problem, we need a way to pass state across the async boundary. Anytime a callback is queued on the event loop, we want a way to store some data that we can get back later when that callback is dequeued.

The way we achieve this in Opbeat is by implementing our own hook into the event loop queue/dequeuing functions. Here’s a simplified explanation of the steps involved:

  1. First, we patch every function in Node.js that can queue something on the event loop so we can be notified every time a callback is queued to or dequeued from the event loop
  2. Then we patch the core http/https servers to get access to all incoming HTTP requests. We store the latest HTTP request in a variable that we can easily access, say currentRequest
  3. Every time a callback is queued on the event loop, we store the current value of currentRequest in a temporary variable
  4. Using a closure to access the temporary variable, every time a callback is dequeued from the event loop, we restore the currentRequest variable with the value of the temporary variable

A quick example

I’ll show how this solution can be implemented using vanilla Node. However, in reality you would likely want to use a shimming module like shimmer because monkey patching is evil! This example is simplified to make it fit in a blog post and for easier understanding so you probably shouldn’t use it in production ;)

Here we go. Let’s create a file named async-hook.js and have it patch the HTTP server to store the latest incoming HTTP request in a local variable:

var http = require('http')

// placeholder for the currently active request
var currentRequest

// patch the HTTP server to store a reference to the last request it received
var _createServer = http.createServer
http.createServer = function (onRequest) {
  return _createServer.call(http, function (req, res) {
    // store the current request for later use
    currentRequest = req
    return onRequest(req, res)
  })
}

In the same file, let’s patch every Node core function that queues callbacks on the event loop and have them restore the content of the currentRequest variable every time a callback is dequeued:

// patch places where callbacks are put on the event loop
global.setTimeout = wrapCallbackFirst(global, 'setTimeout')
global.setInterval = wrapCallbackFirst(global, 'setInterval')
global.setImmediate = wrapCallbackFirst(global, 'setImmediate')
process.nextTick = wrapCallbackFirst(process, 'nextTick')
// ...etc etc (this is by no means a complete list of functions that
// perform async operations)

function wrapCallbackFirst (mod, name) {
  var orig = mod[name]
  return function () {
    // store the current request so we can restore it later
    var req = currentRequest

    // clone arguments so we can inject our own callback
    var args = []
    for (var n = 0; n < arguments.length; n++) {
      args[n] = arguments[n]
    }

    // inject our own callback
    var fn = args[0]
    args[0] = function () {
      // restore the current request from the closure
      currentRequest = req
      // call the original callback
      fn.apply(this, arguments)
    }

    orig.apply(mod, args)
  }
}

Notice how we use a closure to keep a reference to the value of currentRequest as the callback is queued on the event loop.

Finally, let’s also add an uncaughtException listener and have it log the active request before terminating the program:

// in case of an uncaught exception, log the request that caused it
process.on('uncaughtException', function (err) {
  var req = currentRequest
  if (req) {
    console.log('Error during HTTP request:', req.method, req.url)
  } else {
    console.log('Error outside the scope of an HTTP request!')
  }
  console.log(err.stack)
  process.exit(1)
})

The app

To test this code, let’s build a simple app that requires the async-hook.js file and creates a simple HTTP server that performs an async operation:

require('./async-hook')
var http = require('http')

http.createServer(function (req, res) {
  console.log(req.method, req.url)

  fetchData(function (err, result) {
    if (err) throw err
    res.end(result)
  })
}).listen(3000)

function fetchData (callback) {
  // perform async operation...
  setTimeout(function () {
    maybeFail()
    callback(null, 'Hello World!')
  }, 2000)
}

// will fail 1 out of 4 times
function maybeFail () {
  if (Math.random() < 0.25) throw new Error('Boom!')
}

Notice how the async fetchData function has a 25% risk of throwing an exception every time a request is received by the HTTP server.

Now let’s take the app for a spin. After starting the app, let’s send 10 simultaneous HTTP requests to the server and see which of them fail:

$ for i in {1..10}; do curl -s http://localhost:3000/{$i} &; done

Example output from running the app:

$ node app.js
GET /1
GET /3
GET /2
GET /4
GET /5
GET /6
GET /10
GET /9
GET /7
GET /8
Error during HTTP request: GET /4
Error: Boom!
    at maybeFail (/Users/watson/code/async-hook/app.js:23:35)
    at null.<anonymous> (/Users/watson/code/async-hook/app.js:16:5)
    at args.(anonymous function) [as _onTimeout] (/Users/watson/code/async-hook/async-hook.js:54:10)
    at Timer.listOnTimeout (timers.js:92:15)

As you can see, the output clearly shows 10 incoming HTTP requests and that the 4th request to GET /4 failed with an uncaught exception.

What’s next?

Today, patching Node core is necessary to achieve the level of insights we want to give our customers. Patching always calls for careful consideration and requires constant maintenance. For example, modules that uses native bindings don’t allow for easy patching and overhead has to be kept at an absolute minimum.

The Node.js Tracing Working Group is currently working on a new API in Node core called AsyncWrap that will make instrumenting Node.js less patchy. The AsyncWrap API will be backported to Node.js v4.x when it’s released.

To get instant performance insights, built for Node developers, sign up for Opbeat.

About the author
Thomas Watson is our Node.js lead, speaker and commited open source contributor.
You can follow him on Twitter or GitHub.