Introducing boomcatch

Posted by Phil Booth

Boomcatch is a standalone, node.js-based beacon server for boomerang, the foremost client-side RUM library.

RUM, boomerang and the W3C Navigation Timing API

‘RUM’ is an acronym for real-user monitoring, the technique of collecting performance metrics directly from your users’ browsers, rather than from test clients running in your own infrastructure (something commonly referred to as ‘synthetic’ monitoring).

RUM has a number of different qualities to synthetic monitoring:

  • It provides real-time visibility of the website performance that users are actually experiencing.

  • It can be used to implement early-warning systems that alert performance problems before they get reported by users.

  • It enables the validation of performance-related fixes in the field, reducing the risk of environmental factors, such as local network performance, interfering with results.

With such goals in mind, the W3C’s Web Performance Working Group produced the Navigation Timing API, which allows JavaScript running in a conformant browser to get precise timings for the various parts of each request and response. Unfortunately, the Navigation Timing API is not implemented in any version of Safari or in older versions of Internet Explorer and Opera, so the statistics that it enables collection of are biased in that regard. Step forward boomerang, which takes timings in a cross-browser fashion and also sends data from the Navigation Timing API when available.

Boomerang works by sending data in the query string of a GET request to a URL of your choosing. This request is straightforward enough to handle when you have a single back-end responsible for your whole site. But, at Science and Education, our front-end resources are shared across a wide range of different sites and back-end technologies. Rather than handle boomerang requests separately in each of those environments, it made sense for us to run a single dedicated server to handle all such requests. To that end, we wrote boomcatch, a standalone server that runs on node.js.

Extensions

A principle aim of boomcatch is to be completely agnostic about what happens to the data after it has been received from boomerang. As such, it contains two extension points that enable you to customise its behaviour: mappers and forwarders.

Mappers transform the data into an appropriate format for whatever subsequent processing is to be performed on it. At the time of writing, one mapper is available out-of-the-box, which produces output ready to be consumed by statsd.

Forwarders do the work of sending the data on to stats consumers. Right now, two forwarders have been implemented, which can send the mapped data over UDP or HTTP.

We expect to add further mappers and forwarders in the near future, but it is also very easy to specify your own custom extensions at runtime, as long as they match the interface described in the readme.

Security

As with any publicly accessible web server, boomcatch may become the target of accidental or intentional abuse.

Accidental attacks occur when your markup is copied for use elsewhere and inadvertently includes the client-side RUM script pointing at your server. This can easily be mitigated by rejecting requests that do not carry an appropriate value in the referer field of the HTTP header. Boomcatch provides a command-line option for specifying a regular expression to match against the referer header for that purpose.

Intentional attacks are more difficult to handle. One approach is rate-limiting requests based on the originating IP address. Typically you would configure this in your load balancer, but boomcatch also implements a fairly crude rate-limiter that can be used if necessary. Rate-limiting is not without problems, however. Although it protects server resources, it does nothing to ensure the integrity of your data. Worse, it can be circumvented completely by proxying beacon requests or distributing them across multiple clients.

A better solution is to generate a unique, single-use nonce for each client, then include that nonce in your beacon request for the server to validate. That way, only genuine clients are able to send data successfully. Boomerang’s addVar function can be called to include the nonce in beacon requests. Validating the nonce is outside the scope of boomcatch though, since it involves reading and updating the nonce database, which itself is coupled to the back-end that serves your website. Instead, an extension point exists that allows you to plug in your own function for validating nonces. Each time a validation function returns false, boomcatch fails the beacon request with an HTTP 400 status.

Running boomcatch from the command line

If you’ve installed node.js, you can install boomcatch globally with npm:

npm install -g boomcatch

You can then start a server with the default options by running boomcatch with no arguments:

boomcatch

By default, boomcatch will map data for statsd and send it over UDP to 127.0.0.1 on port 8125.

These options and others can be overridden on the command line. To list the available command line options, run:

boomcatch --help

At the time of writing, valid options are:

  • --host <name>: Host name to accept HTTP connections on. The default is 0.0.0.0 (any host).

  • --port <port>: Port to accept HTTP connections on. The default is 80.

  • --path <path>: URL path to accept requests to. The default is /beacon.

  • --referer <regex>: HTTP referers to accept requests from. The default is .*.

  • --limit <milliseconds>: Minimum elapsed time to allow between requests from the same IP address. The deault is 0.

  • --silent: Prevent the command from logging output to the console.

  • --validator <path>: Validator used to accept or reject requests. The default is permissive.

  • --mapper <path>: Data mapper used to transform data before forwarding. The default is statsd.

  • --prefix <prefix>: Prefix for mapped metric names. The default is the empty string (no prefix).

  • --forwarder <path>: Forwarder used to send data, loaded with require. The default is udp.

  • --fwdHost <name>: Host name to forward mapped data to. The default is 127.0.0.1.

  • --fwdPort <port>: Port to forward mapped data on. The default is 8125.

Calling boomcatch from your own code

It is also possible to start a boomcatch server programmatically from another node.js project. You should add boomcatch to the dependencies in your project’s package.json before running:

npm install

You can then require boomcatch and call the returned object’s listen method:

var path = require('path'),
    boomcatch = require('boomcatch');

boomcatch.listen({
    host: 'rum.example.com',                  // Defaults to '0.0.0.0'
    port: 8080,                               // Defaults to 80
    path: '/perf',                            // Defaults to '/beacon'
    referer: /^\w+\.example\.com$/,           // Defaults to /.*/
    limit: 100,                               // Defaults to 0
    log: console.log,                         // Defaults to `function () {}`
    validator: path.resolve('./myvalidator'), // Defaults to 'permissive'
    mapper: path.resolve('./mymapper'),       // Defaults to 'statsd'
    prefix: 'mystats.rum.',                   // Defaults to ''
    forwarder: path.resolve('./myforwarder'), // Defaults to 'udp'
    fwdHost: '192.168.50.4',                  // Defaults to '127.0.0.1'
    fwdPort: 5001                             // Defaults to 8125
});

Development

We welcome contributions in the form of pull requests and issues.

If you want to hack on the code, you can clone the git repo:

git clone git@github.com:nature/boomcatch.git

From the project root, you can then install the dependencies:

npm install

Lint the code:

npm run lint

Run the unit tests:

npm test

Please ensure that you have adhered to the contribution guidelines before submitting any pull requests.


Find this post useful, or want to discuss some of the topics?