At StrongLoop, we develop tools to support development and operations throughout the entire lifecycle of API development. Initially, we released Arc Profiler to help you understand performance characteristics of your Node application. Next, we added Arc Metrics to provide real-time visibility into your staging and production environments. Today we’re announcing the latest Arc module, Tracing, (currently in public Beta) that enables you to perform root cause analysis and triage incidents in production. (You can watch a short overview and demo of the Tracing module here.)
Those who have cut their teeth on the seemingly endless iterations in the dev lifecycle will understand this satirical spin on Dorothy’s line from the Wizard of Oz:
“Toto, I don’t think we’re in staging anymore… There’s no place like production… There’s no place like production…”
Simulated load, automated testing, and all the CI magic in the world won’t prepare you for the “gotchas” that can happen in production. If you’re lucky, you’ll have a canary that keels over as soon as you enter the production mine. But what then?
The answer is Tracing. The Arc Tracing module provides the ability to call in the artillery when you need it. If you see something of interest in Metrics, open up Tracing and you’ll be shown a timeline of memory and CPU usage.
Understanding the Timeline
Locate the point of interest—more often than not a CPU or memory spike in the form of a peak in the line—and start to drill down. When you’ve located the incident, you’ll want to freeze the time slice by clicking on the chart drawing your black line denoting a time slice and starting your drill down.
Level 1: Trace sequences
The first level you’ll see when drilling down is a listing of trace sequences. If you think about the tiers of your application architecture, trace sequences represent these tiers where your application may be misbehaving.
- Spent on generating the request from the client.
- Receiving the client request and running and business logic within the Node.js based API tier.
- Querying and or updating the persistent store.
In the screenshot above, the Node.js app is serving up a HTTP response to a GET request and that’s the only operation it does to consume computing resources.
Level 2: Waterfalls
The second level of the drilldown will give you sense of how much code is running synchronously vs asynchronously. Why? Because you’ll want to quickly rule out any of the usual suspects that bog down the Node application. How many times have you thought you required a module that runs operations asynchronously with a callback only to find out that behind the scenes it was blocking synchronously and serially for a set of operations? Waterfalls stitch together time spent in code across asynchronous boundaries.
A waterfall consists of two parts represented as separate bars:
- Top Bar – time executing the function call
- Bottom Bar – time spent executing the callback function
While hovering over each waterfall bar, you can view its corresponding details in the Inspector pane on the right. As you can see in the screenshot above – the culprit for the memory leak is the aptly named –
Level 3: Flame graph and call stack trace
The third and final level provides a visualization in the form of a flame graph. A flame graph represents all function calls and callbacks depicted in the classic pineapple upside cake (in our case, it’s right-side up) with the encapsulating function at the bottom and it’s nested calls stacked on top. Each Node module is color coded differently. The flame graph will show you where time is spent within the application. The screenshot above shows various Express functions culminating in the
The function call stack trace is also shown in tree form as well below the flame graph. If you hover over the flame graph, it’ll highlight the corresponding function line on the tree and vice versa.
A couple of things you should know:
- Tracing incurs 10-15% overhead on your processes.
- Tracing occurs in 20 second intervals and can display up to 5 hours on the timeline.
- Trace Sequences have been augmented with the same Strong Agent probes utilized in Arc Metrics. This means that we can show you time spent in MySQL, PostgreSQL, Oracle, Redis, MongoDB, Memcached, and Memcache.
- Get started with transaction tracing by signing up for StrongLoop Arc and unlocking the tracing module by contacting us at firstname.lastname@example.org.
- Read the in-depth “Node.js Transaction Tracing by Example” blog
- Want to see the tracing module in action? Sign up for our free “Node.js Transaction Tracing” webinar happening on Friday, June 26 at 10 AM Pacific.
- Learn more about how tracing works in the official documentation.