Please note that as of Aug 3, 2015, StrongOps has been EOL’d. Check out StrongLoop Arc for the same capabilities and more.
A few weeks ago our friend Matt Debergalis from Meteor reached out and said they’d been seeing intermittent slowdowns on Percolate Studio‘s AtmosphereJS, the package management system for Meteor packages. They could reproduce the problem with enough time and load on certain applications and suspected an issue of libuv or event loop interaction with the fibers infrastructure in Meteor. Could StrongLoop lend some of it’s expertise to help get to the root of the problem? To set about investigating, we worked with the Percolate team to leverage the pre-release version of lapse, an upcoming strong-agent feature.
- The strong-agent module is StrongLoop’s monitoring agent that pipes performance data to either the StrongOps console or your favorite visualization tool like DataDog or Graphite.
- Lapse is a tool that monitors the event loop and when it detects a stall, it starts profiling the application with a built-in sampling CPU profiler.
Upon running with lapse for some time they were able to catch the problem. For now lapse creates a log file because we don’t yet have a visualization. In analyzing the log file, the critical portions I noticed looked like this:
ticks parent name
2274 7.3% v8::internal::Isolate::FindOrAllocatePerThreadDataForThisThread()
1325 58.3% LazyCompile: ~<anonymous> packages/meteor.js:683
1325 100.0% LazyCompile: _tickCallback node.js:399
FindOrAllocatePerThreadDataForThisThread() is a small and fast function so for it to show up so prominently, something must be calling it frequently.
node-fibers, a module that implements coroutines and is used by Meteor for its work queue, hacks the thread-local storage to trick V8 into running multiple JS execution contexts in the same VM, where each execution context is mapped to a coroutine.
FindOrAllocatePerThreadDataForThisThread() is involved when switching from one coroutine to another.
Coroutines are cooperative; the current coroutine has to yield control before another one can run and that is what Meteor does in its
process.nextTick() callback; it essentially builds concurrent (but not parallel) green threads on top of a simple round-robin scheduler.
Knowing this and looking at the lapse log we were able to deduce that somehow Meteor’s work queue blocks the event loop every now and then. We can also deduce that it’s not a single work item that is responsible for the stall because none show up in the log file and we wouldn’t be seeing so much activity related to thread-local storage in that case.
From that we can conclude that occasionally the work queue must fill up with many small tasks. But how can that block the event loop for so long?
process.nextTick() has a failsafe mechanism where it will process only so many tick callbacks before deferring the remaining ones to the next event loop tick.
It turns out that the native MongoDB driver disabled the failsafe to silence a warning message that was added in Node v0.10. It is generally inadvisable for libraries to change global settings; the side effects may have unforeseen consequences, like it did in this case.
The workaround turned out to be surprisingly easy: after Meteor switched from
setImmediate(), the problem went away entirely. See the below CPU profile graph below:
The event loop blockage appears to be solved along with a much improved CPU usage profile at the point when the change to
process.nextTick() was deployed. The change in behavior can be seen on 7/4 00:00 hrs. The only CPU spike beyond that point is a forced one for validating that the graphs are actually working!
Additionally, the nextTick issue was previously forcing the Percolate team to restart the webservers to recover them (in the graph this is what causes the CPU to drop back from 100%). This root cause resolution helped them get past the midnight wake up call.
- Learn more about the Atmosphere Package Manager by Percolate Studio
- Learn more about Meteor
- Check out how the strong-agent can help you monitor your Node apps.
- Ready to develop APIs in Node.js and get them connected to your data? Check out the Node.js LoopBack framework. We’ve made it easy to get started either locally or on your favorite cloud, with a simple npm install.