In last week’s performance tip, we discussed in detail how to diagnose memory leak problems with Node.js applications. In this go around we look at CPU profiling and how a good diagnosis can lead us to find the performance hotspots.
Node applications are designed to maximize throughput and efficiency, using non-blocking I/O and asynchronous events. Node apps essentially run single-threaded, even though file and network events could leverage multiple threads. This architecture thereby binds the performance of each application instance/process to one logical CPU core that the thread is attached to. So, in case of a CPU running hot, throwing additional cores into the mix and using clustering, may enable you to scale horizontally. However, it does not resolve the fundamental bottleneck. Profiling the CPU under load is the go to diagnosis technique in this scenario.
In this blog we’ll show you how to use StrongLoop Arc to perform CPU profiling on your apps. What’s Arc? It’s a performance monitoring and DevOps console specifically for Node.
The V8 internal profiler
Google’s V8 has built-in sample based profiling. Profiling is turned off by default, but can be enabled via the
Client-side CPU profiling
Server-side CPU profiling
The V8 profiler has been extended by StrongLoop to provide deep diagnostics and visual snapshotting across any application process in both clustered and non-clustered mode at any point of time. Take for example a Node app which is running as a single master, two worker process cluster. Let’s startup the application, put some load on it and run the profiler.
Startup the cluster
Here we are using the
slc run command with clustering option and specifying the number of CPU cores to be attached.
Here we are using a Jmeter script to simulate 100 simultaneous users on the clustered app. We’ll go with an ideal case, where each application instance/CPU core is handling a 50 users workload.
Profiling the master
As expected, we can see that the master process is spending most of the CPU time in idle state, with root process consuming less than 1 millisecond of CPU profile time.
Profiling the first worker
The first worker process/CPU core shows about 30% utilization during the profiling period. Hovering over the concentric CPU cycle rings show us that there are 3 main segments of code which can be potential hotspots. The first segment covered here shows the times spent in the connect module of express framework. The plot is provided as a dependency tree. Here we are highlighting the time spent on
multipart.js class constructor (line 92) which is 551.68 ms as compared to the entire call tree of 22, 926.61 ms or around 0.025% of the stack
The second segment covered here shows the times spent in the MySQL calls. The call stack would initially start with driver calls and eventually end with queries. Here we are highlighting the time spent in
Query.js class EofPacket() function (line 109) which is 922.70 ms as compared to the entire call tree of 9017.29 ms or around 10% of that stack.
The third segment covered here shows the times spent in the
loadtest calls. This seems to be the top CPU hotspot of the application. Here we are highlighting the time spent in
dataoperations.js class, function
aboutToSendResponse() (line 111) which is almost 99% of the call stack and seemingly looks like a wait sequence. Let us see if we see a similar behavior on the other CPU core attached to this application as the second process.
Profiling the second worker
The second worker process/CPU core shows about 70% overall utilization during the profiling period. Hovering over the concentric CPU cycle rings show us that there are 8-10 main segments of code which can be potential hotspots.
One segment covered here shows the times spent in the memcache modules. The plot is provided as a dependency tree. Here we are highlighting the time spent on
jackpot/index.js class, allocate() function (line 142)…and we could go on to find the other code segments. However, it strikes us that we should be looking at long call chains, not just ones with a single child. This could be of relevance as some functions could be singleton pattern and don’t have too much scope of optimization. Long call chains could be refactored for optimization.
So, collapsing the call chains with single children, we are now able to pinpoint major code segments. Highlighted here is the
http.js file, which has a function
socket.ondata() on line # 1964, which is our top consumer!
StrongLoop Arc is a graphical UI for the StrongLoop API Platform, which includes LoopBack, that complements the slc command line tools for developing APIs quickly and getting them connected to data. Arc also includes tools for building, profiling and monitoring Node apps. It takes just a few simple steps to get started!