The long development cycle for v0.12 (nine months and counting, the longest one to date) has given the core team and contributors ample opportunity to introduce a number of performance optimizations. This blog post aims to cover the most notable ones.
Cork support for writable streams
Writable streams now support a “corked” mode, similar to the TCP_CORK and TCP_NOPUSH socket options from
When corked, data written to the stream is queued up until the stream is uncorked again. This lets Node.js combine smaller writes into larger ones, resulting in fewer system calls and TCP roundtrips.
The http module has been updated to use corked mode transparently when sending a chunked request or response body. If you look at strace(1) output often, you will notice more writev(2) and fewer write(2) system calls.
TLS performance improvements
The tls module has been considerably reworked in Node.js v0.12.
In Node.js v0.10, the tls module sits on top of the net module as transform stream that transparently encrypts and decrypts network traffic. Such layering is desirable from an engineering perspective but it introduces overhead – more moving around of memory and many more calls in and out of the V8 VM than are strictly necessary – and gets in the way of optimizations.
That is why in node.js v0.12, the tls module has been rewritten to use libuv directly. It now pulls incoming network traffic directly off the wire and decrypts it without going through intermediate layers.
Non-scientific benchmarks using a null cipher suggest that TLS is now generally 10% faster while consuming less memory. (I should note that the reduced memory footprint may in part be the result of the reworked memory manager, another v0.12 optimization.)
(And, in case you’re wondering, a null cipher is a cipher that doesn’t encrypt the payload; they’re useful for measuring infrastructure and protocol overhead.)
Crypto performance improvements
Several cryptographic algorithms should now be much faster, sometimes *much* faster. A little background:
Cryptography in Node.js is implemented using the OpenSSL library. Algorithms in OpenSSL have portable reference implementations written in C with hand-rolled assembly versions for specific platforms and architectures.
Node.js v0.10 already uses assembly versions for some things and v0.12 greatly expands that. What’s more, AES-NI is now used when it’s supported by the CPU, which most x86 processors produced in the last three or four years do.
On Linux systems, if
grep ^flags /proc/cpuinfo | grep -w aes finds any matches, then your system supports AES-NI. Note that hypervisors like VMWare or VirtualBox may hide CPU capabilities from the guest operating system, including AES-NI.
An amusing result of enabling AES-NI is that an industrial strength cipher such as AES128-GCM-SHA256 is now faster than a no-encryption cipher like NULL-MD5!
Reduced garbage collector strain
A side effect of the multi-context refactoring is that it greatly reduces the number of persistent handles in Node.js core.
A persistent handle is a strong reference to an object on the V8 heap that prevents the object from being reclaimed by the garbage collector until the reference is removed again. (In GC speak, it’s an artificial GC root.)
Node.js uses persistent handles to cache often-used values, like strings or object prototypes. However, persistent handles need a special post-processing step in the garbage collector and as such have an overhead that scales linearly with the number of handles.
As part of the multi-context cleanup work, a great many persistent handles have been eliminated or switched over to a more lightweight mechanism (called ‘eternal handles’; what’s in a name?)
The net effect is that your application spends less time inside the garbage collector and more time doing useful work. Now
v8::internal::GlobalHandles::PostGarbageCollectionProcessing() should show up a great deal less in
node --prof output.
Better cluster performance
The cluster module in node.js v0.10 depends on the operating system to distribute incoming connections evenly among the worker processes.
It turns out that on Solaris and Linux, some workloads cause very unbalanced distributions among the workers. To mitigate that, Node.js v0.12 has switched to round-robin by default. See this blog post for more details.
Faster timers, faster setImmediate(), faster process.nextTick()
setTimeout() and friends now use a time source that is both faster and immune to clock skew. This optimization is enabled on all platforms but on Linux we take it one step further and read the current time directly from the VDSO, thereby greatly reducing the number of gettimeofday(2) and clock_gettime(2) system calls.
setImmediate() and process.nextTick() also saw performance tweaks that add fast paths for dispatch in the general case. Said functions were already pretty fast but now they’re faster still.
StrongLoop Arc is a graphical UI for the StrongLoop API Platform, which includes LoopBack, that complements the slc command line tools for developing APIs quickly and getting them connected to data. Arc also includes tools for building, profiling and monitoring Node apps. It takes just a few simple steps to get started!