The long development cycle for v0.12 (nine months and counting, the longest one to date) has given the core team and contributors ample opportunity to introduce a number of performance optimizations. This blog post aims to cover the most notable ones.
Cork support for writable streams
Writable streams now support a “corked” mode, similar to the TCP_CORK and TCP_NOPUSH socket options from `man tcp`.
When corked, data written to the stream is queued up until the stream is uncorked again. This lets Node.js combine smaller writes into larger ones, resulting in fewer system calls and TCP roundtrips.
The http module has been updated to use corked mode transparently when sending a chunked request or response body. If you look at strace(1) output often, you will notice more writev(2) and fewer write(2) system calls.
TLS performance improvements
The tls module has been considerably reworked in Node.js v0.12.
In Node.js v0.10, the tls module sits on top of the net module as transform stream that transparently encrypts and decrypts network traffic. Such layering is desirable from an engineering perspective but it introduces overhead – more moving around of memory and many more calls in and out of the V8 VM than are strictly necessary – and gets in the way of optimizations.
That is why in node.js v0.12, the tls module has been rewritten to use libuv directly. It now pulls incoming network traffic directly off the wire and decrypts it without going through intermediate layers.
Non-scientific benchmarks using a null cipher suggest that TLS is now generally 10% faster while consuming less memory. (I should note that the reduced memory footprint may in part be the result of the reworked memory manager, another v0.12 optimization.)
(And, in case you’re wondering, a null cipher is a cipher that doesn’t encrypt the payload; they’re useful for measuring infrastructure and protocol overhead.)
Crypto performance improvements
Several cryptographic algorithms should now be much faster, sometimes *much* faster. A little background:
Cryptography in Node.js is implemented using the OpenSSL library. Algorithms in OpenSSL have portable reference implementations written in C with hand-rolled assembly versions for specific platforms and architectures.
Node.js v0.10 already uses assembly versions for some things and v0.12 greatly expands that. What’s more, AES-NI is now used when it’s supported by the CPU, which most x86 processors produced in the last three or four years do.
On Linux systems, if `grep ^flags /proc/cpuinfo | grep -w aes` finds any matches, then your system supports AES-NI. Note that hypervisors like VMWare or VirtualBox may hide CPU capabilities from the guest operating system, including AES-NI.
An amusing result of enabling AES-NI is that an industrial strength cipher such as AES128-GCM-SHA256 is now faster than a no-encryption cipher like NULL-MD5!
Reduced garbage collector strain
A side effect of the multi-context refactoring is that it greatly reduces the number of persistent handles in Node.js core.
A persistent handle is a strong reference to an object on the V8 heap that prevents the object from being reclaimed by the garbage collector until the reference is removed again. (In GC speak, it’s an artificial GC root.)
Node.js uses persistent handles to cache often-used values, like strings or object prototypes. However, persistent handles need a special post-processing step in the garbage collector and as such have an overhead that scales linearly with the number of handles.
As part of the multi-context cleanup work, a great many persistent handles have been eliminated or switched over to a more lightweight mechanism (called ‘eternal handles’; what’s in a name?)
The net effect is that your application spends less time inside the garbage collector and more time doing useful work. Now `v8::internal::GlobalHandles::PostGarbageCollectionProcessing()` should show up a great deal less in `node –prof` output.
Better cluster performance
The cluster module in node.js v0.10 depends on the operating system to distribute incoming connections evenly among the worker processes.
It turns out that on Solaris and Linux, some workloads cause very unbalanced distributions among the workers. To mitigate that, Node.js v0.12 has switched to round-robin by default. See this blog post for more details.
Faster timers, faster setImmediate(), faster process.nextTick()
setTimeout() and friends now use a time source that is both faster and immune to clock skew. This optimization is enabled on all platforms but on Linux we take it one step further and read the current time directly from the VDSO, thereby greatly reducing the number of gettimeofday(2) and clock_gettime(2) system calls.
setImmediate() and process.nextTick() also saw performance tweaks that add fast paths for dispatch in the general case. Said functions were already pretty fast but now they’re faster still.
- What else is new in v0.12? Cluster round-robin load balancing! Read the blog to learn more.
- Watch Bert Belder’s comprehensive video presentation on all the new upcoming features in v0.12
- Ready to develop APIs in Node.js and get them connected to your data? We’ve made it easy to get started either locally or on your favorite cloud, with a simple npm install. Get Started >>
- Do you want to keep up on the latest Node.js news and developments? Sign up for our newsletter, “In the Loop”.