In our last weekly performance tip, we discussed in detail how the Node.js event-loop works as the orchestrator of requests, events and callbacks. We also troubleshot a blocked event-loop which could wreck havoc on application performance. In this week’s post we’ll dive into the fundamentals of garbage collection (GC) in V8 and how it holds the “keys to the kingdom” in the optimization of Node applications. We will also look at some tools to triage GC and memory management issues in V8.
Where did it all start
In some systems or languages, it is up to the application program to manage all the bookkeeping details of allocating memory from the heap and freeing it when it is no longer required. This is known as manual memory management. Manual memory management may be appropriate for small programs, but it does not scale well in general, nor does it encourage modular or object-oriented programming.
Who/What is the Garbage Collector?
V8 embraces garbage collection (GC) also known as managed memory. Built in GC provides huge simplification for developers by not having to explicitly handle memory bookkeeping in code as was done in the “C” era. It reduces a large class of errors and memory leaks, which are typical in large long-running applications and in some cases, it can even improve performance.
Performance wise, it is a wash. In C, allocating (malloc) and freeing objects can be costly, since heap bookkeeping tends to be more complicated. With managed memory, allocation usually means just incrementing a pointer, but you pay for it eventually when you run out of memory and the garbage collector kicks in. The fact is that V8 uses garbage collection for better or worse.
How does V8 Organize the heap?
V8 divides the heap into several different spaces for effective memory management:
New-space: Most objects are allocated here. New-space is small and is designed to be garbage collected very quickly
Old-pointer-space: Contains most objects which may have pointers to other objects. Most objects are moved here after surviving in new-space for a while.
Old-data-space: Contains objects which just contain raw data (no pointers to other objects). Strings, boxed numbers, and arrays of unboxed doubles are moved here after surviving in new-space for a while.
Large-object-space: Contains objects which are larger than the size limits of other spaces. Each object gets its own mmap’d region of memory. Large objects are never moved by the garbage collector.
Code-space: Code objects, which contain JITed instructions, are allocated here. This is the only space with executable memory.
Cell-space, property-cell-space and map-space: Contain Cells, PropertyCells, and Maps, respectively. Each space contains objects which are all the same size and is restricted in pointers, which simplifies collection.
Each space is composed of a set of “pages”. A Page is a contiguous chunk of memory, allocated from the operating system with mmap. Pages are always 1 MB in size and 1 MB aligned, except in large-object-space, where they may be larger.
How does the Garbage Collector work?
A distinguished set of objects are assumed to be reachable or in live scope: these are known as the “roots.” Typically, these include all the objects referenced from anywhere in the call stack (that is, all local variables and parameters in the functions currently being invoked), and any global variables.
Objects are kept in memory while they are accessible from roots through a reference or a chain of references.
Root objects are pointed directly from V8 or the Web browser like DOM elements
The fundamental problem garbage collection solves is to identify dead memory regions (unreachable objects/garbage) which are not reachable through some chain of pointers from an object which is live. Once identified, these regions can be re-used for new allocations or released back to the operating system
To ensure fast object allocation, short garbage collection pauses, and the “no memory fragmentation V8” employs a stop-the-world, generational, accurate, garbage collector. V8 essentially:
stops program execution when performing a garbage collection cycle.
processes only part of the object heap in most garbage collection cycles. This minimizes the impact of stopping the application.
always knows exactly where all objects and pointers are in memory. This avoids falsely identifying objects as pointers which can result in memory leaks.
In V8, the object heap is segmented into many parts; hence If an object is moved in a garbage collection cycle, V8 updates all pointers to the object.
No, no…how does it really work ?
The GC needs to follow pointers in order to discover live objects. Most garbage collection algorithms can migrate objects from one part of memory to another (to reduce fragmentation and increase locality), so we also need to be able to rewrite pointers without disturbing plain old data.
V8 uses “tagged” pointers. Most objects on the heap just contain a list of tagged words, so the garbage collector can quickly scan them, following the pointers and ignoring the integers. Some objects, such as strings, are known to contain only data (no pointers), so their contents do not have to be tagged.
Next, let’s check which algorithms V8 uses to execute garbage collection.
V8 divides the heap into two generations. Objects are allocated in new-space, which is fairly small (between 1 and 8 MB, depending on behavior heuristics). Allocation in new space is very cheap: we just have an allocation pointer which we increment whenever we want to reserve space for a new object. When the allocation pointer reaches the end of new space, a scavenge (minor garbage collection cycle) is triggered, which quickly removes the dead objects from new space.
Scavenging/copying GC is a kind of tracing garbage collection that operates by relocating reachable objects and then reclaiming objects that are left behind, which must be unreachable and therefore dead.
A copying garbage collection relies on being able to find and correct all references to copied objects.
The Scavenge algorithm is great for quickly collecting and compacting a small amount of memory, but it has large space overhead, since we need physical memory backing both to-space and from-space. This is acceptable as long as we keep new-space small, but it’s impractical to use this approach for more than a few megabytes.
Scavenging is supposed to be fast by design. Hence, it is suitable for frequently occurring short GC cycles.
Full GC/mark-sweep & mark-compact
Objects which have survived two minor garbage collections are promoted to “old-space.” Old-space is garbage collected in full GC (major garbage collection cycle), which is much less frequent. A full GC cycle is triggered when we get over a certain amount of memory in old space.
To collect old space, which may contain several hundred megabytes of data, we use two closely related algorithms, Mark-sweep and Mark-compact.
Mark-sweep collection is a kind of tracing garbage collection that operates by marking reachable objects, then sweeping over memory and recycling objects that are unmarked (which must be unreachable), putting them on a free list.
The mark phase follows reference chains to mark all reachable objects. Once marking is complete, we can reclaim memory by either sweeping or compacting. Both algorithms work at a page level. The sweep phase performs a sequential (address-order) pass over memory to recycle all unmarked objects. A mark-sweep collector doesn’t move objects.
The mark phase follows reference chains to mark all reachable objects; the compaction phase typically performs a number of sequential passes over memory to move objects and update references. Due to compaction, all the marked objects are moved into a single contiguous block of memory (or a small number of such blocks); the memory left unused after compaction is recycled. The compacting algorithm also tries to reduce actual memory usage by migrating objects from fragmented pages (containing a lot of small free spaces) to free spaces on other pages
However, the pause time in mark-sweep and mark-compact were found to be high, slowing down overall performance.
Incremental marking & lazy sweeping
In mid-2012, Google introduced two improvements that reduced garbage collection pauses significantly: incremental marking and lazy sweeping.
If the allocation rate is high during incremental GC, the engine may run out of memory before finishing the incremental cycle. If so, the engine must immediately restart a full, non-incremental GC in order to reclaim some memory and continue execution.
After marking, lazy sweeping begins. All objects have been marked live or dead, and the heap knows exactly how much memory could be freed by sweeping. All this memory doesn’t necessarily have to be freed up right away though, and delaying the sweeping doesn’t hurt. Hence instead of one-go, the garbage collector sweeps pages on an as-needed basis until all pages have been swept. At that point, the garbage collection cycle is complete, and incremental marking is free to start again.
Monitoring Garbage Collection with StrongLoop Arc
The Heap Size graph of Arc monitors three key metrics of memory performance over time:
Heap: Current heap size (MB)
RSS: Resident set size (MB)
V8 Full GC: Heap size sampled immediately after a full garbage collection (MB)
These metrics on historical timescales (1/3/6/12/24/48 hrs) help detect patterns for effective GC. A well tuned system will not expect too long running full GC cycles, nor too frequent short GC events. Every GC event essentially stalls processing of requests.
The Heap profiler tool provides detailed break-down of heap allocation size and instance counts of each object over iterative GC cycles. One should watch out for the objects whose count does not relatively bottom out after each GC cycle. Such objects can be leaking memory.
Upcoming features in Garbage Collection
StrongLoop co-founder Ben Noordhuis has also been working on exposing additional GC metrics like per-generation/per-space GC activity. Note: V8 GC is generational with separate spaces for code, data and large objects. Hence, such visibility will be key to be able to troubleshoot real GC problems. Ben is also working on creating Heap Allocation stack tracing in Node. So, keep it tuned to the StrongLoop Blog or our Twitter feed for updates in this space.
StrongLoop Arc is a graphical UI for the StrongLoop API Platform, which includes LoopBack, that complements the slc command line tools for developing APIs quickly and getting them connected to data. Arc also includes tools for building, profiling and monitoring Node apps. It takes just a few simple steps to get started!
- Read about eight exciting new Node v0.12 features and how to make the most of them, from the authors themselves.
- Need training and certification for Node? Learn more about both the private and open options StrongLoop offers.