Reference Counting

Dear Computer

Chapter 11: Managing Memory

Reference Counting

Tracing garbage collectors periodically run a task that searches for and reclaims garbage. The collector runs separately from the main program flow, which means the collector either pauses the main flow or carefully synchronizes with it. In contrast, reference counting is a more passive garbage collection scheme that frees unreachable objects as part of the normal program flow.

When an object is first allocated on the heap, its pointer is paired with a counter that tracks how many active references point to the object. This reference count is mutated in the following ways across an object's lifetime:

If we can track these four events in an object's lifecycle, we can implement our own automatic memory management scheme. C++ provides hooks for all of these events. When an object is constructed, its constructor is called. When a new object is initialized from another, a special copy constructor is called. When an existing object is reassigned, a special operator= method is called. When an object goes out of scope, its deconstructor is called. We could write our own class that wraps around a heap allocation and tracks the reference count. However, such a class is already available in the C++ standard library: shared_ptr.

Some languages, including Swift and Objective-C, add reference counting automatically to all heap allocations. To get reference counting in C++, we must explicitly wrap pointers up in a shared_ptr. With automatic reference counting (ARC), pointers are implicitly wrapped in a reference-counted abstraction provided by the language's runtime library.

Unlike tracing, reference counting is deterministic. Counting and freeing happens when a function is called or returned from, when the end of a block is reached, or when a reference is assigned. Since the counting is an integrated part of the flow, the main flow of execution is never paused. Also, reference counting is cheap. We store just one extra number per allocation, and those numbers must occasionally get incremented and decremented. Reference counting has a lot going for it.

Alas, reference counting has a serious vulnerability: it may fail to correctly identify garbage. Suppose we have written a doubly-linked list with reference-counted nodes. Node \(n\) holds a reference to node \(n + 1\). And node \(n + 1\) holds a reference to node \(n\). If no other references point at this pair of nodes, then both counts are exactly 1. The two nodes should be garbage, but they are not. Their mutual embrace keeps them afloat in the heap. Any such cycle constitutes a memory leak.

Languages that use reference counting rely on the programmer to prevent cycles from forming. In the case of the doubly-linked list, the forward link may be treated as a strong reference that contributes to the count, while the backward link is treated as a weak reference that doesn't count. If the number of strong references reaches 0, the object is freed, no matter the number of weak references.

We have yet to find a perfect garbage collection scheme. Maybe Rust has something better to offer.

← TracingStatic Ownership →