Reference Counting
Tracing garbage collectors periodically run a task that searches for and reclaims garbage. The collector runs separately from the main program flow, which means the collector either pauses the main flow or carefully synchronizes with it. In contrast, reference counting is a more passive garbage collection scheme that frees unreachable objects as part of the normal program flow.
When an object is first allocated on the heap, its pointer is paired with a counter that tracks how many active references point to the object. This reference count is mutated in the following ways across an object's lifetime:
-
When a reference-counted object is constructed, its count is initialized to 1. For example, suppose the
Playlist
class uses reference counting. This variable declaration and initialization sets the instance's count to 1:The count is stored alongside the rest of the object's state.let playlist = new Playlist("study") // study=1
-
When one reference is assigned to another, as in
a = b
, the count of the object thata
previously referred to is decremented, and the count of the object thatb
refers to is incremented. For example, the counts change as follows when an alias is made forplaylist
:Whenever a reference count is decremented, the object checks to see if the count has reached 0. If it has, it frees its heap memory. If not, some reference still exists and the data must stay in the heap. In the example, the workout list is freed because the second assignment drops its count to 0.let beats = new Playlist("workout") // workout=1 beats = playlist // workout=0, study=2
-
When a reference is passed to a function, as in
f(a)
, the count of the object thata
refers to is incremented because now the function has an additional reference. For example, if eitherplaylist
orbeats
is passed to a function, the count changes as follows:function shuffle(songs) // study=3 // ... shuffle(playlist)
-
When a reference goes out of scope, the count of its object is decremented. References go out of the scope when the block in which they are declared ends or the object to which they belong gets destroyed. For example, after the
shuffle
method returns, thesongs
parameter goes out of scope and the count drops:shuffle(playlist) // study=2
If we can track these four events in an object's lifecycle, we can implement our own automatic memory management scheme. C++ provides hooks for all of these events. When an object is constructed, its constructor is called. When a new object is initialized from another, a special copy constructor is called. When an existing object is reassigned, a special operator=
method is called. When an object goes out of scope, its deconstructor is called. We could write our own class that wraps around a heap allocation and tracks the reference count. However, such a class is already available in the C++ standard library: shared_ptr
.
Some languages, including Swift and Objective-C, add reference counting automatically to all heap allocations. To get reference counting in C++, we must explicitly wrap pointers up in a shared_ptr
. With automatic reference counting (ARC), pointers are implicitly wrapped in a reference-counted abstraction provided by the language's runtime library.
Unlike tracing, reference counting is deterministic. Counting and freeing happens when a function is called or returned from, when the end of a block is reached, or when a reference is assigned. Since the counting is an integrated part of the flow, the main flow of execution is never paused. Also, reference counting is cheap. We store just one extra number per allocation, and those numbers must occasionally get incremented and decremented. Reference counting has a lot going for it.
Alas, reference counting has a serious vulnerability: it may fail to correctly identify garbage. Suppose we have written a doubly-linked list with reference-counted nodes. Node \(n\) holds a reference to node \(n + 1\). And node \(n + 1\) holds a reference to node \(n\). If no other references point at this pair of nodes, then both counts are exactly 1. The two nodes should be garbage, but they are not. Their mutual embrace keeps them afloat in the heap. Any such cycle constitutes a memory leak.
Languages that use reference counting rely on the programmer to prevent cycles from forming. In the case of the doubly-linked list, the forward link may be treated as a strong reference that contributes to the count, while the backward link is treated as a weak reference that doesn't count. If the number of strong references reaches 0, the object is freed, no matter the number of weak references.
We have yet to find a perfect garbage collection scheme. Maybe Rust has something better to offer.