von Neumann Architecture
Named variables are not essential to programming. An early and enduring model of a computer is the von Neumann architecture, which distills the computer down to just a few components: a processor, a bank of memory, and a bus transporting data between the processor and memory. As the program executes, the processor reads from and writes to the cells of memory. The cells are uniquely identified by their addresses; no names are needed.
The von Neumann model still underlies modern computing systems, but only a few mainstream languages fully expose memory as an addressable bank of cells. Some languages, like assembly and spreadsheets, primarily use addresses. Others, like C and C++, support both named and addressable memory. Even languages that don't provide addressable memory are built atop the von Neumann model. In such languages, a variable is a cell that has been given a nickname for easy future reference. We reserve a cell in such languages by declaring a variable. When we assign a value to a variable, we are store the value in a cell. When we create an array, we are usually guaranteed a collection of cells that are next to each other in memory, which makes them fast to iterate through.
Sometimes we use a variable name to refer to the cell itself. For example, in the assignment ncountries = 195
, the name ncountries
is used only to locate the cell that needs to be updated, making the variable reference an lvalue. In the expression ncountries - 1
, the location of the cell associated with ncountries
is unimportant. What matters is the value stored in that cell. In this context, the variable is an rvalue.
Historically, an lvalue appeared on the left-hand side of an assignment statement, while an rvalue appeared on the right. These definitions are not adequate, however, as lvalues may appear on the right and rvalues may appear on the left:
numbers[i] = 87; // rvalue i appears on the left
p = &tmp; // & treats tmp as an lvalue
The important distinction today is that in an lvalue context, a variable provides its associated cell's address, and in an rvalue context, it provides its cell's value. Pointers muddy this a bit. A pointer is an address to some place in memory, and the pointer itself is stored at some address. If we want the value at which a pointer points, we treat it as an rvalue. If we want the address of the pointer itself, we treat it as an lvalue.
There are programming language designers who reject the von Neumann architecture. They criticize it for departing from a more mathematical view of equality and identity. Mathematics, for example, doesn't have a bank of memory cells that change over time. When mathematicians equate a name with a value, as in \(\pi = 3.14159\ldots\), they are declaring a synonym: \(\pi\) is and will always be shorthand for \(3.14159\ldots\). When we name a cell in the von Neumann architecture and equate it with a value, we are making a temporary association. We may write the declaration double pi = 3.14159
only to reassign it later with pi = 3.14
. We'll see a language that rejects the mutability of the von Neumann architecture when we discuss Haskell.
Since the mathematical notion of equality has different semantics than variable assignment, some programming languages use an operator other than =
for assignment. ALGOL, Pascal, and Ada use :=
, which is aptly called the walrus operator. Here's an assignment statement in Ada:
Opacity := 0.25;
In R, we may assign variables with the =
operator championed by Fortran and C or the arrow operators:
opacity = 0.25
opacity <- 0.25
0.25 -> opacity
When we ask if two values are equal in mathematics, we are always comparing rvalues. There are no lvalues in mathematics. In a program, however, we may be able to choose between comparing two values as lvalues or as rvalues. Two values have the same lvalue if they are stored at the same address. The two values have the same identity. Two objects have the same rvalue if their cells contain the same value, regardless of where they're stored. Two values of the same identity necessarily have the same rvalue.
Suppose a
and b
are two object references in a Java program. They are compared as lvalues using the ==
operator, and they are compared as rvalues using the equals
method:
if (p == q) {
System.out.println("p and q point to the same object.");
} else if (p.equals(q)) {
System.out.println("p and q hold equivalent values.");
}
Understanding the difference between lvalue and rvalue comparison is an important part of writing programs. Comparing lvalues is extremely fast since just two addresses are compared. But lvalue comparison is only appropriate when identity is what matters. Comparing rvalues is more versatile, allowing two values in different cells to still be equal, but it's also more expensive since you'll need to compare all the bytes associated with the two values.