von Neumann Architecture

Dear Computer

Chapter 2: Naming Things

von Neumann Architecture

Named variables are not essential to programming. An early and enduring model of a computer is the von Neumann architecture, which distills the computer down to just a few components: a processor, a bank of memory, and a bus transporting data between the processor and memory. As the program executes, the processor reads from and writes to the cells of memory. The cells are uniquely identified by their addresses; no names are needed.

A computer distilled down to its von Neumann architecture.

The von Neumann model still underlies modern computing systems, but only a few mainstream languages fully expose memory as an addressable bank of cells. Some languages, like assembly and spreadsheets, primarily use addresses. Others, like C and C++, support both named and addressable memory. Even languages that don't provide addressable memory are built atop the von Neumann model. In such languages, a variable is a cell that has been given a nickname for easy future reference. We reserve a cell in such languages by declaring a variable. When we assign a value to a variable, we store the value in a cell. When we create an array, we are usually guaranteed a collection of cells that are next to each other in memory, which makes them fast to iterate through.

Sometimes we use a variable name to refer to the cell itself. For example, in the assignment ncountries = 195, the name ncountries is used only to locate the cell that needs to be updated, making the variable reference an lvalue. In the expression ncountries - 1, the location of the cell associated with ncountries is unimportant. What matters is the value stored in that cell. In this context, the variable is an rvalue.

Historically, an lvalue appeared on the left-hand side of an assignment statement, while an rvalue appeared on the right. These definitions are not adequate, however, as lvalues may appear on the right and rvalues may appear on the left:

C
numbers[i] = 87;  // rvalue i appears on the left
p = &tmp;         // & treats tmp as an lvalue
numbers[i] = 87;  // rvalue i appears on the left
p = &tmp;         // & treats tmp as an lvalue

The important distinction today is that in an lvalue context, a variable provides its cell's address. An lvalue is therefore a locatable value, one that is likely to be assigned to or used as a base address from which other cells are accessed. In an rvalue context, the variable provides its cell's value. Its location is irrelevant.

Pointers muddy this a bit. A pointer is an address to some place in memory, and the pointer itself is stored at some address. If we want the value at which a pointer points, we treat it as an rvalue. If we want the address of the pointer itself, we treat it as an lvalue.

There are programming language designers who reject the von Neumann architecture. They criticize it for departing from a more mathematical view of equality and identity. Mathematics, for example, doesn't have a bank of memory cells that change over time. When mathematicians equate a name with a value, as in \(\pi = 3.14159\ldots\), they are declaring a synonym: \(\pi\) is and will always be shorthand for \(3.14159\ldots\). When we name a cell in the von Neumann architecture and equate it with a value, we are making a temporary association. We may write the declaration double pi = 3.14159 only to reassign it later with pi = 3.14. We'll see a language that rejects the mutability of the von Neumann architecture when we discuss Haskell.

Since the mathematical notion of equality has different semantics than variable assignment, some programming languages use an operator other than = for assignment. ALGOL, Pascal, and Ada use :=, which is aptly called the walrus operator. Here's an assignment statement in Ada:

Ada
Opacity := 0.25;
Opacity := 0.25;

In R, we may assign variables with the = operator championed by Fortran and C, or we may use the arrow operators:

R
opacity = 0.25
opacity <- 0.25
0.25 -> opacity
opacity = 0.25
opacity <- 0.25
0.25 -> opacity

When we ask if two values are equal in mathematics, we are always comparing rvalues. There are no lvalues in mathematics. In a program, however, we may be able to choose between comparing two values as lvalues or as rvalues. Two values have the same lvalue if they are stored at the same address. The two values have the same identity. Two objects have the same rvalue if their cells contain the same value, regardless of where they're stored. Two values of the same identity necessarily have the same rvalue.

Suppose a and b are two object references in a Java program. They are compared as lvalues using the == operator, and they are compared as rvalues using the equals method:

Java
if (p == q) {
  System.out.println("same identity, equivalent values");
} else if (p.equals(q)) {
  System.out.println("different identities, equivalent values");
}
if (p == q) {
  System.out.println("same identity, equivalent values");
} else if (p.equals(q)) {
  System.out.println("different identities, equivalent values");
}

Having two ways of comparing values is a source of bugs. Do we really need both of them? Yes.

Imagine we are developing a program for drawing shapes. Suppose a user draws two circles on top of each other. The circles have the same center and radius, so they are rvalue equivalent. If the user attempts to delete one of the circles, we won't know which to delete if we only consider rvalue equivalence. We might choose one or the other or both. The ambiguity is resolved if we have a notion of lvalue equivalence. We delete the circle that has the matching identity.

Elsewhere in the program, the account creation dialog prompts the user to enter their password twice. If we only consider lvalue equivalence, the two passwords won't ever match because the entered passwords will be at different memory locations. We need rvalue equivalence to compare the characters instead of the addresses.

Comparing lvalues is extremely fast since just two addresses are compared. But lvalue comparison is only appropriate when identity is what matters. Comparing rvalues is more versatile, allowing two values in different cells to still be equal, but it's also more expensive since we compare all the bytes associated with the two values.

← VariablesNaming Practices →