Combinatorics of Names

Dear Computer

Chapter 1: Naming Things

Combinatorics of Names

We've divided the concept of a variable into a name and a value. Presumably, a single name is associated with a single value. But what if we don't presume? Let's explore other ways we might combine names and values.

Values without Names

Programs are full of unnamed values, like 1, false, and "move". An unnamed value is a literal. Not only do literals lack a name, they also don't have a usable location.

Names without Values

Likewise, a name might not have a value. We might have forgotten to give it one, or we may have no need of one.

Uninitialized

Some languages allow us to declare a name without initializing it. What happens when we access the missing value? Consider this C program that declares name a and prints its value:

This program compiles and runs. But its behavior is undefined, which means the C standard does not specify what number will be printed. Other languages, like Java and Kotlin, detect references to uninitialized variables at compile time.

What about Ruby?

Symbols

Could you imagine a situation where you wanted just a name without an associated value? Consider this snippet of Ruby that updates an xy-position according to the direction an entity is facing:

ruby
if direction == 1
  y += 1
elsif direction == 2
  x += 1
elsif direction == 3
  y -= 1
else
  x -= 1
end
if direction == 1
  y += 1
elsif direction == 2
  x += 1
elsif direction == 3
  y -= 1
else
  x -= 1
end

The problem with this code is that the direction numbers—the values 1, 2, 3, and an implicit 4—have no inherent meaning. Developers must remember that 1 means north, 2 means east, 3 means south, and 4 means west. Meaning can be restored by using strings as flags instead of numbers:

ruby
if direction == 'north'
  y += 1
elsif direction == 'east'
  x += 1
elsif direction == 'south'
  y -= 1
else
  x -= 1
end
if direction == 'north'
  y += 1
elsif direction == 'east'
  x += 1
elsif direction == 'south'
  y -= 1
else
  x -= 1
end

This reduces the cognitive burden, but it has an impact on performance. Strings are more expensive to compare than integers.

A few languages provide a feature for introducing meaningful names that have no meaningful value and that can be compared quickly—in constant time. These are symbols. In Ruby, a symbol literal is an identifier preceded by a colon. This code is both readable and fast:

ruby
if direction == :north
  y += 1
elsif direction == :east
  x += 1
elsif direction == :south
  y -= 1
else
  x -= 1
end
if direction == :north
  y += 1
elsif direction == :east
  x += 1
elsif direction == :south
  y -= 1
else
  x -= 1
end

The first time the Ruby interpreter encounters a particular symbol, it assigns the symbol a unique integer and stores or interns the association in a global dictionary. Any later references to that symbol in any scope will look up the symbol in the dictionary and evaluate to that same integer.

The particular integer assigned to a symbol is unimportant beyond its uniqueness. If you try to print a symbol, you just see its name. But Ruby does let us inspect the integer with the object_id method.

Symbols are supported in several other languages, including Lisp and JavaScript. A similar effect can be achieved in C, C++, and Java using enums.

Name with Multiple Values

Can one name be associated with multiple values? Chances are you have done this on many occasions. You called it an array. Arrays are stored contiguously in memory, so it might have looked like this in memory:

a diagram showing the cells of an array packed tightly in memorya diagram showing the cells of an array packed tightly in memory

This Ruby statement creates such an array:

Ruby
gift_ideas = ["kidney", "plunger", "corset", "radium"]
gift_ideas = ["kidney", "plunger", "corset", "radium"]

To get at one of the values, we add a subscript or index to the name that serves as an offset from the base cell:

Ruby
puts gift_ideas[3]
puts gift_ideas[3]

Value with Multiple Names

A single value can also be associated with multiple names. Such multiple naming can happen in two different ways. Some languages, namely C and C++, but also Rust and C#, allow additional names through pointers. A pointer is a separate memory cell that stores the address of the value. Consider this C program, that names 17 with both x and p:

C
int x = 17;
int *p = &x;
int x = 17;
int *p = &x;

After these statements, memory will look something like this:

A pointer is an indirect name. It introduces a second value in memory: the address of the first value. To get at the first value, we must dereference the pointer with *p or p->.

Java, Python, and Ruby implement multiple naming through pointers, but their address nature is hidden from the programmer. These constrained pointers are called references. Dereferencing in these languages is automatic.

Other languages, like C++ and Rust, support additional names more directly, with aliases. This C++ program names 17 with both x and y:

C++
int x = 17;
int& y = x;
int x = 17;
int& y = x;

After these statements, memory will look something like this:

Aliases, whose types are followed by &, are called references. Unfortunately, the same term is used for both aliases and constrained pointers, despite their very different mechanics.

The names x and y are synonomous, producing the same value and address:

C++
++x;
++y;

printf("%d == %d\n", x, y);    // prints 19 twice
printf("%p == %p\n", &x, &y);  // prints same two addresses
++x;
++y;

printf("%d == %d\n", x, y);    // prints 19 twice
printf("%p == %p\n", &x, &y);  // prints same two addresses
← Names and ValuesLvalues and Rvalues →