Strings
Rust has two string types: the str
type for fixed-length character sequences and the String
type for heap-allocated, growable character sequences. A str
is like an array in that it's fixed in size. String
, on the other hand, is like an ArrayList
. It wraps around a string and adds a bunch of operations, like resizing, inserting, and removing. Both abstractions are provided because sometimes we want the power of String
but other times we don't want to pay its performance costs.
The type str
is used for both string literals and speedy sharing of character data that's wrapped up in a String
. In both these cases, the text is owned by some other entity, and we are merely borrowing it. When we borrow a value, we are given a pointer to the original memory. If the original value has type T
, the borrowed value has type &T
—which we read as “a reference to a T
”. In practice, we never use str
because low-level text is always borrowed. We use &str
, as in this declaration:
let country: &str = "Vanuatu";
The real str
object is stored in the read-only code section of the process. The variable country
is a reference to that memory, a &str
. The Rust community calls a reference to a sequence of values a slice. So &str
is a string slice. Later we will see slices for other collections.
Only a limited set of operations are available on string slices. The contains
, find
, rfind
, starts_with
, and ends_with
methods search for a substring. The lines
method yields an iterator over the lines of the text, much like the lines
function of Haskell. The polymorphic parse
method behaves like Haskell's read
function. The trim
methods yield a new slice that doesn't include leading and trailing whitespace. The split
methods yield an iterator over subslices separated by delimiters. This code splits a list of comma-separated weekdays and prints each day on its own line:
let series = "lunes,martes,miércoles";
let days = series.split(",");
for day in days {
println!("{}", day);
}
The variable series
has type &str
, but here we are calling str
methods on it. A &str
is effectively a pointer, but a str
is an actual text object. Rust automatically dereferences an &str
into a str
.
Where possible, str
methods neither modify nor copy the character data. For example, the trim
method returns a slice that refers to a portion of the exact same memory as the original slice. A similar method in C would have to either mutate the original string or dynamically allocate memory for the trimmed version because C strings must end in a null terminator. Rust slices don't use null terminators. Instead, they mark off a window of memory using a starting pointer and a length. The slice that trim
returns is a pointer that points after the leading whitespace of the original slice and a length that ends the slice right before the trailing whitespace.
When str
operations do need to modify the character data, they typically return a new String
, which causes a heap allocation. The to_uppercase
, to_lowercase
, and replace
methods do this. This code lowers an HTML element name:
let element: &str = "IMG";
let element: String = element.to_lowercase();
println!("{}", element);
The second element
shadows the first and uses a different type. Explicit types are not necessary here, but they emphasize the type change.
Noticeably absent from str
is a subscript operator or a char_at
method. The Rust designers left these out because they take the complexity of Unicode more seriously than most other languages you have used. When we write text.charAt(i)
in Java, we get back a character that has been arrived at through fast but naive offset arithmetic. But human languages are not always encoded using just one character per symbol. Symbols in UTF-8, for example, may be between one and four bytes wide. There's no way to determine what character i
is without interpreting each byte to determine a character's width. Rust makes it difficult to ignore the complexities of human language by forbidding strings from being viewed as arrays of characters that can be randomly accessed in constant time. Rather, they are linked lists that must be traversed in linear time.
That said, if we are certain that we have only single-byte characters, we may iterate through the text using an iterator returned by the bytes
or chars
methods:
let theme = "FLUTE";
// Print characters one per line
for c in theme.chars() {
println!("{}", c);
}
println!("{:?}", theme.chars().nth(3)); // prints Some('T')
The last statement uses the nth
method to retrieve a character at a particular index, emulating the behavior of charAt
. It yields an Option
type.
The String
type provides operations that may alter the size of the character data. The pop
method removes the last character. The push
method appends a character, and push_str
appends another string. The insert
method adds a character and insert_str
a string at an arbitrary index. The index is a usize
. The clear
method empties the string of all its characters. The replace_range
replaces a window of the string with some other string of arbitrary length. The String
type also supports all methods of str
.
Many of the methods of String
require the receiver to be mutable. This code mutates the variable verb
by inserting some extra characters:
let mut verb = String::from("mutate");
verb.insert_str(3, "il");
println!("{}", verb); // prints mutilate