Processes

Dear Computer

Chapter 13: Concurrency and Parallelism

Processes

Operating systems provide several mechanisms for starting up multiple flows of execution. A common one is the process, an instance of a program. When we double-click on a program or run one from the shell, the OS pulls its code from disk into memory and starts it running. Each process is given its own memory space with sections for its code, global and static data, stack, and heap. A child process does not share any memory with the parent process that started it.

We use a process when our program needs to leverage services provided by some other code that was previously compiled into its own executable. Our program either starts up the process or communicates with a server process that is already running. Because processes are insulated from one another, our program can't just call the other process's functions or access its memory. Instead we use communication channels provided by the operating system, such as signals, files, messaging systems, or sockets.

Ruby

Ruby provides several means of starting up a separate process. The system command accepts the name of the executable and any command-line arguments. This call runs the unzip utility to unpack a ZIP archive:

Ruby
system('unzip', 'archive.zip')

With system, the output of the unzip process is merged with the output of our script's process. If we need to process the output programmatically, we may use a backtick string to capture it:

Ruby
firstLine = `head -n 1 #{path}`

A backtick string is evaluated and passed to a shell process as a single command. Care must be taken to avoid injection vulnerabilities. For example, if path somehow gets assigned "/dev/null; rm -rf /", say goodbye to all your files. The system example above does not exhibit this same vulnerability because each parameter is treated as a single token.

If we need to both write to a process and read from it, we use the popen function or one of its cousins. The P stands for pipe, which is a file-like communication channel that the operating system makes as a bridge between the two processes. This Ruby program creates a pipe between it and the sha256sum utility to generate a cryptographic hash:

Ruby
pipe = IO.popen(['sha256sum', '-'], 'r+')

# Write to other process's stdin.
pipe.puts 'worm peelings'
pipe.close_write

# Read from other process's stdout.
hash = pipe.gets
pipe.close

puts hash

The spawned sha256sum process reads data from its standard input and writes the hash to its standard output. It doesn't have any clue that both of these are connected to a pipe created by the Ruby script.

Rust

Rust provides the Command struct for creating new processes. Command-line arguments are passed via one or more calls to its arg method, input and output are redirected using the stdin and stdout methods, and the spawn method starts up the process. This program accomplishes the same task as the Ruby script but is a bit more verbose because Rust forces programmers to contend with failure:

Rust
use std::process::{Command, Stdio};
use std::io::Write;

fn main() {
    let process = Command::new("sha256sum")
        .arg("--text")
        .arg("-")
        .stdin(Stdio::piped())
        .stdout(Stdio::piped())
        .spawn()
        .expect("Process failed!");

    let mut stdin = process.stdin.as_ref().unwrap();
    write!(&mut stdin, "worm peelings").expect("write failed");

    let output = process.wait_with_output().expect("wait failed");
    let hash = String::from_utf8(output.stdout).expect("bad string");
    println!("{}", hash);
}

The separation of processes makes interprocess communication clumsy compared to calling a function within our program. Processes also consume a fair bit of memory since each gets its own independent memory space. There's another vehicle for concurrency and parallelism that suffers neither of these problems: threads.

← Multiple FlowsThreads →