Processes
Operating systems provide several mechanisms for starting up multiple flows of execution. A common one is the process, an instance of a program. When we double-click on a program or run one from the shell, the OS pulls its code from disk into memory and starts it running. Each process is given its own memory space with sections for its code, global and static data, stack, and heap. A child process does not share any memory with the parent process that started it.
We use a process when our program needs to leverage services provided by some other code that was previously compiled into its own executable. Our program either starts up the process or communicates with a server process that is already running. Because processes are insulated from one another, our program can't just call the other process's functions or access its memory. Instead we use communication channels provided by the operating system, such as signals, files, messaging systems, or sockets.
Ruby
Ruby provides several means of starting up a separate process. The system
command accepts the name of the executable and any command-line arguments. This call runs the unzip
utility to unpack a ZIP archive:
system('unzip', 'archive.zip')
With system
, the output of the unzip
process is merged with the output of our script's process. If we need to process the output programmatically, we may use a backtick string to capture it:
firstLine = `head -n 1 #{path}`
A backtick string is evaluated and passed to a shell process as a single command. Care must be taken to avoid injection vulnerabilities. For example, if path
somehow gets assigned "/dev/null; rm -rf /"
, say goodbye to all your files. The system
example above does not exhibit this same vulnerability because each parameter is treated as a single token.
If we need to both write to a process and read from it, we use the popen
function or one of its cousins. The P stands for pipe, which is a file-like communication channel that the operating system makes as a bridge between the two processes. This Ruby program creates a pipe between it and the sha256sum
utility to generate a cryptographic hash:
pipe = IO.popen(['sha256sum', '-'], 'r+')
# Write to other process's stdin.
pipe.puts 'worm peelings'
pipe.close_write
# Read from other process's stdout.
hash = pipe.gets
pipe.close
puts hash
The spawned sha256sum
process reads data from its standard input and writes the hash to its standard output. It doesn't have any clue that both of these are connected to a pipe created by the Ruby script.
Rust
Rust provides the Command
struct for creating new processes. Command-line arguments are passed via one or more calls to its arg
method, input and output are redirected using the stdin
and stdout
methods, and the spawn
method starts up the process. This program accomplishes the same task as the Ruby script but is a bit more verbose because Rust forces programmers to contend with failure:
use std::process::{Command, Stdio};
use std::io::Write;
fn main() {
let process = Command::new("sha256sum")
.arg("--text")
.arg("-")
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.spawn()
.expect("Process failed!");
let mut stdin = process.stdin.as_ref().unwrap();
write!(&mut stdin, "worm peelings").expect("write failed");
let output = process.wait_with_output().expect("wait failed");
let hash = String::from_utf8(output.stdout).expect("bad string");
println!("{}", hash);
}
The separation of processes makes interprocess communication clumsy compared to calling a function within our program. Processes also consume a fair bit of memory since each gets its own independent memory space. There's another vehicle for concurrency and parallelism that suffers neither of these problems: threads.