Rust Disemvowel
This is a simple lab where we'll use Rust to implement the disemvowel
function that we covered in a previous C lab.
What is Rust?
Rust is a systems programming language designed to give the performance of C or C++ while providing the memory safety of garbage collected languages like Java. It does this through a very strict compiler that will refuse to compile code that is not memory safe. This means that rather than using Valgrind when you run your code like you did in C, the compiler will tell you if and where any potential memory problems exist when you compile your code. This is incredibly useful, but also can be frustrating as code that feels like it should work will refuse to compile over what might appear to be small errors.
Currently no other language promises memory safety while delivering the sort of performance seen in Rust, and in situations where speed and safety are the highest concern, Rust is becoming a more popular choice. Of all security bugs in Microsoft software, 70% of them are memory related, and Rust has been identified as a potential solution to these issues. Additionally, companies like Discord have been moving to Rust to gain the performance benefits of the language.
Rust Background
Running code
Cargo is Rust's package manager and build tool. You can run your code in the command prompt by using
cargo run
And you can run the tests by using
cargo test
Rust Variables
Types
The two data types you will be working with in this lab are Strings and Vectors.
Vectors are a variable sized collection that can store data of the same type; it is fairly similar to something like ArrayList
in Java. You can not have a vector with both numbers and strings in it, but you can add or remove as many items as you want to a vector. You can create a vector in a few ways.
//Initializes a vector containing 5 numbers.
let v = vec![1, 2, 3, 4, 5];
//Initializes an empty vector
let v2 = Vec::new();
The String
data type in Rust stores text data, but those are implemented as a vector of characters behind the scenes. You initialize a String
in a number of ways, including.
let s = String::from("Hello World");
Ownership
The main method of ensuring memory safety that Rust uses is ownership. Ownership means that only one variable can own a value at a given time. This is important because it means that when that owner goes out of scope (e.g., when the function returns), the value can be safely dropped. If two variables were pointing to that value we would not be able to drop the value as the other variable could still exist and need the value.
To get around this restriction, borrowing is used to allow a variable to have a read-only pointer or reference to a value. This allows a new variable to take the same value as another without taking ownership. This is often used to pass a value to a function without losing ownership of the original value.
In this snippet, for example, if we didn't use &s
to indicate that calculate_length
is borrowing the string (instead of taking ownership of it), then we wouldn't be able to print s
after the call to calculate_length
because we would have lost ownership over s
.
let s = String::from("Hello");
let len = calculate_length(&s);
println!("{}", s);
fn calculate_length(str: &String) -> usize {
s.len()
}
Panics and Error Handling
Rust has a few methods of handling errors or missing values, but the one we will be focusing on is the Panic. When your code reaches an unrecoverable error, you can use the panic!
macro to terminate the code with a custom error messsage such as:
panic!("There's a problem!");
Disemvowel
As you've done in your prior lab, "Disemvoweling" is the act of removing all the vowels ('a', 'e', 'i', 'o', and 'u', both upper and lowercase) from a piece of text. This time your code will take an input and output file as arguments and write the disemvoweled contents of the input file to the output file. You will finish a few lines of code to read and write to files, then finish the disemvowel function. Make sure your code passes all cargo tests.
Your job here is to write a small Rust program that takes in two command line arguments, each of which are file names. The first should be the file containing the text to disemvowel, and the second should be the name of the file we'll write the disemvowelled text to.
The skeleton code is in src/main.rs
, which includes a number of unit tests that (among other things) demonstrate Rust's ability to work with complex unicode text. See comments in that file for more guidance on what needs to be done to complete the lab.
Running cargo test
should run the tests, and if you want to run it "by hand"
cargo run this.txt that.txt
will run your program with the given command line arguments (this.txt
and that.txt
in the example).
The file input.txt
has a variety of test text, and target_output.txt
has the expected result of disemvowelling input.txt
. If you run
cargo run input.txt my_output.txt
then
diff my_output.txt target_output.txt
should show you any differences between your output and the expected output.
Note: Because Rust can deal with a variety of languages via unicode, the question of what a vowel is, and how it should be handled is actually a little complicated and we don't attempt to deal with it completely here. There are some edge cases where an accented vowel shows up in Rust as two separate chars, and disemvowelling can remove the vowel but leave behind a dangling accent. We leave addressing that for another day.