Stacking bits
Photo by Daria Nepriakhina 🇺🇦 on Unsplash
We are digressing a little from our previous path about trees to meet a data structure which has a deep connection with (binary) trees.
I will show a very compact (memory efficient) and fast implementation of a binary stack.
In this post I will also show how to use Rust types to implement failures and success as well as some Test Driven Design example.
For those who want to have a look to previous posts, you can check
The source code for this post is here
what is a stack
A stack is a data structure which allow you to retrieve the last datum you put into.
This is very similar to a stack of books on your desk: the most common operations you can do are:
- to add a book on the top; also known as
push
- to look at the topmost book; also known as
top
- to remove the topmost book; also known as
pop
- to count how many books are in our stack;
size
It is also known as Last In First Out queue, sometime written with the LIFO achronym.
defining a binary stack
Instead of creating a generic stack I’d like to focus on a very specific case: a stack which can only hold boolean values.
The signature of the api could be as follows
// warning: this code does not compile struct Stack {} fn push(stack : Stack, value: bool) {} fn top(stack : Stack) -> bool {} fn pop(stack : Stack) -> bool {} fn size(stack : Stack) -> u32 {}
To start this project I chose to create a library:
cargo init --lib .
This creates a lib.rs
file in the src
directory with some example code
Rust allow to create a trait implementation for this structure; I will add the minimum code needed to compile it
pub struct Stack0 {} impl Stack0 { // adding also a constructor pub fn new() -> Stack0 { Stack0 {} } pub fn push(self: & mut Stack0, value: bool) { } pub fn top(self: & Stack0) -> bool { true } pub fn pop(self: & mut Stack0) -> bool { true } pub fn size(self: & Stack0) -> u32 { 1 } }
test first design
What laws should this data structure obey?
Let’s check a few:
- if I
push
a value,pop
should return it - if I
push
a value the size should increase by one - also
top
should return the last thing Ipush
-ed - after
pop
the size should decrease - a new stack should have size 0
Rust has a convenient toolkit to create and execute tests: let’s add a submodule in our library
// this macro is used by compiler to // conditionally compile the next section #[cfg(test)] // this defines a module within current one mod tests { // this statement imports everithing defined // in the upper module use super::*; // this macro identifies a test #[test] fn empty_when_created() { let result = Stack0::new().size(); assert_eq!(result, 0); } }
by executing
cargo test
we will compile the library with the tests and execute them, a report is printed which may look like this:
running 1 test test tests::empty_when_created ... FAILED failures: ---- tests::empty_when_created stdout ---- thread 'tests::empty_when_created' panicked at src/lib.rs:53:9: assertion `left == right` failed left: 1 right: 0 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace failures: tests::empty_when_created test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
we can add more tests in the same module: e.g.
#[test] fn size_increase_when_push() { let mut stack = Stack0::new(); stack.push(false); let result1 = stack.size(); stack.push(true); let result2 = stack.size(); assert_eq!(result2, result1 + 1); }
managing unwanted status
Now I have couple of questions
- What happens when I try to get a book from an empty stack?
- Can a stack grow forever? Or can I decide it is too tall to grow?
Rust uses the Result
enumeration to express this concept; a Result
can be either be Ok(some_value)
or Err(error)
.
Being Result<R, E>
a generic type, I need to specify both the result type R
and the error type E
; the result type will be of course a bool
while for this example I will choose a simple String
for the error.
The stack API can be now modified to tackle our questions:
pub struct Stack1 {} impl Stack1 { pub fn new() -> Stack1 { Stack1 {} } pub fn push(self: & mut Stack1, value: bool) -> Result<bool, String>{ Ok(value) } pub fn top(self: & Stack1) -> Result<bool, String> { Ok(true) } pub fn pop(self: & mut Stack1) -> Result<bool, String> { Ok(true) } pub fn size(self: & Stack1) -> u32 { 1 } }
and so our tests must be adapted; we need to add two rules and their type signature.
#[cfg(test)] mod tests { use super::*; #[test] fn push_and_pop() { let mut stack = Stack1::new(); stack.push(false); stack.push(true); let result1 = stack.pop(); let result2 = stack.pop(); assert_eq!(result1, Ok(true)); assert_eq!(result2, Ok(false)); } #[test] fn size_increase_when_push() { let mut stack = Stack1::new(); stack.push(false); let result1 = stack.size(); stack.push(true); let result2 = stack.size(); assert_eq!(result2, result1 + 1); } #[test] fn push_and_top() { let mut stack = Stack1::new(); stack.push(false); let result1 = stack.top(); stack.push(true); let result2 = stack.top(); assert_eq!(result1, Ok(false)); assert_eq!(result2, Ok(true)); } #[test] fn size_decreases_when_pop() { let mut stack = Stack1::new(); stack.push(false); stack.push(true); let result1 = stack.size(); stack.pop(); let result2 = stack.size(); assert_eq!(result1, result2 - 1); } #[test] fn empty_when_created() { let result = Stack1::new().size(); assert_eq!(result, 0); } #[test] fn empty_does_not_pop() { let mut stack = Stack1::new(); let result = stack.pop(); assert_eq!(result, Err("Empty stack".into())); } }
implementing the stack
The idea here is to use an integer value to host a sequence of bits, each representing a value pushed in the stack.
We can emulate the push behavior by executing a left shift of one and adding 1 only if the input value is true
The following example is made with a 4 bit value just to make it easier to read
e.g.
- stack status is is
0b0001
- stack receive a
PUSH true
- left shift the status; now is
0b0010
- add one to the status; now is
0b0011
- left shift the status; now is
- stack receive a
PUSH false
- left shift the status; now is
0b0110
- left shift the status; now is
the right shift can be used as pop e.g
- stack status is
0b0110
- stack receive a
POP
- evaluate
status & 0b0001
and store the result - right shift the status; now is
0b0011
- return the stored result (is now
false
)
- evaluate
- stack receive a
POP
- evaluate
status & 0b0001
and store the result - right shift the status; now is
0b0001
- return the stored result (is now
true
)
- evaluate
as 0b0000 >> 1 == 0b000
(shift right 0 returns 0) we need a way to know when our stack is empty; a possible way is to define the empty status equal to 0b0001
I will use an unsigned integer of type usize
in this example. The number of bytes of this integer depends on the cpu architecture, it may be 64 or 32 in most cases.
How can we calculate the size of the stack? we use one bit to mark the start point, so if we count all bits after it we have the result.
Rust has two interesting features for this kind of calculation; we know the number of bits of usize
from the constant usize::BITS
and the function usize::leading_zeros
returns the count of the bits before our marker.
here is the full implementation:
pub struct Stack1 { stack: usize } impl Stack1 { pub fn new() -> Stack1 { Stack1 { stack: 1 } } pub fn push(& mut self, value: bool) -> Result<bool, String>{ self.stack = self.stack << 1; if value{ self.stack += 1; } Ok(value) } pub fn top(& self) -> Result<bool, String> { if self.stack == 1 { return Err("Empty stack".into()) } Ok((self.stack & 1) == 1) } pub fn pop(& mut self) -> Result<bool, String> { if self.stack == 1 { return Err("Empty stack".into()) } let result = (self.stack & 1) == 1; self.stack = self.stack >> 1; Ok(result) } pub fn size(& self) -> u32 { usize::BITS - usize::leading_zeros(self.stack) - 1 } }
a last remark on error types and borrow checker rules
It may sound strange to have a very compact and fast implementation like this one with an error type like String
; we are actually using constant strings so a type like &str
should be enough right?
Well, no.
If we use a signature like
fn pop(& mut self) -> Result<bool, & str>;
we are actually using this lifetime signature
fn pop<'a>(& mut 'a self) -> Result<bool, & 'a str>;
This means that we cannot execute something like
let result1 = stack.pop(true); let result2 = stack.pop(true);
because the first call may return an Err(& 'a str)
which is like a borrow of a mutable pointer that cannot be done twice.
So, can we have a more efficient definition of the error type? Sure we can use an enumeration.
But this is for another post.