Climbing a (binary) Tree
Photo by Annie Spratt on Unsplash
In this post I will show how to transform a recursive depth first traversal function into an iterator for a binary tree
The complete source code for this post is available here.
After a couple of digressions we’re back to my beloved tree; here are the link of our journey so far:
While in previous posts we traversed and print the content of the tree data structure with a recursive function, this may not be very convenient in the general case: what can we do to make it more general?
- pass a lambda argument to the recursive traversal function
- transform the traversal function into an iterator coroutine
The second approach leverage the whole Rust iterator interface, providing an easy connection with other data structures.
Fixing our tree
Let’s first recall our previous implementation with some small fixes: first is to have a public tree interface which clearly separates from nodes
use std::fmt::Debug; #[derive(Debug)] pub struct Node<T>{ value : T, left : Option<Box<Node<T>>>, right : Option<Box<Node<T>>> } impl<T> Node<T>{ fn new(value : T) -> Node<T>{ Node{ value, left : None, right : None } } } #[derive(Debug,Default)] pub struct Tree<T>{ root: Option<Box<Node<T>>> } impl<T> Tree<T>{ fn new() -> Tree<T>{ Tree{ root : None } } }
the second fix is quite unnecessary but remove some boilerplate code when creating a node
impl<T> From<Node<T>> for Option<Box<Node<T>>>{ fn from(value: Node<T>) -> Self { Some(Box::new(value)) } }
finally let’s copy our simple implementation of the sorting insertion limited to those values which implement a total order trait
impl<T : Ord> Tree<T>{ fn insert(& mut self, value : T ) { match self.root{ None => { self.root = Node::new(value).into(); } Some(ref mut node) => { Tree::<T>::insert_recursive(node, value); } } } fn insert_recursive(node : & mut Node<T>, value : T){ if value > node.value{ match node.right{ None => { node.right = Node::new(value).into(); } Some(ref mut n) => { Tree::<T>::insert_recursive(n, value); } } }else if value < node.value{ match node.left{ None => { node.left = Node::new(value).into(); } Some(ref mut n) => { Tree::<T>::insert_recursive(n, value); } } } } }
Now we’re ready to start the journey.
Iterable and iterators
A data structure is iterable when you can get a suitable iterator from it; this object may be different from the source data structure, as each iterator must keep an internal status.
Iterators have a wealth of useful methods, like map
or filter
which allow to easily create lazy pipelines. If you need you can directly use them in a for
loop.
I personally do not like iterators which allow mutations to the source data structure while looping, so I won’t focus on this subject in this post.
In Rust a struct is iterable if it implements the IntoIter
trait which defines the into_iter
method, returning an iterator.
Iterators are structs which implement the Iterator
trait which defines the next
method. At each call the next
method returns either Some(value)
or None
if the iterator exhausted its sequence of values or nor respectively.
So let’s create some stub for our goal with a couple of caveats:
- we want to have a generic content type
T
in our tree which may possibly have no restriction, so instead of returning it by value we may want to return it as a borrowed reference& T
- the lifetime of these reference must be the same of the tree so if the returned type has lifetime
'a
also the iterator should be have at least the same lifetime
pub struct TreeIter{ // we have to figure out what to put here } impl<'a, T> Iterator for TreeIter<'a, T>{ // this is the type signature of what we are returning type Item = & 'a T; fn next(& mut self) -> Option<Self::Item> { // what do we put here? } } impl<'a , T> IntoIterator for & 'a Tree<T>{ // this is the type signature of what we are returning type Item = & 'a T; // this is the type signature of the iterator type IntoIter = TreeIter<'a, T>; fn into_iter(self) -> Self::IntoIter { // here we create the iterator from a Tree reference } }
Transform recursive into iterative
Ok, this is going to be quite complex.
In order to understand this transformation I will first write a pseudo-assembler sequence showing how a compiler could transform the recursive call of our traversal function
- Set node with input variable
- If node.left null jump to 7
- Push stack frame
- Set return address to 7
- Set input variable to node.left
- Jump to 1
- Print node.value
- If node.right null jump to 13
- Push stack frame
- Set return address to 13
- Set input to node.right
- Jump to 1.
- Pop stack frame
- Jump to return address
Then I will create an iteration which performs an equivalent algorithm: instead of the application stack I need a real stack where I push all the variable bindings and the return address
- While the stack is not empty
- pop address, node
- match address
case A // enter node
- if node.left Some(left)
- push B, node
- push A, left
- else
2.1. push B, node
- if node.left Some(left)
- case B // left explored
- print node.value
- push C, node
- case C // yielded node
- if node.right Some(right)
- push D, node
- push A, right
- else
- push D, node
- if node.right Some(right)
- case D // completed
- no op
This may sound quite redundant but please bear with me as clarity is more important now than optimizations we can add later
Implementing the coroutine object
The more important point we did here is to transform address jump into an enumeration of states, which can then be used when creating an iterator coroutine; the magic step here is composed of two ideas:
- to mess up the execution stack changing the return address
- to return the value instead of printing it
First let’s create an enum representing our return addresses
#[derive(Debug, Copy, Clone)] enum Address{ Enter, LeftCompleted, ValueYield, Completed }
Then we need to host our stack reification into our main coroutine object, each stack frame will contain the return address and our variable environment which luckily is composed of just one variable: the current node.
pub struct TreeIter<'a, T> { stack : Vec<(Address, & 'a Node<T>)> }
in our implementation let’s first have a creator that initialize the stack if any root node is available
impl<'a, T> TreeIter<'a, T>{ // this creator initialize the stack // with the root element if it exists fn new(tree : & 'a Tree<T>) -> TreeIter<'a, T>{ match tree.root { None => { TreeIter{ stack : Vec::new() } } Some(ref node) => { TreeIter{ stack: vec![(Address::Enter, & node)] } } } } }
then we can add the method implementing the coroutine call
impl<'a, T> TreeIter<'a, T>{ // here I cut the creator fn next_item(& mut self) -> Option<& 'a T>{ while let Some((address,node)) = self.stack.pop(){ match address { Address::Enter => { match node.left{ None => { // if no left node jumps to yield stage self.stack.push((Address::LeftCompleted, node)); }, Some(ref left) => { // otherwise set the return address to yield stage // and call recursively self.stack.push((Address::LeftCompleted, node)); self.stack.push((Address::Enter, left)); } } }, Address::LeftCompleted => { // the coroutine step // set the return address to the next sttep and // yield the value self.stack.push((Address::ValueYield, node)); return Some(& node.value); }, Address::ValueYield => { match node.right{ None => { // jump to to end of function self.stack.push((Address::Completed, node)); }, Some(ref right) => { // set the reurn address to end of function // recursive call on the right node self.stack.push((Address::Completed, node)); self.stack.push((Address::Enter, right)); } } }, Address::Completed => { // ok this is just an address }, } } None } }
Wrapping up traits
Now we can return to implement the IntoIter
and Iterator
traits for our tree:
impl<'a, T> Iterator for TreeIter<'a, T>{ type Item = & 'a T; fn next(& mut self) -> Option<Self::Item> { self.next_item() } } impl<'a , T> IntoIterator for & 'a Tree<T>{ type Item = & 'a T; type IntoIter = TreeIter<'a, T>; fn into_iter(self) -> Self::IntoIter { TreeIter::new(self) } }
and we can also test it; here are a couple of details:
- iterators allow us to use
map
andcollect
- as returned values are of type
& i64
we need to clone their value to easily make the test
#[cfg(test)] mod tests { use super::*; #[test] fn create_a_root_node() { let mut tree : Tree<i64>= Tree::new(); tree.insert(8); tree.insert(10); tree.insert(4); tree.insert(6); tree.insert(5); println!("{:?}",tree); let result : Vec<i64> = tree.into_iter() .map(|x| (*x).clone()). .collect(); assert_eq!(result,vec![4,5,6,8,10]); } }
A note about this post and related subjects
When I started my Rust exploration the binary tree was my first experiment.
I soon realized that the subject involved a deep understanding of Rust borrowing rules and that missing coroutines was going to make a depth first iterator a major task, so a single post idea quickly grow up to multiple posts.
While working on this solution I learned a lot and tried to create the simplest possible code. At a certain point in time I tought to create a double linked tree using Rc
and Weak
reference and found a great book on the subject.
Luckily I was able to use just Box
and Vec
to complete an acceptable iterator so I completely dropped doubly linked trees
Hi,
This article is interesting. However it is very hard to read because of the very thin font and the grey on white color scheme. Even with good eyesight I struggle to read it.
Best,
Yes, you are right.
I'm just a beginner with WordPress themes, I'd like to learn more about better ones
The iterator implementation can be simplified. It should be possible to store just a stack of node references, without the enum state. To get there, notice that the LeftCompleted state is an unnecessary intermediate step before ValueYield, and Completed is a no-op, so there's no point pushing it to the stack in the first place. Also, we can push the right child to the stack in Enter instead of waiting until ValueYield, which makes ValueYield unnecessary. So the final algorithm is:
– The stack initially contains the root node (or is empty, if there isn't one)
– next() returns None if the stack is empty. Otherwise, it pops a node from the stack, pushes all its children that aren't None, and yields its value.
Thanks for your comment, there are actually a few optimization which can be done.
Unfortunately the simplification you propose does not fit my goal: I created a depth-first traversal algorithm,
while your proposal looks like a breadth-first traversal.e.g. my algorithm always return the leftmost leaf as the first element while your returns the root as the first element.
Edited on June 2nd, 2024
Ok here I am back with an apology.
We both were talking about depth first traversal but with two different flavours:
I finally had some time to focus on this piece of content.
For a deeper explanation please check here Tree traversal – Wikipedia