Climbing a (binary) Tree

annie-spratt-gNI4_88t9Rs-unsplash_reduced.jpg

Photo by Annie Spratt on Unsplash

In this post I will show how to transform a recursive depth first traversal function into an iterator for a binary tree

The complete source code for this post is available here.

After a couple of digressions we’re back to my beloved tree; here are the link of our journey so far:

  1. Growing a (binary)Tree
  2. Growing a (sorting) Tree
  3. Stacking Bits
  4. Prime Time

While in previous posts we traversed and print the content of the tree data structure with a recursive function, this may not be very convenient in the general case: what can we do to make it more general?

  1. pass a lambda argument to the recursive traversal function
  2. transform the traversal function into an iterator coroutine

The second approach leverage the whole Rust iterator interface, providing an easy connection with other data structures.

Fixing our tree

Let’s first recall our previous implementation with some small fixes: first is to have a public tree interface which clearly separates from nodes

use std::fmt::Debug;

#[derive(Debug)]
pub struct Node<T>{
    value : T,
    left : Option<Box<Node<T>>>,
    right : Option<Box<Node<T>>>
}

impl<T> Node<T>{
    fn new(value : T) -> Node<T>{
        Node{
            value,
            left : None,
            right : None
        }
    }
}

#[derive(Debug,Default)]
pub struct Tree<T>{
    root: Option<Box<Node<T>>>
}

impl<T> Tree<T>{
    fn new() -> Tree<T>{
        Tree{
            root : None
        }
    }
}

the second fix is quite unnecessary but remove some boilerplate code when creating a node

impl<T> From<Node<T>> for Option<Box<Node<T>>>{
    fn from(value: Node<T>) -> Self {
        Some(Box::new(value))
    }
}

finally let’s copy our simple implementation of the sorting insertion limited to those values which implement a total order trait

impl<T : Ord> Tree<T>{

    fn insert(& mut self, value : T ) {
        match self.root{
            None => {
                self.root = Node::new(value).into();
            }
            Some(ref mut node) => {
                Tree::<T>::insert_recursive(node, value);
            }
        }
    }

    fn insert_recursive(node : & mut Node<T>, value : T){
        if value > node.value{
            match node.right{
                None => {
                    node.right = Node::new(value).into();
                }
                Some(ref mut n) => {
                    Tree::<T>::insert_recursive(n, value);
                }
            }
        }else if value < node.value{
            match node.left{
                None => {
                    node.left = Node::new(value).into();
                }
                Some(ref mut n) => {
                    Tree::<T>::insert_recursive(n, value);
                }
            }
        }
    }
}

Now we’re ready to start the journey.

Iterable and iterators

A data structure is iterable when you can get a suitable iterator from it; this object may be different from the source data structure, as each iterator must keep an internal status.

Iterators have a wealth of useful methods, like map or filter which allow to easily create lazy pipelines. If you need you can directly use them in a for loop.

I personally do not like iterators which allow mutations to the source data structure while looping, so I won’t focus on this subject in this post.

In Rust a struct is iterable if it implements the IntoIter trait which defines the into_iter method, returning an iterator.

Iterators are structs which implement the Iterator trait which defines the next method. At each call the next method returns either Some(value) or None if the iterator exhausted its sequence of values or nor respectively.

So let’s create some stub for our goal with a couple of caveats:

  1. we want to have a generic content type T in our tree which may possibly have no restriction, so instead of returning it by value we may want to return it as a borrowed reference & T
  2. the lifetime of these reference must be the same of the tree so if the returned type has lifetime 'a also the iterator should be have at least the same lifetime
pub struct TreeIter{
    // we have to figure out what to put here
}

impl<'a, T> Iterator for TreeIter<'a, T>{
    // this is the type signature of what we are returning
    type Item = & 'a T;
    fn next(& mut self) -> Option<Self::Item> {
        // what do we put here?
    }
}

impl<'a , T> IntoIterator for & 'a Tree<T>{
    // this is the type signature of what we are returning
     type Item = & 'a T;
    // this is the type signature of the iterator
     type IntoIter = TreeIter<'a, T>;
     fn into_iter(self) -> Self::IntoIter {
         // here we create the iterator from a Tree reference
     }
}

Transform recursive into iterative

Ok, this is going to be quite complex.

In order to understand this transformation I will first write a pseudo-assembler sequence showing how a compiler could transform the recursive call of our traversal function

  1. Set node with input variable
  2. If node.left null jump to 7
  3. Push stack frame
  4. Set return address to 7
  5. Set input variable to node.left
  6. Jump to 1
  7. Print node.value
  8. If node.right null jump to 13
  9. Push stack frame
  10. Set return address to 13
  11. Set input to node.right
  12. Jump to 1.
  13. Pop stack frame
  14. Jump to return address

Then I will create an iteration which performs an equivalent algorithm: instead of the application stack I need a real stack where I push all the variable bindings and the return address

  1. While the stack is not empty
    1. pop address, node
    2. match address
      1. case A // enter node

        1. if node.left Some(left)
          1. push B, node
          2. push A, left
        2. else

        2.1. push B, node

      2. case B // left explored
        1. print node.value
        2. push C, node
      3. case C // yielded node
        1. if node.right Some(right)
          1. push D, node
          2. push A, right
        2. else
          1. push D, node
      4. case D // completed
        1. no op

This may sound quite redundant but please bear with me as clarity is more important now than optimizations we can add later

Implementing the coroutine object

The more important point we did here is to transform address jump into an enumeration of states, which can then be used when creating an iterator coroutine; the magic step here is composed of two ideas:

  1. to mess up the execution stack changing the return address
  2. to return the value instead of printing it

First let’s create an enum representing our return addresses

#[derive(Debug, Copy, Clone)]
enum Address{
    Enter,
    LeftCompleted,
    ValueYield,
    Completed
}

Then we need to host our stack reification into our main coroutine object, each stack frame will contain the return address and our variable environment which luckily is composed of just one variable: the current node.

pub struct TreeIter<'a, T> {
    stack : Vec<(Address, & 'a Node<T>)>
}

in our implementation let’s first have a creator that initialize the stack if any root node is available

impl<'a, T> TreeIter<'a, T>{
    // this creator initialize the stack
    // with the root element if it exists
    fn new(tree : & 'a Tree<T>) -> TreeIter<'a, T>{
        match tree.root {
            None => {
                TreeIter{
                    stack : Vec::new()
                }
            }
            Some(ref node) => {
                TreeIter{
                    stack: vec![(Address::Enter, & node)]
                }
            }
        }
    }
}

then we can add the method implementing the coroutine call

impl<'a, T> TreeIter<'a, T>{

    // here I cut the creator

    fn next_item(& mut self) -> Option<& 'a T>{
        while let Some((address,node)) = self.stack.pop(){
            match address {
                Address::Enter => {
                    match node.left{
                        None => {
                            // if no left node jumps to yield stage
                            self.stack.push((Address::LeftCompleted, node));
                        },
                        Some(ref left) => {
                            // otherwise set the return address to yield stage
                            // and call recursively
                            self.stack.push((Address::LeftCompleted, node));
                            self.stack.push((Address::Enter, left));
                        }
                    }
                },
                Address::LeftCompleted => {
                    // the coroutine step
                    // set the return address to the next sttep and
                    // yield the value
                    self.stack.push((Address::ValueYield, node));
                    return Some(& node.value);
                },
                Address::ValueYield => {
                    match node.right{
                        None => {
                            // jump to to end of function
                            self.stack.push((Address::Completed, node));
                        },
                        Some(ref right) => {
                            // set the reurn address to end of function
                            // recursive call on the right node
                            self.stack.push((Address::Completed, node));
                            self.stack.push((Address::Enter, right));
                        }
                    }
                },
                Address::Completed => {
                    // ok this is just an address
                },
            }
        }
        None
    }
}

Wrapping up traits

Now we can return to implement the IntoIter and Iterator traits for our tree:

impl<'a, T> Iterator for TreeIter<'a, T>{
    type Item = & 'a T;
    fn next(& mut self) -> Option<Self::Item> {
        self.next_item()
    }
}

impl<'a , T> IntoIterator for & 'a Tree<T>{
     type Item = & 'a T;
     type IntoIter = TreeIter<'a, T>;
     fn into_iter(self) -> Self::IntoIter {
         TreeIter::new(self)
     }
}

and we can also test it; here are a couple of details:

  • iterators allow us to use map and collect
  • as returned values are of type & i64 we need to clone their value to easily make the test
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn create_a_root_node() {
        let mut tree : Tree<i64>= Tree::new();
        tree.insert(8);
        tree.insert(10);
        tree.insert(4);
        tree.insert(6);
        tree.insert(5);
        println!("{:?}",tree);
        let result : Vec<i64> = tree.into_iter()
            .map(|x| (*x).clone()).
            .collect();
        assert_eq!(result,vec![4,5,6,8,10]);
    }
}

A note about this post and related subjects

When I started my Rust exploration the binary tree was my first experiment.

I soon realized that the subject involved a deep understanding of Rust borrowing rules and that missing coroutines was going to make a depth first iterator a major task, so a single post idea quickly grow up to multiple posts.

While working on this solution I learned a lot and tried to create the simplest possible code. At a certain point in time I tought to create a double linked tree using Rc and Weak reference and found a great book on the subject.

Luckily I was able to use just Box and Vec to complete an acceptable iterator so I completely dropped doubly linked trees

marco.p.v.vezzoli

Self taught assembler programming at 11 on my C64 (1983). Never stopped since then -- always looking up for curious things in the software development, data science and AI. Linux and FOSS user since 1994. MSc in physics in 1996. Working in large semiconductor companies since 1997 (STM, Micron) developing analytics and full stack web infrastructures, microservices, ML solutions

You may also like...

4 Responses

  1. Soso says:

    Hi,

    This article is interesting. However it is very hard to read because of the very thin font and the grey on white color scheme. Even with good eyesight I struggle to read it.

    Best,

  2. Caleb Sander Mateos says:

    The iterator implementation can be simplified. It should be possible to store just a stack of node references, without the enum state. To get there, notice that the LeftCompleted state is an unnecessary intermediate step before ValueYield, and Completed is a no-op, so there's no point pushing it to the stack in the first place. Also, we can push the right child to the stack in Enter instead of waiting until ValueYield, which makes ValueYield unnecessary. So the final algorithm is:
    – The stack initially contains the root node (or is empty, if there isn't one)
    – next() returns None if the stack is empty. Otherwise, it pops a node from the stack, pushes all its children that aren't None, and yields its value.

    • Thanks for your comment, there are actually a few optimization which can be done.

      Unfortunately the simplification you propose does not fit my goal: I created a depth-first traversal algorithm, while your proposal looks like a breadth-first traversal.
      e.g. my algorithm always return the leftmost leaf as the first element while your returns the root as the first element.

      Edited on June 2nd, 2024

      Ok here I am back with an apology.

      We both were talking about depth first traversal but with two different flavours:

      1. I was interested in *post-order* depth first traversal
      2. you proposed a working solution for *pre-order* depth first traversal

      I finally had some time to focus on this piece of content.

      For a deeper explanation please check here Tree traversal – Wikipedia