Growing A (Binary) Tree in Rust

toa-heftiba-W1yjvf5idqA-unsplash_reduced.jpg

Photo by Toa Heftiba on Unsplash

The code for this post is here

What is a binary tree

A tree is a data structure, familiar to most developers; it is composed at least of a “root” node which is its entry point. Each node can have one or many or no “chidren” nodes, so this is a kind of graph. Unlike other graphs however, “children” nodes in trees won’t have their “parents” as chidrens, nor any “ancestor”. In other words this graph has no circular paths, which creates a “hierarchical” structure.

post011_rust_tree.png

If we draw this data structure with the root node in the bottom and chidren nodes above it looks like a plant, hence its name. We are going to grow a specific species of tree which has at maximum two “children” per each node: it is called “binary tree”.

Those nodes who don’t have any childred are called “leaves” while those who have are called “branches”.

Binary trees are used in machine learning models or to sort data etc.

A naive example from C

Those who programmed in languages in the C family know that to define a tree each data structure representing a node should “connect” to similar structures; this is why we call them “recursive” data structures.

To avoid infinite “loops” C uses “pointers” to connect a parent node with its childrens, so a first attempt to create this structures in rust can be using references

struct BTree0 {left : & BTree0, right : & BTree0}

In C missing childrens are represented by null pointers which are quite dangerous, so Rust uses the Option enumeration to represent possibly missing data. An Option can be either None or Some(thing).

struct BTree0 {left : Option<& BTree0>, right : Option<& BTree0>}

The compiler need to be sure that each reference to BTree0 node lives as long as the parent node, so we need to add a lifetime signature

struct BTree0<'a> {left : Option<& 'a BTree0<'a>>, right : Option<& 'a BTree0<'a>>}

let’s derive the Debug trait to print this data structure

#[derive(Debug)]
struct BTree0<'a> {left : Option<& 'a BTree0<'a>>, right : Option<& 'a BTree0<'a>>}

Now we can create some structure and print it:

fn main() {
    // create a leaf node in this scope
    let test0 : BTree0 = BTree0 { left: None, right: None };

    // create the root node and join the leaf
    let test : BTree0 = BTree0 { left: Some(& test0), right: None };

    // print it!
    println!("Hello, world! {:?}", test);
}

Nice huh? Yes but this is not enough.

Why the example doesn’t actually work

Recursive data structures are usually created by recursive function calls, i.e. functions which are calling themselves. Let’s make a toy example.

fn main() {
    // create a leaf node in this scope
    let test0 : BTree0=BTree0 { left: None, right: None };

    // create a branch calling a recursive function
    let test1 = add_test(false);

    // create the root node and join the leaf
    let test : BTree0 = BTree0 { left: Some(& test0), right: Some(& test1) };

    // print it!
    println!("Hello, world! {:?}", test);
}

fn add_test<'a>(stop : bool) -> BTree0<'a>{
    let value = BTree0 {
        left: (if stop {
                   // stops recursion
                   None
               } else {
                   // calls itself but stop after next call
                   let branch = add_test(true);
                   Some (& branch)
               }),
        right: None
    };
    return value;
}

while the code is syntactically correct the branch variable does not live as much as the value which holds it.

Why? because this variable is created in the stack frame of this function call, which will be wiped away as soon as the function returns.

Grow a tree in a box

So the whole point is to create a node outside of the process stack, i.e. in the heap.

The Box object provided by the standard library does exactly this.

#[derive(Debug)]
struct BTree1 {left : Option<Box<BTree1>>, right : Option<Box<BTree1>>}

this code now compiles and works:

fn main() {
    // create a leaf node in this scope
    let test0 : BTree1 = BTree1 { left: None, right: None };

    // create a branch calling a recursive function
    let test1 = add_test(false);

    // create the root node and join the leaf
    let test : BTree1 = BTree1 { left: Some(Box::new(test0)), right: Some(Box::new(test1)) };

    // print it!
    println!("Hello, world! {:?}", test);
}

fn add_test<'a>(stop : bool) -> BTree1{
    let value = BTree1 {
        left: (
            if stop {
                // stops recursion
                None
            } else {
                // calls itself but stop after next call
                let branch = add_test(true); Some (Box::new(branch))
            }),
        right: None
    };
    return value;
}

the new method of the Box trait creates the object in the heap; the compiler deletes it only when it is no more needed.

marco.p.v.vezzoli

Self taught assembler programming at 11 on my C64 (1983). Never stopped since then -- always looking up for curious things in the software development, data science and AI. Linux and FOSS user since 1994. MSc in physics in 1996. Working in large semiconductor companies since 1997 (STM, Micron) developing analytics and full stack web infrastructures, microservices, ML solutions

You may also like...