Growing A (Binary) Tree in Rust
Photo by Toa Heftiba on Unsplash
The code for this post is here
What is a binary tree
A tree is a data structure, familiar to most developers; it is composed at least of a “root” node which is its entry point. Each node can have one or many or no “chidren” nodes, so this is a kind of graph. Unlike other graphs however, “children” nodes in trees won’t have their “parents” as chidrens, nor any “ancestor”. In other words this graph has no circular paths, which creates a “hierarchical” structure.
If we draw this data structure with the root node in the bottom and chidren nodes above it looks like a plant, hence its name. We are going to grow a specific species of tree which has at maximum two “children” per each node: it is called “binary tree”.
Those nodes who don’t have any childred are called “leaves” while those who have are called “branches”.
Binary trees are used in machine learning models or to sort data etc.
A naive example from C
Those who programmed in languages in the C family know that to define a tree each data structure representing a node should “connect” to similar structures; this is why we call them “recursive” data structures.
To avoid infinite “loops” C uses “pointers” to connect a parent node with its childrens, so a first attempt to create this structures in rust can be using references
struct BTree0 {left : & BTree0, right : & BTree0}
In C missing childrens are represented by null
pointers which are quite dangerous, so Rust uses the Option
enumeration to represent possibly missing data. An Option
can be either None
or Some(thing)
.
struct BTree0 {left : Option<& BTree0>, right : Option<& BTree0>}
The compiler need to be sure that each reference to BTree0
node lives as long as the parent node, so we need to add a lifetime signature
struct BTree0<'a> {left : Option<& 'a BTree0<'a>>, right : Option<& 'a BTree0<'a>>}
let’s derive the Debug
trait to print this data structure
#[derive(Debug)] struct BTree0<'a> {left : Option<& 'a BTree0<'a>>, right : Option<& 'a BTree0<'a>>}
Now we can create some structure and print it:
fn main() { // create a leaf node in this scope let test0 : BTree0 = BTree0 { left: None, right: None }; // create the root node and join the leaf let test : BTree0 = BTree0 { left: Some(& test0), right: None }; // print it! println!("Hello, world! {:?}", test); }
Nice huh? Yes but this is not enough.
Why the example doesn’t actually work
Recursive data structures are usually created by recursive function calls, i.e. functions which are calling themselves. Let’s make a toy example.
fn main() { // create a leaf node in this scope let test0 : BTree0=BTree0 { left: None, right: None }; // create a branch calling a recursive function let test1 = add_test(false); // create the root node and join the leaf let test : BTree0 = BTree0 { left: Some(& test0), right: Some(& test1) }; // print it! println!("Hello, world! {:?}", test); } fn add_test<'a>(stop : bool) -> BTree0<'a>{ let value = BTree0 { left: (if stop { // stops recursion None } else { // calls itself but stop after next call let branch = add_test(true); Some (& branch) }), right: None }; return value; }
while the code is syntactically correct the branch
variable does not live as much as the value
which holds it.
Why? because this variable is created in the stack frame of this function call, which will be wiped away as soon as the function returns.
Grow a tree in a box
So the whole point is to create a node outside of the process stack, i.e. in the heap.
The Box
object provided by the standard library does exactly this.
#[derive(Debug)] struct BTree1 {left : Option<Box<BTree1>>, right : Option<Box<BTree1>>}
this code now compiles and works:
fn main() { // create a leaf node in this scope let test0 : BTree1 = BTree1 { left: None, right: None }; // create a branch calling a recursive function let test1 = add_test(false); // create the root node and join the leaf let test : BTree1 = BTree1 { left: Some(Box::new(test0)), right: Some(Box::new(test1)) }; // print it! println!("Hello, world! {:?}", test); } fn add_test<'a>(stop : bool) -> BTree1{ let value = BTree1 { left: ( if stop { // stops recursion None } else { // calls itself but stop after next call let branch = add_test(true); Some (Box::new(branch)) }), right: None }; return value; }
the new
method of the Box
trait creates the object in the heap; the compiler deletes it only when it is no more needed.