Prime Time

by-pils-GiudN8NZhGY-unsplash-reduced.jpg Photo by By Pils on Unsplash

In this post I will create a prime number iterator to demonstrate how to emulate a coroutine in Rust using a ’Typestate’ finite state machine.

This exercise will also show how to use phantom types and ownership to guarantee proper function call and how to use take to consume a value behind a mutable reference.

This is a second digression from the tree posts I started, but will prove useful with them.

previous posts were

The source code for this post is here

What is a coroutine?

a coroutine is a code unit whose execution can be suspended and resumed multiple times. It reminds of low level interrupts, and it is used among other to implement cooperative concurrent code.

Many languages implement this functionality, including the possibility to receive some value when the coroutine suspend and (in some languages) to send some data when it is resumed.

Another interesting usage of coroutines is to write generators, i.e. iterators which creates their content.

When I write this post, Rust language support for coroutines is not in the stable distribution yet, so I will try to emulate it using a Finite State Machine. Possiby a rusty one.

A Prime example in Python

In order to show the algorithm we are using as an example I will write it in Python first

def prime_coroutine():
    # first we create a list of prime numbers discovered so far
    primes = []
    # let's start with 2
    current = 2
    # loop forever
    while True:
        # check if the current number is prime
        found = True
        for prime in primes:
            # we wish to check all primes which are
            # smaller than the sqrt(current)
            if prime * prime > current:
                break
            # check primality
            if current % prime == 0:
                found = False
                break
        if found:
            primes.append(current)
            # this does the magic
            yield current
        # let's go with the next
        current += 1

# create the coroutine object
coroutine = prime_coroutine()

# each call will restart from the latest yield
print(next(coroutine))
print(next(coroutine))
print(next(coroutine))
2
3
5

Finite Typestate Machine

Rust has a reserved word for yield which is not active yet. The Unstable Rust Book reports the current implementation in the nightly based on a finite state machine .

post015_coroutine_state.png

A finite state machine can be represented by a directed graph where each node represent a state and each edge a transition.

Usually transitions are used to perform some kind of operation: in our case we’d like to return a prime number from 2 upward. Each coroutine call will return the next one. Of course this can be easily done with a struct and a trait but coroutines fit to a more general case.

In our case we want to express the possibility to have our coroutine to

  • arrive in a stop state where it can’t ever exit
  • resume one or multiple time from the suspended state, performing some operation and re-entering the suspended state
  • allow to exit from the suspended state when no more operation can be performed
  • fail its initialization by movig directly to the stop state

Let’s first define a few data types representing our status

struct Uninitialized;
struct Suspended;
struct Completed;

these are empty strcuts we will use as type labels.

We need a structure where to store our coroutine status, this changes depending on the kind of output we are looking for. in our case we want to emulate the prime algorithm in the previous section so we need

  • a vector of prime numbers found so far
  • the number we are checking
use std::marker::PhantomData;

struct PrimesCoroutine<State = Uninitialized>{
    primes : Vec<u64>,
    current : u64,
    state : std::marker::PhantomData<State>,
}

this structure is parametrized with a type which represents our state: the PhantomData field tells the compiler to not use any space for this “label” type, but to consider it when evaluating the type of the object

The whole idea is to use the type checker as a theorem prover which will demonstrate that a well formed program only performs the allowed transitions from the current state.

First we implement the transition from Uninitialized to Suspended. We want to express the following ideas

  • this first step can already return either a value and the suspended state or a failed initialization: this is represented by the Result enumeration
  • the current status is consumed i.e. it is passed to the transition function and cannot be used anymore: this is represented by a signature not using a reference
impl PrimesCoroutine::<Uninitialized>{
    fn init(self) -> Result<(u64, PrimesCoroutine<Suspended>), PrimesCoroutine<Completed>>{
        Ok((
            2,
        PrimesCoroutine{
            primes : self.primes,
            current : 2,
            state : PhantomData,
        }
        ))
    }
}

Then we implement the suspended state which contains most of our algorithm:

  • we want to avoid to look up for numbers that cannot be stored in our u64 so we iterate until u64::MAX
impl PrimesCoroutine<Suspended>{
    fn resume(mut self) -> Result<(u64, PrimesCoroutine<Suspended>), PrimesCoroutine<Completed>>{
        self.primes.push(self.current);
        while self.current < u64::MAX{
            self.current += 1;
            let mut found : bool = true;
            for prime in self.primes.iter(){
                if prime * prime > self.current{
                    // early interruption for square rule
                    break;
                }
                if self.current % prime == 0 {
                    // early interruption for division
                    found = false;
                    break;
                }
            }
            if found {
                // this is a prime number
                return Ok(
                    (self.current
                    ,self)
                )
            }
        }

        Err(
            PrimesCoroutine{
                primes : self.primes,
                current : 0,
                state : PhantomData
            }
        )
    }
}

We may wish to have the list of all primes so far: it is possible to add a “generic” trait implementation

impl<T> PrimesCoroutine<T>{
    fn get_primes(& self) -> & Vec<u64>{
        &self.primes
    }
}

finally it is convenient to have a starting point for our state object: this can be done by creating an associated function

impl PrimesCoroutine{
    fn new() -> PrimesCoroutine<Uninitialized>{
        PrimesCoroutine{
            primes : Vec::new(),
            current : 2,
            state : PhantomData,
        }
    }
}

Now we can test our code

    #[test]
    fn it_works() {
        let primes = PrimesCoroutine::new();
        if let Ok((_result, primes)) = primes.init(){
            let result = primes.resume();
            match result{
                Ok((value,_)) => {assert_eq!(value,3)}
                Err(_) => {panic!("closed stream")}
            }
        }
    }

Iterator

Everybody love iterators.

I first met them when the C++ Standard Template Libray originally designed by Alexander Stepanov became part of standard C++, also exposing some features of parametric polymorphism and functional programming.

The basic idea is to explore a data structure by looping over its contents, without the need to know its internal details. Rust defines the Iterator trait in the standard library. I will now show how to use our coroutine code to implement it.

The first problem to address is to create a place to store the current status of our Finite State Machine: as each state is represented by a different type we need ad enumeration to store them in a single place

enum CoroutineStatus{
    Created(PrimesCoroutine<Uninitialized>),
    Ready(PrimesCoroutine<Suspended>),
    Closed(PrimesCoroutine<Completed>)
}
// this is for convenience
use CoroutineStatus::*;

With this enumeration type we can move the coroutine semantic to an higher level: define a next method for the Created and Ready states which will perform all the required matching, consume the status and optionally return a status

impl CoroutineStatus{
    fn next(self) -> (CoroutineStatus, Option<u64>){
        match self{
            Created(coroutine) => {
                match coroutine.init(){
                    Ok((result, coroutine)) =>{
                        ( Ready(coroutine),
                          Some(result))
                    },
                    Err(coroutine) => {
                        ( Closed(coroutine),
                        None)
                    }
                }
            }
,
            Ready(coroutine) => {
                match coroutine.resume(){
                    Ok((result, coroutine)) =>{
                        ( Ready(coroutine),
                          Some(result))
                    },
                    Err(coroutine) => {
                        ( Closed(coroutine),
                        None)
                    }
                }
            },
            _ => (self, None),
        }
    }
}

Now we can store this value into a structure; values may change in time but there will be an object which we will always refer to for storing our state.

struct Prime{
    coroutine : CoroutineStatus
}

impl Prime{
    fn new() -> Prime{
        Prime{
            coroutine: Created(PrimesCoroutine::new())
        }
    }
}

as usual an associated function can guarantee that a proper initialization is done

Now let’s try to implement the Iterator trait for this struct:

  • we have to define the returned type (will be u64)
  • we have to define a next method which return an Option enumeration:
    • when the iterator returns a value we have Some(value)
    • when the iterator is exhausted it will returnt None
impl Iterator for Prime{
    type Item = u64;
    fn next(& mut self) -> Option<Self::Item>{
        match self.coroutine.next(){
            (status, Some(value)) => {
                self.coroutine = status;
                Some(value)
            },
            (status, None) => {
                self.coroutine = status;
                None
            }
        }
    }
}

But this don’t work!

Why? when we execute next on our coroutine we are consuming this value (this was made by design); but this value is taken from a mutable reference thus invalidanting the content of the pointed struct.

The solution is to temporarily replace the coroutine value with a placeholder (which does not invalidate the referenced struct), then calculate the real next state and finally set it. This can be done with the std::mem::take function.

take returns the content a the mutable reference and subsitute it with a “default” value, thus requiring us to implement the Default trait of the type involved.

As we wish to minimize the memory copied we just add an empty element in our enumeration and choose it as the default value

enum CoroutineStatus{
    Undefined,
    Created(PrimesCoroutine<Uninitialized>),
    Ready(PrimesCoroutine<Suspended>),
    Closed(PrimesCoroutine<Completed>)
}
use CoroutineStatus::*;

// implementing the Default trait
impl Default for CoroutineStatus {
    fn default() -> Self { Undefined }
}

Now we can use take safely

impl Iterator for Prime{
    type Item = u64;
    fn next(& mut self) -> Option<Self::Item>{
        let coroutine = take(& mut self.coroutine);
        match coroutine.next(){
            (status, Some(value)) => {
                self.coroutine = status;
                Some(value)
            },
            (status, None) => {
                self.coroutine = status;
                None
            }
        }
    }
}

An alternative to take

This has been already a long journey.

But what if we want to avoid using take ?

When using our Typestate pattern the state is always consumed to avoid its reuse when invalid at a later time, so take is necessary when we want to store this state in a struct.

The alternative is to create a simpler Finite State Machine using an enumeration for the states and some pattern matching for each transition.

While this works, illegal transitions are detectable only at run time, so we need to create an extra Error state and manage it in our code later. This may be inevitable if you can’t use the std crate, which is the case for embedded code.

More on this point on other posts

marco.p.v.vezzoli

Self taught assembler programming at 11 on my C64 (1983). Never stopped since then -- always looking up for curious things in the software development, data science and AI. Linux and FOSS user since 1994. MSc in physics in 1996. Working in large semiconductor companies since 1997 (STM, Micron) developing analytics and full stack web infrastructures, microservices, ML solutions

You may also like...