Prime Time
In this post I will create a prime number iterator to demonstrate how to emulate a coroutine in Rust using a ’Typestate’ finite state machine.
This exercise will also show how to use phantom types and ownership to guarantee proper function call and how to use take
to consume a value behind a mutable reference.
This is a second digression from the tree posts I started, but will prove useful with them.
previous posts were
The source code for this post is here
What is a coroutine?
a coroutine is a code unit whose execution can be suspended and resumed multiple times. It reminds of low level interrupts, and it is used among other to implement cooperative concurrent code.
Many languages implement this functionality, including the possibility to receive some value when the coroutine suspend and (in some languages) to send some data when it is resumed.
Another interesting usage of coroutines is to write generators, i.e. iterators which creates their content.
When I write this post, Rust language support for coroutines is not in the stable distribution yet, so I will try to emulate it using a Finite State Machine. Possiby a rusty one.
A Prime example in Python
In order to show the algorithm we are using as an example I will write it in Python first
def prime_coroutine(): # first we create a list of prime numbers discovered so far primes = [] # let's start with 2 current = 2 # loop forever while True: # check if the current number is prime found = True for prime in primes: # we wish to check all primes which are # smaller than the sqrt(current) if prime * prime > current: break # check primality if current % prime == 0: found = False break if found: primes.append(current) # this does the magic yield current # let's go with the next current += 1 # create the coroutine object coroutine = prime_coroutine() # each call will restart from the latest yield print(next(coroutine)) print(next(coroutine)) print(next(coroutine))
2 3 5
Finite Typestate Machine
Rust has a reserved word for yield
which is not active yet. The Unstable Rust Book reports the current implementation in the nightly based on a finite state machine .
A finite state machine can be represented by a directed graph where each node represent a state and each edge a transition.
Usually transitions are used to perform some kind of operation: in our case we’d like to return a prime number from 2 upward. Each coroutine call will return the next one. Of course this can be easily done with a struct
and a trait
but coroutines fit to a more general case.
In our case we want to express the possibility to have our coroutine to
- arrive in a
stop
state where it can’t ever exit - resume one or multiple time from the
suspended
state, performing some operation and re-entering thesuspended
state - allow to exit from the
suspended
state when no more operation can be performed - fail its initialization by movig directly to the
stop
state
Let’s first define a few data types representing our status
struct Uninitialized; struct Suspended; struct Completed;
these are empty strcuts we will use as type labels.
We need a structure where to store our coroutine status, this changes depending on the kind of output we are looking for. in our case we want to emulate the prime algorithm in the previous section so we need
- a vector of prime numbers found so far
- the number we are checking
use std::marker::PhantomData; struct PrimesCoroutine<State = Uninitialized>{ primes : Vec<u64>, current : u64, state : std::marker::PhantomData<State>, }
this structure is parametrized with a type which represents our state: the PhantomData
field tells the compiler to not use any space for this “label” type, but to consider it when evaluating the type of the object
The whole idea is to use the type checker as a theorem prover which will demonstrate that a well formed program only performs the allowed transitions from the current state.
First we implement the transition from Uninitialized
to Suspended
. We want to express the following ideas
- this first step can already return either a value and the
suspended
state or a failed initialization: this is represented by theResult
enumeration - the current status is consumed i.e. it is passed to the transition function and cannot be used anymore: this is represented by a signature not using a reference
impl PrimesCoroutine::<Uninitialized>{ fn init(self) -> Result<(u64, PrimesCoroutine<Suspended>), PrimesCoroutine<Completed>>{ Ok(( 2, PrimesCoroutine{ primes : self.primes, current : 2, state : PhantomData, } )) } }
Then we implement the suspended
state which contains most of our algorithm:
- we want to avoid to look up for numbers that cannot be stored in our
u64
so we iterate untilu64::MAX
impl PrimesCoroutine<Suspended>{ fn resume(mut self) -> Result<(u64, PrimesCoroutine<Suspended>), PrimesCoroutine<Completed>>{ self.primes.push(self.current); while self.current < u64::MAX{ self.current += 1; let mut found : bool = true; for prime in self.primes.iter(){ if prime * prime > self.current{ // early interruption for square rule break; } if self.current % prime == 0 { // early interruption for division found = false; break; } } if found { // this is a prime number return Ok( (self.current ,self) ) } } Err( PrimesCoroutine{ primes : self.primes, current : 0, state : PhantomData } ) } }
We may wish to have the list of all primes so far: it is possible to add a “generic” trait implementation
impl<T> PrimesCoroutine<T>{ fn get_primes(& self) -> & Vec<u64>{ &self.primes } }
finally it is convenient to have a starting point for our state object: this can be done by creating an associated function
impl PrimesCoroutine{ fn new() -> PrimesCoroutine<Uninitialized>{ PrimesCoroutine{ primes : Vec::new(), current : 2, state : PhantomData, } } }
Now we can test our code
#[test] fn it_works() { let primes = PrimesCoroutine::new(); if let Ok((_result, primes)) = primes.init(){ let result = primes.resume(); match result{ Ok((value,_)) => {assert_eq!(value,3)} Err(_) => {panic!("closed stream")} } } }
Iterator
Everybody love iterators.
I first met them when the C++ Standard Template Libray originally designed by Alexander Stepanov became part of standard C++, also exposing some features of parametric polymorphism and functional programming.
The basic idea is to explore a data structure by looping over its contents, without the need to know its internal details. Rust defines the Iterator trait in the standard library. I will now show how to use our coroutine code to implement it.
The first problem to address is to create a place to store the current status of our Finite State Machine: as each state is represented by a different type we need ad enumeration to store them in a single place
enum CoroutineStatus{ Created(PrimesCoroutine<Uninitialized>), Ready(PrimesCoroutine<Suspended>), Closed(PrimesCoroutine<Completed>) } // this is for convenience use CoroutineStatus::*;
With this enumeration type we can move the coroutine semantic to an higher level: define a next
method for the Created
and Ready
states which will perform all the required matching, consume the status and optionally return a status
impl CoroutineStatus{ fn next(self) -> (CoroutineStatus, Option<u64>){ match self{ Created(coroutine) => { match coroutine.init(){ Ok((result, coroutine)) =>{ ( Ready(coroutine), Some(result)) }, Err(coroutine) => { ( Closed(coroutine), None) } } } , Ready(coroutine) => { match coroutine.resume(){ Ok((result, coroutine)) =>{ ( Ready(coroutine), Some(result)) }, Err(coroutine) => { ( Closed(coroutine), None) } } }, _ => (self, None), } } }
Now we can store this value into a structure; values may change in time but there will be an object which we will always refer to for storing our state.
struct Prime{ coroutine : CoroutineStatus } impl Prime{ fn new() -> Prime{ Prime{ coroutine: Created(PrimesCoroutine::new()) } } }
as usual an associated function can guarantee that a proper initialization is done
Now let’s try to implement the Iterator trait for this struct:
- we have to define the returned type (will be
u64
) - we have to define a
next
method which return anOption
enumeration:- when the iterator returns a value we have
Some(value)
- when the iterator is exhausted it will returnt
None
- when the iterator returns a value we have
impl Iterator for Prime{ type Item = u64; fn next(& mut self) -> Option<Self::Item>{ match self.coroutine.next(){ (status, Some(value)) => { self.coroutine = status; Some(value) }, (status, None) => { self.coroutine = status; None } } } }
But this don’t work!
Why? when we execute next
on our coroutine we are consuming this value (this was made by design); but this value is taken from a mutable reference thus invalidanting the content of the pointed struct.
The solution is to temporarily replace the coroutine value with a placeholder (which does not invalidate the referenced struct), then calculate the real next state and finally set it. This can be done with the std::mem::take
function.
take
returns the content a the mutable reference and subsitute it with a “default” value, thus requiring us to implement the Default
trait of the type involved.
As we wish to minimize the memory copied we just add an empty element in our enumeration and choose it as the default value
enum CoroutineStatus{ Undefined, Created(PrimesCoroutine<Uninitialized>), Ready(PrimesCoroutine<Suspended>), Closed(PrimesCoroutine<Completed>) } use CoroutineStatus::*; // implementing the Default trait impl Default for CoroutineStatus { fn default() -> Self { Undefined } }
Now we can use take
safely
impl Iterator for Prime{ type Item = u64; fn next(& mut self) -> Option<Self::Item>{ let coroutine = take(& mut self.coroutine); match coroutine.next(){ (status, Some(value)) => { self.coroutine = status; Some(value) }, (status, None) => { self.coroutine = status; None } } } }
An alternative to take
This has been already a long journey.
But what if we want to avoid using take
?
When using our Typestate pattern the state is always consumed to avoid its reuse when invalid at a later time, so take
is necessary when we want to store this state in a struct.
The alternative is to create a simpler Finite State Machine using an enumeration for the states and some pattern matching for each transition.
While this works, illegal transitions are detectable only at run time, so we need to create an extra Error
state and manage it in our code later. This may be inevitable if you can’t use the std
crate, which is the case for embedded code.
More on this point on other posts