Programming in a statically typed functional language (think Haskell) gives quite a few safety guarantees as well as a lot of expressivity thanks to the use of higher-order functions. However the type system lets you write some problematic functions like:
loop :: Int -> Int loop n = 1 + loop n
Why is that problematic? For one, this function never terminates, that can't be good. But also, and this is very annoying, we can not anymore use equational reasoning:
loop 0 = 1 + loop 0 // substract (loop 0) from both sides, and use `x - x = 0` 0 = 1
0 = 1 can then allow you to prove anything
In this post I'm going to present the ideas of "Strong functional programming" (or "Total functional programming") and show how they can help avoid this situation above where unrestricted recursion goes wrong.
Life without ⊥ (bottom)
Int which is used for the definition of the
loop function is not as simple as it looks. It is actually Int (⊥) where ⊥ (called "bottom") a value denoting run-time errors and non-terminating programs. "Total functional programming" doesn't allow this type of values and this why it is called "Total": functions are always defined on their domain (their "input values"), they cannot return an
Not having to deal with ⊥ has a lot of advantages:
As we've seen in the introduction, we can't really assume that
(e - e = 0) when
(e: Nat) without having to worry about the ⊥ case. When ⊥ is present, proof by induction can not be easily used as read in math textbooks. To refresh your memory, this proof principle states that for any property
if P(0) and for all n, P(n) => P(n + 1) then for all n, P(n)
However if you allow ⊥ to sneak in, this principle becomes a tad more complex.
Simpler language design
With no bottom value, one divide between programming languages just disappears. We don't really care anymore about strict evaluation (as in Scheme, Scala) vs lazy evaluation (as in Haskell), the evaluation of a function doesn't depend on the evaluation of its arguments:
-- a function returning the first argument first a b = a -- with strict evaluation first 1 ⊥ = ⊥ -- with lazy evaluation first 1 ⊥ = 1
But that's not the only difference, another one is pattern matching. In Haskell, when you expect a pair of values, you won't be able to match that against ⊥:
-- will not match if (a, b) is ⊥ first (a, b) = a
You can argue that this is a reasonable language decision to make but that's not the only possible one. In Miranda for instance, the match will always succeed:
-- a bottom value can be "lifted" to a pair of bottom values (⊥a, ⊥b) = ⊥
Same thing goes with the evaluation of operators like
& (logical AND). It is defined on Booleans by:
True & True = True True & False = False False & True = False False & False = False
If you introduce ⊥ you have to define:
⊥ & y = ? x & ⊥ = ?
Most of the programming languages decide to be "left-strict", i.e they evaluate the first argument before trying to evaluate the whole expression:
⊥ & y = ⊥ x & ⊥ = False if x = False x & ⊥ = ⊥ otherwise
Again, that might seem completely obvious to us after years of using this convention, but we can imagine other alternatives (doubly-strict, right-strict, doubly-non-strict), and this breaks the classical symmetry of the
& operator in mathematics.
More flexibility in language implementation
Let's introduce a bit of programming languages theory. When you define a programming language, you need to specify:
how to build language expressions from "atomic" blocks. For example, how to build an
ifexpression from a boolean expression and 2 other expressions
if booleanExpression then expression1 else expression2
how to "execute" the language expressions, usually by specifying how to replace values in expressions with other values. For example using a specific boolean value in an
a = true
if a then b else c => c
As you can see, executing a program means "reducing" large expressions to smaller ones, using specific rules. This raises some immediate questions:
can we reach a final, irreducible, expression (a normal form)?
if several "reduction paths" are possible (by using the reduction rules in different ways), do they converge on the same normal form?
If you can show that your programming language satisfies 2. when 1. is true, then you can say that it has the "Church-Rosser" property or it is confluent. Another version of this property, the "strong" form says that every reduction leads to a normal form and it is unique whatever reduction path you took.
Total functional programming can be called "Strong" because it has this property. The consequence is that the language implementation is free to select various evaluation strategies, for example to get better memory consumption or parallelism, without any fear for changing the program meaning.
Not all green
If Strong Functional Programming is so easy don't we all use it right now? What are the disadvantages then?
Not Turing complete
The first issue with Strong Functional Programming is that it is not Turing complete. It computes programs which always terminate but cannot compute all the programs that always terminate. In particular it can not compute its own interpreter. Why is that?
Let's say the interpreter is a function
eval taking as arguments, a program
P and its input
I (this example comes from here). Any program is a big sequence of symbol and can be coded into a number, so our eval interpreter can be seen as a function :: (Nat, Nat) -> Nat. Based on that I can build an
evil function in my total language:
evil :: Nat -> Nat evil code = 1 + eval code code
But there is also a number corresponding to the
evil program. Let's call that number
-- this means "apply the evil program aka '666' to the value 666" -- by definition of the eval interpreter eval 666 666 = evil 666 -- but if we go back to the definition of evil evil 666 = 1 + (eval 666 666) -- we can't get a value for 'evil 666', because it's equivalent to the equation 0 = 1 evil 666 = 1 + evil 666
evil is a non-terminating program of my language, which is however supposed to terminate because built from
eval (a terminating program by our hypothesis)
This is a troubling conclusion. One way out is to create a hierarchy of languages where each language above has enough power to write an interpreter for the language below. And, at the top of the hierarchy, we use a Turing-complete language. This is not unsimilar to isolating a typed, functional world from a non-typed, side-effecting one (aka "the real world").
Yes, by definition. But non-terminating programs are super useful, you wouldn't want your operating system stop after a few keystrokes saying "I'm done, buddy".
The good news is that we can actually deal with this situation by using codata. But before we look at codata in detail, we're going to enumerate some of the rules of an "elementary" Strong Programming Functional language (or ESFPL).
The rules for non-termination
Let's adopt a Haskell-like syntax for our ESPFL. One thing we must be able to do is to define the "usual" datatypes:
data Bool = True | False data Nat = Zero | Suc Nat data List a = Nil | Cons a (List a) data Tree a = Nilt | Node a (Tree a) (Tree a)
and some functions on those types:
-- size of a Tree size a :: Tree a -> Nat size Nilt = 0 size (Node a l r) = 1 + size l + size r -- filter a List with a function filter f Nil = Nil filter f (Cons a rest) if (f a) = Cons a (filter f rest) otherwise = filter f rest
Rule n.1: all case analysis must be complete
This one should feel obvious to anyone having used data types and pattern matching before. If you forget a case in your pattern matching analysis, you face the risk of a runtime exception:
But this rule has a larger impact. You will also have to change your "standard" library. For example you can't define the function taking the first element of a list,
head, as before. You need to provide a value to use when the list is empty:
-- taking the first element of a list head a :: List a a -> a head Nil default = default head (Cons a rest) default = a
The other alternative is to program with a data type where things can't go wrong:
data NonEmptyList a = NCons a (NonEmptyList a) -- taking the first element of a non-empty list head a :: NonEmptyList a -> a head (NCons a rest) = a
This kind of discipline doesn't seem so harsh, but other difficulties arise with the use of built-in arithmetic operators, starting with
divide. What do you do about
divide 1 0?
There are propositions to build data types where "
1 / 0" is defined and equal to "
1"! It looks so weird to me that I'd certainly try to go with a
NonZeroNat data type before anything else.
Rule n.2: type recursion must be covariant
I like this rule because it was not obvious to me at all when I first read it. What does it mean?
- You can define data types recursively
- You can use functions as values
So you can build the following data type
data Silly a = Very (Silly -> a)
Silly data type I can recurse indefinitely:
bad a :: Silly a -> a bad (Very f) = f (Very f) ouch :: a ouch = bad (Very bad)
ouch value is a non-terminating one, because the evaluation of
bad will use
bad itself with no case for termination. Rule n.2 is here to prohibit this. You can't define a data type with a function that use this data type as an input value. The fact that this rule was not obvious raises questions:
- how do we know that we have found all the rules for ESFP?
- is there a "minimal" set of rules?
I have no answer to those questions unfortunately :-(.
Rule n.3: recursion must be structural
The previous rule was about data type recursion, this one is about function recursion. More precisely the rule says that: "each recursive function call must be on a syntactic sub-component of its formal parameter". This means that if you make a recursive call with the function being defined, it has to be on only a part of the input data, like with the ubiquitous
factorial :: Nat -> Nat factorial Zero = 0 factorial (Suc Zero) = 1 -- we recurse with a sub-component of (Suc n) factorial (Suc n) = (Suc n) * (factorial n)
This rule is not too hard to understand. If you always recurse on data that is provably "smaller" than your input data, you can prove that your function will terminate. This rule can also accomodate the Ackermann function, which has 2
ack :: Nat Nat -> Nat ack 0 n = n + 1 -- m + 1 is a shortcut for (Suc m) ack (m + 1) 0 = ack m 1 ack (m + 1) (n + 1) = ack m (ack (m + 1) n)
By the way, if you don't know the Ackermann function, look it up! It has an amazing growth, with very small numbers. Try to evaluate
ack 4 3 for fun :-)
How restrictive is this rule? Not so much. This rule authorizes to program:
"primitive recursive" functions. Those are the functions, like
factorial, which are defined using only simple recursion (think how
addition (Suc n) mcan be defined in terms of
addition n m) and by composing other primitive functions
other "total recursive" functions, like
ack("total" meaning "terminating" here). Indeed
ackis not "primitive recursive". It was actually especially created to show that not all "total recursive" functions were "primitive recursive"
It turns out that all the functions which we can be proven to terminate by using first-order logic (without
exists) can be programmed with structural recursion. That's a lot of functions, but we have to change or programming practices in order to use only structural recursion. Let's look at an example.
Coding fast exponentiation
We want to code the "power" function. Here's a naive version:
pow :: Nat -> Nat -> pow x n = 1, if n == 0 = x * (pow x (n - 1)), otherwise
It is primitive recursive but not very efficient because we need
n calls to get the result. We can do better than this:
pow :: Nat -> Nat -> Nat pow x n = 1, if n == 0 = x * pow (x * x) (n / 2), if odd n = pow (x * x) (n / 2), otherwise
Unfortunately this version is not primitive recursive, because when we call
pow we're not going from
n + 1 to
n. It is however obvious that we're "descending" and that this algorithm will terminate. How can we re-code this algorithm with primitive recursion?
The trick is to encode the "descent" in a data type:
-- representation of a binary digit data Bit = On | Off -- we assume a built-in bits function bits :: Nat -> List Bit -- then the definition of pow is primitive recursive, because we descend on the Bit data type pow :: Nat -> Nat -> Nat pow x n = pow1 x (bits n) pow1 :: Nat -> List Bit -> Nat pow1 x n = 1 pow1 x (Cons On rest) = x * (pow1 (x * x) rest) pow1 x (Cons Off rest) = pow1 (x * x) rest
David Turner conjectures that many of our algorithms can be coded this way and for cases where it get a bit hairier (like for Euclid's
gcd) we could authorize another type of recursion called "Walther's recursion". With this kind of recursion we can recognize a larger class of programs where recursion is guaranteed to terminate because we only use operations effectively "reducing" the data types (acknowledging that
n / 2 is necessarily lower than
n). Using "Walther's recursion" with a language having functions as first-class values is still an unsolved problem though.
Codata for "infinite" computations
After having looked at the rules for bounding recursion and avoiding non-termination, we would still like to be able to program an operating system. The key insight here is that our functions need to terminate but our data doesn't need to. For example, the
Stream data type is infinite:
data Stream a = Cons a (Stream a)
In this perspective, an operating system is a bunch of functions acting on an infinite stream of input values. Let's call this type of infinite data, "codata", and see how we can keep things under control.
Here's our first definition,
Colist (similar to the
Stream type above):
-- "a <> Colist a" stands for "Cocons a (Colist a)" codata Colist a = Conil | a <> Colist a
Does it breach the "Strong normalization" property that we described earlier? It doesn't if we say that every expression starting with a coconstructor is not reducable (or is in "normal form"), there is nothing which can be substituted. Whereas with a regular
List you can evaluate:
Cons (1 + 2) rest = Cons 3 rest
This is why we need to have a new keyword
codata to explicitly define this type of data. In a way, this is making a strong distinction between "lazy" and "strict" data, instead of using lazy evaluation to implement infinite data types.
The equivalent of Rule n.3 for codata is that all recursion on codata must be "primitive corecursive". You have any idea of what it is? It is kind of the opposite / dual of the structural recursion for data types. Instead of proving that we always reduce the input value when making a recursive call we need to prove that we "augment" the result, by always using a "coconstructor" to create the result:
-- functions on codata must always use a coconstructor for their result function a :: Colist a -> Colist a function a <> rest = 'something' <> (function 'something else')
Then we can define functions like:
ones :: Colist Nat ones = 1 <> ones fibonacci :: Colist Nat fibonacci = f 0 1 where f a b = a <> (fibonacci b (a + b))
Another way of looking at the difference between data/recursion and codata/corecursion is that:
- recursion on data is safe if we break-up the data in smaller pieces then recurse on those pieces
- corecursion on codata is safe if we apply the recursion first then put that in a coconstructor which declares that we have infinite data
Codata is data which is (potentially) infinite, but observable. There are methods to safely extract results, one at the time, in a terminating way. That's exactly what the fibonacci function does. It builds a sequence of results, so that you can safely extract the first one, then you have a way to build the rest of the values.
This is why people also talk about the duality between data and codata as a duality between constructors and destructors. Or a duality between accessing the internals of a data type and observing the behavior of a codata type.
When we try to reason about codata and corecursion, we can use coinduction (you saw that one coming didn't you :-)?). The principle of coinduction is the following:
2 pieces of codata are the same, "bisimilar" in the literature, if:
- their finite part are equal (the "
a" in "
a <> rest")
- their infinite part are the same
In other words, if the 2 sequences always produce the same values. Using that principle we can show the following theorem on infinite structures:
-- iterate a function: x, f x, f (f x), f (f (f x)),... iterate f x = x <> iterate f (f x) -- map a function on a colist comap f Conil = Conil comap f a <> rest = (f a) <> (comap f rest)
Now, we'd like to show that
iterate f (f x) = comap f (iterate f x):
iterate f (f x) = (f x) <> iterate f (f (f x)) -- 1. by definition of iterate = (f x) <> comap f (iterate f (f x)) -- 2. by hypothesis = comap f (x <> iterate f (f x)) -- 3. by definition of comap = comap f (iterate f x) -- 4. by definition of iterate
The coinduction principle is used exactly in step 2. If "
iterate f (f (f x))" and "
comap f (iterate f (f x))" are the same, then adding a new value "
(f x)" will preserve equality.
The definition of "primitive corecursive" is a bit restrictive. It prevents useful, and well-founded, definitions like:
-- "evens" would need to come first after <> for this definition to be corecursive evens = 2 <> (comap (+2) evens)
Note that we can't allow any kind of construction with "<>". For example the
bad function below is not terminating:
-- infinite lists data Colist a = a <> Colist a -- the tail of an infinite list cotail a :: Colist a -> Colist a cotail a <> rest = rest -- don't do this at home bad = 1 <> (cotail bad)
In "Ensuring streams flow", David Turner shows that it is possible to develop an algorithm which will check where the corecursion is safe and where it isn't. The formal argument takes a few pages of explanation but the idea is simple. We need to count the "levels of guardedness":
comapis not a problem, because
comapwill add a new coconstructor
cotailis a problem, because
cotail"removes" a coconstructor
So the idea of the algorithm is to count the number of times where we "add" or "remove" coconstructors to determine if the corecursion will be safe or not.
To conclude with the presentation of codata, I want to briefly introduce comonads and give a short example to give an intuition of what it is.
If you look at the definition in the books and listen to the category theory people, you will read or hear something like: "get the monad definition, change the arrows direction and you have a comonad":
-- the comonad operations -- the dual of return or unit for monads extract :: W a -> a -- the dual of bind or flatMap for monads cobind :: W a -> b -> W a -> W b
Intuitively you can think of a comonad as:
"I can extract things from a Context" (to contrast with the "I can put things in a context" of monads)
If I have a function which "computes values from inputs values which might change depending on the context":
W a -> b, then I can use "values in context",
W a, to return other "values in context",
This is still not quite clear, so let's give a simple example based on
Colist is a comonad which we can use like that:
-- a Colist of Nats where each new value is the previous value + 1 nats = 0 <> comap (+1) nats -- a function taking the first 2 elements of a Colist firstTwo a :: Colist a -> (a, a) firstTwo a <> b <> rest = (a, b) -- now, let's cobind firstTwo to nats cobind firstTwo nats = (0, 1) <> (1, 2) <> (2, 3) <> ...
Sometimes you can read that
List is a monad representing the fact that a function can produce non-deterministic results ("zero or many"). By analogy with the
List monad, we can say that
Colist is a comonad that represents the function can have non-deterministic inputs.
I touched on a lot of subjects with this post, for which there would be a lot to say (and certainly with less approximation that I did). In the first part we saw that, when dealing with finite data, the notions of termination, computation and recursivity and intimately related. In the second part we saw that it can be beneficial to give a special status to infinite data and to provide for it the same kind of tools (proofs with corecursion, comonads) that we usually have for data.
I hope that the subject of codata will become more and more popular (this recent proposal for a
codo notation in Haskell might be a good sign). We can see a lot of architectures these days turning to "Event Sourcing". I suspect that Codata, Comonads, Costate and all the Cotypeclassopedia will prove very useful in dealing with these torrents of data.