February 08, 2019
This article assumes the reader already know about
Fix point type is a type that generically describes recursive data. By abstracting over a recursive data structure, we can define a generic operation that works on any recursive data structure.
To learn more about HKT and Fix point type, check these links.
You can run all code snippets in ammonite-repl, for ADT definition, you need to enclose them in a curly brace like { ...adt definition }
before pasting to ammonite repl.
You can find all of the code used in this post here
This post aims to document how to retain type information of Generalized Algebraic Data Type (GADT) with Fix point type. I will illustrate the technique by refactoring a GADT with direct recursion into another GADT without recursion and implement catamorphism
method for it.
This technique can be useful when the type param of your original recursive GADT propagates through the layers of your recursive data structure. It might not make sense now, I suggest carry on reading :)
The following code snippet shows a GADT that describes how to query a recursive data (e.g., JSON). It is super simple, it can only query String or Boolean by path, but we can add things like QueryByWithCondition later (not in the scope of this article).
// simplified
sealed trait Query[A]
case object QueryString extends Query[String]
case object QueryBool extends Query[Boolean]
case class QueryPath[A](path: String, next: Query[A]) extends Query[A]
// sample data
// {
// "oh": {
// "my": "zsh"
// }
// }
// expression to query _.oh.my from JSON above
val expression = QueryPath("oh", QueryPath("my", QueryString))
As you might have noticed, the Query
ADT is recursive, and the type param A
is recursive too, when constructing QueryPath
which extends Query
, the type param is determined by the next
argument of QueryPath
.
def typeMatch[A, B](a: A, b: B)(implicit eq: A=:=B) = ()
val queryString = QueryPath("my", QueryString)
val queryNestedString = QueryPath("my", QueryPath("oh", QueryString))
typeMatch(queryString, queryNestedString) // compiles
Having a type param is useful because, for any recursive query, we can know the result type statically without traversing the tree
, which is a runtime property.
The following steps are common when applying recursion schemes.
catamorphism
and anamorphism
on our recursive data structure(For common use cases, we can use operations from recursion schemes library, e.g. Matryoshka)
We are going to follow the same steps here.
The goal of this step is to remove recursion from our ADT, in our case, QueryPath
should no longer refer to Query
in its definition
sealed trait QueryF[+F[_], A]
case object QueryStringF extends QueryF[Nothing, String]
case object QueryBoolF extends QueryF[Nothing, Boolean]
case class QueryPathF[F[_], A](path: String, next: F[A]) extends QueryF[F, A]
Above is the GADT after transformation, they are suffixed by F
to differentiate with the previous GADT, few things to note
F[_]
to Query
base trait, and make it a covariantQueryStringF
, we specify Nothing
on the F[_]
position, so that QueryStringF
is a subtype of QueryF
, same for QueryBoolF
QueryPathF
, by abstracting next: QueryF[A]
into next: F[A]
, this is why we have to introduce F[_]
on QueryF
, so that we can abstract over the original Query[A]
which is a HKT.Now let’s see how to form a query using the new ADT.
// compiles with `-Ypartial-unification` compiler flag
val expression = QueryPathF("oh", QueryPathF("my", QueryStringF))
As you can see, it’s similar to our previous example, which is a hint that our new generic representation is as powerful as the previous, more concrete representation in terms of expressivity.
Let’s revise what Fix Point Type is, fix
is a structure that conforms to the following rule:
fix(f) = f(fix(f)) for all f
The rule above is abstract, in the sense that it can apply to different domain, for example when f
is function, then fix
is commonly known as Y-combinator
which describes recursive function, when f
is a type then we get Fix Point Type
which describe recursive type.
Below is a straight forward way to define Fix point type
in Scala:
case class Fix[F[_]](unfix: F[Fix[F]])
It follows the rule above in that every Fix[F]
is equivalent to F[Fix[F]]
for all F.
Unfortunately, we cannot use this definition of Fix
to construct a recursive version of QueryF
// does not compile
Fix(QueryPathF("key", Fix(QueryStringF))
Because the type signature does not match, Fix
constructor takes a type param with * -> *
kind (commonly pronounce as Star to Star), encoded as F[_]
in Scala, but QueryF
has kind of (* -> *) -> * -> *
, as it takes a F[_]
and A
and returns a proper type (ie. a Star type)
Insights: Fundamentally,
Fix point
exists for any kind of functions. The regularFix
type is actually a fix point ofF[_]
which is a type level function, it takes aF[_]
and produceF[Fix[F]]
The solution for this problem is to create a different Fix
type that can handle QueryF
, which is a different function.
We can start by writing out the kind/shape of QueryF
, it looks like this: (* -> *) -> * -> *
, meaning it takes a type level function and a proper type and returns a proper type.
The fix point of such a kind should takes this type as an input, and Scala is powerful enough to express such idea using HKT syntax
F[_[_], _]
is the syntax to represent a type level function with this kind: (* -> *) -> * -> *
, then our special HFix for QueryF will looks like below:
case class HFix[F[_[_], _], A](unfix: F[HFix[F, ?], A])
This looks a bit scary, let’s try to break it down.
F[_[_], _]
is a type constructor that takes 2 type params, a HKT * -> *
and a proper type *
A
unfix
becomes F[HFix[F, ?], A]
, remember F
takes a * -> *
on 1st type param, which is the kind of HFix[F, ?]
, the 2nd type param would then be A
so that we propagate it across layers of recursionHFix
?To determine the shape of HFix
, let’s go back to basic:
fix(f) = f(fix(f))
With this formula, we use algebra to solve:
((* -> *) -> * -> *) -> * -> *
The shape of HFix should be ((* -> *) -> * -> *) -> * -> *
, the key is to figure out what k
should be
I think it’s worth to emphasize 2 points
HFix[F[_[_], _], A]
and F[HFix[F, ?], A]
contains type A
(* -> *) -> * -> *
, encoded as F[_[_], _]
, which match our QueryF
typeSample usage of HFix
def queryString = HFix(QueryStringF: QueryF[HFix[Query,?], String])
def queryBool = HFix(QueryBoolF: Query[HFix[QueryF,?], Boolean])
def queryPath[A](p: String, next: HFix[QueryF, A]) = HFix(QueryPathF(p, HFix(next)))
queryPath(
"oh",
queryPath(
"my",
queryString
)
)
res32: HFix[QueryF, String] = HFix(QueryPathF("oh", HFix(QueryPathF("my", HFix(QueryStringF)))))
Cool, so we can create a recursive structure that expresses querying a string at path oh.my
, note the type of this structure is HFix[QueryF, String]
, which is great because we retain the knowledge that it results in a String
.
Defining a Functor instance is essential, it describes how to transform recursive data structure. Unfortunately, due to the usage of GADT, a standard Functor wouldn’t work, it’s similar to the issue with standard Fix Point type.
trait Functor[F[_]]
Above is the interface of Functor, we can define Functor
for any type with kind * -> *
, but Query has a type of (* -> *) -> * -> *
as we’ve seen earlier.
The solution is similar to how to solve the issue with Fix; we need to define another Functor like structure that fit our type.
When using ordinary ADT, we want a Functor instance so that we can transform the type param of our ADT, eg. F[A] => F[B]
, but now we have QueryF[F[_], A]
, which contains 2 type params, what should we transform?
The most generic way would be to transform both type params but to simplify the problem, we are only transforming the 1st type param, e.g., QueryF[F[_], A] => QueryF[G[_], A]
import $ivy.`org.typelevel::cats-core:1.6.0`
import cats.~>
trait HFunctor[F[_[_], _]] {
def hmap[I[_], J[_]](nt: I ~> J): F[I, ?] ~> F[J, ?]
}
This typeclass describes the ability transform the 1st type param of an arbitrary type F, where F has kind of (* -> *) -> * -> *
, which is what we need.
Next, we need to implement an instance of HFunctor
for QueryF
implicit val queryHFunctor: HFunctor[QueryF] = new HFunctor[QueryF] {
def hmap[I[_], J[_]](nt: I ~> J): QueryF[I, ?] ~> QueryF[J, ?] = {
new (QueryF[I, ?] ~> QueryF[J, ?]) {
def apply[A](a: QueryF[I, A]): QueryF[J, A] = {
a match {
case QueryStringF => QueryStringF
case QueryBoolF => QueryBoolF
case query: QueryPathF[I, A] => QueryPathF(query.path, nt(query.next))
}
}
}
}
}
The implementation is relatively straight-forward, basically we try to handle every case of GADT and change the F[_]
type param, for bottom type (ie. when F is Nothing), we dont have to change anything because we define F as covariant.
Finally, after all the ceremony, we are in the position to define catamorphism
On a high-level view, catamorphism
is an operation that collapses a recursive structure into a single value by collapsing each layer of the data structure recursively.
It is generic in that user can define how to collapse each layer.
It looks like this
type HAlgebra[F[_[_], _], G[_]] = F[G, ?] ~> G
def hCata[F[_[_], _], G[_], I](alg: HAlgebra[F, G],hfix: HFix[F, I])(implicit F: HFunctor[F]): G[I] = {
val inner = hfix.unfix
val nt = F.hmap(
new (HFix[F, ?] ~> G) {
def apply[A](fa: HFix[F, A]): G[A] = hCata(alg, fa)
}
)(inner)
alg(nt)
}
The flow of hCata
is similar to normal cata
method, by trying to peel off the outer layer of HFix[F, I]
, we recursively dive into the inner-most layer of the recursive structure, then we apply HAlgebra
while traversing back to the top layer.
Note: You can try to implement anamorphism and other operations :D
We defined everything we need, let’s try to use it.
We’ll go through 2 examples, using the query we created earlier.
a) Print recursive query in a human-readable format
val nestedQuery = queryPath(
"oh",
queryPath(
"my",
queryString
)
)
// a trick to fold into String, this is interesting as it shows that
// generalized type constructor is super powerful; it can represent a
// more specialized type easily
type JustString[A] = String
// important part: convert each layer of query into a string
val print: HAlgebra[QueryF, JustString] = new HAlgebra[QueryF, JustString] {
override def apply[A](fa: QueryF[JustString, A]): JustString[A] = {
fa match {
case QueryStringF => "as[String]"
case QueryBoolF => "as[Bool]"
case q: QueryPathF[JustString, A] => s"${q.path}.${q.next}"
}
}
}
hCata(print, nestedQuery)
// res44: JustString[String] = "oh.my.as[String]"
b) Convert our query description a circe Decoder
import $ivy.`io.circe::circe-parser:0.10.0`
import io.circe.Decoder
val toDecoder: HAlgebra[QueryF, Decoder] = new HAlgebra[QueryF, Decoder] {
override def apply[A](fa: QueryF[Decoder, A]): Decoder[A] = fa match {
case QueryBoolF => Decoder.decodeBoolean
case QueryStringF => Decoder.decodeString
case q: QueryPathF[Decoder, A] =>
Decoder.instance { cursor =>
cursor.get(q.path)(q.next)
}
}
}
val decoder = hCata(toDecoder, nestedQuery)
val json = parse("""
{
"oh": {
"my": "20202"
}
}
""").right.get
decoder.decode(json) // Right("20202")
I tried to use simpler examples, for more complicated examples, check these projects:
We’ve walked through how to apply recursion schemes on a GADT. The basic idea is similar to applying recursion schemes on ordinary ADT; the main difference is that we need to operate on higher order types, eg.
Functor
that takes a function (* -> *
) becomes HFunctor that takes natural transformation (* -> *) -> (* -> *)
Fix
with shape (* -> *) -> *
becomes HFix
with shape ((* -> *) -> (* -> *)) -> (* -> *)
Algebra
with (* -> *) -> * -> *
becomes HAlgebra ((* -> *) -> (* -> *)) -> (* -> *) -> (* -> *)
Notice there’s a common theme, the number of stars doubled in the process, I haven’t fully understood why but I hope this article is still useful.
I learn most of the techniques described here from Xenomorph by Nuttycom, I find it interesting and thought would be good to write it down.
Thanks Nuttycomb for sharing his idea with the open source world! I also wish to thanks my coworker Olivier for introducing the idea of recursion scheme to me, and Alex for correcting me on notation of higher kinded type.