Homage to the Transducer
Clojure transducers and reducers in scala. [With minor updates - 24/10/14]
Empirical learning might be characterized as repeated application of a function (S, A) => S
, assuming S
is existing knowledge and A
is a new observation.
In clojure terms this signature is the basis of something called a reducer.
A transducer is something that transforms data that will ultimately be reduced. It is formulated as a function taking a reducer and producing a new reducer that accepts different input or treats the input differently.
Here is Rich Hickey's post introducing transducers and here is a post about Haskel types for transducers. The latter has some discussion between Rich Hickey and Franklin Chen, the author, in the comments.
I thought I would have a crack at formulating this idea in scala. Admittedly, this may impress neither clojure nor scala programmers, for different reasons.
Stream and iteratee libraries are plentiful in the scala world already. But the pitch for this particular formulation, from Rich Hickey, is:
"...for core.async, I became more and more convinced of the superiority of reducing function transformers over channel->channel functions..."
I've found it works well to define asynchronous processes in terms of foldables and fold functions, abstracting away messaging and other machinery. So I would like to see if that can be taken further.
My scala transducer code so far is on github.
Reducer
A Reducer
has an initial value, a completion step and a reduction step. In clojure these are arity 0, 1, and 2 overloads of the reducer function. In scala:
trait Reducer[-A, +S] {
type State
def init: State
def apply(s: State, a: A): Context[State]
def isReduced(s: State): Boolean
def complete(s: State): S
}
type Context[State] = State
This is a Reducer
over A
's ultimately producing an S
. The reduction step, apply
, operates on values of type State
starting with init
.
The result is produced by applying the complete
method to the final state. The result type S
is not necessarily the same as the internal reduction type State
.
The isReduced
predicate says that a given state is final. The reduction should stop when such a state is encountered, ignoring remaining A
values. In clojure this is handled by the reduced?
function.
Creating Reducers and Running Reductions
A Reducer
can be constructed from an initial value and a function by calling reducer
. It can be run over a reducible source of values such as a List
by reduce
.
val l = List(1.0, 2.0, 3.0)
def f(s: Double, a: Double) = s*0.9 + a*0.1
val r = reducer(0.0)(f)
val s = reduce(l)(r) // 0.561
Context
The type constructor Context
adds nothing to the foregoing definitions because Context[State]
is defined simply as State
.
However, Context
can be defined as any functor. For example, scala.util.Try
to capture errors or scala.concurrent.Future
for an asynchronous reduction.
The result of the reduction is obtained in whatever context is used, that is, reduce
returns a Context[S]
, given a Reducer[A, S]
.
If Context
is defined as scalaz.concurrent.Task
or flowlib.Process
the entire reduction is captured as a suspended, potentially asynchronous, computation which can be subsequently run or composed with others.
Reducible
A typeclass Reducible
is used to make sources of data eligible for reduction.
trait Reducible[R[_]] {
def apply[A, S](ra: R[A])(f: Reducer[A, S]): Context[S]
}
implicit val listIsReducible = new Reducible[List] { ... }
implicit val optionIsReducible = new Reducible[Option] { .. }
...
An implementation note:
Reducible
instances are defined differently for each concreteContext
. In theory, ifContext
was required to have aflatMap
operation, we could consolidate these definitions. In practice, many concreteContext
types don't support deeply nestedflatMap
calls.
Transducer
A Transducer[A, B]
converts a reducer of A
's to a reducer of B
's. It does not depend on the result type, S
.
trait Transducer[+A, -B] { tb =>
def apply[S](fa: Reducer[A, S]): Reducer[B, S]
def andThen[C](tc: Transducer[B, C]) = new Transducer[A, C] { ... }
}
def mapper[A, B](f: B => A): Transducer[A, B]
def filter[A](p: A => Boolean): Transducer[A, A]
def flatMapper[A, B, R[_]: Reducible](f: B => R[A]): Transducer[A, B]
def takeN[A](n: Int): Transducer[A, A]
...
Nor does a Transducer
depend on the type of the internal State
of the given reducer or the concrete Context
in which reductions are produced.
Transducer
's are composed using the andThen
operation.
Transducer State
A Transducer
may require state that evolves during the reduction. For example, takeN
must add state to the reduction representing the number of values seen so far.
val l = List(1.0, 2.0, 3.0)
def f(s: Double, a: Double) = s*0.9 + a*0.1
val r1 = reducer(0.0)(f) // r1.State = Double
val r2 = takeN(2)(r1) // r2.State = (Double, Int)
val s = reduce(l)(r2) // 0.29: Double
In this example, r2
has result type Double
but internally r2.State
is (Double, Int)
. The Double
tracks the state of r1
and the Int
tracks the count of values seen. Ultimately, r2.complete
is called which strips the count from the result.
Note that reducers and transducers are immutable. Reduction state is created and contained within the reduce
operation.
Conclusion
The point of this is not to compare clojure and scala but to attempt to borrow from clojure.
I think Reducer
, Reducible
and Transducer
have emerged as quite reasonable types in scala. Each is a polymorphic function, which is not surprising given we are coming from clojure. The types surely don't model every rule, but they say quite a bit. And the whole thing is pure (important for my concurrent programming use case).
I now want to see how useful this is in practice, particularly with type Context[S] = flowlib.Process[S]
.