Let's look at the problems with actors from another angle. Yes, actors are all about side effects, side effects don't compose, etc, etc, but more specifically, the actor model takes the wrong default of starting with an overly nondeterministic programming model, and then attempting (via some very first-order, out-of-band mechanisms that don't compose) to prune away this nondeterminism in order to obtain predictable performance. In particular, due to its excessive nondeterminism, the actor model lacks predictable tools for controlling queueing and memory usage. Where have we heard this sort of argument before? The exact same problems were noted with threads:
Threads... are wildly nondeterministic. The job of the programmer is to prune away that nondeterminism. We have, of course, developed tools to assist in the pruning. Semaphores, monitors, and more modern overlays on threads... offer the programmer ever more effective pruning. But pruning a wild mass of brambles rarely yields a satisfactory hedge.
To offer another analogy, suppose that we were to ask a mechanical engineer to design an internal combustion engine by starting with a pot of iron, hydrocarbon, and oxygen molecules, moving randomly according to thermal forces. The engineer’s job is to constrain these motions until the result is an internal combustion engine. Thermodynamics and chemistry tells us that this is a theoretically valid way to think about the design problem. But is it practical?
We could almost do a search-and-replace to turn the paper into an argument against using actors! Even though an actor (generally) mutates only local state, these side effects are observable to any piece of code capable of messaging the actor, and observability of side effects propagates transitively. So while actors avoid a class of bugs that arise when using preemptive multithreading with mutable state for communication between threads, they don't really change the overall programming model of having unrestricted mutable state, with the possibility of completely different emergent behavior based on nondeterministic interleaving of actor message processing. We have the same general class of errors to worry about and the same problems with reasoning about our code.
Let's look at an example. Suppose we have two actors,
C, in a producer/consumer relationship.
P produces values and sends them to
C. In the actor model, at any point during execution
P may issue a
C ! msg. This means the memory usage of our program is an emergent runtime property, since every
C ! msg results in a message queued in the mailbox of
C, and any number of messages may pile up depending on the relative rates of processing of
C. Likewise, the latency of the overall pipeline is very much an emergent property, and depends on the nondeterminism of actor scheduling. Yes, there are workarounds: we can build an ad-hoc protocol specific to
C, or we can have the actor scheduler implement some global policy for congestion control (like making message send cost a function of the target's current mailbox size), and so on, but approaches like these are band-aids that don't scale. Like locks and semapahors, the deeper issue is a flawed underlying conceptual model.
There's a kind of general principle here which the actor model violates. Any decision affecting a program's asymptotic runtime or space complexity should be under explicit programmer control, using a clear and composable programming model. Actors violate this principle by making a key decision affecting memory usage (when to sequence a message send, and what sort of nondeterminism to allow) an emergent runtime property, and forcing us to encode these decisions using more hoc protocols, which aren't really first class or composable.
Purely functional approaches to concurrency are explicit. There are no side effects, we have to explicitly sequence the effects of our program, and if we want nondeterminism, we ask for it explicitly. So, rather than starting with unrestricted nondeterminism, with the possibility of unbounded memory usage depending on relative processing rates and what interleaving we get, we start with a fully deterministic model and add nondeterminism only as needed and where helpful.
Here's an example from scalaz-stream. For background, the scalaz-stream library represents sources, sinks, effectful channels, and pure stream transducers using a single basic type,
Process[F,O]: A process which emits
Ovalues, and which may issue requests of type
F[R]for any choice of
Monad, this can be thought of as an effectful source of
Sink[F,O]: A sink is represented as a source of effectful functions, or a
Process[F, O => F[Unit]]
Channel[F,I,O]: More generally, a
Channelis a source of effectful functions, or a
Process[F, I => F[O]].
Wye[I1,I2,O]: A pure stream transducer of two inputs. Emits values of type
O, and may request values from the left, from the right, or from both, allowing for nondeterminism.
Process1[I,O]: A pure stream transducer of one input. Emits values of type
O, and may request values from the input of type
Tee[I1,I2,O]: A pure transducer of two inputs. Emits values of type
O, and may request values from the left or from the right.
To feed a source,
src: Process[F,String] to a
snk: Sink[F,String], we write
src.to(snk). This is fully deterministic--we repeatedly read an element from
src, send it to
snk, and await a response from
snk, halting if either
snk halts. If we want to allow nondeterminism, we can do so, but we do so explicitly, using a more general combinator like
We're still connecting
snk, but we're now using a first-class object, a
Wye to handle the queueing strategy. We happen to be using the very simple queueing strategy,
wye.boundedQueue(20), which will run the
snk concurrently, blocking on
snk only if
20 elements have enqueued unanswered at the
snk. But the
Wye type is an arbitrary stream transducer and we can build up more complex logic for deciding what sort of nondeterminism is allowed (block if more than 20 elements enqueue unanswered OR if four consecutive values are received from
src that satisfy some predicate, and so on). The decision about nondeterminism, which in turn affects the memory usage of our program, is encoded as a first class object, and we can compose and abstract over these objects using the usual powerful tools of FP.
Related problems with
So to summarize, actors are overly nondeterministic. Message sends are actions with side effects, rather than first class values we explicitly sequence and whose nondeterminism we explicitly control. Hence we leave an important property of our program, namely the queueing and memory usage, to various out-of-band mechanisms that aren't visible to the typechecker.
The Scala standard library
Future type has a similar problem. The standard library
Future type represents a running computation, rather than a description of a computation. This unfortunately means that the actual parallelism and hence runtime complexity one obtains for a
Future[A] is an emergent runtime property which even depends on arbitrary details of how various functions that produce futures are factored. This is something we talk about in chapter 7 of the book, when we work through a design exercise of a library for describing parallel computations.
In contrast, the
scalaz.concurrent.Task type is a purely functional description of a (possibly) parallel computation. Rather than nondeterminism being a runtime property, determined by the order that strict futures are created,
Task is completely deterministic, generally instantiated lazily, and allowed points of nondeterminism are supplied explicitly using the functions of the
Nondeterminism typeclass. It's a more explicit model, but that's not a problem. With the usual tools of FP, we can abstract over any common patterns we uncover when building