Modes of Operation - laforge49/Asynchronous-Functional-Programming GitHub Wiki
We always think of asynchronous code as being so much faster than the synchronous alternative, and there is no question that responding to an event is generally much better than doing something like polling. But we rarely consider how much slower it is to execute a chain of asynchronous operations using multiple threads compared to executing those same operations synchronously on a single thread. The sad truth is that when we try to write scalable logic that uses multiple CPU's, too often the result is software that runs slower than if it ran as a single thread. This explains, in part, why single-threaded frameworks like Twisted and node.js are so successful.
In-line code is, without question, the fastest, though often the most inconvenient. A while loop runs a bit slower. Recursion is slower still. A single-threaded event queue is even slower. And a multi-threaded messaging system, i.e. actors, is probably the worst possible approach in terms of raw speed. Which brings us to a dilemma. Actors are a great way to manage multi-threaded processes but tend to force us into implementing everything using actors.
The best approach when using actors has been to keep them coarse grained, doing as much as possible when processing each message. This can result in a lot of messy and cryptic code, but that is a given when speed is the issue, right? But does it have to be this way? The idea behind this project is that by decoupling actors from their mailboxes, actors which use the same mailbox can directly inter-operate on the same thread without recourse to a message queue. This we call synchronous operation, in contrast to asynchronous operation when messages are queued and processed on a different thread.
The processing speed for synchronous operation differs most from asynchronous operation when message processing is brief, with synchronous processing running about 110 times (11,000%) faster than asynchronous processing. Small actors now become practical, and that means both cleaner code and more code reuse.
Actors are useful in multi-threading applications because they ensure the thread safety of their message processing logic. This is done through the use of a mailbox. Actors place messages in each other's mailbox and they read and process messages from their own mailboxes one at a time. so an actor never processes two messages at the same time. What we have done here is to decouple actors from their mailboxes to allow actors to share mailboxes, though any given actor only ever uses a single mailbox. Now only one message is processed at a time for all the actors sharing the same mailbox. And this means that these actors can now inter-operate synchronously.
Incoming messages from actors which use a different mailbox are put in the the destination actor's mailbox and processed later when a thread is available. But now when a message comes from an actor which uses the same mailbox, the receiving actor can safely process the message immediately. There are two exceptions to this, safe functions and immutable actors. Safe functions are always called synchronously to process the type of message they are bound to, and without any restrictions. Thread safety of safe functions is assured by the developer. But care must be taken, as there is no constraint on when a safe function can execute.
Immutable actors, which are actors whose state never changes, can safely process all the messages they receive synchronously and do not use a mailbox. We designate an actor as immutable by constructing it without a mailbox. But there are restrictions placed on immutable actors.
- Unlike regular actors, which only process one message at a time, an immutable actor may find itself processing any number of messages at the same time. So an immutable actor must be completely threadsafe. I.E. An immutable actor must not change its state, nor can its state be changed by any other means.
- An immutable actor can only send messages to other immutable actors.
The behavior of an actor does differ depending on its mode of operation, so some care needs to be exercised in the patterns we use to ensure that the code is not sensitive to these differences. The difference between synchronous and asynchronous operation is, of course, the time when a message is processed. Let's look at some code.
First we define AMessage, to which we can bind some processing logic.
case class AMessage()
Actor A binds AMessage to afunc, which prints when afunc started and ended, and when it got a result from another actor. The constructor for this actor has one parameters: sub, the actor to which a message is to be passed.
class A(sub: Actor) extends Actor {
bind(classOf[AMessage], afunc)
def afunc(msg: AnyRef, rf: Any => Unit) {
println("start afunc")
sub(msg) {rsp =>
println("got result")
rf("all done")
}
println("end afunc")
}
}
Actor B simply returns some result on receiving AMessage.
class B extends Actor {
bind(classOf[AMessage], bfunc)
def bfunc(msg: AnyRef, rf: Any => Unit){rf("ta ta")}
}
Our first test is to pass a message synchronously, i.e. with both actors using the same mailbox.
val mb1 = new ReactorMailbox
val mb2 = new ReactorMailbox
val b = new B
b.setMailbox(mb1)
println("synchronous test")
val sa = new A(b)
sa.setMailbox(mb1)
Future(sa, AMessage())
In this case, the results are received by actor A immediately.
synchronous test
start afunc
got result
end afunc
Our second test is to pass the message asynchronously, i.e. with each actor using a different mailbox.
println("asynchronous test")
val aa = new A(b)
aa.setMailbox(mb2)
Now actor A does not receive the result until after afunc returns.
asynchronous test
start afunc
end afunc
got result
Because synchronous processing is so much faster than asynchronous processing, actors should, as much as possible, use the same mailbox. So when should they use a different mailboxes? One clear case is when an actor blocks for I/O.
When an actor blocks for I/O, it ties up a thread until the operation is completed. So if you have too many actors blocking for I/O you will be using a lot of threads. And that uses up a lot of memory, fast. So it is best to have only a few actors which perform I/O. For example, you can have one actor which reads the entire contents of a file, the file pathname being passed in a message and the result returned being the file contents. But now you want this file reader actor to have its own mailbox, because when it blocks for I/O it will block the execution of all the other actors using the same mailbox. So in general, every actor which blocks for I/O should have its own mailbox.
A second clear case of when you should be using different mailboxes is when a server needs to process multiple requests in parallel. In this case you want all the actors involved in processing the same request to use the same mailbox, but have a separate mailbox for each request.
When implementing Actor subclasses, there is one important thing to keep in mind. Actors do not wait for a response to a message sent to another actor. This makes it easy to implement a high speed file cache. The caching actor need never block--it either immediately returns the requested data, or sends a read request to a reader actor to do the I/o and returns the requested data when it later receives it from the reader actor. Since the caching actor does not wait for the response from the reader actor it is free to process other requests. This all works very nicely but it does mean that the actor's state may have changed in the time it takes another actor to answer a request. This needs to be kept in mind when writing code which processes responses.