Parallel Universe

Lately I saw a lot of excitement around writing asynchronous code, non blocking, lock free, etc… This is good as we are moving towards more parallelized/concurrent softwares and libraries. There are nice performance improvements ahead. But I fear that we are going into having code hard to understand and therefore to debug.

Some people are starting to be so afraid of having a thread being blocked that they will take every available countermeasure to avoid these threads to block. They’ll start writing callbacks. And no matter how cool language is, callbacks make code not easy to read, the computation flow is not as clearly written as it is actually is. Obviously, for scripting or for little projects, quick coding requires quick coding structures like callbacks. But then high concurrency is not needed in the first place. If we start to change the way we code because of performance issues of the underlying stack, then it means that we’re using the wrong stack.

This rant insight was provoked by a presentation and the example shown in this doc: http://docs.scala-lang.org/overviews/core/futures.html#promises Take few minutes to figure this example out. You’ll see there is some implicit code code at line 9: "forkAndGoto line 16". Easy reading, isn’t it ? And this is a simple example.

As always, there are exceptions. You may be in the hurry, some requests have high latencies and it hurts your business, you need to fix it quickly. In other words you need to hack. So hack and use callbacks. But don’t call this proper and maintainable code.

Let’s imagine the ubiquitous use case where there are HTTP requests to handle. On reception of a request there is a call to the database (ouch, a blocking call !). Depending of the database query result, there is a call a third party API (ouch, ouch, another blocking call !). Last but not least there is another call to the database to render the final result (OMG!! a third blocking call !). How should we code that ? As a callback of a callback of a callback ? I am usually one of the people who try to debug things they don’t know and read a lot of the internal code of libraries to figure out what the hell is happening which is breaking my app. For my sake, let’s avoid that mess. For instance how cool do you think this example is: https://gist.github.com/benjchristensen/4677544

When looking for improving some latencies and when the threads are being blocked too many times during the handling of a command, there are Futures for that. In the code handling the command, we just need a launch the stuff which can be done in parallel via Futures and then we’ll have one final thread-block waiting for all of them to finish. Things got parallelized, the control flow is still as clear as it can be, just a last block to get a last reunion of the required data. Now I’m hearing you telling me that the futures may need some callbacks. But these callbacks will be just about doing one thing and basta, there will be hopefully no side effect and nothing will be triggered behind the scene like in the previous example.

If there are such issues that any thread to block becomes a drama, then it’s time to consider changing of paradigm and enter into the real parallel universe: the actor model. Let’s play with Erlang and Akka. Parallelism cannot be easier than in these environments. In this universe, the code wait for input as long as needed. Actual real threads won’t be blocked. We get directly to the point, it shows what is the real deal about parallelism: coordination, async error handling, dispatching and routing asynchronous messages, load balancing, back pressure… And useless design of variable’s scopes in callbacks disappear.

There is RxJava which seems interesting. The "pipelining" of asynchronous calls is quite naturally described (if you can bear the builder pattern). But then again, when into huge parallelism, when there are so much constraints that any resource cannot be wasted, how would be controlled the number of instances of the code running in parallel ? How will be implemented back pressure ? How are errors handled ? Here come the real mess and these nice few lines of pipelining end up a real pain to read.

Now that I have explained why I am right (because I am always right, at the exception of when I’m wrong), I need to explain that I don’t have a tremendous experience with these "concurrent"/"parallel" technologies. Did I wasted your team reading all this ? Maybe :). I have indeed not years of practice of highly concurrent programming, so this is really an intuition, let’s say it is some French flair!

I have practiced a little bit of Erlang though, to get a taste of that language. And coding in Erlang need a different mind set, parallelism becomes very natural, sometimes (most of the time ?) not even thinking about it. That’s mainly why I start to have migraines looking at Scala’s promises; it’s so easy with the proper language or framework. And I have played a little bit further with Akka. There was some software I needed to get more parallelized to gain performance improvements, by using all the available cpu cores. This is another story which will get its own chapter here.

This subject about cleaner coding is particularly important to me because as wrote earlier, I am an open source addict so I often read other’s code. Hence one of my preferred quotes:

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.

— Brian Kernighan

And I’ll conclude by my preferred acronym: KISS.