BlockingQueues and Communicating Sequential Processes

Built with Hugo Theme Blackburn

14 May 2009

java / concurrency / threads

A particularly important objective for developers producing of concurrent systems is to produce designs that always give consistent results and cannot deadlock (see The Four Horsemen on why this can be difficult).

Since Java 1.5, the Java API has included the java.util.concurrent package, which contains a range of classes to support thread synchronisation, communication and safe sharing of data. This Java5 API enhancement introduced blocking queues for passing messages between threads, and with this it has been possible to write concurrent Java applications that benefit from decades of theoretical progress in how to design deadlock-free systems.

When starting out writing multi-threaded programs, the simplest starting point is to use the synchronized keyword. synchronized is a great asset - enforcing single-thread-only access to sections of code allows us to prevent the race conditions that otherwise cause erratic and inconsistent behaviour. But there’s a big catch. Two big catches, really. The obvious catch is that overuse of synchronized means that a greater fraction of the program runs in just one thread. The performance can suffer, in extremis down to to that of the original single-threaded version. Amhadl’s Law bites in. The second catch is harder for anyone new to multi-threaded Java - “it can’t deadlock … can it?” Well, er yes, maybe it can. Everywhere that has synchronized has the likelihood of waiting to acquire a lock and therefore the possibility of waiting forever.

One simple solution is to sidestep the problem using the cleverly-designed work of others. To wit, minimise or eliminate code that uses synchronized and instead use the higher-level primitives in java.util.concurrent.

Many developers use the caurefully-designed data structures for sharing data between threads - for example ConcurrentMap. But there’s another easy way alternative. By treating the classes used by each thread as non-thread-safe, not shared, private resources (i.e. ‘thread-local’, with a small ’T’ and small ‘L’), there isn’t any need to do anything other than write single-threaded code. The application will consist of domains of self-contained activity communicating by passing messages. Each domain is in itself simple because it has private data only - there are never any data races to worry about.

A central tenet of the new Go language is “do not communicate by sharing memory; instead, share memory by communicating.“ Java programs can do this too. Now, blocking queues can connect the domains and completely encapsulate the use of synchronized, and Object’s tricky wait and notify methods out of sight. And well-implemented alternatives also exist to java.util.concurrent, notably JCSP and CTJ.

Of course, we’re used to doing this when we write networking programs. Passing messages between processes using sockets is very similar to passing messages between threads using blocking queues. But when we couple threads together using blocking queues there’s a risk of deadlock once more. This might seem like we’re back in the tricky situation discussed earlier, but now things are different - we have the help of mathematicians who have been beavering away on process algebras for decades and have contributed very useful insights. Fortunately, we “mere mortals” don’t have to do the hard maths - just comprehend that CSP and Pi-Calculus are all about processes that do the thing that Go expressed so clearly (do not communicate by sharing memory; instead, share memory by communicating). Dealing with deadlock problems now usually comes down to applying a handful of design patterns that are known to be deadlock free. Of which … more will follow in a later posting.