Lock-free vs wait-free concurrency

Posted by 111qqz on Tuesday, March 21, 2017



看leveldb源码中遇到的,关于lock-free 和 wait-free..感觉这个讲得不错,我试着翻译一下?

There are two types of [non-blocking thread synchronization](http://en.wikipedia.org/wiki/Non-blocking_synchronization) algorithms - lock-free, and wait-free. Their meaning is often confused. In lock-free systems, while any particular computation may be blocked for some period of time, all CPUs are able to continue performing other computations. To put it differently, while a given thread might be blocked by other threads in a lock-free system, all CPUs can continue doing other useful work without stalls. Lock-free algorithms increase the overall throughput of a system by occassionally increasing the latency of a particular transaction. Most high- end database systems are based on lock-free algorithms, to varying degrees.

有两种 无阻塞线程同步算法,一种是lock-free,一种是wait-free,它们的含义经常被搞混。在一个lock-free系统中,尽管某个特定的计算会被阻碍一段时间,所有的cpu还是能够继续其他计算。换一种说法,尽管在一个lock-free的系统里,一个线程可能被其他线程阻碍,所有的cpu仍然可以继续做其他工作而不做停顿。lock-free算法通过偶然增加一个特定事物的延迟,增加了系统总体的吞吐率。大多数高端数据库系统都在某种程度上基于lock-free 算法。

By contrast, wait-free algorithms ensure that in addition to all CPUs continuing to do useful work, no computation can ever be blocked by another computation. Wait-free algorithms have stronger guarantees than lock-free algorithms, and ensure a high thorughput without sacrificing latency of a particular transaction. They’re also much harder to implement, test, and debug. The [lockless page cache](http://lwn.net/Articles/291826/)patches to the Linux kernel are an example of a wait-free system.


In a situation where a system handles dozens of concurrent transactions and has [soft latency requirements](http://en.wikipedia.org/wiki/Real-time_computing#Hard_and_soft_real-time_systems), lock-free systems are a good compromise between development complexity and high concurrency requirements. A database server for a website is a good candidate for a lock-free design. While any given transaction might block, there are always more transactions to process in the meantime, so the CPUs will never stay idle. The challenge is to build a transaction scheduler that maintains a good mean latency, and a well bounded standard deviation.


In a scenario where a system has roughly as many concurrent transactions as CPU cores, or has hard real-time requirements, the developers need to spend the extra time to build wait-free systems. In these cases blocking a single transaction isn’t acceptable - either because there are no other transactions for the CPUs to handle, minimizing the throughput, or a given transaction needs to complete with a well defined non-probabilistic time period. Nuclear reactor control software is a good candidate for wait-free systems.