[FrontPage] [TitleIndex] [WordIndex

Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

1. Software transactional memory

Goes back to Shavit and Touitou (Shavit and Touitou, Software transactional memory, PODC 1995) based on earlier proposals for hardware support for transactions by Herlihy and Moss. Recently very popular in programming language circles.

We start with the basic idea of a transaction. I read a bunch of registers and update their values, with usually database ACID properties: the transaction either happens completely or not at all, serializes with other transactions as if each occurred instantaneously. Our goal is to implement this with minimal hardware support, and use it for everything.

Generally we only consider static transactions where the set of memory locations accessed is known in advance, as opposed to dynamic transactions where it may vary depending on what we read (e.g. if we have to follow pointers). Static transactions are easier because we can treat them as multi-word RMW.

Implementations are usually non-blocking: some infinite stream of transactions succeed, but not necessarily yours. This excludes simple method based on acquiring locks, since we have to keep going even if a lock-holder crashes, but is weaker than wait-free since we can have starvation.

2. Motivation

Some selling points for STM:

  1. We get atomic operations without having to use our brains much. Unlike hand-coded AtomicSnaphots, counters, queues, etc., we have a universal construction that converts any sequential data structure built on top of stock memory into a concurrent data structure. This is useful since most programmers don't have very big brains. We also avoid burdening the programming with having to remember to lock things.

  2. We can build large shared data structures with the possibility of concurrent access. For example, we can implement AtomicSnapshots so that concurrent updates don't interfere with each other, or an atomic queue where enqueues and dequeues can happen concurrently so long as the queue always has a few elements in it to separate the enqueuers and dequeuers.

  3. We can execute atomic operations on distinct data structures, even if the data structures weren't originally designed to work together, provided they are all implemented using the STM mechanism. This is handy in all the classic database-like settings, as when we want to take $5 from my bank account and put it in yours.

On the other hand, we now have to deal with the possibility that operations may fail. There is a price to everything.

3. Basic approaches

4. Implementing multi-word RMW

We'll describe Shavit and Touitou's method, which essentially follows the locking approach but allows other processes to help dead ones so that locks are always released.

Synchronization primitived used is LL/SC: LL (load-linked) reads a register and leaves our id attached to it, SC writes a register only if our id is still attached, and clears any other id's that might also be attached. It's easy to build 1-register CAS (CAS1) out of this, though S&T exploit some additional power of LL/SC.

Overview of a transaction (no contention or failures):

  1. Process intializes the local state rec for the transaction: operation field describes what it wants to do, Addresses[] lists the relevant memory addresses, OldValues[] (initially null) are set to provide space for storing old values from the R part of the RMW. Status (initially null) records whether the transaction succeeds or fails. A version number distinguishes reuse of the same storage. A Stable bit is set at the end to tell other processes that all values have been initialized.

  2. Process acquires ownership of registers in Addresses[]. See AcquireOwnerships code in paper. Basic idea is to use LL/SC to set Owner[address] to the transaction record only if it is currently Nobody; however, in between the LL and the SC the process checks that the record status is still null and that the version number hasn't changed. The status check is done with an overlapping LL/SC: LL(status) LL(owner) SC(status) SC(owner). If both SC's succeed, go on to the next register. If either fails, attempt to fail the transaction (by doing SC(Failure, bad-address) in the status field) and return if the failure is successful. Otherwise try again.

  3. Do a LL on status to see if AcquireOwnerships succeeded. If so, update the memory, store the old results in OldValues and release the ownerships. If it failed on bad address j, release the ownership and then attempt to carry out the transaction that owns j, provided we are the initiator of the current transaction.

With failures or concurrency, a second process may come in and attempt to acquire ownership of a register we've already grabbed. In this case, it will fail its own transaction and then start helping ours, attempting to acquire more registers for our transaction. If it succeeds before we do, it will update the registers and store the previous values in OldValues.

4.1. Linearizability

Intuition is:

4.2. Non-blocking

To show that the protocol is non-blocking we must show that if an unbounded number of transactions are attempted, one eventually succeeds. First observe that in order to fail, a transaction must be blocked by another transaction that acquired ownership of a higher-address location than it did; eventually we run out of higher-address locations, so there is some transaction that doesn't fail. Of course, this transaction may not succeed (e.g., if its initiator dies), but either (a) it blocks some other transaction, and that transaction's initiator will complete it or die trying, or (b) it blocks no future transactions. In the second case we can repeat the argument for the n-1 surviving processes to show that some of them complete transactions, ignoring the stalled transaction from case (b).

5. Improvements

One downside of S&T is that it uses LL/SC very aggressively (e.g. with overlapping LL/SC operations) and uses non-trivial (though bounded, if you ignore the ever-increasing version numbers) amounts of extra space. Subsequent work has aimed at knocking these down; for example, Harris et al, A practical multi-word compare-and-swap, DISC 2002 builds multi-register CAS out of single-register CAS with O(1) extra bits per register. The proof of these later results can be quite involved; Harris et al, for example, base their algorithm on an implementation of 2-register CAS whose correctness has been verified only by machine (which may be a plus in some views).


2014-06-17 11:58