Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

For more up-to-date notes see http://www.cs.yale.edu/homes/aspnes/classes/465/notes.pdf.

Contents

Synchronous leader election in rings
Synchronous leader election in general networks
Asynchronous leader election in rings

1. Synchronous leader election in rings

See AttiyaWelch chapter 3 or LynchBook Chapter 3 for details.

Basic ideas:

Want a single process to declare itself leader.
No leader election without breaking symmetry [Angluin 1980]. Proof is that if everybody is in the same state at every step, they all put on the crown at the same time.
With ordered identities, a simple algorithm due to Le Lann, Chang, and Roberts solves the problem in O(N) time with O(N²) messages: I send out my own id clockwise and forward any id bigger than mine. If I get my id back I win. Works with a unidirectional ring, doesn't require synchrony, never produces multiple leaders.
On a bidirectional ring we can get O(N log N) messages and O(N) time with power-of-2 probing [Hirschbirg and Sinclair 1980]. Complexity analysis is mildly painful but basically comes down to the fact that any node that sends a message 2^k hops had to be a winner at phase 2^k-1, which means that it is the largest of some group of 2^k-1 ids. Thus the 2^k-hop senders are spaced at least 2^k-1 away from each other and there are at most N/2^k-1 of them. Summing up over all O(log N) phases we get ∑_k 2^k N / 2^k-1 = O(N log N) messages and ∑_k 2^k = O(N) time.
Evil trick: if we have synchronized starting, known N, and known id space, we can have process with id i wait until round i*N to start sending its id around, and have everybody else drop out when they receive it; this way only one process (the one with smallest id) ever starts a message and only N messages are sent. But the running time can be pretty bad.

1.1. Details of LCR

Formally, we'll let the state space for each process i consist of two variables: leader, initially 0, which is set to 1 if i decides it's a leader; and maxId, the largest id seen so far. We assume that i denotes i's position rather than its id, which we'll write as id_i. We will also treat all positions as values mod N, to simplify the arithmetic.

Code for LCR is:

initially, leader ← 0, maxId ← id_i, send i to clockwise neighbor.
upon receiving j:
- if j = id_i, set leader ← 1.
- if j > maxId, set maxId ← j and send j to right neighbor.

1.1.1. Proof of correctness

By induction on the round number k. The induction hypothesis is that at the in round k, each process i's leader bit is 0, its maxId value is equal to the largest id in the range (i-k)..i, and that it sends id_i-k if and only if id_i-k is the largest id in the range (i-k)..i. The base case is that when k = 0, maxId = id_i is the largest id in i..i, and i sends id_i. For the induction step, observe that in round k-1, i-1 sends id_(i-1)-(k-1) = id_i-k iff it is the largest in the range (i-k)..(i-1), and that i adopts it as the new value of maxId and sends it just in case it is larger than the previous largest value in (i-k+1)..(i-1), i.e. if it is the largest value in (i-k)..i.

Finally, in round N-1, i-1 sends id_i-N = id_i if and only if i is the largest id in (i-N)..i-1, the whole state space. So i receives id_i and sets leader_i = 1 if and only if it has the maximum id.

1.1.2. Performance

It's immediate from the correctness proof that the protocols terminates after exactly N+1 rounds.

To count message traffic, observe that each process sends at most 1 message per round, for a total of O(N²) messages. This is a tight bound since if the ids are in decreasing order N N-1 N-2 ... 1, then no messages get eaten until they hit N.

1.2. Details of Hirschbirg and Sinclair

Basically the same as for LCR but both the protocol and the invariant get much messier. To specify the protocol, it may help to think of messages as mobile agents and the state of each process as being of the form (local-state, { agents I'm carrying }). Then the sending rule for a process becomes ship any agents in whatever direction they want to go and the transition rule is accept any incoming agents and update their state in terms of their own internal transition rules. An agent state for LCR will be something like (original-sender, direction, hop-count, max-seen) where direction is R or L depending on which way the agent is going, hop-count is initially 2^k when the agent is sent and drops by 1 each time the agent moves, and max-seen is the biggest id of any node the agent has visited. An agent turns around (switches direction) when hop-count reaches 0.

To prove this works, we can mostly ignore the early phases (though we have to show that the max-id node doesn't drop out early, which is not too hard). The last phase involves any surviving node probing all the way around the ring, so it will declare itself leader only when it receives its own agent from the left. That exactly one node does so is immediate from the same argument for LCR.

To prove the message bound, compute the sum of all active agents per phase times messages sent as above.

1.3. Lower bounds

We'll do the easiest case: an Omega(N log N) lower bound on messages for synchronous-start comparison-based algorithms in bidirectional synchronous rings. For full details see LynchBook Section 3.6, AttiyaWelch Section 3.4.2, or the original paper in JACM by Frederickson and Lynch.

Basic ideas:

Two fragments i..i+k and j..j+k of a ring are order-equivalent if id_i+a > id_i+b iff id_j+a > id_j+b for b = 0..k.
An algorithm is comparison-based if it can't do anything to IDs but copy them and test for <. The state of such an algorithm is modeled by some non-ID state together with a big bag of IDs, messages have a pile of IDs attached to them, etc. Two states/messages are equivalent under some mapping of IDs if you can translate the first to the second by running all IDs through the mapping. Alternate version: Executions of p₁ and p₂ are similar if they send messages in the same direction(s) in the same rounds, declare themselves leader at the same round; an algorithm is comparison-based based if order-equivalent rings yield similar executions for corresponding processes. This can be turned into the explicit-copying-IDs model by replacing the original protocol with a full-information protocol in which each message is replaced by the ID and a complete history of the sending process (including all messages it has every received).
Define an active round as a round in which at least 1 message is sent. Claim: actions of i after k active rounds depends up to an order-equivalent mapping of ids only on the order-equivalence class of ids in i-k..i+k (the k-neighborhood of i). Proof: by induction on k. Suppose i and j have order-equivalent (k-1)-neighborhoods; then after k-1 active rounds they have equivalent states by ind. hyp. In inactive rounds i and j both receive no messages and update their states in the same way. In active rounds, i and j receive order-equivalent messages and update their states in an order-equivalent way.
If we have an order of ids with a lot of order-equivalent k-neighborhoods, then after k active rounds if one process sends a message, so do a lot of other ones.

Now we just need to build a ring with a lot of order-equivalent neighborhoods. For N a power of 2 we can use the bit-reversal ring e.g. 000 100 010 110 001 101 011 111. Here's a picture of what this looks like for N=32:

For N not a power of 2 we look up Frederickson and Lynch or Attiya et al. In either case we get Omega(N/k) order-equivalent members of each equivalence class after k active rounds, giving Omega(N/k) messages per active round, which sums to Omega(N log N).

For non-comparison-based algorithms we can still prove Omega(N log N) messages for time-bounded algorithms, but it requires Ramsey_theory. See AttiyaWelch Section 3.4.2 or LynchBook Section 3.7. Here "time-bounded" means that the running time can't depend on the size of the ID space. The intuition is that for any fixed protocol, if the ID space is larger enough, then (using Ramsey theory) there exists a subset of the ID space where the protocol acts like a comparison-based protocol. So the existence of a O(f(N))-message time-bounded protocol implies the existence of an O(f(N))-message comparison-based protocol, and from the previous lower bound we know f(N) is Omega(N lg N). Note that time-boundedness is necessary: we can't prove the lower bound for non-time-bounded algorithms because of the i*N trick.

2. Synchronous leader election in general networks

Basic assumptions:

Strongly-connected network, i.e. there is a path from every node to every other node.
Upper bound on the diameter (max length of shortest path from one node to another) is known.

Mechanism is the same as in the ring: biggest ID wins. Assuming the network is strongly-connected is essential: otherwise biggest-ID node might not be able to talk to anybody. Diameter bound can be dropped with effort but allows for a simple flooding algorithm as in the ring.

Basic flooding algorithm: state is (maxID, count), message function is send(q, i) = q.maxID if count < diameter and send(q, i) = null otherwise, update function is update(q, msgs) = (max(q.maxID, max(msgs)), count + 1.
- Terminates after exactly diameter rounds.
- Message complexity is exactly |E| * diameter.
- Proof of correctness: by induction on round number, show that q_i.count = round number and that q_i.maxID = max[all u such that d(u,i) ≤ round number] u.ID.
Optimized flooding algorithm: as in basic algorithm, but don't send messages unless q.maxID has changed. This requires adding sendMsg field to state.
- Proof of correctness: show that state of each node at round r except sendMsg is the same in both algorithms. Requires additional invariant that when uv is in E and u.maxID > v.maxID, then u.sendMsg is true.

What if we don't know the diameter bound? Then we need much more machinery. Simplest algorithm is to have each node initiate a DistributedBreadthFirstSearch algorithm but give up if it sees a larger ID than its own.

3. Asynchronous leader election in rings

See also LynchBook §15.1. We'll use AsynchronousMessagePassing model based on IOAutomata.

Assumptions:

Reliable FIFO channels, modeled as universal reliable FIFO channel automaton from AsynchronousMessagePassing.
Processes have identical code (modulo names of actions—IOAutomata model requires that distinct automata have distinct output actions, so we have to embed some sort of position identifier into the action names) but start with distinct initial states that include a unique ID from some totally-ordered set.
A process waits at most l time before issuing a send and a channel waits at most d time before issuing a recv.

Requirements: exactly one process eventually executes a leader output action. (With some tinkering we can get non-leaders to announce non-leader, but we'll keep things simple.)

Actions for the leader-election-in-a-unidirectional-ring system:

send(i,i+1,m) output of p_i; note i+1 wraps around mod N.
recv(i,i+1,m) input of p_i+1.
leader(i) output of p_i.

3.1. A simple algorithm for a unidirectional ring

Here's the code for a simplified version of LCR for an asynchronous system. Note that this is even further stripped down than the AsynchLCR algorithm of LynchBook §15.1.1, since we don't bother building a queue of outgoing IDs, but instead send the largest ID seen so far.

Here is the code for p_i:

states

(myid, maxid, unsent, status) initially (myid, myid, true, waiting)

transitions

send(i,i+1,m)

precondition: m = maxid and unsent = true
effect: unsent := false

recv(i,i+1,m)

effect

if m > maxid: maxid := m, unsent = true
if m = myid and status = waiting: status := winning
if m < maxid: no effect

leader(i)

precondition: status = winning effect: status = won

tasks

all output actions in one task

The system is the composition of N of these p_i together with N universal reliable FIFO channels to connect them; we will refer to the channel from i to i+1 as channel[i]. We now wish to show that it solves leader election assuming all initial IDs are distinct. The intuition is the same as for the original synchronous version of LCR: the max ID happily cruises around the ring until it returns to its source—which then declares itself leader—while all other ID's are eaten.

3.1.1. Correctness proof

The simplest way to express this intuition is by writing some invariants. Below, we let max be the maximum ID and imax be such that myid_imax = max.

Max-spread invariant

There is a process i such that maxid_j = max for all j in [imax,i], maxid_j < max for all j not in [imax,i], and maxid does not appear in channel[j].queue for j not in [imax,i]. Furthermore, at least one of the following holds:

unsent_i = true.
channel[i].queue contains max.
status_imax <> waiting.

Eaten-message invariant

If j is in the range [imax,i) then maxid_j <> i and channel[j].queue does not contain i.

It is not hard to see that both invariants hold in the initial state (for the max-spread invariant we are looking at case (1) with i=imax). To show that both invariants continue to hold requires a painful and exhaustive case analysis that we will omit here.

That nobody but imax can declare itself leader follows from the eaten-message invariant together with the further observation that status_i = waiting until recv(i-1,i,id_i) occurs, which is forbidden by the invariant when i <> imax.

3.1.2. Termination

To show termination, observe that if case (1) of the max-spread invariant holds for some i, eventually i must issue send(i,i+1,max). At this point case (2) holds, and eventually channel[i] must issue recv(i,i+1,max) (possibly after delivering some finite number of messages that were clogging the queue when send(i,i+1,max) occurred). This leads either back to case (1) with i increased by 1, or to case (3), depending on whether max was delivered to some i+1 <> imax or to imax. If case (3) occurs, then we must start with status_imax = winning and it's simply a matter of time before leader(imax) is issued.

3.1.3. Time analysis

For a time analysis, we can repeat the termination argument with explicit delay bounds. The time between case (1) for i and case (2) for i is at most l, where l is the bound on the time between enabling a send action and executing it. The time between case (2) for i and case (1) for i+1 is at most nd, where d is the maximum channel delay, since there are at most N messages clogging up the channel[i] queue (proving this rigorously requires yet another invariant that each id appears either in a single message or as maxid of a single process with unsent = true). Altogether we must wait at most nl + N²d time in the worst case.

However, this analysis gives up too much in assuming that every channel will be full of junk. In fact, we can argue that send(i+r,i+r+1,id_i) occurs no later than time r(l+d)+l (if it occurs at all) and recv(i+r,i+r+1,id_i) occurs no later than time (r+1)(l+d). The proof is by induction on r: for r = 0, send(i,i,id_i) occurs no later than time l, since l is the upper bound on i's task, and if send(i,i,id_i) occurs it must be the first message sent by i, so recv(i,i+1,id_i) occurs no later than time l+d. For larger r, we have that recv(i+r-1,i+r,id_i) occurs no later than time r(l+d), so the next send from i+r either discards id_i or sends id_i at time no later than r(l+d)+l. Similarly once send(i+r,i+r+1,id_i) occurs, we have that any earlier value is delivered by time r(l+d), so id_i if still present at this time must be the first in the queue, and so is delivered no later than r(l+d)+l+d = (r+1)(l+d).

3.1.4. Message complexity

Message complexity is similar to LCR, since we can construct an execution that is effectively synchronous and thus works identically to LCR. This gives a worst-case message complexity of O(N²). We can do better.

3.2. Peterson's algorithm for the unidirectional ring

See LynchBook §15.1.3. This gets O(N lg N) message complexity in all executions.

The basic idea (2-way communication version): Start with N candidate leaders. In each of at most lg N asynchronous phases, each candidate probes its nearest neighbors to the left and right; if its ID is larger than the IDs of both neighbors, it survives to the next phase. Non-candidates act as relays passing messages between candidates. As in Hirschberg and Sinclair, the probing operations in each phase take O(N) messages, and at least half of the candidates drop out in each phase. The last surviving candidate wins when it finds that it's its own neighbor.

To make this work in a 1-way ring, we have to simulate 2-way communication by moving the candidates clockwise around the ring to catch up with their unsendable counterclockwise messages. Peterson's algorithm does this with a two-hop approach that is inspired by the 2-way case above; in each phase k, a candidate effectively moves two positions to the right, allowing it to look at the ids of three phase-k candidates before deciding to continue in phase k+1 or not. Here is a very high-level description; it assumes that we can buffer and ignore incoming messages from the later phases until we get to the right phase, and that we can execute sends immediately upon receiving messages. Doing this formally in terms of I/O automata means that we have to build explicit internal buffers into our processes, which we can easily do but won't do here (see LynchBook pp 483-484 to see how to do this the right way.)

Candidate algorithm:

    phase := 0
    current := myid

    while true do
        send probe(phase, current)
        wait for probe(phase, someid)
        uid2 := someid
        send probe(phase, someid)
        wait for probe(phase, someid)
        uid3 := someid

        if uid2 = current:
            I am the leader!
        else if uid2 > current and uid2 > uid3:
            current := uid2
            phase := phase + 1
        else
            switch to relay algorithm

Relay algorithm:

    upon receiving probe(somephase, someid):
        send probe(somephase, someid)

Note: the phase arguments in the probe messages are useless if one has FIFO channels, which is why LynchBook doesn't use them. Note also that the algorithm does not elect the process with the highest ID, but the process that is incubating the sole surviving candidate in the last phase.

Proof of correctness is essentially the same as for the 2-way algorithm. For any pair of adjacent candidates, at most one of their current IDs survives to the next phase. So we get a sole survivor after lg N phases. Each process sends or relays at most 2 messages per phases, so we get at most 2 N lg N total messages.

3.3. A simple randomized O(N lg N)-message algorithm

Run LCR where each id is constructed by prepending a long random bit-string to the real id. This gives uniqueness (since the real id's act as tie-breakers) and something very close to a random permutation on the constructed id's. When we have unique random id's, a simple argument shows that the i-th largest id only propagates an expected N/i hops, giving a total of O(N H_N) = O(N log N) hops. Unique random id's occur with high probability provided the range of the random sequence is >> N².

The downside of this algorithm compared to e.g. Peterson's is that knowledge of N is required to pick random id's from a large enough range. It also has higher bit complexity since Peterson's algorithm is sending only IDs (in the official version) without any random padding.

3.4. Lower bound on message complexity

Short version: reduce to the synchronous case and get Omega(N log N). Longer version: avoid Ramsey theory by exploiting asynchrony, which gives the adversary substantially more power. See LynchBook §15.1.4 if you are interested—I don't think we will do this one in class.

CategoryDistributedComputingNotes