Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

For more up-to-date notes see http://www.cs.yale.edu/homes/aspnes/classes/465/notes.pdf.

Algorithms for build a BreadthFirstSearch tree in a network. All assume that there is a designated initiator node that starts the algorithm. At end of algorithm each node except the initiator has a parent pointer and every node has a list of children. These are consistent and define a BFS tree i.e. nodes at distance k from the initiator appear at level k of the tree.

1. Synchronous algorithms

In a synchronous network, Flooding solves BFS; see AttiyaWelch, Lemma 2.8, page 21. (Or see some old notes on SynchronousBreadthFirstSearch based on LynchBook §4.2.)

2. Asynchronous algorithms

Here the complication is that we can no longer rely on synchronous communication to reach all nodes at distance d at the same time. So instead we need to keep track of distancs explicitly, or possibly enforce some approximation to synchrony in the algorithm. (A general version of this last approach is to apply a synchronizer to one of the synchronous algorithms: see Synchronizers).

To keep things simple, we'll drop the requirement that a parent learn the IDs of its children, since this can be tacked on as a separate notification protocol, in which each child just sends one message to its parent once it figures out who its parent is.

2.1. A simple algorithm using explicit distances

This is the AsynchBFS automaton from LynchBook §15.4. It's a very simple algorithm, closely related to Dijkstra's algorithm for shortest paths, but there is otherwise no particular reason to use it; it is dominated by the O(D) time and O(DE) message complexity synchronizer-based algorithm described later.

The idea is to run an AsynchronousBroadcast with distances attached. Each node sets its distance to 1 plus the smallest distance sent by its neighbors and its parent to the neighbor supplying that smallest distance. A node notifies all its neighbors of its new distance whenever its distance changes.

In pseudocode:

States: distance, initially 0 for initiator and inf for all other nodes, internal send buffers

Initiator initialization code:
    send distance to all neighbors

All processes:
    upon receiving d from p:
        if d+1 < distance:
            distance := d+1
            parent := p
            send distance to all neighbors

(See LynchBook for a precondition-effect description, which also includes code for buffering outgoing messages.)

The claim is that after at most O(VE) messages and O(D) time, all distance values are equal to the length of the shortest path from the initiator to the appropriate node. The proof is by showing the following

Invariant: distance_p is always the length of some path from initiator to p, and any message sent by p is also the length of some path from initiator to p.
Proof: The second part follows from the first; any message sent equals p's current value of distance. For the first part, suppose p updates its distance; then it sets it to 1+the length of some path from initiator to p', which is the length of that same path extended by adding the pp' edge.

We also need a liveness argument that says that distance_p = d(initiator, p) no later than time d(initiator, p). Note that we can't detect this condition occurring without a lot of additional work.

In LynchBook, there's an extra |V| term in the time complexity that comes from message pile-ups, since the model used there only allows one incoming message to be processed per time unit.¹ The trick to arranging this to happen often is to build a graph where node 1 is connected to nodes 2 and 3, node 2 to 3 and 4, node 3 to 4 and 5, etc. This allows us to quickly generate many paths of distinct distance from 1 to node k, which produce k outgoing messages from node k. (It may be that a more clever analysis can avoid this blowup, by showing that it only happens in a few places.)

2.2. Layering

Here we run up to |V| instances of the simple algorithm with a distance bound on each: instead of sending out just 0, the initiator sends out (0, bound), where bound is initially 1 and increases at each phase. A process only sends out its improved distance if it is less than the bound. When the bound increases, notification of the increase is distributed only through the partial BFS tree constructed so far. With some effort, it is possible to prove that in a bidirectional network that this approach guarantees that each edge is only probed once with a new distance (since distance 1 nodes are recruited before distance 2 nodes) etc, and the bound-update and acknowledgement messages contribute at most |V| messages per phase. So we get O(E + V*diam) total messages. But the time complexity is bad: O(diam²) in the worst case.

2.3. An O(diam) time algorithm for a bidirectional network

The reason the layered algorithm takes so long is that at each phase we have to phone all the way back up the tree to the initiator to get permission to go on to the next phase. We need to do this to make sure that a node is only recruited into the tree once: otherwise we can get pile-ups on the channels as in the simple algorithm. But we don't necessarily need to do this globally. Instead, we'll require each node at distance d to delay sending out a recruiting message until it has confirmed that none of its neighbors will be sending it a smaller distance. We do this by having two classes of messages:<<FootNote(In an earlier version of these notes, these messages where called distance(d) and not-distance(d); the more self-explanatory exactly and more-than terminology is taken from the Boulinier et al. paper cited in the text.>>

exactly(d): "I know that my distance is d."
more-than(d): "I know that my distance is > d."

The rules for sending these messages for a non-initiator are:

I can send exactly(d) as soon as I have received exactly(d-1) from at least one neighbor and more-than(d-2) from all neighbors.
I can send more-than(d) if d = 0 or as soon as I have received more-than(d-1) from all neighbors.

The initiator sends exactly(0) to all neighbors at the start of the protocol (these are the only messages the initiator sends).

My distance will be the unique distance that I am allowed to send in an exactly(d) messages. Note that this algorithm terminates in the sense that every node learns its distance at some finite time.

Comment regarding Synchronizers: The algorithm essentially corresponds to building the alpha synchronizer into the synchronous BFS algorithm, just as the layered model builds in the beta synchronizer. See AttiyaWelch §11.3.2 for a discussion of BFS using synchronizers. The original approach of applying synchronizers to get BFS is due to Awerbuch (JACM, 1985).

Proof of correctness

Under the assumption that local computation takes zero time and message delivery takes at most 1 time unit, we'll show that if distance(initiator, p) = d, (a) p sends more-than(d') for any d' < d by time d', (b) p sends exactly(d) by time d, (c) p never sends more-than(d') for any d' ≥ d, and (d) p never sends exactly(d') for any d' > d. For parts (c) and (d) we use induction on distance; for (a) and (b), induction on time. This is not terribly surprising: (c) and (d) are safety properties, so we don't need to talk about time. But (a) and (b) are liveness properties so time comes in.

Let's start with (c) and (d). The base case is that the initiator (distance 0) never sends anything but exactly(0). Now consider a node p at distance d > 0, and let p' be a neighbor of p at distance d-1. By the induction hypothesis we have (c) p' never sends more-than(d'-1) for any d' ≥ d. Since a precondition for p sending more-than(d') is receiving more-than(d'-1) from p', p never sends more-than(d') for any d' ≥ d and (d) holds for p. Similarly, (c) holds for p, because to send exactly(d') with d' > d it must first receive more-than(d'-2) from p', but then d'-2 > d-2 ≥ d-1 so more-than(d'-2) is not sent by p'.
Now for (a) and (b). The base case is that the initiator sends exactly(0) to all nodes at time 0, giving (a), and there is no more-than(d') with d' < 0 for it to send, giving (b) vacuously; and any non-initiator sends more-than(0) immediately. At time t+1, we have that (a) more-than(t) was sent by any node at distance t+1 or greater by time t and (b) exactly(t) was sent by any node at distance t by time t; so for any node at distance t+2 we send more-than(t+1) no later than time t+1 (because we already received more-than(t) from all our neighbors) and for any node at distance t+1 we send exactly(t+1) no later than time t+1 (because we received all the preconditions for doing so by this time).

Message complexity

A node at distance d sends more-than(d') for all d' < d and distance d and no other messages. So we have message complexity bounded by |E|*diam in the worst case.

Time complexity

It's immediate from (a) and (b) that all messages that are sent are sent by time diam, and indeed that a node learns its distance at time distance(initiator, node). So we have optimal time complexity.

Comment: Our time proof assumes that messages don't pile up on edges, or that such pile-ups don't affect delivery time (this is the default assumption used in AttiyaWelch). A more sophisticated proof could remove this assumption.

One downside of this algorithm is that it has to be started simultaneously at all nodes. Alternatively, we could trigger "time 0" at each node by a broadcast from the initiator, using the usual asynchronous broadcast algorithm; this would give us a BFS tree in O(|E|*diam) messages (since the O(|E|) messages of the broadcast disappear into the constant) and 2*diam time. The analysis of time goes through as before, except that the starting time 0 becomes the time at which the last node in the system is woken up by the broadcast. Further optimizations are possible; see, for example, this improved version due to Boulinier et al.

CategoryDistributedComputingNotes

The model in AttiyaWelch doesn't have this restriction. (1)