Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

1. Randomized merging

Suppose we start with n piles of 1 object each, and at each step we pick two remaining piles uniformly at random, and merge them together at a cost equal to the size of the first of the two piles chosen. (For example, we may need to change a pointer in every element of the first pile to point to the second pile.)

What is the exact expected cost of all merges if we stop when there is only one pile left? Hint: compute E[S] = ∑ E[X_m] where X_m is the cost of the merge starting with m piles.
Use moment generating functions to get the best reasonable-looking bound you can on the upper tail of the total cost S. Ideally, your bound should look like Pr[S ≥ (1+δ)(something)] ≤ exp(something).

Note that you can't necessarily use Chernoff bounds directly, since the X_m aren't likely to be independent. Instead, you may find it helpful to try to compute a worst-case upper bound on E[exp(αX_m)|X_n...X_m+1]. A useful fact here is that if ∑ x_i = c, and you want to maximize ∑ f(x_i) subject to x_i ≥ a for all i, then when f is convex the maximum value is obtained by making all but one of the x_i equal to a, and when f is concave the maximum value is obtained by making all of the x_i equal to c/a.

1.1. Solution

Let X_m be the cost of the merge when there are m piles. Then E[X_m] = E[N] where N is the size of the first randomly chosen pile. Calculate E[N] = (1/m) ∑ n_i = n/m, giving E[X_m] = n/m exactly. Now sum from m=2..n to get n(H_n-1).
Let S = ∑ X_m. We want to compute an upper bound on E[e^αS]. As observed above, we do not expect the X_m to be independent, as the creation of a big pile early may allow the creation of still larger piles later. But if we assume the worst possible arrangement of the piles at each stage, we should be able to get an upper bound on E[X_m|X_n...X_m-1].
To do so, we need to maximize E[e^αN] given that ∑ n_i = n. Using the fact that e^αx is a convex function for α>0, the maximum is obtained when one of the n_i (n₁, say) is as larger as possible and the rest are as small as possible. This gives E[exp(αX_m)] ≤ (1/m)e^α(n-m+1) + (1-1/m)e^α = e^α(1+((1/m)e^α(n-m)-1)) ≤ e^α(exp((1/m)e^α(n-m)-1)) < exp(α + (1/m)e^αn). Now take the product for m=2..n to get E[e^αS] < exp((n-1)α + (H_n-1)e^αn) < exp(αn + e^αn ln n).
This gives Pr[S ≥ x] ≤ exp(αn + e^αn ln n - αx) = exp(α(n-x) + e^αn ln n). The right-hand side is minimized by minimizing the exponent α(n-x) + e^αn ln n; differentiating with respect to α gives (n-x) + e^αn n ln n, and after setting this to zero we get α = (1/n) (ln((x-n)/(n ln n)). We can plug this value of α back into the original bound to get
$\begin{align*} \Pr[S \ge x] &\le \exp\left( \frac{n-x}{n} \ln \frac{x-n}{n \ln n} + \exp\left( \ln \frac{x-n}{n \ln n} \right) \right) \\ &= \exp\left( \frac{n-x}{n} \ln \frac{x-n}{n \ln n} + \frac{x-n}{n \ln n} \right) \\ &= \exp\left( \frac{x-n}{n} \left( -\ln \frac{x-n}{n \ln n} + \frac{1}{\ln n} \right) \right) \\ \end{align*}$
This is still a pretty horrifying expression. We can simplify it somewhat by writing x as (1+δ) n ln n, which turns (x-n)/n into δ ln n and (x-n)/(n ln n) into δ. We then get Pr[S ≥ (1+δ) n ln n] ≤ exp(-δ ln n (δ - 1/ln n)) = exp(-δ² ln n + δ).

2. A Collatz-like process

The Collatz conjecture says that if you start with any positive integer x₀, and let x_t+1 = (3x_t+1)/2 if x_t is odd and x_t/2 if x_t is even, then you will eventually get to 1. Since the Collatz conjecture seems to be hard to prove, let's simplify the game by sending x either to 3x/2 (without the extra 1) or to x/2, with the choice of which to do made using a fair coin-flip. Note that in this simplified game, x_t does not have to be an integer.

What is the expected value of x_t (in the simplified game) as a function of t and x₀?
Show that there is a constant c such that ln x_t - ct is a martingale.
Assuming that x₀ = 1, use the Azuma-Hoeffding inequality get a bound on the probability that x_t ≥ N for any fixed N > 0.

2.1. Solution

Compute E[x_t+1|x_t] = (1/2)((3/2)x_t+(1/2)x_t) = x_t. So {x_t} is a martingale and E[x_t] = x₀.
We'll prove this in general for any process of the form Z_t+1 = Z_tM_t+1, where M_t+1 is a random multiplier. Observe that ln Z_t+1 = ln Z_t + ln M_t+1; so E[ln Z_t+1|ℱ_t] = ln Z_t + E[ln M_t+1|ℱ_t]. We've just shown that {ln Z_t} is a submartingale. To make it a martingale, subtract off the predictable process ∑ E[ln M_t+1|ℱ_t] = t E[ln M_t]. This gives c = (ln (3/2) + ln (1/2))/2 = ln(3/4)/2 = ln(√3/2)) ≅ -0.1438; so ln x_t - t ln (√3/2) is a martingale.
Again it's probably easiest to do this first in general and then solve for the specific values for our process. Let Z_t, M_t be as above, and consider the martingale X_t = ln Z_t - t E[ln M_t]. If M_t always lies in the range [a,b], this can go up by at most ln b - E[ln M_t], and never drops to less than ln a - E[ln M_t]. For the specific case of a = 1/2, b = 3/2, the first quantity is ln (3/2) - ln (√3/2) = ln √3, and the second is ln 1/2 - ln (√3/2) = ln 1/√3 = - ln √3. Both steps are bounded in absolute value by ln √3. So we have Pr[(ln x_t - t ln (√3/2)) - ln x₀ ≥ α] ≤ exp(-α²/2 t ln² √3). Under our assumption that x₀ = 1, we can simplify the LHS to Pr[ln x_t - t ln (√3/2) ≥ α], or Pr[x_t ≥ N] where ln N = α + t ln √3/2, which gives α = ln N - t ln (√3/2). Plugging this value into the RHS gives Pr[x_t ≥ N] ≤ exp(-(ln N - t ln (√3/2))²/2 t ln² √3).

This is not a very nice-looking bound, but it does have the very interesting property that for large t the right-hand side is exp(-Θ(t)) (since ln (√3/2) < 0). This means that even though the expected value of x_t is still 1, the probability that it exceeds any fixed bound N > 0 (including values much less than 1) goes to zero exponentially quickly. Generalization to small-stakes gambling games is left as an exercise for the reader.

3. Random back-off

A collection of n processes wish to transmit information across a shared broadcast channel, where a message can be transmitted successfully in a given time step only if exactly one process attempts to broadcast. If more than one process attempts to transmit, we get a collision, which is detected by the process's hardware. Otherwise all processes either see the message (if one process transmits) or the absence of a message (if no process transmits).

Suppose that at time 0, all n processes are active (and thus attempt to transmit). After any step where there is a collision, each active process remains active with some probability p, where 0 < p < 1. After a successful transmission (one active process) or silence (no active processes), all processes become active again.

Show that some process eventually successfully transmits a message, with probability 1, and that the expected time E[T] until this happens is finite. (You do not need to calculate E[T].)
Show that there is a constant c (which may depend on n and p) such that Pr[T ≥ α E[T]] ≤ exp(-cα).
Let A_t be the event that a message is transmitted at time t. Show that for fixed n and p, lim_t→∞ Pr[A_t] converges to a positive constant.

3.1. Solution

The basic idea: The process described is an ergodic Markov chain (with state space 2^[n]), so we can use general properties of Markov chains to prove results about it. Irreducibility is immediate from the fact that every state is reachable with nonzero probability from the all-in state, while the all-in state is reachable in at most two steps from every state via the all-empty state. Aperiodicity is shown by observing that every state with at least two active processes satisfies p_ii > 0.

We now proceed to the specific claims:

Immediate from the fact that all states in an irreducible finite chain are non-null recurrent. More specifically, let y be the state in which only process 1 is active, and let p = min_x≠∅ p_xy. Let T be the worst-case expected time to reach y starting from an arbitrary initial state. Let S be the set of processes. If the initial state is ∅, we have T ≤ 2 + (1-p)T. If the initial state is not ∅, we have T ≤ 1 + (1-p)T. In either case we get T ≤ 2/p.
This is true for all Markov chains. Since Pr[T ≥ eE[T]] ≤ exp(-1), and if we fail we get to try again, we get Pr[T ≥ αE[T]] ≤ exp(-α/e).
Immediate consequence of the ergodic theorem: Let A be the set of states with one active process; then lim_t→∞ Pr[A_t] = lim_t→∞ Pr[X_t∈A] = π(A).