(See http://www.cs.yale.edu/homes/aspnes/classes/469/notes.pdf for a more up-to-date version of these notes.)
Let us consider probabilistic recurrences of the form T(n) = 1 + T(n-Xn), where Xn is a random variable with 0 < Xn ≤ n and T(0) = 0. We assume we can compute a lower bound on E[Xn] for each n, and we want to translate this lower bound into an upper bound on E[T(n)].
1. Examples
Hoare's QuickSelect algorithm for finding the k-th smallest element of an unsorted array works like QuickSort, only after partitioning the array around a random pivot we throw away the part that doesn't contain our target and recurse only on the surviving piece. How many rounds of this must we do? Here E[Xn] is more complicated, since after splitting our array of size n into piles of size n' and n-n'-1, we have to pick one or the other (or possibly just the pivot alone) based on the value of k.
Suppose we start with n biased coins that each come up heads with probability p. In each round, we flip all the coins and throw away the ones that come up tails. How many rounds does it take to get rid of all of the coins? (This essentially tells us how tall a skip list can get.) Here we have E[Xn] = pn.
In the coupon collector problem, we sample from 1..n with replacement until we see ever value at least once. We can model this by a recurrence in which T(k) is the time to get all the coupons given there are k left that we haven't seen. Here Xn is 1 with probability k/n and 0 with probability (n-k)/n, giving E[Xn] = k/n.
Chutes and ladders without the chutes and ladders. We start at location n, and whenever it's our turn, we roll a fair six-sided die X and move to n-X unless this value is negative, in which case we stay put until the next turn. How many turns does it take to get to 0?
2. The Karp-Upfal-Wigderson bound
Let a be a constant, let T(n) = 1 + T(n-X), where X is an integer-valued random variable satisfying 0 ≤ X ≤ n-a and let T(a) = 0. Let E[X] ≥ μ(n) for all n, where μ is a positive non-decreasing function of n. Then
Proof: This is essentially the same proof as in MotwaniRaghavan, but we add some extra detail to allow for the possibility that X=0. Let p = Pr[X=0], q = 1-p = Pr[X≠0]. Note we have q>0 because otherwise E[X] = 0 < μ(n). Then we have
Now we have E[T(n)] on both sides, which we don't like very much. So we collect it on the left-hand side:
divide both sides by q = 1-p, and apply the induction hypothesis:
The second-to-last step uses the fact that μ(t)≤μ(n) for t≤n.
It may seem like we don't know what E[X|X≠0] is. But we know that X≥0, so we have E[X] = p E[X|X=0] + q E[X|X≠0] = q E[X|X≠0]. So we can solve for E[X|X≠0] = E[X]/q. So let's plug this in:
This concludes the proof. Now we just need to find some applications.
2.1. Quickselect
In Quickselect, we pick a random pivot and split the original array of size n into three piles of size m (less than the pivot), 1 (the pivot itself), and n-m-1 (greater than the pivot). We then figure out which of the three piles contains the k-th smallest element (depend on whether k </=/> m+1) and recurse, stopping when we hit a pile with 1 element. It's easiest to analyze this by assuming that we recurse in the largest of the three piles, i.e., that our recurrence is T(n) = 1 + max(T(m), T(n-m-1)), where m is uniform in 0..n-1. The exact value of E[max(m, n-m-1)] is a little messy to compute (among other things, it depends on whether n is odd or even), but it's not hard to see that it's always less than (3/4)n. So letting μ(n) = n/4, we get
2.2. Tossing coins
Here we have E[Xn] = (1-p)n. If we let μ(n) = (1-p)n and plug into the formula without thinking about it too much, we get
That ln 0 is trouble. We can fix it by making μ(n) = (1-p)⌈n⌉, to get
2.3. Coupon collector
Now that we know how to avoid dividing by zero, this is easy and fun. Let μ(x) = ⌈x⌉/n, then we have
As it happens, this is the exact answer for this case. This will happen whenever X is always a 0-1 variable and we define μ(x) = E[X|n=⌈x⌉], which can be seen by spending far too much time thinking about the precise sources of error in the inequalities in the proof.
2.4. Chutes and ladders
Let μ(n) be the expected drop from position n. We have to be a little bit careful about small n, but we can compute that in general μ(n) = . For fractional values x we will set μ(x) = μ(⌈x⌉) as before. Then we have
We can summarize the values in the following table:
n |
μ(n) |
1/μ(n) |
∑ 1/μ(k) |
1 |
1/6 |
6 |
6 |
2 |
1/2 |
2 |
8 |
3 |
1 |
1 |
9 |
4 |
5/3 |
3/5 |
48/5 |
5 |
5/2 |
2/5 |
10 |
≥6 |
7/2 |
2/7 |
10+(2/7)(n-5)=(2/7)n + 65/7 |
This is a slight overestimate; for example, we can calculate by hand that the expected waiting time for n=2 is 6 and for n=3 is 20/3 = 6 2/3.
We can also consider the generalized version of the game where we start at n and drop by 1..n each turn as long as the drop wouldn't take us below 0. Now the expected drop from position k is k(k+1)/2n and so we can apply the formula to get
The sum of 1/(k(k+1)) where k=1..n happens to have a very nice value; it's exactly n/(n+1).1 So in fact we can rewrite the bound as 2n(n/(n+1)) = 2n2/(n+1)
3. High-probability bounds
So far we have only considered bounds on the expected value of T(n). Suppose we want to show that T(n) is in fact small with high probability, i.e. a statement of the form Pr[T(n) ≥ t] ≤ ε. There are two natural ways to do this: we can repeatedly apply Markov's inequality to the expectation bound, or we can attempt to analyze the recurrence in more detail. The first method tends to give weaker bounds but it's easier.
3.1. Converting expectation bounds to high-probability bounds
Given E[T(n)] ≤ m, we have Pr[T(n) ≥ αm] ≤ α-1. This does not give a very good bound on probability; if we want to show Pr[T(n) ≥ t] ≤ n-c for some constant c (a typical high-probability bound), we need t ≥ ncm. But we can get a better bound if m bounds the expected time starting from any reachable state, as is the case for the class of problems we have been considering.
The idea is that if T(n) exceeds αm, we restart the analysis and argue that Pr[T(n) ≥ 2αm | T(n) ≥ αm] ≤ α-1, from which it follows that Pr[T(n) ≥ 2αm] ≤ α-2. In general, for any non-negative integer k, we have Pr[T(n) ≥ kαm] ≤ α-k. Now we just need to figure out how to pick α to minimize this quanitity for fixed t.
Let t = kαm. Then k = t/αm and we seek to minimize α-t/αm. Taking the logarithm gives -(t/m)(ln α)/α. The t/m factor is irrelevant to the minimization problem, so we are left with minimizing -(ln α)/α. Taking the derivative gives -α-2 + α-2 ln α; this is zero when ln α = 1 or α = e. (This popular constant shows up often in problems like this.) So we get Pr[T(n) ≥ kem] ≤ e-k, or, letting k = ln (1/ε), Pr[T(n) ≥ em ln(1/ε)] ≤ ε.
So, for example, we can get an n-c bound on the probability of running too long by setting our time bound to em ln(nc) = cem ln n = O(m log n). We can't hope to do better than O(m), so this bound is tight up to a log factor.
3.2. Detailed analysis of the recurrence
As Lance Fortnow explains, getting rid of log factors is what theoretical computer science is all about. So we'd like to do better than an O(m log n) bound if we can. In some cases this is not too hard.
Suppose for each n, T(n) = 1 + T(X), where E[X] ≤ αn for a fixed constant α. Let X0 = n, and let X1, X2, etc. be the sequence of sizes of the remaining problem at time 1, 2, etc. Then we have E[X1] ≤ αn from our assumption. But we also have E[X2] = E[E[X2|X1]] ≤ E[αX1,] = αE[X1] ≤ α2n, and by induction we can show that E[Xk] ≤ αkn for all k. Since Xk is integer-valued, E[Xk] is an upper bound on Pr[Xk > 0]; we thus get Pr[T(n) ≥ k] = Pr[Xk > 0] ≤ αkn. We can solve for the value of k that makes this less than ε: k = -log(n/ε) / log α = log1/α n + log1/α (1/ε).
For comparison, the bound on the expectation of T(n) from the KUW formula is H(n)/(1-α). This is actually pretty close to log1/α n when α is close to 1, and is not too bad even for smaller α. But the difference is that the dependence on log(1/ε) is additive with the tighter analysis, so for fixed c we get Pr[T(n) ≥ t] ≤ n-c at t = O(log n + log nc) = O(log n) instead of O(log n log nc) = O(log2 n).
CategoryRandomizedAlgorithmsNotes
Proof: Trivially true for n=0; for larger n, compute ∑n 1/(k(k+1)) = (n-1)/n + 1/(n(n+1)) = ((n-1)(n+1) - 1)/(n(n+1)) = n2/(n(n+1)) = n/(n+1). (1)