Note: You are looking at a static copy of the former PineWiki site, used for class notes by James Aspnes from 2003 to 2012. Many mathematical formulas are broken, and there are likely to be other bugs as well. These will most likely not be fixed. You may be able to find more up-to-date versions of some of these notes at http://www.cs.yale.edu/homes/aspnes/#classes.

The expected value or expectation of some quantity X (written E[X] or sometimes just EX) is just the average of all possible values of X weighted by their probabilities. Formally, if X takes on values in some set S, then

E[X] = Sigma_{x in S} x Pr[X=x]

From the definition it is not hard to show several useful properties about expectations:

E[aX] = a E[X] when a is a constant.
E[X+Y] = E[X]+E[Y].

These properties together are known as linearity of expectation.

The additive property applies for any random values X and Y, even if they are not independent. For example, consider an algorithm that splits an input array of n elements into two arrays, where one contains anywhere from 0 to n elements (with all n+1 possibilities equally likely) and the other contains the rest; it then runs a Theta(n²) subroutine on both subarrays. Let T₁ be the running time of the subroutine on the first array, and let T₂ be the running time on the second array. Let T be the total running time. Then E[T] = E[T₁+T₂] = E[T₁]+E[T₂] = 2E[T₁] (by symmetry). The value of E[T₁] can be calculated from the definition (it's Theta(n²)). In this analysis, it is possible to consider each of T₁ and T₂ separately, even though their actual values are closely linked.

For more complicated calculations it can sometimes be useful to split the possibilities into several cases. The conditional expectation of a variable X given an event A (written E[X|A]) is the average value of X over all cases where A occurs. Formally, it is defined for a variable X that takes on values in S by

E[X|A] = Sigma_{x in S} x Pr[X=x|A],

where Pr[X=x|A] is the probability that X equals x conditioned on A, which is in turn defined formally by

Pr[X=x|A] = Pr[X=x and A]/Pr[A].

The usefulness of conditional probabilities is that if we have a set of mutually exclusive events A₁, A₂, A_n whose probabilities sum to 1---representing some set of n cases that together exhaust all possibilities---we can compute the expectation of X by taking a weighted average over the conditional expectations in each case:

E[X] = Sigma_{i=1 to n} Pr[A_i]E[X|A_i].

The formal definition of E[X] that we gave up above is actually a special case of this formula.

1. Examples

Suppose X takes on the values 1 through n with equal probabilities. What is E[X²]? Use the conditional probability formula with event A_i being the event that X=i: E[X²] = Sigma_{i=1 to n} i²*(1/n) = (1/n)Sigma_{i=1 to n}i² = (1/n)Theta(n³) = Theta(n²).
A startup offers you a job that involves working 120 hours per week for a year, at the end of which you will be vested with stock options that will have a value somewhere between $0 and $10 million, with all values in this range equally likely. However, there is a 40% chance that you will die of overwork before you can collect, which standard formulas used in cost-benefit analysis count as a $4 million loss. Your expected return if you don't die is $5 million; your total expected return is thus (-4000000)*0.40+(5000000)*0.60 = $1,400,000, an excellent salary for a year's work. (This example may show why looking only at expectations may not always give a complete picture of random outcomes.)

CategoryAlgorithmNotes