Variance is the average of the square of the distance from the mean. For this reason, variance is sometimes called the “mean square deviation.” Then we take its square root to get the standard deviation—which in turn is called “root mean square deviation.” But why bother squaring? Why not study the actual distance from the mean, namely, the absolute value of \(R - \text[R]\), instead of its root mean square? The answer is that variance and standard deviation have useful properties that make them much more important in probability theory than average absolute deviation. In this section, we’ll describe some of those properties. In the next section, we’ll see why these properties are important.
Applying linearity of expectation to the formula for variance yields a convenient alternative formula. Lemma 19.3.1. \[\nonumber \text[R] = \text[R^2] - \text^2[R],\] for any random variable, \(R\). Here we use the notation \(\text^2 [R]\) as shorthand for \((\text[R])^2\). Proof. Let \(\mu = \text[R].\) Then \[\begin \text[R] &= \text[(R - \text[R])^2] & (\text) \\ &= \text[(R - \mu)^2] & (\text\mu) \\ &= \text[R^2 - 2\mu R + \mu^2] \\ &= \text[R^2] - 2\mu \text[R] + \mu^2 & (\text) \\ &= \text[R^2] - 2\mu^2 + \mu^2 & (\text \mu) \\ &= \text[R^2] - \mu^2 \\ &= \text[R^2] - \text^2 [R]. & (\text \mu) \\ & & \quad \blacksquare \end\] A simple and very useful formula for the variance of an indicator variable is an immediate consequence. Corollary 19.3.2. If \(B\) is a Bernoulli variable where \(p ::= \text[B = 1]\), then \[\label \text[B] = p - p^2 = p(1-p).\] Proof. By Lemma 18.4.2, \(\text[B] = p\). But \(B\) only takes values 0 and 1, so \(B^2 = B\) and equation (\ref) follows immediately from Lemma 19.3.1. \(\quad \blacksquare\)
+ 1 \right) \\ &= p + (1-p)\text[C^2] + (1-p)\left(\frac
+ 1 \right), \quad \text \end\] \[\begin p\text[C^2] &= p + (1-p) \left(\frac
+ 1 \right) \\ &= \frac
\quad \text \end\] \[\nonumber \text[C^2] = \frac.\] Combining this with (\ref) proves Lemma 19.3.3. If failures occur with probability \(p\) independently at each step, and \(C\) is the number of steps until the first failure 2 , then \[\text[C] = \frac.\]
Theorem \(\PageIndex<4>\) [Square Multiple Rule for Variance] Let \(R\) be a random variable and \(a\) a constant. Then \[\text[aR] = a^2 \text[R].\] Proof Beginning with the definition of variance and repeatedly applying linearity of expectation, we have: \[\begin \text[aR] &::= \text[(aR - \text[aR])^2] \\ &= \text[(aR)^2 - 2aR \text[aR] + \text^2 [aR]] \\ &= \text[(aR)^2] - \text[2aR \text[aR]] + \text^2 [aR] \\ &= a^2\text[R^2] - 2 \text[aR] \text[aR] + \text^2 [aR] \\ &= a^2\text[R] - a^2 \text^2 [R] \\ &= a^2 (\text[R^2] - \text^2 [R]) \\ &= a^2 \text[R] & (\text) \\ & & \quad \blacksquare \end\]4>
It’s even simpler to prove that adding a constant does not change the variance, as the reader can verify:
Theorem \(\PageIndex<5>\) Let \(R\) be a random variable and \(b\) a constant. Then \[\text[R + b] = \text[R].\]5>
Recalling that the standard deviation is the square root of variance, this implies that the standard deviation of \(aR + b\) is simply \(|a|\) times the standard deviation of \(R\): Corollary 19.3.6. \[\nonumber \sigma_ <(aR + b)>= |a| \sigma_R.\]
In general, the variance of a sum is not equal to the sum of the variances, but variances do add for independent variables. In fact, mutual independence is not necessary: pairwise independence will do. This is useful to know because there are some important situations, such as Birthday Matching in Section 16.4, that involve variables that are pairwise independent but not mutually independent.
Theorem \(\PageIndex<7>\) If \(R\) and \(S\) are independent random variables, then \[\label \text[R + S] = \text[R] + \text[S].\] Proof We may assume that \(\text[R] = 0\), since we could always replace \(R\) by \(R - \text[R]\) in equation (\ref); likewise for \(S\). This substitution preserves the independence of the variables, and by Theorem 19.3.5, does not change the variances. But for any variable \(T\) with expectation zero, we have \(\text[T] = \text[T^2]\), so we need only prove \[\label \text[(R + S)^2] = \text[R^2] + \text[S^2].\] But (\ref) follows from linearity of expectation and the fact that \[\label \text[RS] = \text[R]\text[S]\] since \(R\) and \(S\) are independent: \[\begin \text[(R + S)^2] &= \text[R^2 + 2RS + S^2] \\ &= \text[R^2] + 2\text[RS] + \text[S^2] \\ &= \text[R^2] + 2\text[R]\text[S] + \text[S^2] & (\text) \\ &= \text[R^2] + 2 \cdot 0 \cdot 0 + \text[S^2] \\ &= \text[R^2] + \text[S^2] \\ & & \quad \blacksquare \end\]7>
It’s easy to see that additivity of variance does not generally hold for variables that are not independent. For example, if \(R = S\), then equation (\ref<19.3.6>) becomes \(\text[R + R] = \text[R] + \text[R]\). By the Square Multiple Rule, Theorem 19.3.4, this holds iff \(4 \text[R] = 2 \text[R]\), which implies that \(\text[R] = 0\). So equation (\ref<19.3.6>) fails when \(R = S\) and \(R\) has nonzero variance. The proof of Theorem 19.3.7 carries over to the sum of any finite number of variables. So we have:19.3.6>
Theorem \(\PageIndex<8>\) [Pairwise Independent Additivity of Variance] If \(R_1, R_2,\ldots, R_n\) are pairwise independent random variables, then \[\text[R_1 + R_2 + \cdots + R_n] = \text[R_1] + \text[R_2] + \cdots + \text[R_n].\]8>
Now we have a simple way of computing the variance of a variable, \(J\), that has an \((n, p)\)-binomial distribution. We know that \(J = \sum_