SP18:Lecture 33 Independence

From CS2800 wiki

We define independence with examples, we also introduce random variables, defining their sum and product.


Independence

A common situation when modeling experiments is that different events "don't influence each other". For example, if I roll a die twice, it is reasonable to assume that the second roll has no influence on the first, and vice-versa.

Another way of stating this is that, given that the first roll was a 1, the probability that the second roll is 1 is unchanged. Formally:


Definition: Independent events
Two events [math]E_1 [/math] and [math]E_2 [/math] are independent if [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(E_2 \href{/cs2800/wiki/index.php/%5Cmid}{\mid} E_1) = \href{/cs2800/wiki/index.php/Pr}{Pr}(E_2) [/math].

Equivalently, we have

Definition: Independent events
Two events [math]E_1 [/math] and [math]E_2 [/math] are independent if [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(E_1 \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} E_2) = \href{/cs2800/wiki/index.php/Pr}{Pr}(E_1)\href{/cs2800/wiki/index.php/Pr}{Pr}(E_2) [/math].

The former definition more closely matches the intuition described above, but the latter definition works if [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(E_2) = 0 [/math], and also make it clear that independence is symmetric.

Warning: you should not assume events are independent unless you have a good reason for doing so. This is one of the most common mistakes people make when reasoning about probability.

Example: repeated medical test

Consider the medical test example. We saw that under some reasonable assumptions, the probability that someone has a disease given that the test is positive was 0.1%.

Perhaps this is not a high enough risk to justify performing an invasive procedure. Can we increase our confidence by taking a second test?

In our model, we defined various events: [math]D [/math] is the event that we have the disease, while [math]H [/math] is the event that we are healthy.

Let us use [math]P_1 [/math] to indicate the event that the first test is positive, and [math]N_1 [/math] to indicate the event that the first test is negative; define [math]P_2 [/math] and [math]N_2 [/math] similarly.

We were given in the problem that the false positive rate is 1%; this means that [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(P_1 \href{/cs2800/wiki/index.php/%5Cmid}{\mid} H) = 1/100 [/math] and [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(P_2 \href{/cs2800/wiki/index.php/%5Cmid}{\mid} H) = 1/100 [/math]. Similarly, we have that the false negative rate is 2%; this means that [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(N_1 \href{/cs2800/wiki/index.php/%5Cmid}{\mid} D) = \href{/cs2800/wiki/index.php/Pr}{Pr}(N_2 \href{/cs2800/wiki/index.php/%5Cmid}{\mid} D) = 2/100 [/math]. Finally, we were given that [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(D) = 1/10000 [/math].

Using these facts, we were able to compute that [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(D \href{/cs2800/wiki/index.php/%5Cmid}{\mid} P_1) \approx 1/1000 [/math]; the same computation shows that [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(D \href{/cs2800/wiki/index.php/%5Cmid}{\mid} P_2) \approx 1/1000 [/math].

Now, suppose both tests come back positive. What is the probability that we have the disease?

We want to compute [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(D \href{/cs2800/wiki/index.php/%5Cmid}{\mid} P_1 \href{/cs2800/wiki/index.php/%5Ccap}{\cap} P_2) [/math].

We can organize this information into a probability tree:

Repeated-medical-test-tree.svg

However, we don't know what [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(P_2 \href{/cs2800/wiki/index.php/%5Cmid}{\mid} H \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} P_1) [/math] is. And this is sensible: depending on how the test works and what causes false positives, this probability could be anything:

  • Perhaps the test gives a false positive if the patient has a genetic anomaly (which 1% of the population has). In this case, rerunning the test will give exactly the same result, so [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(P_2 \href{/cs2800/wiki/index.php/%5Cmid}{\mid} H \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} P_1) = 1 [/math]. Using this, we would find that [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(D \href{/cs2800/wiki/index.php/%5Cmid}{\mid} P_1 \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} P_2) = \href{/cs2800/wiki/index.php/Pr}{Pr}(D \href{/cs2800/wiki/index.php/%5Cmid}{\mid} P_1) \approx 1/1000 [/math].
  • Perhaps the test gives a false positive because the lab technician dropped one of the 100 samples that they were testing and caused an incorrect result. In this case, a second run of the test cannot possibly fail, because there is only one incorrect test; therefore [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(P_2 \href{/cs2800/wiki/index.php/%5Cmid}{\mid} H \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} P_2) = 0 [/math]. Using this assumption, we would find that [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(D \href{/cs2800/wiki/index.php/%5Cmid}{\mid} P_1 \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} P_2) = 1 [/math].
  • Perhaps different iterations of the test fail independently. In this case, The false positive on the first test doesn't change the probability that the second test is a false positive. In this case, we have [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(P_2 \href{/cs2800/wiki/index.php/%5Cmid}{\mid} H \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} P_1) = \href{/cs2800/wiki/index.php/Pr}{Pr}(P_2 \href{/cs2800/wiki/index.php/%5Cmid}{\mid} H) = 1% [/math]; using the assumption that the first and second tests are independent, we can compute [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(H \href{/cs2800/wiki/index.php/%5Cmid}{\mid} P_1 \href{/cs2800/wiki/index.php/%5Ccap}{\cap} P_2) [/math] using a probability tree (or using Bayes' rule and the law of total probability):

Repeated-medical-test-tree-independent.svg

Focusing on the [math]H [/math] branch, we see that [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(P_1 \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} P_2 \href{/cs2800/wiki/index.php/%5Cmid}{\mid} H) = 10^{-4} [/math]. In the [math]D [/math] branch, we see that [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(P_1 \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} P_2 \href{/cs2800/wiki/index.php/%5Cmid}{\mid} D) = 9604/10^8 \approx 10^{-5} [/math]. By the law of total probability, we have


[math]\href{/cs2800/wiki/index.php/Pr}{Pr}(P_1 \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} P_2) = \href{/cs2800/wiki/index.php/Pr}{Pr}(P_1 \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} P_2 \href{/cs2800/wiki/index.php/%5Cmid}{\mid} H)\href{/cs2800/wiki/index.php/Pr}{Pr}(H) + \href{/cs2800/wiki/index.php/Pr}{Pr}(P_1 \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} P_2 \href{/cs2800/wiki/index.php/%5Cmid}{\mid} D)\href{/cs2800/wiki/index.php/Pr}{Pr}(D) \approx 10^{-4} [/math]


Using Bayes' rule, we have

[math]\href{/cs2800/wiki/index.php/Pr}{Pr}(D \href{/cs2800/wiki/index.php/%5Cmid}{\mid} P_1 \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} P_2) = \frac{\href{/cs2800/wiki/index.php/Pr}{Pr}(P_1 \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} P_2 \href{/cs2800/wiki/index.php/%5Cmid}{\mid} D)\href{/cs2800/wiki/index.php/Pr}{Pr}(D)}{\href{/cs2800/wiki/index.php/Pr}{Pr}(P_1 \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} P_2)} \approx 10^{-5} / 10^{-4} = 1/10 [/math]

Random variables

Outcomes describe the qualitative aspect of an experiment: what happened? Often, after performing a probabilistic experiment, we want to measure various quantitative aspects of the outcome and relate them to each other.

Random variables are the technical tool we use for this. A random variable gives a numeric value to each outcome:

Definition: Random variable
A (real-valued) random variable [math]X [/math] on a probability space [math](\href{/cs2800/wiki/index.php/S}{S},\href{/cs2800/wiki/index.php/Pr}{Pr}) [/math] is a function [math]X : S \href{/cs2800/wiki/index.php/%E2%86%92}{→} \href{/cs2800/wiki/index.php?title=%E2%84%9D&action=edit&redlink=1}{ℝ} [/math]. More generally, if [math]A [/math] is any set, an [math]A [/math]-valued random variable [math]X [/math] is a function [math]X : \href{/cs2800/wiki/index.php/S}{S} \href{/cs2800/wiki/index.php/%E2%86%92}{→} A [/math].
  • For example, if we were to model a game where I roll a die, and I win \$10 if I roll a 6 and lose \$3 if I roll 3 or less, then a reasonable sample space would be the set [math]\href{/cs2800/wiki/index.php/S}{S} \href{/cs2800/wiki/index.php/Definition}{:=} \href{/cs2800/wiki/index.php/Enumerated_set}{\{1,2,\dots,6\}} [/math], and the winnings would be described by a random variable [math]W : \href{/cs2800/wiki/index.php/S}{S} \href{/cs2800/wiki/index.php/%E2%86%92}{→} \href{/cs2800/wiki/index.php?title=%E2%84%9D&action=edit&redlink=1}{ℝ} [/math] given by [math]W(1) := W(2) := W(3) := -3\lt /math [/math], [math]W(4) := W(5) := 0 [/math], and [math]W(6) := 10 [/math].
  • For example, if we were to model an experiment where I select a person and sample their height, then a reasonable sample space would be the set of people, and the random variable of interest would be the function [math]H : \href{/cs2800/wiki/index.php/S}{S} \href{/cs2800/wiki/index.php/%E2%86%92}{→} \href{/cs2800/wiki/index.php?title=%E2%84%9D&action=edit&redlink=1}{ℝ} [/math] where [math]H(p) [/math] is the height of person [math]p [/math].

Combining random variables

Random variables are neither "random" nor "variable". However, by defining arithmetic operations on them, we can put them into equations, where they can act like variables.

If [math]X [/math] and [math]Y [/math] are random variables on a probability space [math](\href{/cs2800/wiki/index.php/S}{S},\href{/cs2800/wiki/index.php/Pr}{Pr}) [/math], then [math]X \href{/cs2800/wiki/index.php?title=%2B&action=edit&redlink=1}{+} Y [/math] is the random variable on [math](\href{/cs2800/wiki/index.php/S}{S},\href{/cs2800/wiki/index.php/Pr}{Pr}) [/math] given by [math](X \href{/cs2800/wiki/index.php?title=%2B&action=edit&redlink=1}{+} Y)(s) := X(s) + Y(s) [/math].

Note: You cannot add random variables on different sample spaces.

Similarly, we can define other operations:

If [math]X [/math] and [math]Y [/math] are random variables on a probability space [math](\href{/cs2800/wiki/index.php/S}{S},\href{/cs2800/wiki/index.php/Pr}{Pr}) [/math], then [math]X \href{/cs2800/wiki/index.php?title=%C2%B7&action=edit&redlink=1}{·} Y [/math] is the random variable on [math](\href{/cs2800/wiki/index.php/S}{S},\href{/cs2800/wiki/index.php/Pr}{Pr}) [/math] given by [math](X \href{/cs2800/wiki/index.php?title=%C2%B7&action=edit&redlink=1}{·} Y)(s) \href{/cs2800/wiki/index.php/Definition}{:=} X(s)·Y(s) [/math].

Note: You cannot multiply random variables on different sample spaces.

If [math]X [/math] is a random variable on a probability space [math](\href{/cs2800/wiki/index.php/S}{S},\href{/cs2800/wiki/index.php/Pr}{Pr}) [/math], then [math]\href{/cs2800/wiki/index.php?title=-&action=edit&redlink=1}{-} X [/math] is the random variable on [math](\href{/cs2800/wiki/index.php/S}{S},\href{/cs2800/wiki/index.php/Pr}{Pr}) [/math] given by [math](\href{/cs2800/wiki/index.php?title=-&action=edit&redlink=1}{-} X)(s) := -X(s) [/math].


As usual, [math]X \href{/cs2800/wiki/index.php?title=-&action=edit&redlink=1}{-} Y [/math] is shorthand for [math]X \href{/cs2800/wiki/index.php?title=%2B&action=edit&redlink=1}{+} (\href{/cs2800/wiki/index.php?title=-&action=edit&redlink=1}{-}Y) [/math].

For example, suppose we modeled an experiment where we randomly selected a rectangle from a given set. We might have random variables [math]W [/math] and [math]H [/math] that give the width and height of the selected rectangle. We could then define a new "area" random variable by multiplying [math]W [/math] and [math]H [/math]; this would work as expected: to find the area of a given outcome, you would measure the width and the height and then multiply them (since by definition, [math]A(s) = (W·H)(s) = W(s)H(s) [/math]).

Because we define operations on random variables pointwise, random variables behave the same way as real numbers do. For example,


If [math]X [/math], [math]Y [/math], and [math]Z [/math] are random variables on a probability measure [math](\href{/cs2800/wiki/index.php/S}{S},\href{/cs2800/wiki/index.php/Pr}{Pr}) [/math], then [math]X(Y + Z) = XY + XZ [/math].
Proof:
Choose an arbitrary [math]s \href{/cs2800/wiki/index.php/%E2%88%88}{∈} \href{/cs2800/wiki/index.php/S}{S} [/math]. We have

[math]\begin{align*} \left(X(Y + Z)\right)(s) &= X(s)\left(Y+Z\right)(s) && \href{/cs2800/wiki/index.php/%C2%B7_(random_variables)}{\text{by definition of ·}} \\ &= X(s)\left(Y(s) + Z(s)\right) && \href{/cs2800/wiki/index.php/%2B_(random_variables)}{\text{by definition of +}} \\ &= X(s)Y(s) + X(s)Z(s) && \href{/cs2800/wiki/index.php/Arithmetic}{arithmetic} \\ &= (XY)(s) + (XZ)(s) && \href{/cs2800/wiki/index.php/%C2%B7_(random_variables)}{\text{by definition of ·}} \\ &= (XY + XZ)(s) && \href{/cs2800/wiki/index.php/%2B_(random_variables)}{\text{by definition of +}} \\ \end{align*} [/math]

Thus [math]X(Y+Z) \href{/cs2800/wiki/index.php/Equality_(functions)}{=} XY + XZ [/math].