SP18:Lecture 32 Conditional probability

From CS2800 wiki
Revision as of 12:42, 8 May 2018 by {{GENDER:Mdg39|[math]'"2}} [/math]'"7
(<math>1) </math>2 | <math>3 (</math>4) | <math>5 (</math>6)

We introduce Conditional probability, which is a tool for stating and interpreting facts about a probability space. We give definitions and prove some basic results.

Conditional probability

The conditional probability of [math]B [/math] given [math]A [/math] is the probability of [math]B ∩ A [/math], "scaled up" so that the probability of [math]A [/math] given [math]A [/math] is 1.

Conditional-probability.svg

Formally:

If [math]A [/math] and [math]B \href{/cs2800/wiki/index.php/%E2%8A%86}{⊆} \href{/cs2800/wiki/index.php/S}{S} [/math] are events, then the conditional probability of [math]B [/math] given [math]A [/math] (written [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(B \href{/cs2800/wiki/index.php/%5Cmid}{\mid} A) [/math]) is given by


[math]\href{/cs2800/wiki/index.php/Pr}{Pr}(B \href{/cs2800/wiki/index.php/%5Cmid}{\mid} A) \href{/cs2800/wiki/index.php/Definition}{:=} \frac{\href{/cs2800/wiki/index.php/Pr}{Pr}(B \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} A)}{\href{/cs2800/wiki/index.php/Pr}{Pr}(A)} [/math]


For example, suppose we wish to model the following experiment: we first select one of two coins. The first coin (coin a) is weighted: it lands heads 3/4 of the time. The second coin (coin b) is fair: it lands heads 1/2 of the time. We choose the first coin 1/3 of the time. We want to find the probability of getting heads.

How do we interpret the facts given in the problem?

We first construct a sample space: there are 4 things that can happen: we can choose coin a and flip heads, we can choose coin a and tails, we can choose coin b and flip heads, or we could choose coin b and flip tails. A reasonable sample space would be [math]\href{/cs2800/wiki/index.php/S}{S} = \href{/cs2800/wiki/index.php/Enumerated_set}{\{a,b\}} \href{/cs2800/wiki/index.php/%5Ctimes}{\times} \href{/cs2800/wiki/index.php/Enumerated_set}{\{h,t\}} = \{(a,h),(a,t),(b,h),(b,t)\} [/math].

It is (always) helpful to define some events: let [math]A \href{/cs2800/wiki/index.php/Definition}{:=} \{(a,h),(a,t)\} [/math] be the event that we pick coin a, and [math]H \href{/cs2800/wiki/index.php/Definition}{:=} \{(a,h),(b,h)\} [/math] be the event that we flip heads; define [math]B [/math] and [math]T [/math] similarly.

Now we need to interpret the probabilities given in the problem. When we say "[coin a] lands heads 3/4 of the time", we don't mean that 3/4 of the time we choose coin a and flip it and get heads (this would be [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(A \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} H) [/math]). Rather, we mean that if we restrict our attention to the outcomes where we chose coin a, then the probability of getting heads in that restricted experiment is 3/4. Put more simply, the probability that we get heads given that we choose coin a is 3/4.

We interpret this in our model by setting [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(H \href{/cs2800/wiki/index.php/%5Cmid}{\mid} A) = 3/4 [/math]. Since we choose coin a with probability 1/3, we see that [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(H \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} A) = \href{/cs2800/wiki/index.php/Pr}{Pr}(H \href{/cs2800/wiki/index.php/%5Cmid}{\mid} A) \cdot \href{/cs2800/wiki/index.php/Pr}{Pr}(A) = 1/4 [/math]: we would expect to select coin a and flip heads in about a quarter of the experiments.

Similarly, [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(H \href{/cs2800/wiki/index.php/%5Cmid}{\mid} B) = 1/2 [/math] so [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(H \href{/cs2800/wiki/index.php/%5Ccap}{\cap} B) = 1/3 [/math].

Since we can only select one of the coins, the events [math]A [/math] and [math]B [/math] are disjoint, so we can use the third Kolmogorov axiom to compute [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(H) [/math]:

[math]\href{/cs2800/wiki/index.php/Pr}{Pr}(H) = \href{/cs2800/wiki/index.php/Pr}{Pr}((H \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} A) \href{/cs2800/wiki/index.php/%E2%88%AA}{∪} (H \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} B)) = \href{/cs2800/wiki/index.php/Pr}{Pr}(H \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} A) + \href{/cs2800/wiki/index.php/Pr}{Pr}(H \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} B)) = 1/3 + 1/4 [/math]

Probability trees

A useful way to organize information about events in a probability space is by drawing a probability tree. Here the branches corrsepond to events, and the edges are weighted by the corresponding conditional probabilities.

Consider the experiment where we choose coin a 1/3 of the time, and coin b 2/3 of the time, and where coin a lands heads 3/4 of the time and coin b lands heads 1/2 of the time.

We can draw a tree to organize these events into a tree:


Probability-tree.svg


The vertices in the tree represent events; the event [math]E_1 [/math] is a child of [math]E_2 [/math] if [math]E_1 \href{/cs2800/wiki/index.php/%E2%8A%86}{⊆} E_2 [/math]. The number on the edge from [math]E_1 [/math] to [math]E_2 [/math] is the conditional probability of [math]E_2 [/math] given [math]E_1 [/math].

The probability of an event in the tree can be found by multiplying the probabilities on the path leading to that event. This comes from the definition of conditional probability: if [math]E_1 \href{/cs2800/wiki/index.php/%E2%8A%86}{⊆} E_2 [/math] then [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(E_1) = \href{/cs2800/wiki/index.php/Pr}{Pr}(E_1 \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} E_2) = \href{/cs2800/wiki/index.php/Pr}{Pr}(E_1 \href{/cs2800/wiki/index.php/%5Cmid}{\mid} E_2)\href{/cs2800/wiki/index.php/Pr}{Pr}(E_2) [/math].

Useful formulas

Bayes' rule

Bayes' rule (also called Bayes' law or Bayes' identity) is a simple equation relating [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(B \href{/cs2800/wiki/index.php/%5Cmid}{\mid} A) [/math] and [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(A \href{/cs2800/wiki/index.php/%5Cmid}{\mid} B) [/math]:

[math]\href{/cs2800/wiki/index.php/Pr}{Pr}(A \href{/cs2800/wiki/index.php/%5Cmid}{\mid} B) = \frac{\href{/cs2800/wiki/index.php/Pr}{Pr}(B \href{/cs2800/wiki/index.php/%5Cmid}{\mid} A) \href{/cs2800/wiki/index.php/Pr}{Pr}(A)}{\href{/cs2800/wiki/index.php/Pr}{Pr}(B)} [/math]
Proof:
By definition, [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(A \href{/cs2800/wiki/index.php/%5Cmid}{\mid} B) = \href{/cs2800/wiki/index.php/Pr}{Pr}(A \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} B) / \href{/cs2800/wiki/index.php/Pr}{Pr}(B) [/math] and [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(B \href{/cs2800/wiki/index.php/%5Cmid}{\mid} A) = \href{/cs2800/wiki/index.php/Pr}{Pr}(A \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} B) / \href{/cs2800/wiki/index.php/Pr}{Pr}(A) [/math]. Multiplying by the denominators gives [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(A \href{/cs2800/wiki/index.php/%5Cmid}{\mid} B) \href{/cs2800/wiki/index.php/Pr}{Pr}(B) = \href{/cs2800/wiki/index.php/Pr}{Pr}(A \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} B) = \href{/cs2800/wiki/index.php/Pr}{Pr}(B \href{/cs2800/wiki/index.php/%5Cmid}{\mid} A)\href{/cs2800/wiki/index.php/Pr}{Pr}(A) [/math]. Dividing by [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(B) [/math] gives the result.

Law of total probability

Often, we have several events that partition the sample space. For example, we may have events like "the die is even" (call this event [math]A_1 [/math]) and "the die is odd" (this event is [math]A_2 [/math]); one of the two must happen (so [math]A_1 \href{/cs2800/wiki/index.php/%E2%88%AA}{∪} A_2 = S [/math]) but they cannot both happen (so [math]A_1 \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} A_2 = \href{/cs2800/wiki/index.php/%E2%88%85}{∅} [/math]).

In this case, there is an easy way to compute the probability of another event [math]B [/math] by considering it separately in the [math]A_1 [/math] case and the [math]A_2 [/math] case:

If [math]A_1 [/math], [math]A_2 [/math], [math]\dots [/math], [math]A_n [/math] partition the sample space, then for any [math]B [/math],


[math]\href{/cs2800/wiki/index.php/Pr}{Pr}(B) = \sum \href{/cs2800/wiki/index.php/Pr}{Pr}(B \href{/cs2800/wiki/index.php/%5Cmid}{\mid} A_i)\href{/cs2800/wiki/index.php/Pr}{Pr}(A_i) = \href{/cs2800/wiki/index.php/Pr}{Pr}(B \href{/cs2800/wiki/index.php/%5Cmid}{\mid} A_1)\href{/cs2800/wiki/index.php/Pr}{Pr}(A_1) + \href{/cs2800/wiki/index.php/Pr}{Pr}(B \href{/cs2800/wiki/index.php/%5Cmid}{\mid} A_2)\href{/cs2800/wiki/index.php/Pr}{Pr}(A_2) + \cdots + \href{/cs2800/wiki/index.php/Pr}{Pr}(B \href{/cs2800/wiki/index.php/%5Cmid}{\mid} A_n)\href{/cs2800/wiki/index.php/Pr}{Pr}(A_n) [/math]
Proof: Law of total probability
The proof is pretty clear from the following picture:

Law-of-total-probability.svg

Since the [math]A_i [/math] are disjoint, we have that the sets [math]B \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} A_i [/math] are disjoint; and since every element of [math]S [/math] is in one of the [math]A_i [/math], we have that every element of [math]B [/math] is in one of the [math]B \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} A_i [/math].

Therefore, we can apply the third Kolmogorov axiom to conclude

[math]\begin{aligned} \href{/cs2800/wiki/index.php/Pr}{Pr}(B) &= \href{/cs2800/wiki/index.php/Pr}{Pr}((B \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} A_1) \href{/cs2800/wiki/index.php/%E2%88%AA}{∪} \cdots \href{/cs2800/wiki/index.php/%E2%88%AA}{∪} (B \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} A_n)) && \text{as argued above} \\ &= \href{/cs2800/wiki/index.php/Pr}{Pr}(B \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} A_1) + \cdots + \href{/cs2800/wiki/index.php/Pr}{Pr}(B \href{/cs2800/wiki/index.php/%E2%88%A9}{∩} A_n) && \href{/cs2800/wiki/index.php/Kolmogorov_axiom}{\text{Kologorov's third axiom}} \\ &= \href{/cs2800/wiki/index.php/Pr}{Pr}(B \href{/cs2800/wiki/index.php/%5Cmid}{\mid} A_1)\href{/cs2800/wiki/index.php/Pr}{Pr}(A_1) + \cdots + \href{/cs2800/wiki/index.php/Pr}{Pr}(B \href{/cs2800/wiki/index.php/%5Cmid}{\mid} A_n)\href{/cs2800/wiki/index.php/Pr}{Pr}(A_n) && \href{/cs2800/wiki/index.php/Conditional_probability}{\text{by definition of }Pr(B \mid A)} \end{aligned} [/math]

as required.

Medical test example

Suppose a patient takes a medical test to see if they have a rare disease. The disease is rare: only 1/10,000 people have it. The test has a very good false positive rate of 1% (that is, of the people who don't have the disease, 1% of them still test positive) and a false negative rate of 2% (of the people who do have the disease, 2% of them test negative).

If a patient takes the test and gets a positive result, what is the probability that they have the disease?

We can model this problem probabilistically. Let [math]D [/math] represent the event where the patient has the disease, and let [math]H [/math] be the event where the patient is healthy. Let [math]P [/math] be the event representing a positive test result, and let [math]N [/math] be the event that the test is negative.

We can interpret the facts from the problem:

  • the disease is rare: [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(D) = 1/10000 [/math] (and therefore [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(H) = 1 - \href{/cs2800/wiki/index.php/Pr}{Pr}(D) = 9999/10000 [/math]).
  • the false positive rate is 1%: [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(P|H) = 1/100 [/math].
  • the false negative rate is 2%: [math]\href{/cs2800/wiki/index.php/Pr}{Pr}(N|D) = 2/100 [/math].

We are interested in the probability that the patient has the disease, given that they tested positive. In other words, we want to find [math]Pr(D|P) [/math].

We can apply Bayes' rule and the law of total probability (since [math]H [/math] and [math]D [/math] partition the sample space):


[math]Pr(D|P) = \frac{Pr(P|D)Pr(D)}{Pr(P)} = \frac{Pr(P|D)Pr(D)}{Pr(P|D)Pr(D) + Pr(P|H)Pr(H)} [/math]


We need [math]Pr(P|D) [/math], in other words, what is the probability that the test results are positive, given that someone has the disease. Intuitively, this should be 98% (1 - Pr(N|D)). And indeed it is. You can prove this using the fact that conditional probabilities satisfy Kolmogorov's axioms.


Plugging this in, we get


[math]\href{/cs2800/wiki/index.php/Pr}{Pr}(D\href{/cs2800/wiki/index.php/%5Cmid}{\mid}P) = \frac{98/100 \cdot 1/10000}{98/100 \cdot 1/10000 + 1/100 \cdot 9999/10000} = 98 / (98 + 9999) \approx 1/1000 [/math]


Perhaps this is surprising; you might expect that a positive result on a good test means you have the disease with high probability. And indeed, you have learned a great deal: your chances of having the disease went up by a factor of 10. However, because the disease is still rare, you are still not particularly likely to have it.

However, you might want to have further testing done; see the repeated medical test example.