# SP18:Lecture 33 Independence

We define independence with examples, we also introduce random variables, defining their sum and product.

# Independence

A common situation when modeling experiments is that different events "don't influence each other". For example, if I roll a die twice, it is reasonable to assume that the second roll has no influence on the first, and vice-versa.

Another way of stating this is that, given that the first roll was a 1, the probability that the second roll is 1 is unchanged. Formally:

Definition: Independent events
Two events and are independent if .

Equivalently, we have

Definition: Independent events
Two events and are independent if .

The former definition more closely matches the intuition described above, but the latter definition works if , and also make it clear that independence is symmetric.

Warning: you should not assume events are independent unless you have a good reason for doing so. This is one of the most common mistakes people make when reasoning about probability.

## Example: repeated medical test

Consider the medical test example. We saw that under some reasonable assumptions, the probability that someone has a disease given that the test is positive was 0.1%.

Perhaps this is not a high enough risk to justify performing an invasive procedure. Can we increase our confidence by taking a second test?

In our model, we defined various events: is the event that we have the disease, while is the event that we are healthy.

Let us use to indicate the event that the first test is positive, and to indicate the event that the first test is negative; define and similarly.

We were given in the problem that the false positive rate is 1%; this means that and . Similarly, we have that the false negative rate is 2%; this means that . Finally, we were given that .

Using these facts, we were able to compute that ; the same computation shows that .

Now, suppose both tests come back positive. What is the probability that we have the disease?

We want to compute .

We can organize this information into a probability tree:

However, we don't know what is. And this is sensible: depending on how the test works and what causes false positives, this probability could be anything:

• Perhaps the test gives a false positive if the patient has a genetic anomaly (which 1% of the population has). In this case, rerunning the test will give exactly the same result, so . Using this, we would find that .
• Perhaps the test gives a false positive because the lab technician dropped one of the 100 samples that they were testing and caused an incorrect result. In this case, a second run of the test cannot possibly fail, because there is only one incorrect test; therefore . Using this assumption, we would find that .
• Perhaps different iterations of the test fail independently. In this case, The false positive on the first test doesn't change the probability that the second test is a false positive. In this case, we have ; using the assumption that the first and second tests are independent, we can compute using a probability tree (or using Bayes' rule and the law of total probability):

Focusing on the branch, we see that . In the branch, we see that . By the law of total probability, we have

Using Bayes' rule, we have

# Random variables

Outcomes describe the qualitative aspect of an experiment: what happened? Often, after performing a probabilistic experiment, we want to measure various quantitative aspects of the outcome and relate them to each other.

Random variables are the technical tool we use for this. A random variable gives a numeric value to each outcome:

Definition: Random variable
A (real-valued) random variable on a probability space is a function . More generally, if is any set, an -valued random variable is a function .
• For example, if we were to model a game where I roll a die, and I win \$10 if I roll a 6 and lose \$3 if I roll 3 or less, then a reasonable sample space would be the set , and the winnings would be described by a random variable given by , , and .
• For example, if we were to model an experiment where I select a person and sample their height, then a reasonable sample space would be the set of people, and the random variable of interest would be the function where is the height of person .

## Combining random variables

Random variables are neither "random" nor "variable". However, by defining arithmetic operations on them, we can put them into equations, where they can act like variables.

If and are random variables on a probability space , then is the random variable on given by .

Note: You cannot add random variables on different sample spaces.

Similarly, we can define other operations:

If and are random variables on a probability space , then is the random variable on given by .

Note: You cannot multiply random variables on different sample spaces.

If is a random variable on a probability space , then is the random variable on given by .

As usual, is shorthand for .

For example, suppose we modeled an experiment where we randomly selected a rectangle from a given set. We might have random variables and that give the width and height of the selected rectangle. We could then define a new "area" random variable by multiplying and ; this would work as expected: to find the area of a given outcome, you would measure the width and the height and then multiply them (since by definition, ).

Because we define operations on random variables pointwise, random variables behave the same way as real numbers do. For example,

If , , and are random variables on a probability measure , then .
Proof:
Choose an arbitrary . We have

Thus .