RANDOM VARIABLES
PART 1 Discrete random variables
PART 2 Combining random variables
PART 3 Binomial random variables
PART 4 Geometric random variables
PART 1 Discrete random variables
1. Random variable: is a set of possible numerical values determined by the outcome of a random experiment. It’s a way to map/quantify outcomes of random processes to numbers
- [Eg] Define a random variable , where
2. Random variable vs. Traditional algebra variable
(1) Random variable is usually denoted as capital letter, such as ; traditional algebra variable is usually denoted as lower case, such as
(2) We can assign values/variables to traditional variables or solve the values for them; whereas random variables can take lots of values with different probabilities, and it makes much sense to talk about the probability of a random variable equaling to a value, the probability of a random variable less than/greater than a value, or the probability of a random variable having some properties
3. Types of random variable — discrete vs. continuous
(1) Discrete random variable: has a countable number of possible values. The number of values can be finite and infinite, but we can list the values that a random variable could take on
(2) Continuous random variable: can take all values in a given interval. We can’t count the number of possible values and can’t list the values.
[Eg1]
- is discrete variable
[Eg2] is exact mass of a random animal selected at the New Orleans zoo.
- is continuous variable.
[Eg3] is the year a random student in the class was born
- is discrete variable.
[Eg4] is the number of ants born tomorrow in the universe
- is discrete variable.
[Eg5] is the number of ants born tomorrow in the universe
- is discrete variable
[Eg6] is the exact winning time for the men’s 100m dash in 2016 Olympics
- is continuous variable
[Eg7] is the winning time for the men’s 100m dash in 2016 Olympics rounded to the nearest hundredth
- is discrete variable
4. Expected value of discrete random variable
(1) Assume a discrete random variable can take with the probability of , respectively. The expected value of is
(2) Expected value uses probability to tell us what outcomes to expect in the long run.
[EXERCISE] John just bought a brand new cell phone and is considering buying a warranty. The warranty costs 200 euros and is worth 1000 euros if his phone breaks. John estimates that there is a 10% chance of his phone breaking. Find the expected value of buying the warranty.
[ANSWER]
- If the cell phone breaks, the value of the warranty is 1000-200=800, with 10% probability
- If the cell doesn’t phone break, the value of the warranty is -200=-200, with 90% probability
- So the expected value is
5. Variance and Standard deviation of discrete random variable
Assume a discrete random variable can take with the probability of , respectively.
(1) The variance of is
(2) Standard deviation of is
6. Probability density function: 见Unit04-Modeling data distribution-Part 7
(1) Probability density function(PDF) is used to describe the probability of the outcomes for continuous random variable
(2) Probability mass function(PMF) is used to describe the probability of the outcomes for discrete random variable
7. Impact of transforming random variables — scaling and shifting
Assume is a constant and is a random variable with mean being and standard deviation being , then
(1) the impact of shifting:
(2) the impact of scaling:
PART 2 Combining random variables
1. Effect on mean, standard deviation, and variance
We can form new distributions by combining random variables. If we know the mean and standard deviation of the original distributions, we can use that information to find the mean and standard deviation of the resulting distribution.
- we can combine means directly
- we can variances/standard deviations when the random variables are independent
| Mean | Variance |
Adding: | ||
Subtracting: |
- Make sure that the variables are independent or that it’s reasonable to assume independence, before combining variances.
- Even when we subtract two random variables, we still add their variances; subtracting two variables increases the overall variability in the outcomes.
- We can find the standard deviation of the combined distributions by taking the square root of the combined variances.
2. Why independence matters for variance of sum
Assume is the number of hours a random selected person slept yesterday and
Assume is the number of hours a random selected person was awake yesterday and
- If and are independent, then
- But, actually and are not independent and ; therefore
3. Combining normal random variables
When we combine variables that each follow a normal distribution, the resulting distribution is also normally distributed.
PART 3 Binomial random variables
1. Binomial variables: a binomial random variable represents the number of successes in repeated trials of a Bernoulli experiment. Conditions that make a random variable be a binomial random variable:
(1) the outcome of each trial can be classified as either success or failure
(2) each trial is independent of the others
(3) there is a fixed number of trials
(4) the probability of success on each trial remains constant
2. 10% rule of assuming “independence” between trials
If the sample size is less than or equal to 10% of the population, then we can assume that each trial is independent of the others
[EXERCISE] A division of a company has over 200 employees, 48% of which are male. The company is going to randomly select 10 of these employees to attend a conference. Let the number of male employees chosen. Is a binomial variable? Why or why not?
A. Each trial isn’t being classified as success or failure, so is not a binomial variable
B. There is no fixed number of trials, so is not a binomial variable
C. The trials are not independent, so is not a binomial variable
D. This situation satisfies each of the conditions for a binomial variable, so has a binomial distribution
[ANSWER] D is the correct answer
A. False. It’s a success when the person chosen is a male, and failure when the person chosen is not a male.
B. False. The number of trials is fixed at 10 employees.
C. False. Even though they are being sampled without replacement, the sample size of 10 employees is less than 10% of the population size, so we can consider the trials independent.
D. True. It satisfies 4 conditions for a random variable being a binomial random variable: (1) Each trial has two outcomes(male or not); (2) Results of each trial can be considered independent because of the 10% rule; (3) There is a fixed number of trials, which is 10 employees; (4) The probability of success is the same for each trial, which is 40%, because of the 10% rule.
3. Binomial distribution
(1) Definition: the binomial distribution with parameters and is the discrete probability distribution of the number of successes in Bernoulli trials
(2) If a random variable follows the binomial distribution with parameters and , then we write
(3) Probability mass function: if , the probability of getting exactly successes in independent Bernoulli trials is
(4) Cumulative distribution function:
4. Why binomial distribution important?
For a lot of discrete processes, one might assume that the underlying distribution is a binomial distribution, and it is the sum of any events that have only two outcomes which are pretty frequent.
5. Normal distribution vs. Binomial distribution
(1) Normal distribution describes continuous data which have a symmetric distribution, with a characteristic “bell” shape
(2) Binomial distribution describes the distribution of binary data from a finite sample
(3) Binomial distribution is a discrete version of normal distribution; as you get more and more trials, the binomial distribution is going to approach the normal distribution(the normal distribution can be used as an approximation to the binomial distribution)
(4) A common rule of thumb is that the normal approximation works well when and .
6. Bernoulli distribution
(1) Definition: is a discrete probability distribution for a single Bernoulli trial
(2) Expected value for Bernoulli distribution:
(3) Variance for Bernoulli distribution:
7. Bernoulli distribution vs. Binomial distribution
(1) The Bernoulli distribution represents the success or failure of a single Bernoulli trial
(2) The Binomial distribution represents the number of successes and failures in independent Bernoulli trials
(3) Expected value
- Bernoulli distribution:
- Binomial distribution:
(4) Variance
- Bernoulli distribution:
- Binomial distribution:
PART 4 Geometric random variables
1. Geometric random variable vs. Binomial random variable
(1) Binomial random variable: must follow the following conditions:
- the outcome of each trial can be classified as either success or failure
- each trial is independent of the others
- there is a fixed number of trials
- the probability of success on each trial remains constant
[Eg] is the number of 6 after 12 rolls of fair die, is binomial random variable
- It is asking “How many successes in finite number of trials?”
(2) Geometric random variable: must follow the following conditions:
- the outcome of each trial can be classified as either success or failure
- each trial is independent of the others
- the probability of success on each trial remains constant
[Eg] is the number of rolls until get 6 on fair die, is geometric random variable
- It is asking “How many trials until success?”
2. Geometric distribution: the probability distribution of the number() of Bernoulli trials needed to get one success
(1) Expected value:
(2) Variance:
3. Probability mass function and Cumulative distribution function for Geometric distribution
(1) PMF:
(2) CDF:
(3) = (前 次都失败)=
(4)
[EXERCISE] Violet is taking a computer-adaptive test, where each time she answers a question correctly, the computer gives her a more difficult question. Let be the number of questions Violet answers correctly before she misses one. What type of variable is ?
[ANSWER] It’s neither binomial random variable nor geometric random variable.
- Because it doesn’t have fixed times of trials, it’s not a binomial random variable.
- The probability of success in each trial is not the same because the difficulty of the question changes each time, it’s not a geometric random variable.
PART 5 Law of large numbers
In probability and statistics, the law of large numbers states that as a sample size grows, the mean of the sample gets closer to the expected value of the whole population
PART 6 Poisson distribution
1. Poisson random variable: a count of the number of occurrences of an event in a given unit of time, distance, area, or volume
2. Poisson distribution: is the discrete probability distribution of the number of events occurring in a given time period, given the average number of times the event occurs over that time period.
3. Conditions for Poisson distribution / Poisson random variable:
(1) Events occur independently
- If an event occurs, it doesn’t affect the probability of another event occurring in the same time period.
(2) The rate of occurrence is constant. / The probability that an event occurs in a given length of time doesn’t change through time
(3) The probability of an event occurring is proportional to the length of the time period.
- For example, if a certain fast-food restaurant gets an average of 3 visitors to the drive-through per minute and it follows Poisson distribution; then there should be 6 visitors to the drive-through in a 2-minute period.
4. Probability mass function and Cumulative density function for Poisson distribution
A discrete random variable is said to have a Poisson distribution with parameter , where represents the average number of times an event occurs in a given time interval. Then for
(1) PMF: , represents the number of times an event actually occurs in that given time interval.
(2) CDF:
5. Expected value and Variance for Poisson distribution
(1) Expected value:
(2) Variance: