概率论_3.1_3.2_3.3
3.1 Random Variables
DEFINITION:
For a given sample space of some experiment, a random variable (rv) is any rule that associates a number with each outcome in(将数字与中每个结果关联起来) . In mathematical language, a random variable is a function whose domain is the sample space and whose range is the set of real numbers.
Random variables are customarily denoted by uppercase letters(大写字母表示), such as X and Y, near the end of our alphabet.We will now use lowercase letters to represent some particular value of the corresponding random variable(用小写字母来表示相应随机变量的某些特定值).
The notation X(s)=x means that x is the value associated with the outcome s by the rv X
DEFINITION:
Any random variable whose only possible values are 0 and 1 is called a Bernoulli random variable(伯努利随机变量).
Two Types of Random Variables
A discrete random variable(离散型随机变量) is an rv whose possible values either constitute a finite set(构成一个有限的集合) or else can be listed in an infinite sequence(在无限序列中列出) in which there is a first element, a second element, and so on ( “countably” infinite,无限可列).
A random variable is continuous(连续型随机变量) if both of the following apply:
- Its set of possible values consists either of all numbers in a single interval on the number line(数轴上单个区间内的所有数字) (possibly infinite in extent(可能是无限的范围),e.g.,from - ∞ \infty ∞ to ∞ \infty ∞)or all numbers in a disjoint union of such intervals(这些区间的不相交并集中的所有数,e.g., [ 0 , 10 ] ∪ [ 20 , 30 ] \left[0,10 \right]\cup\left[20,30 \right] [0,10]∪[20,30])
- No possible value of the variable has positive probability, that is, P(X = c) = 0 for any possible value c.
3.2 Probability Distributions for Discrete Random Variables
p(x) will denote the probability assigned to the value x, i.e.,p(i)=P(X=i).
DEFINITION:
The probability distribution(概率分布) or probability mass function (概率质量函数,pmf) of a discrete rv is defined for every number x by . p(x) = P(X = x) = P(all s ∈ \in ∈ S: X(s)=x).
The conditions p ( x ) ≥ 0 p(x)\ge 0 p(x)≥0and ∑ a l l p o s s i b l e x p ( x ) = 1 \sum_{all \hspace{1mm} possible \hspace{1mm}x}^{} p(x)=1 ∑allpossiblexp(x)=1 are required of any pmf.
A Parameter of a Probability Distribution
DEFINITION:
Suppose p(x) depends on a quantity that can be assigned any one of a number of possible values, with each different value determining a different probability distribution. Such a quantity is called a parameter of the distribution. The collection of all probability distributions for different values of the parameter is called a family(族) of probability distributions.
e.g.,
The Cumulative Distribution Function(累积分布函数)
DEFINITION:
The cumulative distribution function (cdf) F(x) of a discrete rv variable X with pmf p(x) is defined for every number x by
F ( x ) = P ( X ≤ x ) = ∑ y : y ≤ x p ( y ) F(x)=P(X \leq x)=\sum_{y:y\leq x}^{} p(y) F(x)=P(X≤x)=y:y≤x∑p(y)
For any number x, F(x) is the probability that the observed value of X will be at most x.
For X a discrete rv, the graph of F(x) will have a jump at every possible value of X and will be flat(平的) between possible values. Such a graph is called a step function(跳跃函数).
e.g.,
PROPOSITION:
For any two numbers a and b with a ≤ b a\leq b a≤b,
P ( a ≤ X ≤ b ) = F ( b ) − F ( a − ) P(a\leq X \leq b)=F(b)-F(a-) P(a≤X≤b)=F(b)−F(a−)
where “a-” represents the largest possible X value that is strictly less than a.
In particular, if the only possible values are integers and if a and b are integers, then
P ( a ≤ X ≤ b ) = P ( X = a o r a + 1 o r . . . o r b ) = F ( b ) − F ( a − 1 ) P(a\leq X \leq b)=P(X=a\hspace{1mm}or\hspace{1mm}a+1\hspace{1mm}or...or\hspace{1mm}b)=F(b)-F(a-1) P(a≤X≤b)=P(X=aora+1or...orb)=F(b)−F(a−1)
Taking a=b yields P(X = a) = F(a) - F(a - 1) in this case.
3.3 Expected Values
The Expected Value of X
DEFINITION:
Let X be a discrete rv with set of possible values D and pmf p(x). The expected value or mean value of X, denoted by E(X) or μ X μ_X μX or just μ \mu μ, is
E ( X ) = μ X = ∑ x ∈ D x ⋅ p ( x ) E(X)=\mu_X=\sum_{x \in D}^{} x\cdot p(x) E(X)=μX=x∈D∑x⋅p(x)
The Expected Value of a Function
Sometimes interest will focus on the expected value of some function h(X) rather than on just E(X).
PROPOSITION:
If the rv X has a set of possible values D and pmf p(x), then the expected value of any function h(X), denoted by E[h(X)] or μ h ( X ) μ_{h(X)} μh(X) , is computed by
E [ h ( X ) ] = ∑ D h ( x ) ⋅ p ( x ) E[h(X)]=\sum_{D}^{} h(x)\cdot p(x) E[h(X)]=D∑h(x)⋅p(x)
Rules of Expected Value
The h(X) function of interest is quite frequently a linear function . In this case, E[h(X)] is easily computed from E(X).
PROPOSITION:
E ( a X + b ) = a ⋅ E ( X ) + b E(aX+b)=a\cdot E(X) + b E(aX+b)=a⋅E(X)+b
(Or, using alternative notation, μ a X + b = a ⋅ μ X + b \mu_{aX+b}=a\cdot \mu_X+b μaX+b=a⋅μX+b)
Two special cases of the proposition yield two important rules of expected value.
- For any constant a, E ( a X ) = a ⋅ E ( X ) E(aX) = a \cdot E(X) E(aX)=a⋅E(X) (take b = 0)
- For any constant b, E ( X + b ) = E ( X ) + b E(X+b)=E(X)+b E(X+b)=E(X)+b (take a = 1)
The Variance of X
The expected value of X describes where the probability distribution is centered(概率分布的中心位置). Using the physical analogy(类比) of placing point mass p(x) at the value x on a onedimensional axis, if the axis were then supported by a fulcrum(支点) placed at μ \mu μ, there would be no tendency for the axis to tilt(该轴没有倾斜的趋势).
DEFINITION:
Let X have pmf p(x) and expected value μ \mu μ. Then the variance of X, denoted by V(X) or σ X 2 \sigma_X^2 σX2, or just σ 2 \sigma^2 σ2, is
V ( X ) = ∑ D ( x − μ ) 2 ⋅ p ( x ) = E [ ( X − μ ) 2 ] V(X)=\sum_{D}^{} (x-\mu)^2\cdot p(x)=E[(X-\mu)^2] V(X)=D∑(x−μ)2⋅p(x)=E[(X−μ)2]
The standard deviation (SD) of X is
σ X = σ X 2 \sigma_X=\sqrt{\sigma_X^2} σX=σX2
A Shortcut Formula for σ 2 \sigma^2 σ2
The number of arithmetic operations necessary to compute σ 2 \sigma^2 σ2 can be reduced by using an alternative formula.
PROPOSITION:
V ( X ) = σ 2 = [ ∑ D x 2 ⋅ p ( x ) ] − μ 2 = E ( X 2 ) − [ E ( X ) ] 2 V(X)=\sigma^2=[\sum_D^{}x^2\cdot p(x)]-\mu^2=E(X^2)-[E(X)]^2 V(X)=σ2=[D∑x2⋅p(x)]−μ2=E(X2)−[E(X)]2
Rules of Variance
PROPOSITION:
V ( a X + b ) = σ a X + b 2 = a 2 ⋅ σ X 2 a n d σ a X + b = ∣ a ∣ ⋅ σ X V(aX+b)=\sigma_{aX+b}^2=a^2\cdot\sigma_X^2 \hspace{1mm} and \hspace{1mm} \sigma_{aX+b}=|a|\cdot \sigma_X V(aX+b)=σaX+b2=a2⋅σX2andσaX+b=∣a∣⋅σX
In particular,
σ a X = ∣ a ∣ ⋅ σ X , σ X + b = σ X \sigma_{aX}=|a|\cdot \sigma_X,\sigma_{X+b}=\sigma_X σaX=∣a∣⋅σX,σX+b=σX