文章目录
课程分数构成
- 日常表现:25%
- Closed-book final exam 75%
chapter 1 introduction to probability
experiments and Events
- experiment: any process, real or hypothetical, in which the possible outcomes can be identified ahead of time.
- Sample space: a set of all possible outcomes.
- exhausive: all possible outcomes should be included.
- mutually exlucsive(互斥的)
- All event: any subset of the sample space
Set Theory
- Member Relation: elements s belongs to Set S s ∈ S s\in S s∈S
- Containment A ⊂ B B ⊃ A A\subset B\\B\supset A A⊂BB⊃A
- empty set A = Φ A = \Phi A=Φ
- Countable; Uncountable
- complement A c = { s ∈ S ∣ s ∉ A } A^c = \{s\in S\mid s\notin A\} Ac={s∈S∣s∈/A}
- union of Sets
A
∪
B
=
{
s
∈
S
∣
s
∈
A
o
r
s
∈
B
}
A\cup B=\{ s\in S \mid s\in A\ or s\in B\}
A∪B={s∈S∣s∈A ors∈B}
- union of many sets ∪ i = 1 n A i \cup^n_{i=1}A_i ∪i=1nAi
- intersection of Sets A ∩ B A\cap B A∩B
- difference of Sets A − B = A ∩ B c = A − A ∩ B A-B = A\cap B^c = A-A\cap B A−B=A∩Bc=A−A∩B
- disjoint/mutually exclusive A ∩ B = ϕ A\cap B = \phi A∩B=ϕ
- Theorem 1.4.11 Partition a Set A = ( A ∩ B ) ∪ ( A ∩ B c ) B = ( A ∩ B ) ∪ ( A C ∩ B ) A = (A\cap B)\cup (A\cap B^c)\\B = (A\cap B)\cup(A^C\cap B) A=(A∩B)∪(A∩Bc)B=(A∩B)∪(AC∩B)
- properties of set operations
- commutative laws
- associative laws
- distributive laws A ∪ ( B ∩ C ) = ( A ∪ B ) ∩ ( A ∪ C ) A ∩ ( B ∪ C ) = ( A ∩ B ) ∪ ( A ∩ C ) A\cup (B\cap C) = (A\cup B)\cap (A\cup C)\\A\cap(B\cup C) = (A\cap B)\cup (A\cap C) A∪(B∩C)=(A∪B)∩(A∪C)A∩(B∪C)=(A∩B)∪(A∩C)
- morgan’s laws ( A ∪ B ) c = A c ∩ B c ( A ∩ B ) c = A c ∪ B c (A\cup B)^c = A^c\cap B^c\\(A\cap B)^c = A^c\cup B^c (A∪B)c=Ac∩Bc(A∩B)c=Ac∪Bc
- must occur必然事件
The definition of probability: Axioms and Theorems
- probability: A probabilistic measure, or simply a probability, on a sample space S is a specification of numbers Pr(A) for all event A that satisfy Axioms 1, 2, and 3.
- Not disjoint(share common elements): P r ( E ∪ F ) = P r ( E ) + P r ( F ) − P r ( E ∩ F ) Pr(E\cup F) = Pr(E)+Pr(F)-Pr(E\cap F) Pr(E∪F)=Pr(E)+Pr(F)−Pr(E∩F)
finite sample space
- Finite sample space: a sample space with a finite number of possible outcomes 有限样本空间 P r ( A ) = m n Pr(A) = \frac{m}{n} Pr(A)=nm
- m is the number of outcomes in an event
- n is the number of outcomes in the sample space
Counting Methods
- Multiplication Rule
- Permutations
- sampling without replacement没有放回的抽样
- P n , k = n ( n − 1 ) ⋯ ( n − k + 1 ) = n ! ( n − k ) ! p = P n , k n k P_{n,k} = n(n-1)\cdots (n-k+1) = \frac{n!}{(n-k)!}\\p = \frac{P_{n,k}}{n^k} Pn,k=n(n−1)⋯(n−k+1)=(n−k)!n!p=nkPn,k
- Combinational Methods
- C n , k = ( n k ) = n ! k ! ( n − k ) ! C_{n,k} = \binom{n}{k} = \frac{n!}{k!(n-k)!} Cn,k=(kn)=k!(n−k)!n!
- 只关心数目k,不关心抽出k的内部次序。
- Unordered Sampling with replacement: blood Types and selecting baked goods C n + k − 1 , k C_{n+k-1,k} Cn+k−1,k
Multinomial Coefficients
( n n 1 n 2 ⋯ n k ) = n ! n 1 ! n 2 ! n 3 ! ⋯ n k ! \binom{n}{n_1\ n_2\ \cdots\ n_k}=\frac{n!}{n_1!n_2!n_3!\cdots n_k!} (n1 n2 ⋯ nkn)=n1!n2!n3!⋯nk!n!
The probability of a Union of Events
- for 3 events P r ( A ∪ B ∪ C ) = P r ( A ) + P r ( B ) + P r ( C ) − P r ( A ∩ B ) − P r ( A ∩ C ) − P r ( B ∩ C ) + P r ( A ∩ B ∩ C ) Pr(A\cup B\cup C)= Pr(A)+Pr(B)+Pr(C)-Pr(A\cap B)-Pr(A\cap C)-Pr(B\cap C)+Pr(A\cap B\cap C) Pr(A∪B∪C)=Pr(A)+Pr(B)+Pr(C)−Pr(A∩B)−Pr(A∩C)−Pr(B∩C)+Pr(A∩B∩C)
Chapter 2 conditional probability
definition of the conditional probability
P
r
(
A
∣
B
)
=
P
r
(
A
∩
B
)
P
r
(
B
)
Pr(A\mid B) = \frac{Pr(A\cap B)}{Pr(B)}
Pr(A∣B)=Pr(B)Pr(A∩B)
Mutiplication tule:
P
r
(
A
∩
B
)
=
P
r
(
A
∣
B
)
P
r
(
B
)
Pr(A\cap B) = Pr(A\mid B)Pr(B)
Pr(A∩B)=Pr(A∣B)Pr(B)
More terms:
P
r
(
A
∩
B
∩
C
∩
D
)
=
P
r
(
A
)
P
r
(
B
∣
A
)
P
r
(
C
∣
A
∩
B
)
P
r
(
D
∣
A
∩
B
∩
C
)
Pr(A\cap B\cap C\cap D) = Pr(A)Pr(B\mid A)Pr(C\mid A\cap B)Pr(D\mid A\cap B\cap C)
Pr(A∩B∩C∩D)=Pr(A)Pr(B∣A)Pr(C∣A∩B)Pr(D∣A∩B∩C)
for complement:
P
r
(
A
c
∣
B
)
=
1
−
P
r
(
A
∣
B
)
P
r
(
A
c
∣
B
∩
C
)
=
1
−
P
r
(
A
∣
B
∩
C
)
Pr(A^c\mid B) = 1-Pr(A\mid B)\\Pr(A^c\mid B\cap C) = 1-Pr(A\mid B\cap C)
Pr(Ac∣B)=1−Pr(A∣B)Pr(Ac∣B∩C)=1−Pr(A∣B∩C)
conditional version of Muliplication Rule
P r ( A 1 ∩ A 2 ∩ ⋯ ∩ A n ∣ B ) = P r ( A 1 ∣ B ) P r ( A 2 ∣ A 1 ∩ B ) P r ( A 3 ∣ A 1 ∩ A 2 ∩ B ) ⋯ P r ( A n ∣ A 1 ∩ A 2 ∩ ⋯ ∩ A n − 1 ∩ B ) Pr(A_1\cap A_2\cap \cdots \cap A_n \mid B) = Pr(A_1\mid B)Pr(A_2\mid A_1\cap B)Pr(A_3\mid A_1\cap A_2\cap B)\cdots Pr(A_n\mid A_1\cap A_2 \cap \cdots\cap A_{n-1}\cap B) Pr(A1∩A2∩⋯∩An∣B)=Pr(A1∣B)Pr(A2∣A1∩B)Pr(A3∣A1∩A2∩B)⋯Pr(An∣A1∩A2∩⋯∩An−1∩B)
partitions
P r ( A ) = P r ( B ) P r ( A ∣ B ) + P r ( B c ) P r ( A ∣ B c ) Pr(A) = Pr(B)Pr(A\mid B)+Pr(B^c)Pr(A\mid B^c) Pr(A)=Pr(B)Pr(A∣B)+Pr(Bc)Pr(A∣Bc)
Augmented Experiment
- if desired, any experiment can be augmented to include the potential or hypothetical observation of as much additional information as we would find useful to help us calculate any probabilities that we desire.
independent events
- the occurrence of B B B do not affect A A A, then we say A A A and B B B are independent.
- A A A and B B B are not independent, but they would be independent if C C C occur, A A A and B B B are conditionally independent given C C C.
- Two events A A A and B B B are independent if P r ( A ∣ B ) = P r ( A ) Pr(A\mid B) = Pr(A) Pr(A∣B)=Pr(A)
- Theorem 2.2.1
If 2 events A A A and B B B are independent, then the events A A A and B c B^c Bc Are also independent. - Mutually Independent Events
- Mutually Exclusive ≠ \ne = Mutually Independence
- Theorem 2.2.3
Let n > 1 n>1 n>1 and let A 1 , ⋯ , A n A_1, \cdots, A_n A1,⋯,An be events that are mutually exclusive. The events are also mutually independent if and only if all the events except possibly one of them has probability 0.
P r ( A i ∩ A j ) = P r ( A i ) P r ( A j ) = 0 Pr(A_i\cap A_j) = Pr(A_i)Pr(A_j) = 0 Pr(Ai∩Aj)=Pr(Ai)Pr(Aj)=0
Summary:
- A collection of events is independent if and only if that some of them occur does not change the probability that any combination of the rest of them occurs.
- Equivalently, a collection of events is independent if and only if the probability of the intersection of every sub collection is the product of the individual probabilities. P r ( A ∩ B ) = P r ( A ) P r ( B ) Pr(A\cap B) = Pr(A)Pr(B) Pr(A∩B)=Pr(A)Pr(B)
- The concept of independence has a version conditional on another event.
Bayes’ Theorem
P r ( B i ∣ A ) = P r ( A ∩ B i ) A = P r ( B i ) P r ( A ∣ B i ) ∑ j = 1 k P r ( B j ) P r ( A ∣ B j ) Pr(B_i\mid A )= \frac{Pr(A \cap B_i)}{A} = \frac{Pr(B_i)Pr(A\mid B_i)}{\sum\limits_{j=1}^kPr(B_j)Pr(A\mid B_j)} Pr(Bi∣A)=APr(A∩Bi)=j=1∑kPr(Bj)Pr(A∣Bj)Pr(Bi)Pr(A∣Bi)
- Prior and posterior probabilities
summary:
- Definition of Conditional Probability
- Law of Total Probability
- Independent Events
- Bayes’ Theorem
Chapter 3 Random Variables And Distributions
definition of a Random Variable
- Let S S S be the sample sample space for an experiment. A real-valued function that is defined on S S S is called a random variable.
The distribution of a random variable
- Let
X
X
X be a random variable. The distribution of
X
X
X is the collection of all probabilities of the form
P
r
(
x
∈
C
)
Pr(x\in C)
Pr(x∈C) for all sets
C
C
C of real numbers such that {
X
∈
C
X\in C
X∈C} is an event.
某随机变量的分布是该随机变量能取到某些值的一系列概率的集合。 - Discrete Distribution / Random Variable
A dandom variable X X X is called discreate if it takes a finite number k k k of different values: x 1 , ⋯ , x k x_1,\cdots,x_k x1,⋯,xk, or, ar most, an infinite sequence of different values x 1 , x 2 , ⋯ x_1, x_2,\cdots x1,x2,⋯ - Probability function:
f
(
x
)
=
P
r
(
X
=
x
)
f(x) = Pr(X=x)
f(x)=Pr(X=x)also called probability mass function(p.m.f)
- The colsure of the set {x:f(x)>0} is called the support of (the distribution of) X X X
- important properties:
f ( x ) ≥ 0 f(x)\ge 0 f(x)≥0
∑ i = 1 ∞ f ( x i ) = 1 \sum\limits_{i=1}^{\infty}f(x_i) =1 i=1∑∞f(xi)=1
P r ( x ∈ C ) = ∑ x i ∈ C f ( x i ) Pr(x\in C) = \sum\limits_{x_i\in C}f(x_i) Pr(x∈C)=xi∈C∑f(xi)
Typical Discreate Distributions
- Bernolli Distribution
- a random variable Z Z Z takes only the values 0 and 1 with P r ( Z = 1 ) = p Pr(Z=1) = p Pr(Z=1)=p
- f ( x ∣ p ) = { p x ( 1 − p ) 1 − x x=0,1 0 o t h e r w i s e f(x\mid p) = \begin{cases}p^x(1-p)^{1-x}&\text{x=0,1}\\0&otherwise\end{cases} f(x∣p)={px(1−p)1−x0x=0,1otherwise
- It is also called 2 points distribution.
- Bernoulli Trials/Process
- independent and indentically distributed(i.d.d)
- Every randon variable X i X_i Xi is Bernoulli distribution
- e.g.
Continuous Distribution/Random Variable
P r ( a ≤ X ≤ b ) = ∫ a b f ( x ) d x Pr(a\le X\le b) = \int\limits_a^b f(x)dx Pr(a≤X≤b)=a∫bf(x)dx
- Uniformly distribution on intervals f ( x ) = { 1 b − a a ≤ x ≤ b 0 o t h e r w i s e f(x) = \begin{cases}\frac1{b-a}&{a\le x\le b}\\0&otherwise\end{cases} f(x)={b−a10a≤x≤botherwise X ∼ U ( a , b ) X\sim U(a,b) X∼U(a,b)
The cumulative distribution function
The distribution function or cumulative distribution function (
c
.
d
.
f
c.d.f
c.d.f) F of a random variable
X
X
X is the function
F
(
x
)
=
P
r
(
X
≤
x
)
−
∞
<
x
<
∞
F(x) = Pr(X\le x)\ \ \ -\infty<x<\infty
F(x)=Pr(X≤x) −∞<x<∞
Properties:
- None decreasing function
- A random variable that only takes real numbers.
- Continuous from the right. F ( x ) = F ( x + ) F(x) = F(x^+) F(x)=F(x+)
Bernoulli c.d.f: F ( x ) = { 0 x < 0 1 − p 0 ≤ x < 1 1 x ≥ 1 F(x) = \begin{cases}0&x<0\\1-p&0\le x<1\\1&x\ge1\end{cases} F(x)=⎩⎪⎨⎪⎧01−p1x<00≤x<1x≥1
Bimonial c.d.f: F ( x ) = { 0 x < 0 ∑ i = 0 s u b ( x ) ( n i ) p i ( 1 − p ) n − i 0 ≤ x < n 1 x ≥ n F(x) = \begin{cases}0&x<0\\\sum\limits_{i=0}^{sub(x)}\binom nip^i(1-p)^{n-i}&0\le x\lt n\\1&x\ge n\end{cases} F(x)=⎩⎪⎪⎪⎨⎪⎪⎪⎧0i=0∑sub(x)(in)pi(1−p)n−i1x<00≤x<nx≥n
uniform c.d.f: have the uniform distribution on the interval [ a , b ] [a,b] [a,b] F ( x ) = { 0 x ≤ a ∫ a x 1 b − a d y = x − a b − a a < x ≤ b 1 x > b F(x) = \begin{cases}0&x\le a\\\int_a^x\frac 1{b-a}dy = \frac{x-a}{b-a}&a\lt x\le b\\1&x\gt b\end{cases} F(x)=⎩⎪⎨⎪⎧0∫axb−a1dy=b−ax−a1x≤aa<x≤bx>b
Standard Normal c.d.f:
F
(
x
)
=
Φ
(
x
)
=
∫
−
∞
x
1
2
π
e
−
t
2
2
d
t
F(x )=\Phi(x) = \int\limits_{-\infty}^x\frac 1{\sqrt{2\pi}}e^{-\frac{t^2}2}dt
F(x)=Φ(x)=−∞∫x2π1e−2t2dt
Theorem:
Φ
(
−
x
)
=
1
−
Φ
(
x
)
\Phi(-x) = 1-\Phi(x)
Φ(−x)=1−Φ(x)
P
r
(
x
>
x
)
=
1
−
F
(
x
)
Pr(x>x) = 1-F(x)
Pr(x>x)=1−F(x)
P
r
(
x
1
<
X
≤
x
2
)
=
F
(
x
2
)
−
F
(
x
1
)
Pr(x_1<X\le x_2) = F(x_2) - F(x_1)
Pr(x1<X≤x2)=F(x2)−F(x1)
P
r
(
X
=
x
)
=
F
(
x
)
−
F
(
x
−
)
Pr(X =x) = F(x) - F(x^-)
Pr(X=x)=F(x)−F(x−)
c.d.f of a Discrete Distribution:
F
(
x
)
F(x)
F(x) will have a jump of magnitude
f
(
x
)
f(x)
f(x) at each possible value
x
i
x_i
xi of
X
X
X, and
F
(
x
)
F(x)
F(x) will be constant between every pair of successive jumps.
The c.d.f of a continuous distribution:
F
(
x
)
=
∫
−
∞
x
f
(
t
)
d
t
d
F
(
x
)
d
x
=
f
(
x
)
F(x) = \int_{-\infty}^xf(t)dt\\\frac{dF(x)}{dx}=f(x)
F(x)=∫−∞xf(t)dtdxdF(x)=f(x)