概率论与数理统计 4 Continuous Random Variables and Probability Distributions(连续随机变量与概率分布)(上篇)

最新推荐文章于 2022-06-20 13:26:59 发布

Lum0s!

最新推荐文章于 2022-06-20 13:26:59 发布

阅读量1.2k

点赞数 1

分类专栏：概率论与数理统计文章标签：学习概率论统计学

本文链接：https://blog.csdn.net/m0_59751822/article/details/123908342

版权

概率论与数理统计专栏收录该内容

7 篇文章 2 订阅

订阅专栏

概率论_4.1_4.2_4.3

4.1 Probability Density Functions
- Probability Distributions for Continuous Variables
4.2 Cumulative Distribution Functions and Expected Values(累积分布函数与期望值)
4.3 The Normal Distribution(正态分布)

4.1 Probability Density Functions

A discrete random variable (rv) is one whose possible values either constitute a finite set or else can be listed in an infinite sequence (a list in which there is a first element, a second element, etc.). A random variable whose set of possible values is an entire interval of numbers is not discrete.

Probability Distributions for Continuous Variables

DEFINITION:

Let X be a continuous rv. Then a probability distribution or probability density function (pdf) of X is a function f(x) such that for any two numbers a and b with $\leq b$ ,

$\leq X \leq b) = \int_{a}^{b} f(x)dx$

That is, the probability that X takes on a value in the interval [a, b] is the area above this interval and under the graph of the density function, as illustrated in the figure below. The graph of f(x) is often referred to as the density curve.

在这里插入图片描述

For f(x) to be a legitimate pdf, it must satisfy the following two conditions:

$\ge 0$ for all x
$\int_{- \infin}^{\infin} f(x)dx = area \, under \, the \, entire \, graph \, of \, f(x)=1$

DEFINITION:

A continuous rv X is said to have a uniform distribution on the interval [A, B] if the pdf of X is

$=\begin{cases} \frac{1}{B-A}, A \leq x \leq B \\0, otherwise \end{cases}$

When X is a discrete random variable, each possible value is assigned positive probability. This is not true of a continuous random variable (that is, the second condition of the definition is satisfied) because the area under a density curve that lies above any single value is zero（当X是一个离散随机变量时，每个可能的值都被赋正概率。对于连续型随机变量(即满足定义的第二个条件)，这是不成立的，因为在任意单个值之上的密度曲线下的面积是零）:

$P(X=c)=\int_{c}^{c}f(x)dx=\lim_{\epsilon \to \infty} \int_{c -\epsilon}^{c+\epsilon}f(x)dx=0$

The fact that P(X=c)=0 when X is continuous has an important practical consequence: The probability that X lies in some interval between a and b does not depend on whether the lower limit a or the upper limit b is included in the probability calculation(X位于a和b之间的某个区间的概率并不取决于a的下限或b的上限是否包含在概率计算中):

$\leq X \leq b)=P(a < X < b)=P(a < X \leq b)=P(a \leq X < b)$

4.2 Cumulative Distribution Functions and Expected Values(累积分布函数与期望值)

The Cumulative Distribution Function

DEFINITION:

The cumulative distribution function(cdf) F(x) for a continuous rv X is defined for every number x by

$\leq x)=\int_{- \infin}^{x}f(y)dy$

For each x, F(x) is the area under the density curve to the left of x. This is illustrated in the figure below, where F(x) increases smoothly as x increases.

在这里插入图片描述

Using F(x) to Compute Probabilities

PROPOSITION:

Let X be a continuous rv with pdf f(x) and cdf F(x). Then for any number a,

$P (X > a) = 1 - F (a)$

and for any two numbers a and b with a < b,

$\leq X \leq b)=F(b)-F(a)$

The figure below illustrates the second part of this proposition; the desired probability is the shaded area under the density curve between a and b, and it equals the difference between the two shaded cumulative areas. This is different from what is appropriate for a discrete integer valued random variable (e.g., binomial or Poisson): P(a $\leq$ X $\leq$ b) = F(b) - F(a - 1) when a and b are integers.

在这里插入图片描述

Obtaining f(x) from F(x)

PROPOSITION:

If X is a continuous rv with pdf f(x) and cdf F(x), then at every x at which the derivative F’(x) exists, F’(x)=f(x).

Percentiles of a Continuous Distribution

DEFINITION:

Let p be a number between 0 and 1. The (100p)th percentile of the distribution of a continuous rv X, denoted by $\eta(p)$ , is defined by

$p=F(\eta(p))=\int_{- \infin}^{\eta(p)}f(y)dy$

在这里插入图片描述

DEFINITION:

The median of a continuous distribution, denoted by $\tilde{\mu}$ , is the 50th percentile, so $\tilde{\mu}$ satisfies $.5=F(\tilde{\mu})$ . That is, half the area under the density curve is to the left of $\tilde{\mu}$ and half is to the right of $\tilde{\mu}$ .

A continuous distribution whose pdf is symmetric—the graph of the pdf to the left of some point is a mirror image of the graph to the right of that point—has median $\tilde{\mu}$ equal to the point of symmetry, since half the area under the curve lies to either side of this point.

在这里插入图片描述

Expected Values

DEFINITION:

The expected or mean value of a continuous rvX with pdf f(x) is

$\mu_X=E(X)=\int_{-\infin}^{\infin}x\cdot f(x)dx$

PROPOSITION:

If X is a continuous rv with pdf f(x) and h(X) is any function of X, then

$E[h(X)]=\mu_{h(X)}=\int_{-\infin}^{\infin}h(x)\cdot f(x)dx$

DEFINITION:

The variance of a continuous random variable X with pdf f(x) and mean value $\mu$ is

$\sigma_{X}^{2}=V(X)=\int_{-\infin}^{\infin}(x-\mu)^2\cdot f(x)dx=E[(X-\mu)^2]$

The standard deviation (SD) of X is $\sigma_X=\sqrt{V(X)}$ .

PROPOSITION:

$V(X)=E(X^2)-[E(X)]^2$

4.3 The Normal Distribution(正态分布)

DEFINITION：

A continuous rv X is said to have a normal distribution with parameters $\mu$ and $\sigma$ (or $\mu$ and $\sigma^2$ ), where $-\infin < \mu < \infin$ and $\sigma$ , if the pdf of X is

$f(x;\mu ,\sigma)= \frac{1}{\sqrt{2 \pi \sigma}}e^{-(x-\mu)^2/2\sigma^2} \,\, -\infin < x < \infin$

The statement that X is normally distributed with parameters $\mu$ and $\sigma^2$ is often abbreviated X~N( $\mu$ , $\sigma^2$ ).

The Standard Normal Distribution(标准正态分布)

DEFINITION：

The normal distribution with parameter values $\mu=0$ and $\sigma=1$ is called the standard normal distribution. A random variable having a standard normal distribution is called a standard normal random variable(标准正态随机变量) and will be denoted by Z. The pdf of Z is

$f(z;0,1)=\frac{1}{\sqrt{2 \pi}}e^{-z^2/2} \,\, -\infin < z < \infin$

The graph of f(z; 0, 1) is called the standard normal (or z) curve(标准正态曲线). Its inflection points(拐点) are at 1 and -1. The cdf of Z is $\leq z)=\int_{-\infin}^{z}f(y;0,1)dy$ which we will denote by $\Phi(z)$ .

Percentiles of the Standard Normal Distribution

For any p between 0 and 1, Appendix Table A.3 can be used to obtain the (100p)th percentile of the standard normal distribution.

$z_{\alpha}$ Notation for z Critical Values

In statistical inference, we will need the values on the horizontal z axis that capture certain small tail areas under the standard normal curve.

Notation:

$z_{\alpha}$ will denote the value on the z axis for which $\alpha$ of the area under the z curve lies to the right of $z_{\alpha}$ .

在这里插入图片描述

The $z_{\alpha}'s$ are usually referred to as z critical values(z临界值). Table 4.1 lists the most useful z percentiles and values.

在这里插入图片描述

Nonstandard Normal Distributions

When $\sim N(\mu,\sigma^2)$ , probabilities involving X are computed by “standardizing”. The standardized variable(标准化变量) is $\mu)/\sigma$ . Subtracting $\mu$ shifts the mean from $\mu$ to zero, and then dividing by $\sigma$ scales the variable so that the standard deviation is 1 rather than $\sigma$ .

PROPOSITION:

If X has a normal distribution with mean and standard deviation , then

$Z=\frac{X-\mu}{\sigma}$

has a standard normal distribution. Thus

$\leq X \leq b)=P(\frac{a-\mu}{\sigma}\leq Z \leq \frac{b-\mu}{\sigma}) = \Phi(\frac{b-\mu}{\sigma})-\Phi(\frac{a-\mu}{\sigma})$

$\leq a)=\Phi(\frac{a-\mu}{\sigma}) \hspace{1cm} P(X\ge b)=1-\Phi(\frac{b-\mu}{\sigma})$

If the population distribution of a variable is (approximately) normal, then

Roughly 68% of the values are within 1 SD of the mean.
Roughly 95% of the values are within 2 SDs of the mean.
Roughly 99.7% of the values are within 3 SDs of the mean.

Percentiles of an Arbitrary Normal Distribution

The (100p)th percentile of a normal distribution with mean $\mu$ and standard deviation $\sigma$ is easily related to the (100p)th percentile of the standard normal distribution.

PROPOSITION:

$\, percentile \, for \, normal \, (\mu,\sigma)}=\mu +[(100p)th \, for \, standard \, normal] \cdot \sigma$

The Normal Distribution and Discrete Populations

The normal distribution is often used as an approximation to the distribution of values in a discrete population(正态分布常被用作离散总体中数值分布的近似值). In such situations, extra care should be taken to ensure that probabilities are computed in an accurate manner.

The correction for discreteness of the underlying distribution(对底层分布离散性的校正) is often called a continuity correction(连续性校正). It is useful in the following application of the normal distribution to the computation of binomial probabilities.

Approximating the Binomial Distribution

PROPOSITION:

Let X be a binomial rv based on n trials with success probability p. Then if the binomial probability histogram is not too skewed, X has approximately a normal distribution with $\mu=np$ and $\sigma=\sqrt{npq}$ . In particular, for x=a possible value of X,

$\leq x)=B(x,n,p) \approx (area \, under \, the \, normal \, curve \, to \, the \, left \, of \, x + .5)=\Phi(\frac{x+.5-np}{\sqrt{npq}})$

In practice, the approximation is adequate provided that both $\ge 10$ and $\ge 10$ , since there is then enough symmetry in the underlying binomial distribution