GitModel-动手学数理统计_01(python)

1 动手学数理统计_01

github 上pdf版本及ipynb版本:https://github.com/cx-333/Math-Modeling

1.1 总体与样本

  • 总体:将试验的全部可能的观察值称为总体,这些观察值可能是有限的,也可能是无限的,分别对应有限总体和无限总体,每一个可能观察值称为个体

由于总体的每一个个体都是随机试验的一个观察值,因此它是某一随机变量 X X X的值,一个总体便对应一个随机变量 X X X,对随机变量 X X X的研究就是对总体的研究,随机变量 X X X和总体具有相同的分布函数和数字特征。

  • 样本:设 X X X是具有分布函数 F F F的随机变量,若 X 1 , X 2 , ⋯   , X n X_{1}, X_{2}, \cdots, X_{n} X1,X2,,Xn是具有同一分布函数 F F F的、相互独立的随机变量,则称 X 1 , X 2 , ⋯   , X n X_{1}, X_{2}, \cdots, X_{n} X1,X2,,Xn为从分布函数 F F F(或总体 F F F、或总体 X X X)得到的容量为 n n n的简单随机样本,简称样本,他们的观察值 x 1 , x 2 , ⋯   , x n x_{1}, x_{2}, \cdots, x_{n} x1,x2,,xn称为样本值,又称为 X X X n n n个独立的观察值。

由样本的定义(样本中 n n n个随机变量相互独立)得:

    1. 样本( X 1 , X 2 , ⋯   , X n X_{1}, X_{2}, \cdots, X_{n} X1,X2,,Xn)的分布函数为 F ∗ ( x 1 , x 2 , ⋯   , x n ) = ∏ i = 1 n F ( x i ) F^{*}(x_{1}, x_{2}, \cdots, x_{n})=\prod_{i=1}^{n}F(x_{i}) F(x1,x2,,xn)=i=1nF(xi)
    1. 样本( X 1 , X 2 , ⋯   , X n X_{1}, X_{2}, \cdots, X_{n} X1,X2,,Xn)的概率密度为 f ∗ ( x 1 , x 2 , ⋯   , x n ) = ∏ i = 1 n f ( x i ) f^{*}(x_{1}, x_{2}, \cdots, x_{n})=\prod_{i=1}^{n}f(x_{i}) f(x1,x2,,xn)=i=1nf(xi)

1.2 经验分布函数、直方图与箱线图

  • 经验分布函数:设 x 1 , x 2 , ⋯   , x n x_{1}, x_{2}, \cdots, x_{n} x1,x2,,xn 是取自总体分布函数为 F ( x ) F(x) F(x) 的样本,若将样本观测值由小到大进行排列,记为 x ( 1 ) , x ( 2 ) , ⋯   , x ( n ) x_{(1)}, x_{(2)}, \cdots, x_{(n)} x(1),x(2),,x(n) , 则 x ( 1 ) , x ( 2 ) , ⋯   , x ( n ) x_{(1)}, x_{(2)}, \cdots, x_{(n)} x(1),x(2),,x(n) 称为有序样本,用有序样本 定义如下函数
    F n ( x ) = { 0 ,  当  x < x ( 1 ) , k / n ,  当  x ( k ) ⩽ x < x ( k + 1 ) , k = 1 , 2 , ⋯   , n − 1 , 1 ,  当  x ⩾ x ( n ) , F_{n}(x)=\left\{\begin{array}{ll} 0, & \text { 当 } x<x_{(1)}, \\ k / n, & \text { 当 } x_{(k)} \leqslant x<x_{(k+1)}, k=1,2, \cdots, n-1, \\ 1, & \text { 当 } x \geqslant x_{(n)}, \end{array}\right. Fn(x)=0,k/n,1,  x<x(1),  x(k)x<x(k+1),k=1,2,,n1,  xx(n),
    F n ( x ) F_{n}(x) Fn(x) 是一非减右连续函数,且满足
    F n ( − ∞ ) = 0  和  F n ( ∞ ) = 1. F_{n}(-\infty)=0 \text { 和 } F_{n}(\infty)=1 . Fn()=0  Fn()=1.

F n ( x ) F_{n}(x) Fn(x) 为该样本的经验分布函数。

经验分布函数 F n ( x ) F_{n}(x) Fn(x)是总体分布函数 F ( x ) F(x) F(x)的良好的近似。

🔥例子: 随机观察总体 X X X , 得到一个容量为 10 的样本:
3.2 , 2.5 , − 2 , 2.5 , 0 , 3 , 2 , 2.5 , 2 , 4 3.2, \quad 2.5, \quad-2, \quad 2.5, \quad 0, \quad 3, \quad 2, \quad 2.5,2, \quad 4 3.2,2.5,2,2.5,0,3,2,2.5,2,4
X \mathrm{X} X 经验分布函数。

🦊解:

  1. 排序 − 2 , 0 , 2 , 2 , 2.5 , 2.5 , 2.5 , 3 , 3.2 , 4 -2, \quad 0, \quad 2, \quad 2, \quad 2.5, \quad 2.5, \quad 2.5, \quad 3, \quad 3.2, \quad 4 2,0,2,2,2.5,2.5,2.5,3,3.2,4
  2. 利用公式计算:
    F n ( x ) = { 0 ,  当  x < x ( 1 ) , k / n ,  当  x ( k ) ⩽ x < x ( k + 1 ) , k = 1 , 2 , ⋯   , n − 1 , 1 ,  当  x ⩾ x ( n ) , F_{n}(x)=\left\{\begin{array}{ll} 0, & \text { 当 } x<x_{(1)}, \\ k / n, & \text { 当 } x_{(k)} \leqslant x<x_{(k+1)}, k=1,2, \cdots, n-1, \\ 1, & \text { 当 } x \geqslant x_{(n)}, \end{array}\right. Fn(x)=0,k/n,1,  x<x(1),  x(k)x<x(k+1),k=1,2,,n1,  xx(n),
  3. 得:
    F 10 ( x ) = { 0 , x < − 2 1 / 10 , − 2 ≤ x < 0 2 / 10 , 0 ≤ x < 2 4 / 10 , 2 ≤ x < 2.5 7 / 10 , 2.5 ≤ x < 3 8 / 10 , 3 ≤ x < 3.2 9 / 10 , 3.2 ≤ x < 4 1 , x ≥ 4 F_{10}(x)=\left\{\begin{array}{cc} 0, & x<-2 \\ 1 / 10, & -2 \leq x<0 \\ 2 / 10, & 0 \leq x<2 \\ 4 / 10, & 2 \leq x<2.5 \\ 7 / 10, & 2.5 \leq x<3 \\ 8 / 10, & 3 \leq x<3.2 \\ 9 / 10, & 3.2 \leq x<4 \\ 1, & x \geq 4 \end{array}\right. F10(x)=0,1/10,2/10,4/10,7/10,8/10,9/10,1,x<22x<00x<22x<2.52.5x<33x<3.23.2x<4x4
  • 直方图:为研究总体分布的性质,通过独立重复试验得到其样本的观察值 x 1 , x 2 , ⋯   , x n x_{1}, x_{2}, \cdots, x_{n} x1,x2,,xn,将这些数据进行整理,并以表格或图形的方式展现出来,从而推测出总体的分布。直方图可以反映样本的概率密度,由于样本和其总体服从同一分布,且具有相同的数字特征,则样本的概率密度可看作是总体的概率密度。直方图包括频数直方图频率直方图

直方图的绘制步骤:假设一样本包含 n n n个样本值 ( x 1 , x 2 , ⋯   , x n ) (x_{1}, x{2}, \cdots, x_{n}) x1,x2,,xn

    1. 选取区间 [ a , b ] [a, b] [a,b] a a a要小于样本中最小的样本值, b b b要大于样本中最大的样本值;
    1. 将选取的区间分为 k k k个小区间,小区间的长度记为 △ , △ = b − a k \bigtriangleup , \bigtriangleup = \frac{b-a}{k} ,=kba;💡tips:当 n < 50 n< 50 n<50时, k k k 5 ∼ 6 5 \sim 6 56, 当 n n n较大时, k k k 10 ∼ 20 10 \sim 20 1020,若 k k k取太大,则会出现小区间内频数为 0 0 0的情况(应尽量避免);
    1. 统计小区间 ( [ a + i △ , a + ( i + 1 ) △ ] , i = 0 , 1 , ⋯   , k − 1 ) ([a+i\bigtriangleup , a+(i+1)\bigtriangleup ], i = 0, 1, \cdots,k-1) ([a+i,a+(i+1)],i=0,1,,k1)内样本中个体出现的次数 { f j , j = 1 , 2 , ⋯   , k − 1 } \{f_{j}, j = 1, 2, \cdots, k-1 \} {fj,j=1,2,,k1},或频率 { f j / n , j = 1 , 2 , ⋯   , k − 1 } \{ f_{j}/n, j = 1, 2, \cdots, k-1 \} {fj/n,j=1,2,,k1};
    1. 将选取的区间 [ a , b ] [a, b] [a,b]作为横轴,样本中个体出现的次数 { f j , j = 1 , 2 , ⋯   , k − 1 } \{ f_{j}, j = 1, 2, \cdots, k-1 \} {fj,j=1,2,,k1}或频率 { f j / n , j = 1 , 2 , ⋯   , k − 1 } \{ f_{j}/n, j = 1, 2, \cdots, k-1 \} {fj/n,j=1,2,,k1}作为纵轴;
    1. 画出每个小区间及其对应的样本中个体次数(频数)的柱状图则得到直方图。

将样本中个体出现的次数 { f j , j = 1 , 2 , ⋯   , k − 1 } \{ f_{j}, j = 1, 2, \cdots, k-1\} {fj,j=1,2,,k1}作为纵轴得到的直方图为频数直方图,将样本中个体出现的频率 { f j / n , j = 1 , 2 , ⋯   , k − 1 } \{f_{j}/n, j = 1, 2, \cdots, k-1\} {fj/n,j=1,2,,k1}作为纵轴得到的直方图为频率直方图。

🔥例子:画出下列样本的直方图
138 , 142 , 148 , 145 , 140 , 141 138 , 139 , 144 , 138 , 139 , 136 138 , 137 , 137 , 133 , 140 , 130 145 , 141 , 135 , 131 , 136 , 131 134 , 132 , 135 , 134 , 132 , 134 130 , 135 , 135 , 134 , 136 , 131 139 , 140 , 141 , 138 , 137 , 137 131 , 127 , 136 , 128 , 138 , 132 134 , 136 , 137 , 133 , 121 , 129 137 , 132 , 131 , 139 , 136 , 135 \begin{aligned} &138, \quad 142, \quad 148, \quad 145, \quad 140, \quad 141 \\ &138, \quad 139, \quad 144, \quad 138, \quad 139, \quad 136 \\ &138, \quad 137, \quad 137, \quad 133, \quad 140, \quad 130\\ &145, \quad 141, \quad 135, \quad 131, \quad 136, \quad 131\\ &134, \quad 132, \quad 135, \quad 134, \quad 132, \quad 134\\ &130, \quad 135, \quad 135, \quad 134, \quad 136, \quad 131\\ &139, \quad 140, \quad 141, \quad 138, \quad 137, \quad 137\\ &131, \quad 127, \quad 136, \quad 128, \quad 138, \quad 132\\ &134, \quad 136, \quad 137, \quad 133, \quad 121, \quad 129\\ &137, \quad 132, \quad 131, \quad 139, \quad 136, \quad 135\\ \end{aligned} 138,142,148,145,140,141138,139,144,138,139,136138,137,137,133,140,130145,141,135,131,136,131134,132,135,134,132,134130,135,135,134,136,131139,140,141,138,137,137131,127,136,128,138,132134,136,137,133,121,129137,132,131,139,136,135

python代码(求解题)

# 1. 按照直方图的步骤一步一步画图
import matplotlib.pyplot as plt
# 图像嵌入
%matplotlib inline  
plt.rcParams['font.sans-serif']=['SimHei','Songti SC','STFangsong']
plt.rcParams['axes.unicode_minus'] = False  # 用来正常显示负号
import numpy as np
import warnings
warnings.filterwarnings("ignore")

# 样本值
x = [138, 142, 148, 145, 140, 141,
    138, 139, 144, 138, 139, 136,
    138, 137, 137, 133, 140, 130,
    145, 141, 135, 131, 136, 131,
    134, 132, 135, 134, 132, 134,
    130, 135, 135, 134, 136, 131,
    139, 140, 141, 138, 137, 137,
    131, 127, 136, 128, 138, 132,
    134, 136, 137, 133, 121, 129,
    137, 132, 131, 139, 136, 135]

# 1. 选取区间 [a, b]
a = np.min(x) - 1
b = np.max(x) + 1

# 2. 分区间
n = len(x)
if n < 50:
    k = 6
elif n < 100:
    k = 8
else:
    k =15

delta = (b - a) / k

# 3. 统计
region_ab = np.zeros(k)   # 存储区间[a, b]的每个小区间
fi = np.zeros(k)      # 存储每个小区间样本值的频数
for i in range(k):
    region_ab[i] = a+i*delta + (delta / 2)

for idx, cen in enumerate(region_ab):
    for data in x:
        if data >= (cen - delta/2) and data <= (cen + delta/2):
                fi[idx] += 1
        else:
            continue

fi_n = fi / n     # 计算频率
# 4. 画图

# plt.figure(figsize=(10, 8))
plt.bar(region_ab, fi, width=delta)   # 频数直方图
plt.title('频数直方图')
plt.xlabel('x')
plt.ylabel('fi')
plt.show()
# plt.figure(figsize=(10, 8))
plt.bar(region_ab, fi_n, width=delta)  # 频率直方图
plt.title('频率直方图')
plt.xlabel('x')
plt.ylabel('fi/n')
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-NnhesBIW-1656215719268)(%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1_files/%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1_14_0.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-tMS8h0YR-1656215719270)(%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1_files/%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1_14_1.png)]

# 2. 利用matplotlib.pyplot 中的hist方法直接画图
import matplotlib.pyplot as plt
# 图像嵌入
%matplotlib inline  
plt.rcParams['font.sans-serif']=['SimHei','Songti SC','STFangsong']
plt.rcParams['axes.unicode_minus'] = False  # 用来正常显示负号
import numpy as np
import warnings
warnings.filterwarnings("ignore")
# 样本值
x = [138, 142, 148, 145, 140, 141,
    138, 139, 144, 138, 139, 136,
    138, 137, 137, 133, 140, 130,
    145, 141, 135, 131, 136, 131,
    134, 132, 135, 134, 132, 134,
    130, 135, 135, 134, 136, 131,
    139, 140, 141, 138, 137, 137,
    131, 127, 136, 128, 138, 132,
    134, 136, 137, 133, 121, 129,
    137, 132, 131, 139, 136, 135]
    
a = np.min(x) - 1
b = np.max(x) + 1
k = 8
# plt.figure(figsize=(10, 8))
plt.hist(x, bins=k, alpha=0.8, range=(a, b), density=None)  # density = None, 频数直方图
plt.title('频数直方图')
plt.xlabel('x')
plt.ylabel('fi')
plt.show()
# plt.figure(figsize=(10, 8))
plt.hist(x, bins=k, alpha=0.8, range=(a, b), density=True)  # density = True, 频率直方图
plt.title('频率直方图')
plt.xlabel('x')
plt.ylabel('fi/n')
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-FzDWEflt-1656215719271)(%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1_files/%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1_15_0.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-BVAqJZNz-1656215719271)(%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1_files/%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1_15_1.png)]

  • 箱线图

  首先介绍样本分位数:设有容量为 n n n的样本观察值 x 1 , x 2 , ⋯   , x n x_{1}, x_{2}, \cdots, x_{n} x1,x2,,xn,样本 p p p分位数 ( 0 < p < 1 ) (0<p<1) (0<p<1)记为 x p x_{p} xp,它具有以下性质:(1)至少有 n p np np个观察值小于或等于 x p x_{p} xp;(2)至少有 n ( 1 − p ) n(1-p) n(1p)个观察值大于或等于 x p x_{p} xp.

样本分位数的求解步骤:

    1. x 1 , x 2 , ⋯   , x n x_{1}, x_{2}, \cdots, x_{n} x1,x2,,xn按自小到大的次序排列成 x ( 1 ) ≤ x ( 2 ) ≤ ⋯ ≤ x ( n ) x_{(1)}\le x_{(2)}\le \cdots\le x_{(n)} x(1)x(2)x(n)
    1. 使用下述公式计算 x p x_{p} xp分位数 x p = { x ( [ n p ] + 1 ) , 当 n p 不 是 整 数 1 2 [ x ( n p ) + x ( n p + 1 ) ] , 当 n p 是 整 数 x_{p}=\left \{ \begin{aligned} &x_{([np]+1)}, &当np不是整数\\&\frac{1}{2}[x_{(np)}+x_{(np+1)}], &当np是整数 \end{aligned}\right. xp=x([np]+1),21[x(np)+x(np+1)],npnp其中, [ ⋅ ] [\cdot] []表示取整。

特别地,当 p = 0.25 p=0.25 p=0.25时, 0.25 0.25 0.25分位数 x 0.25 x_{0.25} x0.25也记为 Q 1 Q_{1} Q1, 称为第一四分位数;当 p = 0.5 p=0.5 p=0.5时, 0.5 0.5 0.5分位数 x 0.5 x_{0.5} x0.5也记为 Q 2 或 M Q_{2}或M Q2M,称为样本中位数;当 p = 0.75 p=0.75 p=0.75时, 0.75 0.75 0.75分位数 x 0.75 x_{0.75} x0.75也记为 Q 3 Q_{3} Q3,称为第三四分位数。

箱线图的画法:箱线图基于以下 5 5 5个数字特征概括,即 最小值 M i n Min Min、第一四分位数 Q 1 Q_{1} Q1、中位数 M M M、第三四分位数 Q 3 Q_{3} Q3和最大值 M a x Max Max。箱线图的形式如下
在这里插入图片描述

🔥例子:以下是 8 8 8个病人的血压(收缩压, m m H g mmHg mmHg)数据,请作出箱线图
110 102 117 122 118 150 132 123 110 \quad 102 \quad 117 \quad 122 \quad 118 \quad 150 \quad 132 \quad 123 110102117122118150132123

🦊解:

  1. 排序
    102 110 117 118 122 123 132 150 102 \quad 110 \quad 117 \quad 118 \quad 122 \quad 123 \quad 132 \quad 150 102110117118122123132150

  2. 计算各分位点及最小最大值
    ∵ n p = 8 × 0.25 = 2 , ∴ Q 1 = 1 2 ( 110 + 117 ) = 113.5 ∵ n p = 8 × 0.2 = 5 = 4 , ∴ Q 2 = 1 2 ( 118 + 122 ) = 120 ∵ n p = 8 × 0.75 = 6 , ∴ Q 3 = 1 2 ( 123 + 132 ) = 127.5 M i n = 110 , M a x = 123. \begin{aligned} &\because np=8\times 0.25 = 2, \quad &\therefore Q_{1}=\frac{1}{2}(110+117)=113.5 \\ &\because np=8\times 0.2=5 = 4, \quad &\therefore Q_{2}=\frac{1}{2}(118+122)=120 \\ &\because np=8\times 0.75 = 6, \quad &\therefore Q_{3}=\frac{1}{2}(123+132)=127.5 \\ & Min = 110, Max = 123. \end{aligned} np=8×0.25=2,np=8×0.2=5=4,np=8×0.75=6,Min=110,Max=123.Q1=21(110+117)=113.5Q2=21(118+122)=120Q3=21(123+132)=127.5

  3. 画图

python代码(画箱线图)

import matplotlib.pyplot as plt 
%matplotlib inline 
plt.rcParams['font.sans-serif']=['SimHei','Songti SC','STFangsong']
plt.rcParams['axes.unicode_minus'] = False  # 用来正常显示负号

x = [102, 110, 117, 118, 122, 123, 132, 150]

# 程序会自动找出异常点,即相差太大的点,该点< Q1-1.5(Q3-Q1)=Q1-1.5IQR 或> Q3+1.5(Q3-Q1)=Q3+1.5IQR
fig, ax = plt.subplots()
plt.figure(figsize=(6,4))
ax.boxplot(x)
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-CJ2LxeRx-1656215719272)(%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1_files/%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1_24_0.png)]

<Figure size 432x288 with 0 Axes>

1.3 统计量与三大抽样分布

  • 统计量:设 X 1 , X 2 , ⋯   , X n X_{1}, X_{2}, \cdots, X_{n} X1,X2,,Xn是来自总体 X X X的一个样本, g ( X 1 , X 2 , ⋯   , X n ) g(X_{1}, X_{2}, \cdots, X_{n}) g(X1,X2,,Xn) X 1 , X 2 , ⋯   , X n X_{1}, X_{2}, \cdots, X_{n} X1,X2,,Xn的函数,若 g g g中不含任何未知参数,则称 g ( X 1 , X 2 , ⋯   , X n ) g(X_{1}, X_{2}, \cdots, X_{n}) g(X1,X2,,Xn)是一个统计量

常用统计量,设 X 1 , X 2 , ⋯   , X n X_{1}, X_{2}, \cdots, X_{n} X1,X2,,Xn是来自总体 X X X的一个样本, x 1 , x 2 , ⋯   , x n x_{1}, x_{2}, \cdots, x_{n} x1,x2,,xn是这一样本的观察值。

    1. 样本均值 X ‾ = 1 n ∑ i = 1 n X i \overline{X} = \frac{1} {n} \sum_{i=1}^{n}X_{i} X=n1i=1nXi对应的观察值为 x ‾ = 1 n ∑ i = 1 n x i \overline{x} = \frac{1} {n} \sum_{i=1}^{n}x_{i} x=n1i=1nxi
    1. 样本方差 1 ) S n 2 = 1 n ∑ i = 1 n ( X i − X ‾ ) 2 2 ) S 2 = 1 n − 1 ∑ i = 1 n ( X i − X ‾ ) 2 , 无 偏 方 差 , 应 用 较 多 \begin{aligned} &1) S_{n}^{2} = \frac{1} {n} \sum_{i=1}^{n}(X_{i} - \overline{X})^{2} \\ &2) S^{2} = \frac{1} {n-1} \sum_{i=1}^{n}(X_{i} - \overline{X})^{2}, 无偏方差,应用较多\end{aligned} 1)Sn2=n1i=1n(XiX)22)S2=n11i=1n(XiX)2,对应的观察值分别为 s n 2 = 1 n ∑ i = 1 n ( x i − x ‾ ) 2 和 s 2 = 1 n − 1 ∑ i = 1 n ( x i − x ‾ ) 2 s_{n}^{2} = \frac{1} {n} \sum_{i=1}^{n}(x_{i} - \overline{x})^{2}和s^{2} = \frac{1} {n-1} \sum_{i=1}^{n}(x_{i} - \overline{x})^{2} sn2=n1i=1n(xix)2s2=n11i=1n(xix)2
    1. 样本标准差 S = S 2 = 1 n − 1 ∑ i = 1 n ( X i − X ‾ ) 2 S = \sqrt{S^{2}} = \sqrt{\frac{1} {n-1} \sum_{i=1}^{n}(X_{i} - \overline{X})^{2}} S=S2 =n11i=1n(XiX)2 对应的观察值为 s = 1 n − 1 ∑ i = 1 n ( x i − x ‾ ) 2 s = \sqrt{\frac{1} {n-1} \sum_{i=1}^{n}(x_{i} - \overline{x})^{2}} s=n11i=1n(xix)2
    1. 样本 k k k阶(原点)矩 A k = 1 n ∑ i = 1 n X i k , k = 1 , 2 , ⋯ A_{k} = \frac{1}{n}\sum_{i=1}^{n}X_{i}^{k}, k =1, 2, \cdots Ak=n1i=1nXik,k=1,2,对应的观察值为 a k = 1 n ∑ i = 1 n x i k , k = 1 , 2 , ⋯ a_{k} = \frac{1}{n}\sum_{i=1}^{n}x_{i}^{k}, k =1, 2, \cdots ak=n1i=1nxik,k=1,2,
    1. 样本 k k k阶中心矩 B k = 1 n ∑ i = 1 n ( X i − X ‾ ) k , k = 1 , 2 , ⋯ B_{k} = \frac{1}{n}\sum_{i=1}^{n}(X_{i} - \overline{X})^{k}, k =1, 2, \cdots Bk=n1i=1n(XiX)k,k=1,2,对应的观察值为 b k = 1 n ∑ i = 1 n ( x i − x ‾ ) k , k = 1 , 2 , ⋯ b_{k} = \frac{1}{n}\sum_{i=1}^{n}(x_{i} - \overline{x})^{k}, k =1, 2, \cdots bk=n1i=1n(xix)k,k=1,2,
  • 三大抽样分布

   (1) χ 2 \chi ^{2} χ2分布:设 X 1 , X 2 , ⋯   , X n X_{1}, X_{2}, \cdots, X_{n} X1,X2,,Xn是来自总体 N ( 0 , 1 ) N(0, 1) N(0,1)的样本,则称统计量
χ 2 = X 1 2 + X 2 2 + ⋯ + X n 2 \chi ^{2} = X_{1}^{2} + X_{2}^{2} + \cdots + X_{n}^{2} χ2=X12+X22++Xn2
服从自由度为 n n n χ 2 \chi ^{2} χ2分布,记为 χ 2 ∼ χ 2 ( n ) \chi ^{2} \sim \chi ^{2}(n) χ2χ2(n)。 自由度表示上式中右端包含的独立变量的个数。

   χ 2 \chi ^{2} χ2分布的概率密度函数(不需要记)为
f ( y ) = { 1 2 n / 2 Γ ( n / 2 ) y n / 2 − 1 e − y / 2 , y > 0 0 , 其 他 f(y) = \left \{ \begin{aligned} & \frac{1}{2^{n/2}\Gamma {(n/2})}y^{n/2-1}e^{-y/2}, &y>0 \\ & 0, & 其他 \end{aligned} \right. f(y)=2n/2Γ(n/2)1yn/21ey/2,0,y>0

python代码( χ 2 分 布 的 图 形 \chi ^{2}分布的图形 χ2

import matplotlib.pyplot as plt
%matplotlib inline
from scipy.stats import chi2
import numpy as np

fig, ax = plt.subplots(1, 1)
x = np.linspace(0.01, 30, 10000)
ax.plot(x, chi2.pdf(x, df=2), '-', label='n = 2')
ax.plot(x, chi2.pdf(x, 4), '--', label='n = 4')
ax.plot(x, chi2.pdf(x, df=10), '-.', label='n = 10')
ax.set_ylim([0, 0.5])
ax.set_xlabel("y")
ax.set_ylabel("f(y)")
ax.legend()
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-c6hQVlxX-1656215719272)(%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1_files/%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1_29_0.png)]

# 利用定理画卡方分布的图形
import matplotlib.pyplot as plt
%matplotlib inline
from scipy.stats import norm, chi2
import numpy as np

def demonstate_chi(n):
    x = 0
    for i in range(n):
        x += np.square(norm(loc=0, scale=1).rvs(size=10000))
    
    return x

x = np.linspace(0.01, 30, 10000)

n_2 = demonstate_chi(2)
n_4 = demonstate_chi(4)
n_10 = demonstate_chi(10)

plt.figure(figsize=(10, 5))
plt.subplot(1,3, 1)
plt.plot(x, chi2.pdf(x, 2), '-', label='n = 2', c='blue')
plt.hist(n_2, density=True, histtype='stepfilled', alpha=0.5)
plt.legend()
plt.subplot(1,3, 2)
plt.plot(x, chi2.pdf(x, df = 4), '--', label='n = 4', c='gray')
plt.hist(n_4, density=True, histtype='stepfilled', alpha=0.5)
plt.legend()
plt.subplot(1,3, 3)
plt.plot(x, chi2.pdf(x, 10), '-.', label='n = 10', c='red')
plt.hist(n_10, density=True, histtype='stepfilled', alpha=0.5)
plt.legend()
plt.tight_layout(w_pad=3)
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-owHWqB1a-1656215719273)(%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1_files/%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1_30_0.png)]

χ 2 \chi ^{2} χ2分布的性质

    1. χ 2 \chi ^{2} χ2分布的可加性:设 χ 1 2 ∼ χ 2 ( n 1 ) , χ 2 2 ∼ χ 2 n 2 \chi_{1}^{2} \sim \chi ^{2}(n1), \chi_{2}^{2} \sim \chi ^{2}{n2} χ12χ2(n1),χ22χ2n2,且 χ 1 2 , χ 2 2 \chi_{1}^{2}, \chi_{2}^{2} χ12,χ22相互独立,则 χ 1 2 + χ 2 2 ∼ χ 2 ( n 1 + n 2 ) \chi_{1}^{2} + \chi_{2}^{2} \sim \chi ^{2} (n1 + n2) χ12+χ22χ2(n1+n2)
    1. χ 2 \chi ^{2} χ2分布的期望方差:若 χ 2 ∼ χ 2 ( n ) \chi ^{2} \sim \chi ^{2}(n) χ2χ2(n),则 E ( χ 2 ) = n , D ( χ 2 ) = 2 n E(\chi ^{2}) = n, D(\chi ^{2}) = 2n E(χ2)=n,D(χ2)=2n 证: χ 2 = X 1 2 + X 2 2 + ⋯ + X n 2 , X i ∼ N ( 0 , 1 ) 故 E ( X i ) = 0 , E ( X i 2 ) = D ( X i ) = 1 E ( χ 2 ) = ∑ i = 1 n E ( X i 2 ) = n D ( X i 2 ) = E ( X i 4 ) − E 2 ( X i 2 ) = 3 − 1 = 2 D ( χ 2 ) = ∑ i = 1 n D ( X i 2 ) = 2 n \begin{aligned} &\chi ^{2} = X_{1}^{2} + X_{2}^{2} + \cdots + X_{n}^{2}, X_{i} \sim N(0, 1) \\ & 故 E(X_{i})=0, E(X_{i}^{2}) = D(X_{i}) = 1 \\ & E(\chi ^{2}) = \sum_{i=1}^{n}E(X_{i}^{2}) = n \\ &D(X_{i}^{2}) = E(X_{i}^{4}) - E^{2}(X_{i}^{2}) = 3 - 1 = 2 \\ & D(\chi ^{2}) = \sum_{i=1}^{n}D(X_{i}^{2}) = 2n\end{aligned} χ2=X12+X22++Xn2,XiN(0,1)E(Xi)=0,E(Xi2)=D(Xi)=1E(χ2)=i=1nE(Xi2)=nD(Xi2)=E(Xi4)E2(Xi2)=31=2D(χ2)=i=1nD(Xi2)=2n
    1. χ 2 \chi ^{2} χ2分布的分位点:对于给定的正数 α , 0 < α < 1 \alpha, 0 <\alpha <1 α,0<α<1,称满足条件 P { χ 2 > χ α 2 ( n ) } = ∫ χ α 2 ( n ) ∞ f ( y ) d y = α P\{\chi^{2} > \chi_{\alpha} ^{2}(n)\} = \int_{\chi_{\alpha} ^{2}(n)}^{\infty}f(y)dy = \alpha P{χ2>χα2(n)}=χα2(n)f(y)dy=α 的点 χ α 2 ( n ) \chi_{\alpha} ^{2}(n) χα2(n) χ 2 ( n ) \chi ^{2}(n) χ2(n)分布上的 α \alpha α分位点。

   (2) t t t 分布:设 X ∼ N ( 0 , 1 ) , Y ∼ χ 2 ( n ) X \sim N(0, 1), Y \sim \chi^{2}(n) XN(0,1),Yχ2(n),且 X , Y X, Y X,Y相互独立,则称随机变量
t = X Y / n t = \frac{X}{\sqrt{Y/n}} t=Y/n X
服从自由度为 n n n t t t分布,记为 t ∼ t ( n ) t \sim t(n) tt(n)

   t t t分布的概率密度函数为:
h ( t ) = Γ [ ( n + 1 ) / 2 ] π n Γ ( n / 2 ) ( 1 + t 2 n ) − ( n + 1 ) / 2 , − ∞ < t < ∞ h(t) = \frac{\Gamma [(n+1)/2]}{\sqrt{\pi n} \Gamma (n/2)}(1 + \frac{t^{2}}{n})^{-(n+1)/2}, -\infty < t < \infty h(t)=πn Γ(n/2)Γ[(n+1)/2](1+nt2)(n+1)/2,<t<

python代码(画 t t t分布的图像)

import matplotlib.pyplot as plt
%matplotlib inline
from scipy.stats import t
import numpy as np

fig, ax = plt.subplots(1, 1)
x = np.linspace(-10, 10, 10000)
ax.plot(x, t.pdf(x, df=2), '-', label='n = 2', c='blue')
ax.plot(x, t.pdf(x, 9), '--', label='n = 9', c='gray')
ax.plot(x, t.pdf(x, df=10000), '-.', label='n = 10000', c='red')
ax.set_xlabel("t")
ax.set_ylabel("h(t)")
ax.legend()
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-it3x9i8t-1656215719273)(%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1_files/%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1_34_0.png)]

# 利用定理画 t 分布的分布函数
import matplotlib.pyplot as plt
%matplotlib inline
from scipy.stats import norm, chi2
import numpy as np

def demonstate_t(n):
    x = 0
    y = 0
    x = norm(loc=0, scale=1).rvs(size=10000)
    y = chi2.rvs(df=n)
    t = x / np.sqrt(y/ n)
    
    return t

x = np.linspace(-10, 10, 10000)

n_2 = demonstate_t(2)
n_9 = demonstate_t(9)
n_10000 = demonstate_t(10000)

plt.figure(figsize=(10, 5))
plt.subplot(1,3, 1)
plt.plot(x, t.pdf(x, 2), '-', label='n = 2', c='blue')
plt.hist(n_2,bins=15, density=True, histtype='stepfilled', alpha=0.5)
plt.legend()
plt.subplot(1,3, 2)
plt.plot(x, t.pdf(x, df = 9), '--', label='n = 9', c='gray')
plt.hist(n_9, density=True, histtype='stepfilled', alpha=0.5)
plt.legend()
plt.subplot(1,3, 3)
plt.plot(x, t.pdf(x, 10000), '-.', label='n = 10000', c='red')
plt.hist(n_10000, density=True, histtype='stepfilled', alpha=0.5)
plt.legend(loc="upper right")
plt.tight_layout(w_pad=3)
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ltoK96mO-1656215719273)(%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1_files/%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1_35_0.png)]

n → ∞ n \rightarrow \infty n时, t t t分布近似为 N ( 0 , 1 ) N(0 ,1) N(0,1)分布。

  • t t t分布的分位点:对于给定的正数 α , 0 < α < 1 \alpha, 0 <\alpha <1 α,0<α<1,称满足条件 P { t > t α ( n ) } = ∫ t α ( n ) ∞ h ( t ) d t = α P\{t > t_{\alpha}(n)\} = \int_{t_{\alpha}(n)}^{\infty}h(t)dt = \alpha P{t>tα(n)}=tα(n)h(t)dt=α 的点 t α ( n ) t_{\alpha}(n) tα(n) t ( n ) t(n) t(n)分布上的 α \alpha α分位点。
  • h ( t ) h(t) h(t)图形具有对称性,即 t 1 − α ( n ) = − t α ( n ) t_{1 - \alpha}(n) = -t_{\alpha}(n) t1α(n)=tα(n)

   (3) F F F 分布:设 U ∼ χ 2 ( n 1 ) , V ∼ χ 2 ( n 2 ) U \sim \chi ^{2}(n1), V \sim \chi ^{2}(n2) Uχ2(n1),Vχ2(n2),且 U , V U, V U,V相互独立,则称随机变量
F = U / n 1 V / n 2 F = \frac{U/n1}{V/n2} F=V/n2U/n1
服从自由度为 ( n 1 , n 2 ) (n1, n2) (n1,n2) F F F分布,记为 F ∼ F ( n 1 , n 2 ) F \sim F(n1, n2) FF(n1,n2)

   F F F分布的概率密度函数为:
ψ ( y ) = { Γ [ ( n 1 + n 2 ) / 2 ] ( n 1 / n 2 ) n 1 / 2 y ( n 1 / 2 ) − 1 Γ ( n 1 / 2 ) Γ ( n 2 / 2 ) [ 1 + ( n 1 y / n 2 ) ] ( n 1 + n 2 ) / 2 , y > 0 0 , 其 它 \psi (y) = \left \{ \begin{aligned} & \frac{\Gamma [(n1 +n2)/2](n1/n2)^{n1/2}y^{(n1/2)-1}}{\Gamma (n1/2)\Gamma (n2/2)[1+(n1y/n2)]^{(n1+n2)/2}}, &y>0 \\ & 0, &其它 \end{aligned} \right. ψ(y)=Γ(n1/2)Γ(n2/2)[1+(n1y/n2)](n1+n2)/2Γ[(n1+n2)/2](n1/n2)n1/2y(n1/2)1,0,y>0

python代码(画 F F F分布函数)

import matplotlib.pyplot as plt
%matplotlib inline
from scipy.stats import f
import numpy as np

fig, ax = plt.subplots(1, 1)
x = np.linspace(0.01, 10, 10000)
ax.plot(x, f.pdf(x, dfn=10, dfd=40), '-', label='F~(10, 40)', c='blue')
ax.plot(x, f.pdf(x, dfn=40, dfd=10), '--', label='F~(40, 10)', c='orange')
ax.plot(x, f.pdf(x, dfn=11, dfd=3), '-.', label='F~(11, 3)', c='red')
ax.set_xlabel("y")
ax.set_ylabel("f(y)")
ax.legend()
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-SQPOz17S-1656215719274)(%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1_files/%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1_40_0.png)]

# 利用定理
import matplotlib.pyplot as plt
%matplotlib inline
from scipy.stats import chi2
import numpy as np

def demonstate_f(n1, n2):
    u = 0
    v = 0
    u = chi2.rvs(df=n1, size=10000)
    v = chi2.rvs(df=n2, size=10000)
    f = (u/n1) / (v/n2)
    
    return f

x = np.linspace(0.01, 10, 10000)

n_10_40 = demonstate_f(10, 40)
n_40_10 = demonstate_f(40 ,10)
n_11_3 = demonstate_f(11, 3)

plt.figure(figsize=(10, 5))
plt.subplot(1,3, 1)
plt.plot(x, f.pdf(x, dfn=10, dfd=40), '-', label='F~(10, 40)', c='blue')
plt.hist(n_10_40, bins=300, density=True, histtype='stepfilled', alpha=0.5)
plt.legend()
plt.subplot(1,3, 2)
plt.plot(x, f.pdf(x, dfn=40, dfd=10), '--', label='F~(40, 10)', c='orange')
plt.hist(n_40_10, bins=300, density=True, histtype='stepfilled', alpha=0.5)
plt.legend()
plt.subplot(1,3, 3)
plt.plot(x, f.pdf(x, dfn=11, dfd=3), '-.', label='F~(11, 3)', c='red')
plt.hist(n_11_3,bins=550, density=True, histtype='stepfilled', alpha=0.5)
plt.xlim([0, 10])
plt.legend(loc="upper right")
plt.tight_layout(w_pad=3)
plt.show()


[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-rsHthGmu-1656215719274)(%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1_files/%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1_41_0.png)]

  • F F F分布的分位点:对于给定的正数 α , 0 < α < 1 \alpha, 0 <\alpha <1 α,0<α<1,称满足条件 P { F > F α ( n 1 , n 2 ) } = ∫ F α ( n 1 , n 2 ) ∞ ψ ( y ) d y = α P\{F > F_{\alpha}(n1, n2)\} = \int_{F_{\alpha}(n1, n2)}^{\infty}\psi (y)dy = \alpha P{F>Fα(n1,n2)}=Fα(n1,n2)ψ(y)dy=α 的点 F α ( n 1 , n 2 ) F_{\alpha}(n1, n2) Fα(n1,n2) F ( n 1 , n 2 ) F(n1, n2) F(n1,n2)分布上的 α \alpha α分位点。
  • F ∼ F ( n 1 , n 2 ) F \sim F(n1, n2) FF(n1,n2),则 1 F ∼ F ( n 2 , n 1 ) \frac{1}{F} \sim F(n2, n1) F1F(n2,n1)
  • F 1 − α ( n 1 , n 2 ) = 1 F α ( n 2 , n 1 ) F_{1-\alpha}(n1, n2) =\frac{1} {F_{\alpha}(n2, n1)} F1α(n1,n2)=Fα(n2,n1)1
  • 📕重要定理:关于正态总体的样本均值与样本方差的分布

  定理一:设 X 1 , X 2 , ⋯   , X n X_{1}, X_{2}, \cdots, X_{n} X1,X2,,Xn是来自正态总体 N ( μ , σ 2 ) N(\mu, \sigma^{2}) N(μ,σ2)的样本, X ‾ \overline{X} X是样本均值,则
X ‾ ∼ N ( μ , σ 2 / n ) . \overline{X} \sim N(\mu, \sigma^{2}/n). XN(μ,σ2/n).

  定理二:设 X 1 , X 2 , ⋯   , X n X_{1}, X_{2}, \cdots, X_{n} X1,X2,,Xn是来自正态总体 N ( μ , σ 2 ) N(\mu, \sigma^{2}) N(μ,σ2)的样本, X ‾ 和 S 2 \overline{X} {和} S^{2} XS2分别是样本均值和样本方差,则有
1. ( n − 1 ) S 2 σ 2 ∼ χ 2 ( n − 1 ) 2. X ‾ 与 S 2 相 互 独 立 \begin{aligned} & 1. \frac{(n-1)S^{2}}{\sigma^{2}} \sim \chi^{2}(n-1) \\ & 2. \overline{X}与S^{2}相互独立 \end{aligned} 1.σ2(n1)S2χ2(n1)2.XS2

  定理三:设 X 1 , X 2 , ⋯   , X n X_{1}, X_{2}, \cdots, X_{n} X1,X2,,Xn是来自正态总体 N ( μ , σ 2 ) N(\mu, \sigma^{2}) N(μ,σ2)的样本, X ‾ 和 S 2 \overline{X} {和} S^{2} XS2分别是样本均值和样本方差,则有
X ‾ − μ S / n ∼ t ( n − 1 ) . \frac{\overline{X} - \mu}{S/\sqrt{n}} \sim t(n-1). S/n Xμt(n1).

  定理四:设 X 1 , X 2 , ⋯   , X n 1 和 Y 1 , Y 2 , ⋯   , Y n 2 X_{1}, X_{2}, \cdots, X_{n1}{和}Y_{1}, Y_{2}, \cdots, Y_{n2} X1,X2,,Xn1Y1,Y2,,Yn2分别是来自正态总体 N ( μ 1 , σ 1 2 ) 和 N ( μ 2 , σ 2 2 ) N(\mu_1, \sigma_{1}^{2})和N(\mu_2, \sigma_{2}^{2}) N(μ1,σ12)N(μ2,σ22)的样本,且这两个样本相互独立,则有
1. S 1 2 / S 2 2 σ 1 2 / σ 2 2 ∼ F ( n 1 − 1 , n 2 − 1 ) 2. 当 σ 1 2 = σ 2 2 = σ 2 时 , ( X ‾ − Y ‾ ) − ( μ 1 − μ 2 ) S w 1 n 1 + 1 n 2 ∼ t ( n 1 + n 2 − 2 ) \begin{aligned} & 1. \frac{S_{1}^{2}/S_{2}^{2}}{\sigma_{1}^{2}/\sigma_{2}^{2}} \sim F(n1-1, n2-1) \\ & 2. 当\sigma_{1}^{2} = \sigma_{2}^{2} = \sigma^{2}时,\frac{(\overline{X} - \overline{Y}) - (\mu_{1} - \mu_{2})}{S_{w}\sqrt{\frac{1}{n1}+\frac{1}{n2}}} \sim t(n1+n2-2) \end{aligned} 1.σ12/σ22S12/S22F(n11,n21)2.σ12=σ22=σ2Swn11+n21 (XY)(μ1μ2)t(n1+n22)
其中, S w 2 = ( n 1 − 1 ) S 1 2 + ( n 2 − 1 ) S 2 2 n 1 + n 2 − 2 S_{w}^{2} = \frac{(n1-1)S_{1}^{2}+(n2-1)S_{2}^{2}}{n1+n2-2} Sw2=n1+n22(n11)S12+(n21)S22.

python代码(验证定理)

import matplotlib.pyplot as plt
%matplotlib inline
from scipy.stats import chi2, t, norm, f
import numpy as np

def theory_1(mu, sigma, n):
    x_mean = []
    for i in range(10000):
        x_mean.append(np.sum(norm.rvs(loc=mu, scale=sigma, size=n))/n)
    return x_mean

def theory_2(mu, sigma, n):
    res = []
    for i in range(10000):
        x = norm.rvs(loc=mu, scale=sigma, size=n)
        x_mean = np.mean(x)
        s2 = np.sum(np.square(x - x_mean))/(n-1)
        res.append((n-1)*s2/(sigma**2))
    return res

def theory_3(mu, sigma, n):
    res = []
    for i in range(10000):
        x = norm.rvs(loc=mu, scale=sigma, size=n)
        x_mean = np.mean(x)
        s = np.sqrt(np.sum(np.square(x - x_mean))/(n-1))
        res.append((x_mean-mu)/(s/np.sqrt(n)))
    return res

def theory_4(mu1, mu2, sigma1, sigma2, n1, n2):
    res = []
    for i in range(10000):
        x1 = norm.rvs(loc=mu1, scale=sigma1, size=n1)
        x1_mean = np.mean(x1)
        x2 = norm.rvs(loc=mu2, scale=sigma2, size=n2)
        x2_mean = np.mean(x2)
        s1_2 = np.sum(np.square(x1-x1_mean)) / (n1-1)
        s2_2 = np.sum(np.square(x2-x2_mean)) / (n2-1)
        temp1 = (s1_2/s2_2)
        temp2 = (sigma1**2/sigma2**2)
        res.append(temp1/temp2)
    return res 

mu = 5
sigma = 10
n = 5
mu1, mu2 = 1, 2
sigma1, sigma2 = 3, 4
n1, n2 = 10, 40
x_mean = theory_1(mu, sigma, n)
t2 = theory_2(mu, sigma, n)
t_ = theory_3(mu, sigma, n)
f_ = theory_4(mu1, mu2, sigma1, sigma2, n1, n2)

x1 =np.linspace(-10, 20, 10000)
x2 = np.linspace(0.01, 30, 10000)
x3 = np.linspace(-5, 5, 10000)
x4 = np.linspace(0.01, 10, 10000)

plt.figure(figsize=(10, 8))
plt.subplot(2,2, 1)
plt.plot(x1, norm.pdf(x1,loc=mu, scale=sigma/np.sqrt(n)), '-', label='N({}, {})'.format(mu, sigma**2/n), c='blue')
plt.hist(x_mean,bins=50, density=True, histtype='stepfilled', alpha=0.5)
plt.title("Theory_1")
plt.xlabel("x")
plt.ylabel("p(x)")
plt.legend()
plt.subplot(2,2, 2)
plt.plot(x2, chi2.pdf(x2, df=n-1), '--', label='X({})'.format(n-1), c='orange')
plt.hist(t2, bins=50,  density=True, histtype='stepfilled', alpha=0.5)
plt.title("Theory_2")
plt.xlabel("x")
plt.ylabel("p(x)")
# plt.xlim([0, 30])
plt.legend()
plt.subplot(2,2, 3)
plt.plot(x3, t.pdf(x3, df=n-1), '-.', label='t({})'.format(n-1), c='red')
plt.hist(t_,bins=50, density=True, histtype='stepfilled', alpha=0.5)
plt.title("Theory_3")
plt.xlabel("x")
plt.ylabel("p(x)")
plt.legend(loc="upper right")
plt.subplot(2,2, 4)
plt.plot(x4, f.pdf(x4, dfn=n1-1, dfd=n2-1), '--', label='F({}, {})'.format(n1-1, n2-1), c='orange')
plt.hist(f_, bins=50, density=True, histtype='stepfilled', alpha=0.5)
plt.title("Theory_4")
plt.xlabel("x")
plt.ylabel("p(x)")
plt.xlim([0, 10])
plt.legend()

plt.tight_layout(w_pad=3)
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-9oBslt7k-1656215719275)(%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1_files/%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1_46_0.png)]

1.4 参数估计之点估计的概念

  • 点估计:设总体 X X X的分布函数 F ( x ; θ ) F(x;\theta) F(x;θ)的形式为已知, θ \theta θ是待估计参数, X 1 , X 2 , ⋯   , X n X_{1}, X_{2}, \cdots, X_{n} X1,X2,,Xn X X X的一个样本, x 1 , x 2 , ⋯   , x n x_{1}, x_{2}, \cdots, x_{n} x1,x2,,xn是相应的一个样本值,点估计问题就是要构造一个适当的统计量 θ ^ ( X 1 , X 2 , ⋯   , X n ) \hat{\theta}(X_{1}, X_{2}, \cdots, X_{n}) θ^(X1,X2,,Xn),用它的观察值 θ ^ ( x 1 , x 2 , ⋯   , x n ) \hat{\theta}(x_{1}, x_{2}, \cdots, x_{n}) θ^(x1,x2,,xn)作为未知参数 θ \theta θ的近似值。称 θ ^ ( X 1 , X 2 , ⋯   , X n ) \hat{\theta}(X_{1}, X_{2}, \cdots, X_{n}) θ^(X1,X2,,Xn) θ \theta θ估计量 θ ^ ( x 1 , x 2 , ⋯   , x n ) \hat{\theta}(x_{1}, x_{2}, \cdots, x_{n}) θ^(x1,x2,,xn) θ \theta θ估计值。统称它们为估计,简记为 θ ^ \hat{\theta} θ^

点估计就是用样本统计量去估计总体分布的未知参数。由于估计量是样本的函数,因此,对于不同的样本值, θ \theta θ的估计值一般是不相同的。

1.5 参数估计之点估计的方法:矩估计

  • 矩估计法:设 X X X为连续型随机变量,其概率密度为 f ( x ; θ 1 , θ 2 , ⋯   , θ k ) f(x;\theta_{1},\theta_{2}, \cdots, \theta_{k}) f(x;θ1,θ2,,θk),或 X X X为离散随机变量,其分布律为 P { X = x } = p ( x ; θ 1 , θ 2 , ⋯   , θ k ) P\{X=x\}=p(x;\theta_{1},\theta_{2}, \cdots, \theta_{k}) P{X=x}=p(x;θ1,θ2,,θk),其中 θ 1 , θ 2 , ⋯   , θ k \theta_{1},\theta_{2}, \cdots, \theta_{k} θ1,θ2,,θk为待估计参数, X 1 , X 2 , ⋯   , X n X_{1}, X_{2}, \cdots, X_{n} X1,X2,,Xn是来自 X X X的样本。假设总体 X X X的前 k k k阶矩
    μ l = E ( X l ) = ∫ − ∞ ∞ x l f ( x ; θ 1 , θ 2 , ⋯   , θ k ) d x \mu_{l} = E(X^{l}) = \int_{-\infty}^{\infty}x^{l}f(x;\theta_{1},\theta_{2}, \cdots, \theta_{k})dx μl=E(Xl)=xlf(x;θ1,θ2,,θk)dx

    μ l = E ( X l ) = = ∑ x l p ( x ; θ 1 , θ 2 , ⋯   , θ k ) \mu_{l} = E(X^{l}) = =\sum x^{l}p(x;\theta_{1},\theta_{2}, \cdots, \theta_{k}) μl=E(Xl)==xlp(x;θ1,θ2,,θk)
    然后假设样本 k k k阶矩 A k A_{k} Ak等于总体 k k k阶矩 μ k \mu_{k} μk,即 A k = μ k A_{k} = \mu_{k} Ak=μk,这种利用样本矩估计总体矩,从而估计未知参数的方法称为矩估计法

样本矩公式

    1. 样本原点矩 A k = 1 n ∑ i = 1 n X i k , k = 1 , 2 , ⋯ A_{k} = \frac{1}{n}\sum_{i=1}^{n}X_{i}^{k}, k =1, 2, \cdots Ak=n1i=1nXik,k=1,2,
    1. 样本中心矩 B k = 1 n ∑ i = 1 n ( X i − X ‾ ) k , k = 1 , 2 , ⋯ B_{k} = \frac{1}{n}\sum_{i=1}^{n}(X_{i} - \overline{X})^{k}, k =1, 2, \cdots Bk=n1i=1n(XiX)k,k=1,2,

矩估计法的解题步骤:

    1. 确定总体分布待估计参数 θ i \theta_{i} θi的个数 n n n
    1. 列出总体分布的前 n n n阶矩 μ 1 到 μ n \mu_{1}到\mu_{n} μ1μn μ n \mu_{n} μn是关于待估计参数 θ i \theta_{i} θi的函数
    1. μ 1 到 μ n \mu_{1}到\mu_{n} μ1μn联立方程组,求解待估计参数 θ i \theta_{i} θi
    1. 将求得的 θ i \theta_{i} θi中的 μ k \mu_{k} μk换成相应的 A k A_{k} Ak,即得到待估计参数的估计值

🔥例子:设总体 X X X [ a , b ] [a, b] [a,b]上服从均匀分布, a , b a, b a,b未知, X 1 , X 2 , ⋯   , X n X_{1}, X_{2}, \cdots, X_{n} X1,X2,,Xn是来自总体 X X X的样本,求 a , b a, b a,b的矩估计量。

🦊解:

  1. 确定估计参数个数, a , b a, b a,b, n = 2 n=2 n=2
  2. 求总体的前 2 2 2阶矩
    μ 1 = E ( X ) = b − a 2 μ 2 = E ( X 2 ) = D ( X ) + E 2 ( X ) = ( b − a ) 2 12 + ( b − a ) 2 4 \begin{aligned} &\mu_{1} = E(X) = \frac{b-a}{2} \\ &\mu_{2} = E(X^{2}) = D(X) + E^{2}(X) = \frac{(b-a)^{2}}{12} + \frac{(b-a)^{2}}{4} \\ \end{aligned} μ1=E(X)=2baμ2=E(X2)=D(X)+E2(X)=12(ba)2+4(ba)2
  3. 联立方程组并求解
    { μ 1 = b − a 2 μ 2 = ( b − a ) 2 12 + ( b − a ) 2 4 \left \{ \begin{aligned} &\mu_{1} = \frac{b-a}{2} \\ &\mu_{2} = \frac{(b-a)^{2}}{12} + \frac{(b-a)^{2}}{4} \\ \end{aligned} \right. μ1=2baμ2=12(ba)2+4(ba)2
    解得
    a = μ 1 − 3 ( μ 2 − μ 1 2 ) , b = μ 1 + 3 ( μ 2 − μ 1 2 ) a = \mu_{1} - \sqrt{3(\mu_{2}-\mu_{1}^{2})}, b = \mu_{1} + \sqrt{3(\mu_{2}-\mu_{1}^{2})} a=μ13(μ2μ12) ,b=μ1+3(μ2μ12)
  4. 将相应的 μ k \mu_{k} μk换成 A k A_{k} Ak
    a = A 1 − 3 ( A 2 − A 1 2 ) = 1 n ∑ i = 1 n X i − 3 ( 1 n ∑ i = 1 n X i 2 − ( 1 n ∑ i = 1 n X i ) 2 ) b = A 1 + 3 ( A 2 − A 1 2 ) = 1 n ∑ i = 1 n X i + 3 ( 1 n ∑ i = 1 n X i 2 − ( 1 n ∑ i = 1 n X i ) 2 ) \begin{aligned} &a = A_{1} - \sqrt{3(A_{2}-A_{1}^{2})} = \frac{1}{n}\sum_{i=1}^{n}X_{i} - \sqrt{3(\frac{1}{n}\sum_{i=1}^{n}X_{i}^{2}-(\frac{1}{n}\sum_{i=1}^{n}X_{i})^{2})} \\ & b = A_{1} + \sqrt{3(A_{2}-A_{1}^{2})} = \frac{1}{n}\sum_{i=1}^{n}X_{i} + \sqrt{3(\frac{1}{n}\sum_{i=1}^{n}X_{i}^{2}-(\frac{1}{n}\sum_{i=1}^{n}X_{i})^{2})} \end{aligned} a=A13(A2A12) =n1i=1nXi3(n1i=1nXi2(n1i=1nXi)2) b=A1+3(A2A12) =n1i=1nXi+3(n1i=1nXi2(n1i=1nXi)2)

python代码(求解上题)

import numpy as np
from scipy.stats import uniform

a_real = 1
b_real = 6
n = 1000
x = uniform.rvs(loc=1, scale=5, size=n)

A1 = np.sum(x) / n
A2 = np.sum(np.square(x)) / n

a_estimate = A1 - np.sqrt(3 *(A2-A1**2))
b_estimate = A1 + np.sqrt(3 *(A2-A1**2))
print("a的真实值:{}, b的真实值:{}".format(a_real, b_real))
print("a的矩估计值:{:.2f}, b的矩估计值:{:.2f}".format(a_estimate, b_estimate))
a的真实值:1, b的真实值:6
a的矩估计值:1.02, b的矩估计值:6.06
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值