python中的统计思维笔记—Statistical Thinking in Python (Part 1)-4

最新推荐文章于 2024-10-04 15:28:05 发布

曦彧

最新推荐文章于 2024-10-04 15:28:05 发布

阅读量373

点赞数

文章标签：数据分析

本文链接：https://blog.csdn.net/sinat_41942988/article/details/104221749

版权

笔记专栏收录该内容

15 篇文章 1 订阅

订阅专栏

1. 从np.random.normal()到正态分布的拟合https://blog.csdn.net/lanchunhui/article/details/50163669

matplotlib.pyplot.hist()转载：python中plt.hist参数详解

matplotlib tutorial

2. Introduction to the Normal distribution（正态分布）

在本练习中，您将探索Normal PDF并学习使用黑客统计信息绘制已知分布的PDF的方法。具体来说，您将为各种方差值绘制普通PDF。

# Draw 100000 samples from Normal distribution with stds of interest:
#samples_std1, samples_std3, samples_std10

samples_std1 = np.random.normal(20,1,size=100000)
samples_std3 = np.random.normal(20,3,size=100000)
samples_std10 = np.random.normal(20,10,size=100000)


# Make histograms
_ = plt.hist(samples_std1,normed=True,histtype='step',bins=100)
_ = plt.hist(samples_std3,normed=True,histtype='step',bins=100)
_ = plt.hist(samples_std10,normed=True,histtype='step',bins=100)


# Make a legend, set limits and show plot
_ = plt.legend(('std = 1', 'std = 3', 'std = 10'))
plt.ylim(-0.01, 0.42)
plt.show()

The Normal CDF(普通CDF)

使用您在上一个练习产生的样本（samples_std1，samples_std3和samples_std10在你的命名空间里），生成并绘制的CDF。

# Generate CDFs
x_std1, y_std1 = ecdf(samples_std1)
x_std3, y_std3 = ecdf(samples_std3)
x_std10, y_std10 = ecdf(samples_std10)


# Plot CDFs
_ = plt.plot(x_std1, y_std1, marker='.', linestyle='none')
_ = plt.plot(x_std3, y_std3, marker='.', linestyle='none')
_ = plt.plot(x_std10, y_std10, marker='.', linestyle='none')


# Make a legend and show the plot
_ = plt.legend(('std = 1', 'std = 3', 'std = 10'), loc='lower right')
plt.show()

The Normal distribution: Properties and warnings

使用该np.random.normal()函数以均值和标准差从正态分布中取样，并绘制CDF。

计算理论样本的CDF和Belmont获奖者数据的ECDF,进行比较，belmont_no_outliers在命名空间里

# Compute mean and standard deviation: mu, sigma
mu = np.mean(belmont_no_outliers)
sigma = np.std(belmont_no_outliers)

# Sample out of a normal distribution with this mu and sigma: samples
samples = np.random.normal(mu,sigma,size=100000)

# Get the CDF of the samples and of the data
x_theor,y_theor = ecdf(samples)
x, y = ecdf(belmont_no_outliers)


# Plot the CDFs and show the plot
_ = plt.plot(x_theor, y_theor)
_ = plt.plot(x, y, marker='.', linestyle='none')
_ = plt.xlabel('Belmont winning time (sec.)')
_ = plt.ylabel('CDF')
plt.show()

The Exponential distribution指数分布

定义一个函数，该函数successive_poisson(tau1, tau2, size=1)对无打扰和循环命中的等待时间进行采样。
- 从指数分布中得出无击球的等待时间tau1（size样本数），并分配给t1。
- 绘制等待时间tau2（size样本数）以使周期从指数分布中剔除并分配给t2。
- 该函数返回两个事件的等待时间之和。

def successive_poisson(tau1, tau2, size=1):
    """Compute time for arrival of 2 successive Poisson processes."""
    # Draw samples out of first exponential distribution: t1
    t1 = np.random.exponential(tau1,size)

    # Draw samples out of second exponential distribution: t2
    t2 = np.random.exponential(tau1,size)

    return t1 + t2

Distribution of no-hitters and cycles(无忙碌和周期的分布)

现在，您将使用采样函数来计算等待时间，以观察无打扰和命中周期。无击打的平均等待时间为764场比赛，击中周期的平均等待时间为715场比赛。

# Draw samples of waiting times: waiting_times
waiting_times = successive_poisson(764,715,size=100000)

# Make the histogram
plt.hist(waiting_times,normed=True,histtype='step',bins=100)


# Label axes
plt.xlabel("waiting time")
plt.ylabel("PDF")


# Show the plot
plt.show()