Hypothesis Test for Calculation Mean μ

import pandas as pd #data manipulation and analysis
import numpy as np #lib used for working with arrays
import matplotlib.pyplot as plt #lib for plots and visualizations
import seaborn as sns #lib for visualizations
%matplotlib inline

import scipy.stats as stats #probability distribution

One Sample Z-test (when population standard deviation is known)
It is rarely the case when you know the population standard deviation and not the mean...but let's assume that is the case 

It is known from experience that for a certain E-commerce company the mean delivery time of the product is 5 days with a standard deviation of 1.3 days.

The new customer service manager of the company is afraid that the company is slipping and collects a random sample of 45 orders. The mean delivery time of these samples comes out to be 5.25 days.

Is there enough statistical evidence for the manager's apprehension that the mean delivery time of products is greater than 5 days.

One-tailed test, concerning population mean μ, the mean delivery time of products

Use level of significance α=0.05

Let's write the null and alternative hypotheses
Let μ be the mean delivert time of the products
The Manager will test the null hypothesis
H_0:\mu=5

against the alternate hypothesis 

H_a:\mu \geq 5

Are the assumptions of Z-test satisfied?

  • Samples are drawn from a normal distribution - since the sample is 45 (which is >30), Central Limit Theorem states that the distribution of sample means will be normal. If the sample size was less than 30, we would have been able to apply z test on if we knew that the population was normal.
  • Observations are from a simple random sample - we are informed that the manager collected a simple random sample 
  • Standard deviation is known - yes.

Voila! We can use Z-test for this problem

Next is to find Z test statistic 

z分数(z-score),也叫标准分数(standard score)是一个数与平均数的差再除以标准差的过程。在统计学中,标准分数是一个观测或数据点的值高于被观测值或测量值的平均值的标准偏差的符号数。
Set the values of population mean and population standard deviation to 5 and and 1.3 respectively 

mu,sigma=5,1.3
#1.3 standard deviation away from the population of 5 

#calculate the sample mean to 5.25
x_bar=5.25

#calculate the test statistic
#sample mean - population mean
test_stat=(x_bar - mu)/(sigma/np.sqrt(45))
test_stat
1.2900392177883402

Introduction of Rejection Acceptance Region / p-value

Though the probability is small, we cannot conclude whether the evidence is significant enough to reject the null hypothesis in favour of alternate hypothesis or not. To determine it, we use either one of the following approaches:

1. Rejection region approach

2. p-value approach

1. Rejection Region Approach

For this approach, we need to follow the below steps.

1. we choose a value of level significance (α).

α is the probability of rejecting the null hypothesis if it is true.

2. then we find the rejection region in the graph

3. We reject the  null hypothesis if the test statistics falls in the rejection region. Else, we don't reject the null hypothesis.

In the given example, the Z test statistic follows a standard normal distribution as shown in the above plot. The Z value lying in the right end of the distribution gives strong evidence against the null hypothesis. To find the rejection region, we will find the value of Z (called critical value) that gives an area of α to the right end. 

from scipy.stats import norm
critical_val=norm.ppf(1-.05) #the chance of being above it is 5%
critical_val
1.6448536269514722
x=np.linspace(-4,4,100)
plt.plot(x,norm.pdf(x,0,1))
plt.axvline(x=critical_val,c='r')
x1=np.linspace(critical_val,4,50)
plt.fill_between(x1,norm.pdf(x1,0,1),color='r')
plt.annotate('Reject Null',(2,0.20))
plt.annotate('Do Not Reject\n  Null',(-1,0.2))
plt.show()

 #as our test statistic (~1.29) does not lie in the rejection region, we cannot reject the null hypothesis. Thus we do not have statistical evidence to say that mean delivery time of a product is greater than 5 days.

#calculate the p-value
1-norm.cdf(test_stat)
0.09851852092578695

as 0.098 is greater than the level of significance 0.05, we cannot reject the null hypothesis. Thus we do not have statitical evidence to say that the mean delivery time of a product is greater than 5 days.

Key takeaway

  • We get the same result by using both the Rejection Region and P-value approach that the manager does not have enough statistical evidence to say that the mean delivert time of a product is greater than 5 days.

Exercise:

level of significance: It is the probability of rejecting the null hypothesis when it is true and it is fixed before hypothesis test

The p-value is the probability of observing the test statistic or more extreme results under the null hypothesis.

The z-test statistic follows a Standard Normal distribution

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值