单因素方差分析
# 呷哺呷哺3个城市不同用户评分 from scipy.stats import f_oneway a = [10,9,9,8,8,7,7,8,8,9] b = [10,8,9,8,7,7,7,8,9,9] c = [9,9,8,8,8,7,6,9,8,9] f,p = f_oneway(a,b,c) print (f)
0.101503759398 0.903820890369
其中p值为0.903820890369
不能认为所检验的因素对观察值有显著影响
多因素方差分析
# # 呷哺呷哺2个因素:环境等级,食材等级 from scipy import stats import pandas as pd import numpy as np from statsmodels.formula.api import ols from statsmodels.stats.anova import anova_lm environmental = [5,5,5,5,5,4,4,4,4,4,3,3,3,3,3,2,2,2,2,2,1,1,1,1,1] ingredients = [5,4,3,2,1,5,4,3,2,1,5,4,3,2,1,5,4,3,2,1,5,4,3,2,1] score = [5,5,4,3,2,5,4,4,3,2,4,4,3,3,2,4,3,2,2,2,3,3,3,2,1] data = {'E':environmental, 'I':ingredients, 'S':score} df = pd.DataFrame(data) df.head()
符号意义:
- (~)隔离因变量和自变量 (左边因变量,右边自变量 )
- (+)分隔各个自变量
- (:)表示两个自变量交互影响
formula = 'S~E+I+E:I' model = ols(formula,df).fit() results = anova_lm(model) print (results)
df sum_sq mean_sq F PR(>F) E 1.0 7.22 7.220000 54.539568 2.896351e-07 I 1.0 18.00 18.000000 135.971223 1.233581e-10 E:I 1.0 0.64 0.640000 4.834532 3.924030e-02 Residual 21.0 2.78 0.132381 NaN NaN
说明: E的F值2.896351e-07,I的F值1.233581e-10很小,E和I对结果有显著影响,之间并无交互