统计学第七周统计学
import pandas as pd
import numpy as np
path = ‘E:\english\data.xlsx’
data=pd.read_excel(path)
######按照港口分类,计算数据的统计量
embark = data.groupby([‘Embarked’])
embark_basic=data.groupby([‘Embarked’]).agg([‘count’,‘min’,‘max’,‘median’,‘mean’,‘var’,‘std’])
age_basic=embark_basic[‘Age’]
fare_basic=embark_basic[‘Fare’]
age_basic
count min max median mean var std
Embarked
C 130 0.42 71.0 29.0 30.814769 238.234892 15.434860
Q 28 2.00 70.5 27.0 28.089286 286.130622 16.915396
S 554 0.67 80.0 28.0 29.445397 200.029876 14.143192
fare_basic
count min max median mean var std
Embarked
C 130 4.0125 512.3292 36.2521 68.296767 8200.719153 90.557822
Q 28 6.7500 90.0000 7.7500 18.265775 477.142064 21.843582
S 554 0.0000 263.0000 13.0000 27.476284 1335.636543 36.546362
验证年龄是否服从正态分布
import seaborn as sns
sns.set_palette(