文章目录
Pandas 数据类型
- Series 一维, 带标签数组
本质上是由两个数组组成, 一个数组构成对象的键(index,索引),一个数组构成对象的值(values)
ndarrary的很多方法可以运用于Series类型,比如argmax,clip
Series 具有where方法,但是结果和ndarrary不一样 - DataFrame 二维, Series容器
Series
生成Pandas数据
import pandas as pd
t1 = pd.Series([1,2,31,12,3,4])
print(t1)
print('*'*50)
t1 = t1.astype('float')
print(t1)
print('*'*50)
0 1
1 2
2 31
3 12
4 3
5 4
dtype: int64
**************************************************
a 1.0
b 2.0
c 31.0
d 12.0
e 3.0
f 4.0
dtype: float64
**************************************************
temp_dict = {'name':'HHVic','age':19,'tel':800800}
t3 = pd.Series(temp_dict)
print(t3)
print('*'*50)
a = {string.ascii_uppercase[i]:i for i in range(10)}
print(pd.Series(a))
b= pd.Series(a, index=list(string.ascii_uppercase[5:15]))
print(b)
print('*'*50)
name HHVic
age 19
tel 800800
dtype: object
**************************************************
A 0
B 1
C 2
D 3
E 4
F 5
G 6
H 7
I 8
J 9
dtype: int64
F 5.0
G 6.0
H 7.0
I 8.0
J 9.0
K NaN
L NaN
M NaN
N NaN
O NaN
dtype: float64
**************************************************
Pandas之Series切片和索引
import pandas as pd
import string
temp_dict = {'name':'HHVic','age':19,'tel':800800}
t3 = pd.Series(temp_dict)
print(t3)
print('*'*50)
print(t3['age'])
print('*'*50)
print(t3[0])
print('*'*50)
#取前两行
print(t3[[1,2]])
print('*'*50)
a = {string.ascii_uppercase[i]:i for i in range(10)}
b = pd.Series(a)
print(b)
print('*'*50)
print(b[['A','F']])
name HHVic
age 19
tel 800800
dtype: object
**************************************************
19
**************************************************
HHVic
**************************************************
age 19
tel 800800
dtype: object
**************************************************
A 0
B 1
C 2
D 3
E 4
F 5
G 6
H 7
I 8
J 9
dtype: int64
**************************************************
A 0
F 5
dtype: int64
运算符:
t1 = pd.Series([1,2,31,12,3,4],index=list('abcdef'))
print(t1)
print(t1[t1>10])
a 1
b 2
c 31
d 12
e 3
f 4
dtype: int64
c 31
d 12
dtype: int64
对于一个陌生的Series, 或者索引和值的方法:
a = {string.ascii_uppercase[i]:i for i in range(10)}
b = pd.Series(a)
print(b)
print('*'*50)
print(b.index)
print(b.values)
A 0
B 1
C 2
D 3
E 4
F 5
G 6
H 7
I 8
J 9
dtype: int64
**************************************************
Index(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'], dtype='object')
[0 1 2 3 4 5 6 7 8 9]
Pandas之读取外部数据
import pandas as pd
df = pd.read_csv('dogNames2.csv')
print(df)
Row_Labels Count_AnimalName
0 1 1
1 2 2
2 40804 1
3 90201 1
4 90203 1
... ... ...
16215 37916 1
16216 38282 1
16217 38583 1
16218 38948 1
16219 39743 1
[16220 rows x 2 columns]
Dataframe
创建Dataframe
DataFrame对象既有行索引, 也有列索引
行索引: index,0轴 axis=0
列索引:columns, 1轴, axis=1
import numpy as np
import pandas as pd
t = pd.DataFrame(np.arange(12).reshape((3,4)))
print(t)
0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
更改索引
t1 = pd.DataFrame(np.arange(12).reshape((3,4)),index=list('abc'),columns=list('WXYZ'))
print(t1)
print('*'*60)
************************************************************
W X Y Z
a 0 1 2 3
b 4 5 6 7
c 8 9 10 11
************************************************************
传入字典
d1 = {'name':['HHVic','HHV'],'age':[20,21],'tel':[10010,10086]}
d1 = pd.DataFrame(d1)
print(d1)
print(type(d1))
print('*'*60)
************************************************************
name age tel
0 HHVic 20 10010
1 HHV 21 10086
<class 'pandas.core.frame.DataFrame'>
************************************************************
d2 = [{'name':'HHVic','age':22,'tel':10010},{'name':'HHVV','tel':10000},{'name':'HHHHV','age':21}]
print(d2)
print('*'*60)
d2 = pd.DataFrame(d2)
print(d2)
************************************************************
[{'name': 'HHVic', 'age': 22, 'tel': 10010}, {'name': 'HHVV', 'tel': 10000}, {'name': 'HHHHV', 'age': 21}]
************************************************************
name age tel
0 HHVic 22.0 10010.0
1 HHVV NaN 10000.0
2 HHHHV 21.0 NaN
Dataframe基础属性
命令 | 属性 |
---|---|
df.shape | 行数,列数 |
df.dtypes | 列数据类型 |
df.ndim | 数据维度 |
df.index | 行索引 |
df.columns | 列索引 |
df.values | 对象值, 二维ndarrary数组 |
df.head(3) | 显示头部几行,默认5行 |
df.tail(3) | 显示末尾几行,默认5行 |
df.info() | 相关信息概览:行数,列数,列索引,列非空个数,列类型,内存占用 |
df.describe | 快速综合统计结果:计数,均值,标准差,最大值,四分位数,最小值 |
Dataframe案例
查看哪个狗狗名字使用率最高
df = pd.read_csv('dogNames2.csv')
#DataFrame排序的方法
df=df.sort_values(by='Count_AnimalName',ascending=False)
print(df.head(5))
Row_Labels Count_AnimalName
1156 BELLA 1195
9140 MAX 1153
2660 CHARLIE 856
3251 COCO 852
12368 ROCKY 823
Pandas之取行或取列
#Pandas取行或者列的注意点
# - 方括号写数组, 表示取行, 对行进行操作
# - 写字符串, 表示取列索引,对列进行操作
print(df[:20]) #取前二十
print('*'*60)
print(df[:20]['Row_Labels']) #单独取某一列的前20
print('*'*60)
************************************************************
Row_Labels Count_AnimalName
1156 BELLA 1195
9140 MAX 1153
2660 CHARLIE 856
3251 COCO 852
12368 ROCKY 823
8417 LOLA 795
8552 LUCKY 723
8560 LUCY 710
2032 BUDDY 677
3641 DAISY 649
11703 PRINCESS 603
829 BAILEY 532
9766 MOLLY 519
14466 TEDDY 485
2913 CHLOE 465
14779 TOBY 446
8620 LUNA 432
6515 JACK 425
8788 MAGGIE 393
13762 SOPHIE 383
************************************************************
1156 BELLA
9140 MAX
2660 CHARLIE
3251 COCO
12368 ROCKY
8417 LOLA
8552 LUCKY
8560 LUCY
2032 BUDDY
3641 DAISY
11703 PRINCESS
829 BAILEY
9766 MOLLY
14466 TEDDY
2913 CHLOE
14779 TOBY
8620 LUNA
6515 JACK
8788 MAGGIE
13762 SOPHIE
Name: Row_Labels, dtype: object
Pandas之loc
import pandas as pd
import numpy as np
t3 = pd.DataFrame(np.arange(12).reshape(3,4), index=list('abc'),columns=list('WXYZ'))
print(t3)
print('*'*60)
print(t3.loc['a','Z']) #取某个数值
print('*'*60)
print(t3.loc['a']) #取a行,生成Series数据
print('*'*60)
print(t3.loc['a',:]) #同取a行,生成Series数据
print('*'*60)
print(t3.loc[:,'Y']) #取Y列
#print(t3.loc['Y']) #报错
print('*'*60)
print(t3.loc[['a','c'],:]) #取a c行
print('*'*60)
print(t3.loc[:,['W','Z']])
print('*'*60)
print(t3.loc[['a','b'],['W','Z']])
print('*'*60)
print(t3.loc['a':'c',['W','Z']]) # 冒号在loc里面是闭合的 ,可以取到c行的数据
W X Y Z
a 0 1 2 3
b 4 5 6 7
c 8 9 10 11
************************************************************
3
************************************************************
W 0
X 1
Y 2
Z 3
Name: a, dtype: int32
************************************************************
W 0
X 1
Y 2
Z 3
Name: a, dtype: int32
************************************************************
a 2
b 6
c 10
Name: Y, dtype: int32
************************************************************
W X Y Z
a 0 1 2 3
c 8 9 10 11
************************************************************
W Z
a 0 3
b 4 7
c 8 11
************************************************************
W Z
a 0 3
b 4 7
************************************************************
W Z
a 0 3
b 4 7
c 8 11
Pandas之iloc
import pandas as pd
import numpy as np
t3 = pd.DataFrame(np.arange(12).reshape(3,4), index=list('abc'),columns=list('WXYZ'))
print(t3)
print('*'*60)
print(t3.iloc[1]) #取第二行
print('*'*60)
print(t3.iloc[0]) #取第一行
print('*'*60)
print(t3.iloc[1,:])#取第二行
print('*'*60)
print(t3.iloc[:,2])#取第三列
print('*'*60)
print(t3.iloc[[0,2],[2,1]])#取第一行和第三行以及第三列和第二列
print('*'*60)
print(t3.iloc[1:,:2])#取第二行以后和第三列之前交叉的数值
print('*'*60)
t3.iloc[1:,:2] =100
print(t3)
print('*'*60)
t3.iloc[1:,:2] =np.nan
print(t3)
W X Y Z
a 0 1 2 3
b 4 5 6 7
c 8 9 10 11
************************************************************
W 4
X 5
Y 6
Z 7
Name: b, dtype: int32
************************************************************
W 0
X 1
Y 2
Z 3
Name: a, dtype: int32
************************************************************
W 4
X 5
Y 6
Z 7
Name: b, dtype: int32
************************************************************
a 2
b 6
c 10
Name: Y, dtype: int32
************************************************************
Y X
a 2 1
c 10 9
************************************************************
W X
b 4 5
c 8 9
************************************************************
W X Y Z
a 0 1 2 3
b 100 100 6 7
c 100 100 10 11
************************************************************
W X Y Z
a 0.0 1.0 2 3
b NaN NaN 6 7
c NaN NaN 10 11
Pandas之布尔索引
列出狗狗名字使用次数大于800的数据
import numpy as np
import pandas as pd
df = pd.read_csv('dogNames2.csv')
#DataFrame排序的方法
df=df.sort_values(by='Count_AnimalName',ascending=False)
t1 = print(df[df['Count_AnimalName']>800])
print(t1)
Row_Labels Count_AnimalName
1156 BELLA 1195
9140 MAX 1153
2660 CHARLIE 856
3251 COCO 852
12368 ROCKY 823
None
import numpy as np
import pandas as pd
df = pd.read_csv('dogNames2.csv')
#DataFrame排序的方法
df=df.sort_values(by='Count_AnimalName',ascending=False)
t1 = print(df[df['Count_AnimalName']>800])
print(t1)
print('*'*60)
t2 = print(df[(800<df['Count_AnimalName'])&(df['Count_AnimalName']<1000)]) #且
print(t2)
print('*'*60)
t2 = print(df[(800<df['Count_AnimalName'])|(df['Count_AnimalName']<1000)]) #或
print(t2)
print('*'*60)
#列出使用次数超过700并且名字字符串长度大于4
t3 = print(df[(700<df['Count_AnimalName'])&(df['Row_Labels'].str.len()>4)])
print(t3)
print('*'*60)
Row_Labels Count_AnimalName
1156 BELLA 1195
9140 MAX 1153
2660 CHARLIE 856
3251 COCO 852
12368 ROCKY 823
None
************************************************************
Row_Labels Count_AnimalName
2660 CHARLIE 856
3251 COCO 852
12368 ROCKY 823
None
************************************************************
Row_Labels Count_AnimalName
1156 BELLA 1195
9140 MAX 1153
2660 CHARLIE 856
3251 COCO 852
12368 ROCKY 823
... ... ...
6881 JJUJJU 1
6882 JJYODAA 1
6883 J-K 1
6884 J-LO 1
8106 LEELO 1
[16212 rows x 2 columns]
None
************************************************************
Row_Labels Count_AnimalName
1156 BELLA 1195
2660 CHARLIE 856
12368 ROCKY 823
8552 LUCKY 723
None
************************************************************
删除nan数据方法
import pandas as pd
import numpy as np
t3 = pd.DataFrame(np.arange(12).reshape(3,4), index=list('abc'),columns=list('WXYZ'))
print('*'*60)
t3.iloc[1:,:2] =np.nan
print(t3)
print('*'*60)
print(pd.isnull(t3))
print('*'*60)
print(pd.isnull(t3))
print('*'*60)
print(pd.notnull(t3))
print('*'*60)
t4 = t3[pd.notnull(t3['W'])]
print(t4)
print('*'*60)
t5 = t3.dropna(axis=0,how='all') #删除全部为nan的数据
print(t5)
print('*'*60)
t5 = t3.dropna(axis=0,how='any') #删除包含nan的数据
print(t5)
print('*'*60)
t5 = t3.dropna(axis=0,how='any',inplace=True) #删除包含nan的数据
print(t5)
print('*'*60)
************************************************************
W X Y Z
a 0.0 1.0 2 3
b NaN NaN 6 7
c NaN NaN 10 11
************************************************************
W X Y Z
a False False False False
b True True False False
c True True False False
************************************************************
W X Y Z
a False False False False
b True True False False
c True True False False
************************************************************
W X Y Z
a True True True True
b False False True True
c False False True True
************************************************************
W X Y Z
a 0.0 1.0 2 3
************************************************************
W X Y Z
a 0.0 1.0 2 3
b NaN NaN 6 7
c NaN NaN 10 11
************************************************************
W X Y Z
a 0.0 1.0 2 3
************************************************************
修改nan数据方法
import numpy as np
import pandas as pd
d2 = [{'name':'HHVic','age':22,'tel':10010},{'name':'HHVV','tel':10000},{'name':'HHHHV','age':21}]
print(d2)
print('*'*60)
d2 = pd.DataFrame(d2)
print(d2)
print('*'*60)
print(d2.fillna(100))
print('*'*60)
print(d2.fillna(d2.mean())) #自动填充平均值
print('*'*60)
print(d2['age'].fillna(d2['age'].mean()))
print('*'*60)
[{'name': 'HHVic', 'age': 22, 'tel': 10010}, {'name': 'HHVV', 'tel': 10000}, {'name': 'HHHHV', 'age': 21}]
************************************************************
name age tel
0 HHVic 22.0 10010.0
1 HHVV NaN 10000.0
2 HHHHV 21.0 NaN
************************************************************
name age tel
0 HHVic 22.0 10010.0
1 HHVV 100.0 10000.0
2 HHHHV 21.0 100.0
************************************************************
name age tel
0 HHVic 22.0 10010.0
1 HHVV 21.5 10000.0
2 HHHHV 21.0 10005.0
************************************************************
0 22.0
1 21.5
2 21.0
Name: age, dtype: float64
************************************************************
Pandas案例1
一组从2006年到2016年1000部最流行的电影数据,想知道这些电影数据中心评分的平均分,导演的人数等信息.
import pandas as pd
import numpy as np
file_path='IMDB-Movie-Data.csv'
df = pd.read_csv(file_path)
print(df.info())
print('*'*70)
print(df.head(1))
#获取平均分
print('The average of Rating is :{}'.format(df['Rating'].mean()))
#获取导演的人数
print('The amount of Director is (Round1) : {}'.format(len(set(df['Director'].tolist()))))
#获取导演的人数(第二种方法)
print('The amount of Director is (Round2) : {}'.format(len(df['Director'].unique())))
#获取演员的人数
temp_actors_lsit = df['Actors'].str.split(',').tolist()
actors_list = [i for j in temp_actors_lsit for i in j]
# actors_list = np.array(temp_actors_lsit).flatten().tolist() #方法不可用
actors_num = len(set(actors_list))
print('The amount of Actors is (Round2) : {}'.format(actors_num))
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Rank 1000 non-null int64
1 Title 1000 non-null object
2 Genre 1000 non-null object
3 Description 1000 non-null object
4 Director 1000 non-null object
5 Actors 1000 non-null object
6 Year 1000 non-null int64
7 Runtime (Minutes) 1000 non-null int64
8 Rating 1000 non-null float64
9 Votes 1000 non-null int64
10 Revenue (Millions) 872 non-null float64
11 Metascore 936 non-null float64
dtypes: float64(3), int64(4), object(5)
memory usage: 93.9+ KB
None
**********************************************************************
Rank Title ... Revenue (Millions) Metascore
0 1 Guardians of the Galaxy ... 333.13 76.0
[1 rows x 12 columns]
The average of Rating is :6.723199999999999
The amount of Director is (Round1) : 644
The amount of Director is (Round2) : 644
The amount of Actors is (Round2) : 2394
Pandas案例2
Rating以及runtime的分布情况,如何呈现.
Runtime分布情况:
import pandas as pd
from matplotlib import pyplot as plt
file_path = 'IMDB-Movie-Data.csv'
pd.read_csv(file_path)
df = pd.read_csv(file_path)
print(df.head(1))
print(df.info())
#Rating分布情况
#选择图形 直方图
#准备数据
runtime_data=df['Runtime (Minutes)'].values
max_runtime = runtime_data.max()
min_runtime = runtime_data.min()
#计算组数
print(max_runtime-min_runtime)
num_bin=(max_runtime-min_runtime)//5
#设置图形大小
plt.figure(figsize=(20,8),dpi=80)
plt.hist(runtime_data,num_bin)
plt.xticks(range(min_runtime,max_runtime+5,5))
plt.show()
plt.savefig('day5-1.png')
Rating分布情况
runtime_data = np.array([8.1, 7.0, 7.3, 7.2, 6.2, 6.1, 8.3, 6.4, 7.1, 7.0, 7.5, 7.8, 7.9, 7.7, 6.4, 6.6, 8.2, 6.7, 8.1, 8.0, 6.7, 7.9, 6.7, 6.5, 5.3, 6.8, 8.3, 4.7, 6.2, 5.9, 6.3, 7.5, 7.1, 8.0, 5.6, 7.9, 8.6, 7.6, 6.9, 7.1, 6.3, 7.5, 2.7, 7.2, 6.3, 6.7, 7.3, 5.6, 7.1, 3.7, 8.1, 5.8, 5.6, 7.2, 9.0, 7.3, 7.2, 7.4, 7.0, 7.5, 6.7, 6.8, 6.5, 4.1, 8.5, 7.7, 7.4, 8.1, 7.5, 7.2, 5.9, 7.1, 7.5, 6.8, 8.1, 7.1, 8.1, 8.3, 7.3, 5.3, 8.8, 7.9, 8.2, 8.1, 7.2, 7.0, 6.4, 7.8, 7.8, 7.4, 8.1, 7.0, 8.1, 7.1, 7.4, 7.4, 8.6, 5.8, 6.3, 8.5, 7.0, 7.0, 8.0, 7.9, 7.3, 7.7, 5.4, 6.3, 5.8, 7.7, 6.3, 8.1, 6.1, 7.7, 8.1, 5.8, 6.2, 8.8, 7.2, 7.4, 6.7, 6.7, 6.0, 7.4, 8.5, 7.5, 5.7, 6.6, 6.4, 8.0, 7.3, 6.0, 6.4, 8.5, 7.1, 7.3, 8.1, 7.3, 8.1, 7.1, 8.0, 6.2, 7.8, 8.2, 8.4, 8.1, 7.4, 7.6, 7.6, 6.2, 6.4, 7.2, 5.8, 7.6, 8.1, 4.7, 7.0, 7.4, 7.5, 7.9, 6.0, 7.0, 8.0, 6.1, 8.0, 5.2, 6.5, 7.3, 7.3, 6.8, 7.9, 7.9, 5.2, 8.0, 7.5, 6.5, 7.6, 7.0, 7.4, 7.3, 6.7, 6.8, 7.0, 5.9, 8.0, 6.0, 6.3, 6.6, 7.8, 6.3, 7.2, 5.6, 8.1, 5.8, 8.2, 6.9, 6.3, 8.1, 8.1, 6.3, 7.9, 6.5, 7.3, 7.9, 5.7, 7.8, 7.5, 7.5, 6.8, 6.7, 6.1, 5.3, 7.1, 5.8, 7.0, 5.5, 7.8, 5.7, 6.1, 7.7, 6.7, 7.1, 6.9, 7.8, 7.0, 7.0, 7.1, 6.4, 7.0, 4.8, 8.2, 5.2, 7.8, 7.4, 6.1, 8.0, 6.8, 3.9, 8.1, 5.9, 7.6, 8.2, 5.8, 6.5, 5.9, 7.6, 7.9, 7.4, 7.1, 8.6, 4.9, 7.3, 7.9, 6.7, 7.5, 7.8, 5.8, 7.6, 6.4, 7.1, 7.8, 8.0, 6.2, 7.0, 6.0, 4.9, 6.0, 7.5, 6.7, 3.7, 7.8, 7.9, 7.2, 8.0, 6.8, 7.0, 7.1, 7.7, 7.0, 7.2, 7.3, 7.6, 7.1, 7.0, 6.0, 6.1, 5.8, 5.3, 5.8, 6.1, 7.5, 7.2, 5.7, 7.7, 7.1, 6.6, 5.7, 6.8, 7.1, 8.1, 7.2, 7.5, 7.0, 5.5, 6.4, 6.7, 6.2, 5.5, 6.0, 6.1, 7.7, 7.8, 6.8, 7.4, 7.5, 7.0, 5.2, 5.3, 6.2, 7.3, 6.5, 6.4, 7.3, 6.7, 7.7, 6.0, 6.0, 7.4, 7.0, 5.4, 6.9, 7.3, 8.0, 7.4, 8.1, 6.1, 7.8, 5.9, 7.8, 6.5, 6.6, 7.4, 6.4, 6.8, 6.2, 5.8, 7.7, 7.3, 5.1, 7.7, 7.3, 6.6, 7.1, 6.7, 6.3, 5.5, 7.4, 7.7, 6.6, 7.8, 6.9, 5.7, 7.8, 7.7, 6.3, 8.0, 5.5, 6.9, 7.0, 5.7, 6.0, 6.8, 6.3, 6.7, 6.9, 5.7, 6.9, 7.6, 7.1, 6.1, 7.6, 7.4, 6.6, 7.6, 7.8, 7.1, 5.6, 6.7, 6.7, 6.6, 6.3, 5.8, 7.2, 5.0, 5.4, 7.2, 6.8, 5.5, 6.0, 6.1, 6.4, 3.9, 7.1, 7.7, 6.7, 6.7, 7.4, 7.8, 6.6, 6.1, 7.8, 6.5, 7.3, 7.2, 5.6, 5.4, 6.9, 7.8, 7.7, 7.2, 6.8, 5.7, 5.8, 6.2, 5.9, 7.8, 6.5, 8.1, 5.2, 6.0, 8.4, 4.7, 7.0, 7.4, 6.4, 7.1, 7.1, 7.6, 6.6, 5.6, 6.3, 7.5, 7.7, 7.4, 6.0, 6.6, 7.1, 7.9, 7.8, 5.9, 7.0, 7.0, 6.8, 6.5, 6.1, 8.3, 6.7, 6.0, 6.4, 7.3, 7.6, 6.0, 6.6, 7.5, 6.3, 7.5, 6.4, 6.9, 8.0, 6.7, 7.8, 6.4, 5.8, 7.5, 7.7, 7.4, 8.5, 5.7, 8.3, 6.7, 7.2, 6.5, 6.3, 7.7, 6.3, 7.8, 6.7, 6.7, 6.6, 8.0, 6.5, 6.9, 7.0, 5.3, 6.3, 7.2, 6.8, 7.1, 7.4, 8.3, 6.3, 7.2, 6.5, 7.3, 7.9, 5.7, 6.5, 7.7, 4.3, 7.8, 7.8, 7.2, 5.0, 7.1, 5.7, 7.1, 6.0, 6.9, 7.9, 6.2, 7.2, 5.3, 4.7, 6.6, 7.0, 3.9, 6.6, 5.4, 6.4, 6.7, 6.9, 5.4, 7.0, 6.4, 7.2, 6.5, 7.0, 5.7, 7.3, 6.1, 7.2, 7.4, 6.3, 7.1, 5.7, 6.7, 6.8, 6.5, 6.8, 7.9, 5.8, 7.1, 4.3, 6.3, 7.1, 4.6, 7.1, 6.3, 6.9, 6.6, 6.5, 6.5, 6.8, 7.8, 6.1, 5.8, 6.3, 7.5, 6.1, 6.5, 6.0, 7.1, 7.1, 7.8, 6.8, 5.8, 6.8, 6.8, 7.6, 6.3, 4.9, 4.2, 5.1, 5.7, 7.6, 5.2, 7.2, 6.0, 7.3, 7.2, 7.8, 6.2, 7.1, 6.4, 6.1, 7.2, 6.6, 6.2, 7.9, 7.3, 6.7, 6.4, 6.4, 7.2, 5.1, 7.4, 7.2, 6.9, 8.1, 7.0, 6.2, 7.6, 6.7, 7.5, 6.6, 6.3, 4.0, 6.9, 6.3, 7.3, 7.3, 6.4, 6.6, 5.6, 6.0, 6.3, 6.7, 6.0, 6.1, 6.2, 6.7, 6.6, 7.0, 4.9, 8.4, 7.0, 7.5, 7.3, 5.6, 6.7, 8.0, 8.1, 4.8, 7.5, 5.5, 8.2, 6.6, 3.2, 5.3, 5.6, 7.4, 6.4, 6.8, 6.7, 6.4, 7.0, 7.9, 5.9, 7.7, 6.7, 7.0, 6.9, 7.7, 6.6, 7.1, 6.6, 5.7, 6.3, 6.5, 8.0, 6.1, 6.5, 7.6, 5.6, 5.9, 7.2, 6.7, 7.2, 6.5, 7.2, 6.7, 7.5, 6.5, 5.9, 7.7, 8.0, 7.6, 6.1, 8.3, 7.1, 5.4, 7.8, 6.5, 5.5, 7.9, 8.1, 6.1, 7.3, 7.2, 5.5, 6.5, 7.0, 7.1, 6.6, 6.5, 5.8, 7.1, 6.5, 7.4, 6.2, 6.0, 7.6, 7.3, 8.2, 5.8, 6.5, 6.6, 6.2, 5.8, 6.4, 6.7, 7.1, 6.0, 5.1, 6.2, 6.2, 6.6, 7.6, 6.8, 6.7, 6.3, 7.0, 6.9, 6.6, 7.7, 7.5, 5.6, 7.1, 5.7, 5.2, 5.4, 6.6, 8.2, 7.6, 6.2, 6.1, 4.6, 5.7, 6.1, 5.9, 7.2, 6.5, 7.9, 6.3, 5.0, 7.3, 5.2, 6.6, 5.2, 7.8, 7.5, 7.3, 7.3, 6.6, 5.7, 8.2, 6.7, 6.2, 6.3, 5.7, 6.6, 4.5, 8.1, 5.6, 7.3, 6.2, 5.1, 4.7, 4.8, 7.2, 6.9, 6.5, 7.3, 6.5, 6.9, 7.8, 6.8, 4.6, 6.7, 6.4, 6.0, 6.3, 6.6, 7.8, 6.6, 6.2, 7.3, 7.4, 6.5, 7.0, 4.3, 7.2, 6.2, 6.2, 6.8, 6.0, 6.6, 7.1, 6.8, 5.2, 6.7, 6.2, 7.0, 6.3, 7.8, 7.6, 5.4, 7.6, 5.4, 4.6, 6.9, 6.8, 5.8, 7.0, 5.8, 5.3, 4.6, 5.3, 7.6, 1.9, 7.2, 6.4, 7.4, 5.7, 6.4, 6.3, 7.5, 5.5, 4.2, 7.8, 6.3, 6.4, 7.1, 7.1, 6.8, 7.3, 6.7, 7.8, 6.3, 7.5, 6.8, 7.4, 6.8, 7.1, 7.6, 5.9, 6.6, 7.5, 6.4, 7.8, 7.2, 8.4, 6.2, 7.1, 6.3, 6.5, 6.9, 6.9, 6.6, 6.9, 7.7, 2.7, 5.4, 7.0, 6.6, 7.0, 6.9, 7.3, 5.8, 5.8, 6.9, 7.5, 6.3, 6.9, 6.1, 7.5, 6.8, 6.5, 5.5, 7.7, 3.5, 6.2, 7.1, 5.5, 7.1, 7.1, 7.1, 7.9, 6.5, 5.5, 6.5, 5.6, 6.8, 7.9, 6.2, 6.2, 6.7, 6.9, 6.5, 6.6, 6.4, 4.7, 7.2, 7.2, 6.7, 7.5, 6.6, 6.7, 7.5, 6.1, 6.4, 6.3, 6.4, 6.8, 6.1, 4.9, 7.3, 5.9, 6.1, 7.1, 5.9, 6.8, 5.4, 6.3, 6.2, 6.6, 4.4, 6.8, 7.3, 7.4, 6.1, 4.9, 5.8, 6.1, 6.4, 6.9, 7.2, 5.6, 4.9, 6.1, 7.8, 7.3, 4.3, 7.2, 6.4, 6.2, 5.2, 7.7, 6.2, 7.8, 7.0, 5.9, 6.7, 6.3, 6.9, 7.0, 6.7, 7.3, 3.5, 6.5, 4.8, 6.9, 5.9, 6.2, 7.4, 6.0, 6.2, 5.0, 7.0, 7.6, 7.0, 5.3, 7.4, 6.5, 6.8, 5.6, 5.9, 6.3, 7.1, 7.5, 6.6, 8.5, 6.3, 5.9, 6.7, 6.2, 5.5, 6.2, 5.6, 5.3])
max_runtime = runtime_data.max()
min_runtime = runtime_data.min()
print(min_runtime,max_runtime)
#设置不等宽的组距,hist方法中取到的会是一个左闭右开的去见[1.9,3.5)
num_bin_list = [1.9,3.5]
i=3.5
while i<=max_runtime:
i += 0.5
num_bin_list.append(i)
print(num_bin_list)
#设置图形的大小
plt.figure(figsize=(20,8),dpi=80)
plt.hist(runtime_data,num_bin_list)
#xticks让之前的组距能够对应上
plt.xticks(num_bin_list)
plt.savefig('day5-2.png')
plt.show()