Data Analysis - Day5 - Pandas

最新推荐文章于 2024-07-09 23:56:48 发布

HHVic

最新推荐文章于 2024-07-09 23:56:48 发布

阅读量99

点赞数 1

分类专栏： Data Analysis 文章标签： python 数据分析

本文链接：https://blog.csdn.net/landian0531/article/details/117201587

版权

Data Analysis 专栏收录该内容

9 篇文章 1 订阅

订阅专栏

文章目录

Pandas 数据类型
Pandas之取行或取列
Pandas案例1
Pandas案例2

Pandas 数据类型

Series 一维, 带标签数组
本质上是由两个数组组成, 一个数组构成对象的键(index,索引),一个数组构成对象的值(values)
ndarrary的很多方法可以运用于Series类型,比如argmax,clip
Series 具有where方法,但是结果和ndarrary不一样
DataFrame 二维, Series容器

Series

生成Pandas数据

import pandas as pd

t1 = pd.Series([1,2,31,12,3,4])
print(t1)
print('*'*50)
t1 = t1.astype('float')
print(t1)
print('*'*50)

0     1
1     2
2    31
3    12
4     3
5     4
dtype: int64
**************************************************
a     1.0
b     2.0
c    31.0
d    12.0
e     3.0
f     4.0
dtype: float64
**************************************************

temp_dict = {'name':'HHVic','age':19,'tel':800800}
t3 = pd.Series(temp_dict)
print(t3)
print('*'*50)

a = {string.ascii_uppercase[i]:i for i in range(10)}
print(pd.Series(a))

b= pd.Series(a, index=list(string.ascii_uppercase[5:15]))

print(b)
print('*'*50)


name     HHVic
age         19
tel     800800
dtype: object
**************************************************
A    0
B    1
C    2
D    3
E    4
F    5
G    6
H    7
I    8
J    9
dtype: int64
F    5.0
G    6.0
H    7.0
I    8.0
J    9.0
K    NaN
L    NaN
M    NaN
N    NaN
O    NaN
dtype: float64
**************************************************

Pandas之Series切片和索引

import pandas as pd
import string

temp_dict = {'name':'HHVic','age':19,'tel':800800}
t3 = pd.Series(temp_dict)
print(t3)
print('*'*50)
print(t3['age'])
print('*'*50)
print(t3[0])
print('*'*50)
#取前两行
print(t3[[1,2]])
print('*'*50)

a = {string.ascii_uppercase[i]:i for i in range(10)}
b = pd.Series(a)
print(b)
print('*'*50)
print(b[['A','F']])

name     HHVic
age         19
tel     800800
dtype: object
**************************************************
19
**************************************************
HHVic
**************************************************
age        19
tel    800800
dtype: object
**************************************************
A    0
B    1
C    2
D    3
E    4
F    5
G    6
H    7
I    8
J    9
dtype: int64
**************************************************
A    0
F    5
dtype: int64

运算符:

t1 = pd.Series([1,2,31,12,3,4],index=list('abcdef'))
print(t1)
print(t1[t1>10])

a     1
b     2
c    31
d    12
e     3
f     4
dtype: int64
c    31
d    12
dtype: int64

对于一个陌生的Series, 或者索引和值的方法:

a = {string.ascii_uppercase[i]:i for i in range(10)}
b = pd.Series(a)
print(b)
print('*'*50)
print(b.index)
print(b.values)

A    0
B    1
C    2
D    3
E    4
F    5
G    6
H    7
I    8
J    9
dtype: int64
**************************************************
Index(['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'], dtype='object')
[0 1 2 3 4 5 6 7 8 9]

Pandas之读取外部数据

import pandas as pd
df = pd.read_csv('dogNames2.csv')
print(df)

      Row_Labels  Count_AnimalName
0              1                 1
1              2                 2
2          40804                 1
3          90201                 1
4          90203                 1
...          ...               ...
16215      37916                 1
16216      38282                 1
16217      38583                 1
16218      38948                 1
16219      39743                 1

[16220 rows x 2 columns]

Dataframe

创建Dataframe

DataFrame对象既有行索引, 也有列索引

行索引: index,0轴 axis=0
列索引:columns, 1轴, axis=1

import numpy as np
import pandas as pd
t = pd.DataFrame(np.arange(12).reshape((3,4)))
print(t)

   0  1   2   3
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11

更改索引

t1 = pd.DataFrame(np.arange(12).reshape((3,4)),index=list('abc'),columns=list('WXYZ'))
print(t1)
print('*'*60)

************************************************************
   W  X   Y   Z
a  0  1   2   3
b  4  5   6   7
c  8  9  10  11
************************************************************

传入字典

d1 = {'name':['HHVic','HHV'],'age':[20,21],'tel':[10010,10086]}
d1 = pd.DataFrame(d1)
print(d1)
print(type(d1))
print('*'*60)

************************************************************
    name  age    tel
0  HHVic   20  10010
1    HHV   21  10086
<class 'pandas.core.frame.DataFrame'>
************************************************************

d2 = [{'name':'HHVic','age':22,'tel':10010},{'name':'HHVV','tel':10000},{'name':'HHHHV','age':21}]
print(d2)
print('*'*60)
d2 = pd.DataFrame(d2)
print(d2)

************************************************************
[{'name': 'HHVic', 'age': 22, 'tel': 10010}, {'name': 'HHVV', 'tel': 10000}, {'name': 'HHHHV', 'age': 21}]
************************************************************
    name   age      tel
0  HHVic  22.0  10010.0
1   HHVV   NaN  10000.0
2  HHHHV  21.0      NaN

Dataframe基础属性

命令	属性
df.shape	行数,列数
df.dtypes	列数据类型
df.ndim	数据维度
df.index	行索引
df.columns	列索引
df.values	对象值, 二维ndarrary数组
df.head(3)	显示头部几行,默认5行
df.tail(3)	显示末尾几行,默认5行
df.info()	相关信息概览:行数,列数,列索引,列非空个数,列类型,内存占用
df.describe	快速综合统计结果:计数,均值,标准差,最大值,四分位数,最小值

Dataframe案例

查看哪个狗狗名字使用率最高

df = pd.read_csv('dogNames2.csv')
#DataFrame排序的方法
df=df.sort_values(by='Count_AnimalName',ascending=False)
print(df.head(5))

      Row_Labels  Count_AnimalName
1156       BELLA              1195
9140         MAX              1153
2660     CHARLIE               856
3251        COCO               852
12368      ROCKY               823

Pandas之取行或取列

#Pandas取行或者列的注意点
# - 方括号写数组, 表示取行, 对行进行操作
# - 写字符串, 表示取列索引,对列进行操作
print(df[:20])  #取前二十
print('*'*60)
print(df[:20]['Row_Labels']) #单独取某一列的前20
print('*'*60)

************************************************************
      Row_Labels  Count_AnimalName
1156       BELLA              1195
9140         MAX              1153
2660     CHARLIE               856
3251        COCO               852
12368      ROCKY               823
8417        LOLA               795
8552       LUCKY               723
8560        LUCY               710
2032       BUDDY               677
3641       DAISY               649
11703   PRINCESS               603
829       BAILEY               532
9766       MOLLY               519
14466      TEDDY               485
2913       CHLOE               465
14779       TOBY               446
8620        LUNA               432
6515        JACK               425
8788      MAGGIE               393
13762     SOPHIE               383
************************************************************
1156        BELLA
9140          MAX
2660      CHARLIE
3251         COCO
12368       ROCKY
8417         LOLA
8552        LUCKY
8560         LUCY
2032        BUDDY
3641        DAISY
11703    PRINCESS
829        BAILEY
9766        MOLLY
14466       TEDDY
2913        CHLOE
14779        TOBY
8620         LUNA
6515         JACK
8788       MAGGIE
13762      SOPHIE
Name: Row_Labels, dtype: object

Pandas之loc

import pandas as pd
import numpy as np

t3 = pd.DataFrame(np.arange(12).reshape(3,4), index=list('abc'),columns=list('WXYZ'))

print(t3)

print('*'*60)

print(t3.loc['a','Z'])  #取某个数值

print('*'*60)

print(t3.loc['a'])  #取a行,生成Series数据

print('*'*60)

print(t3.loc['a',:]) #同取a行,生成Series数据

print('*'*60)

print(t3.loc[:,'Y']) #取Y列

#print(t3.loc['Y'])  #报错
print('*'*60)
print(t3.loc[['a','c'],:]) #取a c行
print('*'*60)
print(t3.loc[:,['W','Z']])
print('*'*60)
print(t3.loc[['a','b'],['W','Z']])
print('*'*60)
print(t3.loc['a':'c',['W','Z']]) # 冒号在loc里面是闭合的 ,可以取到c行的数据

   W  X   Y   Z
a  0  1   2   3
b  4  5   6   7
c  8  9  10  11
************************************************************
3
************************************************************
W    0
X    1
Y    2
Z    3
Name: a, dtype: int32
************************************************************
W    0
X    1
Y    2
Z    3
Name: a, dtype: int32
************************************************************
a     2
b     6
c    10
Name: Y, dtype: int32
************************************************************
   W  X   Y   Z
a  0  1   2   3
c  8  9  10  11
************************************************************
   W   Z
a  0   3
b  4   7
c  8  11
************************************************************
   W  Z
a  0  3
b  4  7
************************************************************
   W   Z
a  0   3
b  4   7
c  8  11

Pandas之iloc

import pandas as pd
import numpy as np

t3 = pd.DataFrame(np.arange(12).reshape(3,4), index=list('abc'),columns=list('WXYZ'))

print(t3)
print('*'*60)
print(t3.iloc[1]) #取第二行
print('*'*60)
print(t3.iloc[0])  #取第一行
print('*'*60)
print(t3.iloc[1,:])#取第二行
print('*'*60)
print(t3.iloc[:,2])#取第三列
print('*'*60)
print(t3.iloc[[0,2],[2,1]])#取第一行和第三行以及第三列和第二列
print('*'*60)
print(t3.iloc[1:,:2])#取第二行以后和第三列之前交叉的数值
print('*'*60)
t3.iloc[1:,:2] =100
print(t3)
print('*'*60)
t3.iloc[1:,:2] =np.nan
print(t3)

   W  X   Y   Z
a  0  1   2   3
b  4  5   6   7
c  8  9  10  11
************************************************************
W    4
X    5
Y    6
Z    7
Name: b, dtype: int32
************************************************************
W    0
X    1
Y    2
Z    3
Name: a, dtype: int32
************************************************************
W    4
X    5
Y    6
Z    7
Name: b, dtype: int32
************************************************************
a     2
b     6
c    10
Name: Y, dtype: int32
************************************************************
    Y  X
a   2  1
c  10  9
************************************************************
   W  X
b  4  5
c  8  9
************************************************************
     W    X   Y   Z
a    0    1   2   3
b  100  100   6   7
c  100  100  10  11
************************************************************
     W    X   Y   Z
a  0.0  1.0   2   3
b  NaN  NaN   6   7
c  NaN  NaN  10  11

Pandas之布尔索引

列出狗狗名字使用次数大于800的数据

import numpy as np
import pandas as pd
df = pd.read_csv('dogNames2.csv')
#DataFrame排序的方法
df=df.sort_values(by='Count_AnimalName',ascending=False)

t1 = print(df[df['Count_AnimalName']>800])
print(t1)

      Row_Labels  Count_AnimalName
1156       BELLA              1195
9140         MAX              1153
2660     CHARLIE               856
3251        COCO               852
12368      ROCKY               823
None

import numpy as np
import pandas as pd
df = pd.read_csv('dogNames2.csv')
#DataFrame排序的方法
df=df.sort_values(by='Count_AnimalName',ascending=False)

t1 = print(df[df['Count_AnimalName']>800])
print(t1)
print('*'*60)

t2 = print(df[(800<df['Count_AnimalName'])&(df['Count_AnimalName']<1000)]) #且
print(t2)
print('*'*60)

t2 = print(df[(800<df['Count_AnimalName'])|(df['Count_AnimalName']<1000)])  #或
print(t2)
print('*'*60)

#列出使用次数超过700并且名字字符串长度大于4
t3 = print(df[(700<df['Count_AnimalName'])&(df['Row_Labels'].str.len()>4)])
print(t3)
print('*'*60)

      Row_Labels  Count_AnimalName
1156       BELLA              1195
9140         MAX              1153
2660     CHARLIE               856
3251        COCO               852
12368      ROCKY               823
None
************************************************************
      Row_Labels  Count_AnimalName
2660     CHARLIE               856
3251        COCO               852
12368      ROCKY               823
None
************************************************************
      Row_Labels  Count_AnimalName
1156       BELLA              1195
9140         MAX              1153
2660     CHARLIE               856
3251        COCO               852
12368      ROCKY               823
...          ...               ...
6881      JJUJJU                 1
6882     JJYODAA                 1
6883         J-K                 1
6884        J-LO                 1
8106       LEELO                 1

[16212 rows x 2 columns]
None
************************************************************
      Row_Labels  Count_AnimalName
1156       BELLA              1195
2660     CHARLIE               856
12368      ROCKY               823
8552       LUCKY               723
None
************************************************************

删除nan数据方法

import pandas as pd
import numpy as np

t3 = pd.DataFrame(np.arange(12).reshape(3,4), index=list('abc'),columns=list('WXYZ'))

print('*'*60)
t3.iloc[1:,:2] =np.nan
print(t3)
print('*'*60)
print(pd.isnull(t3))
print('*'*60)
print(pd.isnull(t3))
print('*'*60)
print(pd.notnull(t3))
print('*'*60)
t4 = t3[pd.notnull(t3['W'])]
print(t4)
print('*'*60)
t5 = t3.dropna(axis=0,how='all') #删除全部为nan的数据
print(t5)
print('*'*60)
t5 = t3.dropna(axis=0,how='any') #删除包含nan的数据
print(t5)
print('*'*60)

t5 = t3.dropna(axis=0,how='any',inplace=True) #删除包含nan的数据
print(t5)
print('*'*60)

************************************************************
     W    X   Y   Z
a  0.0  1.0   2   3
b  NaN  NaN   6   7
c  NaN  NaN  10  11
************************************************************
       W      X      Y      Z
a  False  False  False  False
b   True   True  False  False
c   True   True  False  False
************************************************************
       W      X      Y      Z
a  False  False  False  False
b   True   True  False  False
c   True   True  False  False
************************************************************
       W      X     Y     Z
a   True   True  True  True
b  False  False  True  True
c  False  False  True  True
************************************************************
     W    X  Y  Z
a  0.0  1.0  2  3
************************************************************
     W    X   Y   Z
a  0.0  1.0   2   3
b  NaN  NaN   6   7
c  NaN  NaN  10  11
************************************************************
     W    X  Y  Z
a  0.0  1.0  2  3
************************************************************

修改nan数据方法

import numpy as np
import pandas as pd

d2 = [{'name':'HHVic','age':22,'tel':10010},{'name':'HHVV','tel':10000},{'name':'HHHHV','age':21}]
print(d2)
print('*'*60)
d2 = pd.DataFrame(d2)
print(d2)
print('*'*60)
print(d2.fillna(100))
print('*'*60)
print(d2.fillna(d2.mean())) #自动填充平均值
print('*'*60)
print(d2['age'].fillna(d2['age'].mean()))
print('*'*60)

[{'name': 'HHVic', 'age': 22, 'tel': 10010}, {'name': 'HHVV', 'tel': 10000}, {'name': 'HHHHV', 'age': 21}]
************************************************************
    name   age      tel
0  HHVic  22.0  10010.0
1   HHVV   NaN  10000.0
2  HHHHV  21.0      NaN
************************************************************
    name    age      tel
0  HHVic   22.0  10010.0
1   HHVV  100.0  10000.0
2  HHHHV   21.0    100.0
************************************************************
    name   age      tel
0  HHVic  22.0  10010.0
1   HHVV  21.5  10000.0
2  HHHHV  21.0  10005.0
************************************************************
0    22.0
1    21.5
2    21.0
Name: age, dtype: float64
************************************************************

Pandas案例1

一组从2006年到2016年1000部最流行的电影数据,想知道这些电影数据中心评分的平均分,导演的人数等信息.

import pandas as pd
import numpy as np

file_path='IMDB-Movie-Data.csv'
df = pd.read_csv(file_path)

print(df.info())
print('*'*70)
print(df.head(1))

#获取平均分
print('The average of Rating is :{}'.format(df['Rating'].mean()))

#获取导演的人数
print('The amount of Director is (Round1) : {}'.format(len(set(df['Director'].tolist()))))

#获取导演的人数(第二种方法)
print('The amount of Director is (Round2) : {}'.format(len(df['Director'].unique())))

#获取演员的人数
temp_actors_lsit = df['Actors'].str.split(',').tolist()
actors_list = [i for j in temp_actors_lsit for i in j]
# actors_list = np.array(temp_actors_lsit).flatten().tolist() #方法不可用
actors_num = len(set(actors_list))
print('The amount of Actors is (Round2) : {}'.format(actors_num))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 12 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Rank                1000 non-null   int64  
 1   Title               1000 non-null   object 
 2   Genre               1000 non-null   object 
 3   Description         1000 non-null   object 
 4   Director            1000 non-null   object 
 5   Actors              1000 non-null   object 
 6   Year                1000 non-null   int64  
 7   Runtime (Minutes)   1000 non-null   int64  
 8   Rating              1000 non-null   float64
 9   Votes               1000 non-null   int64  
 10  Revenue (Millions)  872 non-null    float64
 11  Metascore           936 non-null    float64
dtypes: float64(3), int64(4), object(5)
memory usage: 93.9+ KB
None
**********************************************************************
   Rank                    Title  ... Revenue (Millions) Metascore
0     1  Guardians of the Galaxy  ...             333.13      76.0

[1 rows x 12 columns]
The average of Rating is :6.723199999999999
The amount of Director is (Round1) : 644
The amount of Director is (Round2) : 644
The amount of Actors is (Round2) : 2394

Pandas案例2

Rating以及runtime的分布情况,如何呈现.

Runtime分布情况:

import pandas as pd
from matplotlib import pyplot as plt
file_path = 'IMDB-Movie-Data.csv'
pd.read_csv(file_path)

df = pd.read_csv(file_path)

print(df.head(1))
print(df.info())

#Rating分布情况
#选择图形 直方图
#准备数据
runtime_data=df['Runtime (Minutes)'].values
max_runtime = runtime_data.max()
min_runtime = runtime_data.min()

#计算组数
print(max_runtime-min_runtime)
num_bin=(max_runtime-min_runtime)//5

#设置图形大小
plt.figure(figsize=(20,8),dpi=80)
plt.hist(runtime_data,num_bin)

plt.xticks(range(min_runtime,max_runtime+5,5))
plt.show()
plt.savefig('day5-1.png')

在这里插入图片描述
Rating分布情况

runtime_data = np.array([8.1, 7.0, 7.3, 7.2, 6.2, 6.1, 8.3, 6.4, 7.1, 7.0, 7.5, 7.8, 7.9, 7.7, 6.4, 6.6, 8.2, 6.7, 8.1, 8.0, 6.7, 7.9, 6.7, 6.5, 5.3, 6.8, 8.3, 4.7, 6.2, 5.9, 6.3, 7.5, 7.1, 8.0, 5.6, 7.9, 8.6, 7.6, 6.9, 7.1, 6.3, 7.5, 2.7, 7.2, 6.3, 6.7, 7.3, 5.6, 7.1, 3.7, 8.1, 5.8, 5.6, 7.2, 9.0, 7.3, 7.2, 7.4, 7.0, 7.5, 6.7, 6.8, 6.5, 4.1, 8.5, 7.7, 7.4, 8.1, 7.5, 7.2, 5.9, 7.1, 7.5, 6.8, 8.1, 7.1, 8.1, 8.3, 7.3, 5.3, 8.8, 7.9, 8.2, 8.1, 7.2, 7.0, 6.4, 7.8, 7.8, 7.4, 8.1, 7.0, 8.1, 7.1, 7.4, 7.4, 8.6, 5.8, 6.3, 8.5, 7.0, 7.0, 8.0, 7.9, 7.3, 7.7, 5.4, 6.3, 5.8, 7.7, 6.3, 8.1, 6.1, 7.7, 8.1, 5.8, 6.2, 8.8, 7.2, 7.4, 6.7, 6.7, 6.0, 7.4, 8.5, 7.5, 5.7, 6.6, 6.4, 8.0, 7.3, 6.0, 6.4, 8.5, 7.1, 7.3, 8.1, 7.3, 8.1, 7.1, 8.0, 6.2, 7.8, 8.2, 8.4, 8.1, 7.4, 7.6, 7.6, 6.2, 6.4, 7.2, 5.8, 7.6, 8.1, 4.7, 7.0, 7.4, 7.5, 7.9, 6.0, 7.0, 8.0, 6.1, 8.0, 5.2, 6.5, 7.3, 7.3, 6.8, 7.9, 7.9, 5.2, 8.0, 7.5, 6.5, 7.6, 7.0, 7.4, 7.3, 6.7, 6.8, 7.0, 5.9, 8.0, 6.0, 6.3, 6.6, 7.8, 6.3, 7.2, 5.6, 8.1, 5.8, 8.2, 6.9, 6.3, 8.1, 8.1, 6.3, 7.9, 6.5, 7.3, 7.9, 5.7, 7.8, 7.5, 7.5, 6.8, 6.7, 6.1, 5.3, 7.1, 5.8, 7.0, 5.5, 7.8, 5.7, 6.1, 7.7, 6.7, 7.1, 6.9, 7.8, 7.0, 7.0, 7.1, 6.4, 7.0, 4.8, 8.2, 5.2, 7.8, 7.4, 6.1, 8.0, 6.8, 3.9, 8.1, 5.9, 7.6, 8.2, 5.8, 6.5, 5.9, 7.6, 7.9, 7.4, 7.1, 8.6, 4.9, 7.3, 7.9, 6.7, 7.5, 7.8, 5.8, 7.6, 6.4, 7.1, 7.8, 8.0, 6.2, 7.0, 6.0, 4.9, 6.0, 7.5, 6.7, 3.7, 7.8, 7.9, 7.2, 8.0, 6.8, 7.0, 7.1, 7.7, 7.0, 7.2, 7.3, 7.6, 7.1, 7.0, 6.0, 6.1, 5.8, 5.3, 5.8, 6.1, 7.5, 7.2, 5.7, 7.7, 7.1, 6.6, 5.7, 6.8, 7.1, 8.1, 7.2, 7.5, 7.0, 5.5, 6.4, 6.7, 6.2, 5.5, 6.0, 6.1, 7.7, 7.8, 6.8, 7.4, 7.5, 7.0, 5.2, 5.3, 6.2, 7.3, 6.5, 6.4, 7.3, 6.7, 7.7, 6.0, 6.0, 7.4, 7.0, 5.4, 6.9, 7.3, 8.0, 7.4, 8.1, 6.1, 7.8, 5.9, 7.8, 6.5, 6.6, 7.4, 6.4, 6.8, 6.2, 5.8, 7.7, 7.3, 5.1, 7.7, 7.3, 6.6, 7.1, 6.7, 6.3, 5.5, 7.4, 7.7, 6.6, 7.8, 6.9, 5.7, 7.8, 7.7, 6.3, 8.0, 5.5, 6.9, 7.0, 5.7, 6.0, 6.8, 6.3, 6.7, 6.9, 5.7, 6.9, 7.6, 7.1, 6.1, 7.6, 7.4, 6.6, 7.6, 7.8, 7.1, 5.6, 6.7, 6.7, 6.6, 6.3, 5.8, 7.2, 5.0, 5.4, 7.2, 6.8, 5.5, 6.0, 6.1, 6.4, 3.9, 7.1, 7.7, 6.7, 6.7, 7.4, 7.8, 6.6, 6.1, 7.8, 6.5, 7.3, 7.2, 5.6, 5.4, 6.9, 7.8, 7.7, 7.2, 6.8, 5.7, 5.8, 6.2, 5.9, 7.8, 6.5, 8.1, 5.2, 6.0, 8.4, 4.7, 7.0, 7.4, 6.4, 7.1, 7.1, 7.6, 6.6, 5.6, 6.3, 7.5, 7.7, 7.4, 6.0, 6.6, 7.1, 7.9, 7.8, 5.9, 7.0, 7.0, 6.8, 6.5, 6.1, 8.3, 6.7, 6.0, 6.4, 7.3, 7.6, 6.0, 6.6, 7.5, 6.3, 7.5, 6.4, 6.9, 8.0, 6.7, 7.8, 6.4, 5.8, 7.5, 7.7, 7.4, 8.5, 5.7, 8.3, 6.7, 7.2, 6.5, 6.3, 7.7, 6.3, 7.8, 6.7, 6.7, 6.6, 8.0, 6.5, 6.9, 7.0, 5.3, 6.3, 7.2, 6.8, 7.1, 7.4, 8.3, 6.3, 7.2, 6.5, 7.3, 7.9, 5.7, 6.5, 7.7, 4.3, 7.8, 7.8, 7.2, 5.0, 7.1, 5.7, 7.1, 6.0, 6.9, 7.9, 6.2, 7.2, 5.3, 4.7, 6.6, 7.0, 3.9, 6.6, 5.4, 6.4, 6.7, 6.9, 5.4, 7.0, 6.4, 7.2, 6.5, 7.0, 5.7, 7.3, 6.1, 7.2, 7.4, 6.3, 7.1, 5.7, 6.7, 6.8, 6.5, 6.8, 7.9, 5.8, 7.1, 4.3, 6.3, 7.1, 4.6, 7.1, 6.3, 6.9, 6.6, 6.5, 6.5, 6.8, 7.8, 6.1, 5.8, 6.3, 7.5, 6.1, 6.5, 6.0, 7.1, 7.1, 7.8, 6.8, 5.8, 6.8, 6.8, 7.6, 6.3, 4.9, 4.2, 5.1, 5.7, 7.6, 5.2, 7.2, 6.0, 7.3, 7.2, 7.8, 6.2, 7.1, 6.4, 6.1, 7.2, 6.6, 6.2, 7.9, 7.3, 6.7, 6.4, 6.4, 7.2, 5.1, 7.4, 7.2, 6.9, 8.1, 7.0, 6.2, 7.6, 6.7, 7.5, 6.6, 6.3, 4.0, 6.9, 6.3, 7.3, 7.3, 6.4, 6.6, 5.6, 6.0, 6.3, 6.7, 6.0, 6.1, 6.2, 6.7, 6.6, 7.0, 4.9, 8.4, 7.0, 7.5, 7.3, 5.6, 6.7, 8.0, 8.1, 4.8, 7.5, 5.5, 8.2, 6.6, 3.2, 5.3, 5.6, 7.4, 6.4, 6.8, 6.7, 6.4, 7.0, 7.9, 5.9, 7.7, 6.7, 7.0, 6.9, 7.7, 6.6, 7.1, 6.6, 5.7, 6.3, 6.5, 8.0, 6.1, 6.5, 7.6, 5.6, 5.9, 7.2, 6.7, 7.2, 6.5, 7.2, 6.7, 7.5, 6.5, 5.9, 7.7, 8.0, 7.6, 6.1, 8.3, 7.1, 5.4, 7.8, 6.5, 5.5, 7.9, 8.1, 6.1, 7.3, 7.2, 5.5, 6.5, 7.0, 7.1, 6.6, 6.5, 5.8, 7.1, 6.5, 7.4, 6.2, 6.0, 7.6, 7.3, 8.2, 5.8, 6.5, 6.6, 6.2, 5.8, 6.4, 6.7, 7.1, 6.0, 5.1, 6.2, 6.2, 6.6, 7.6, 6.8, 6.7, 6.3, 7.0, 6.9, 6.6, 7.7, 7.5, 5.6, 7.1, 5.7, 5.2, 5.4, 6.6, 8.2, 7.6, 6.2, 6.1, 4.6, 5.7, 6.1, 5.9, 7.2, 6.5, 7.9, 6.3, 5.0, 7.3, 5.2, 6.6, 5.2, 7.8, 7.5, 7.3, 7.3, 6.6, 5.7, 8.2, 6.7, 6.2, 6.3, 5.7, 6.6, 4.5, 8.1, 5.6, 7.3, 6.2, 5.1, 4.7, 4.8, 7.2, 6.9, 6.5, 7.3, 6.5, 6.9, 7.8, 6.8, 4.6, 6.7, 6.4, 6.0, 6.3, 6.6, 7.8, 6.6, 6.2, 7.3, 7.4, 6.5, 7.0, 4.3, 7.2, 6.2, 6.2, 6.8, 6.0, 6.6, 7.1, 6.8, 5.2, 6.7, 6.2, 7.0, 6.3, 7.8, 7.6, 5.4, 7.6, 5.4, 4.6, 6.9, 6.8, 5.8, 7.0, 5.8, 5.3, 4.6, 5.3, 7.6, 1.9, 7.2, 6.4, 7.4, 5.7, 6.4, 6.3, 7.5, 5.5, 4.2, 7.8, 6.3, 6.4, 7.1, 7.1, 6.8, 7.3, 6.7, 7.8, 6.3, 7.5, 6.8, 7.4, 6.8, 7.1, 7.6, 5.9, 6.6, 7.5, 6.4, 7.8, 7.2, 8.4, 6.2, 7.1, 6.3, 6.5, 6.9, 6.9, 6.6, 6.9, 7.7, 2.7, 5.4, 7.0, 6.6, 7.0, 6.9, 7.3, 5.8, 5.8, 6.9, 7.5, 6.3, 6.9, 6.1, 7.5, 6.8, 6.5, 5.5, 7.7, 3.5, 6.2, 7.1, 5.5, 7.1, 7.1, 7.1, 7.9, 6.5, 5.5, 6.5, 5.6, 6.8, 7.9, 6.2, 6.2, 6.7, 6.9, 6.5, 6.6, 6.4, 4.7, 7.2, 7.2, 6.7, 7.5, 6.6, 6.7, 7.5, 6.1, 6.4, 6.3, 6.4, 6.8, 6.1, 4.9, 7.3, 5.9, 6.1, 7.1, 5.9, 6.8, 5.4, 6.3, 6.2, 6.6, 4.4, 6.8, 7.3, 7.4, 6.1, 4.9, 5.8, 6.1, 6.4, 6.9, 7.2, 5.6, 4.9, 6.1, 7.8, 7.3, 4.3, 7.2, 6.4, 6.2, 5.2, 7.7, 6.2, 7.8, 7.0, 5.9, 6.7, 6.3, 6.9, 7.0, 6.7, 7.3, 3.5, 6.5, 4.8, 6.9, 5.9, 6.2, 7.4, 6.0, 6.2, 5.0, 7.0, 7.6, 7.0, 5.3, 7.4, 6.5, 6.8, 5.6, 5.9, 6.3, 7.1, 7.5, 6.6, 8.5, 6.3, 5.9, 6.7, 6.2, 5.5, 6.2, 5.6, 5.3])
max_runtime = runtime_data.max()
min_runtime = runtime_data.min()
print(min_runtime,max_runtime)

#设置不等宽的组距，hist方法中取到的会是一个左闭右开的去见[1.9,3.5)
num_bin_list = [1.9,3.5]
i=3.5
while i<=max_runtime:
    i += 0.5
    num_bin_list.append(i)
print(num_bin_list)

#设置图形的大小
plt.figure(figsize=(20,8),dpi=80)
plt.hist(runtime_data,num_bin_list)

#xticks让之前的组距能够对应上
plt.xticks(num_bin_list)
plt.savefig('day5-2.png')
plt.show()

在这里插入图片描述

HHVic

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Data Analysis - Day5 - Pandas

文章目录Pandas 数据类型Pandas 数据类型Series 一维, 带标签数组DataFrame 二维, Series容器import pandas as pdt1 = pd.Series([1,2,31,12,3,4])print(t1)0 11 22 313 124 35 4dtype: int64
复制链接

扫一扫