data 观察日期 Average Price 平均价格 type 类型,常规或有机 year 年份 region 观察的城市 Total Volume 销售总数 Total/Small/Large/XLarge Bags 四种不同包装规格的牛油果销售量 4046/4225/4770 三种不同种类的牛油果销售量
描述性分析
import pandas as pd
avo=pd.read_csv("E:/Python/avocado.csv")
avo.head()
Unnamed: 0
Date
AveragePrice
Total Volume
4046
4225
4770
Total Bags
Small Bags
Large Bags
XLarge Bags
type
year
region
0
0
2015-12-27
1.33
64236.62
1036.74
54454.85
48.16
8696.87
8603.62
93.25
0.0
conventional
2015
Albany
1
1
2015-12-20
1.35
54876.98
674.28
44638.81
58.33
9505.56
9408.07
97.49
0.0
conventional
2015
Albany
2
2
2015-12-13
0.93
118220.22
794.70
109149.67
130.50
8145.35
8042.21
103.14
0.0
conventional
2015
Albany
3
3
2015-12-06
1.08
78992.15
1132.00
71976.41
72.58
5811.16
5677.40
133.76
0.0
conventional
2015
Albany
4
4
2015-11-29
1.28
51039.60
941.48
43838.39
75.78
6183.95
5986.26
197.69
0.0
conventional
2015
Albany
avo.describe()
Unnamed: 0
AveragePrice
Total Volume
4046
4225
4770
Total Bags
Small Bags
Large Bags
XLarge Bags
year
count
18249.000000
18249.000000
1.824900e+04
1.824900e+04
1.824900e+04
1.824900e+04
1.824900e+04
1.824900e+04
1.824900e+04
18249.000000
18249.000000
mean
24.232232
1.405978
8.506440e+05
2.930084e+05
2.951546e+05
2.283974e+04
2.396392e+05
1.821947e+05
5.433809e+04
3106.426507
2016.147899
std
15.481045
0.402677
3.453545e+06
1.264989e+06
1.204120e+06
1.074641e+05
9.862424e+05
7.461785e+05
2.439660e+05
17692.894652
0.939938
min
0.000000
0.440000
8.456000e+01
0.000000e+00
0.000000e+00
0.000000e+00
0.000000e+00
0.000000e+00
0.000000e+00
0.000000
2015.000000
25%
10.000000
1.100000
1.083858e+04
8.540700e+02
3.008780e+03
0.000000e+00
5.088640e+03
2.849420e+03
1.274700e+02
0.000000
2015.000000
50%
24.000000
1.370000
1.073768e+05
8.645300e+03
2.906102e+04
1.849900e+02
3.974383e+04
2.636282e+04
2.647710e+03
0.000000
2016.000000
75%
38.000000
1.660000
4.329623e+05
1.110202e+05
1.502069e+05
6.243420e+03
1.107834e+05
8.333767e+04
2.202925e+04
132.500000
2017.000000
max
52.000000
3.250000
6.250565e+07
2.274362e+07
2.047057e+07
2.546439e+06
1.937313e+07
1.338459e+07
5.719097e+06
551693.650000
2018.000000
avo.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18249 entries, 0 to 18248
Data columns (total 14 columns):
Unnamed: 0 18249 non-null int64
Date 18249 non-null object
AveragePrice 18249 non-null float64
Total Volume 18249 non-null float64
4046 18249 non-null float64
4225 18249 non-null float64
4770 18249 non-null float64
Total Bags 18249 non-null float64
Small Bags 18249 non-null float64
Large Bags 18249 non-null float64
XLarge Bags 18249 non-null float64
type 18249 non-null object
year 18249 non-null int64
region 18249 non-null object
dtypes: float64(9), int64(2), object(3)
memory usage: 1.9+ MB
avo_corr=avo.corr().T
avo_corr
Unnamed: 0
AveragePrice
Total Volume
4046
4225
4770
Total Bags
Small Bags
Large Bags
XLarge Bags
year
Unnamed: 0
1.000000
-0.133008
0.014035
0.017628
0.019829
0.041752
-0.002219
0.000347
-0.009196
-0.011546
-0.171667
AveragePrice
-0.133008
1.000000
-0.192752
-0.208317
-0.172928
-0.179446
-0.177088
-0.174730
-0.172940
-0.117592
0.093197
Total Volume
0.014035
-0.192752
1.000000
0.977863
0.974181
0.872202
0.963047
0.967238
0.880640
0.747157
0.017193
4046
0.017628
-0.208317
0.977863
1.000000
0.926110
0.833389
0.920057
0.925280
0.838645
0.699377
0.003353
4225
0.019829
-0.172928
0.974181
0.926110
1.000000
0.887855
0.905787
0.916031
0.810015
0.688809
-0.009559
4770
0.041752
-0.179446
0.872202
0.833389
0.887855
1.000000
0.792314
0.802733
0.698471
0.679861
-0.036531
Total Bags
-0.002219
-0.177088
0.963047
0.920057
0.905787
0.792314
1.000000
0.994335
0.943009
0.804233
0.071552
Small Bags
0.000347
-0.174730
0.967238
0.925280
0.916031
0.802733
0.994335
1.000000
0.902589
0.806845
0.063915
Large Bags
-0.009196
-0.172940
0.880640
0.838645
0.810015
0.698471
0.943009
0.902589
1.000000
0.710858
0.087891
XLarge Bags
-0.011546
-0.117592
0.747157
0.699377
0.688809
0.679861
0.804233
0.806845
0.710858
1.000000
0.081033
year
-0.171667
0.093197
0.017193
0.003353
-0.009559
-0.036531
0.071552
0.063915
0.087891
0.081033
1.000000
import matplotlib.pyplot as plt
import seaborn as sns
plt.subplots(figsize=(15,12))
sns.heatmap(avo_corr,vmax=1,cmap="Blues",square=True)