1.下载汽车油耗数据集并解压
下载地址:https://www.fueleconomy.gov/feg/download.shtml
vehiclesData.py:
#encoding = utf-8 import pandas as pd import numpy as np from ggplot import * import matplotlib.pyplot as plt vehicles = pd.read_csv("../data/vehicles.csv") print(vehicles.head()) print(len(vehicles))#- 查看有多少观测点(行)和多少变量(列)运行结果:
barrels08 barrelsA08 charge120 charge240 city08 city08U cityA08 \
0 15.695714 0.0 0.0 0.0 19 0.0 0
1 29.964545 0.0 0.0 0.0 9 0.0 0
2 12.207778 0.0 0.0 0.0 23 0.0 0
3 29.964545 0.0 0.0 0.0 10 0.0 0
4 17.347895 0.0 0.0 0.0 17 0.0 0
cityA08U cityCD cityE ... mfrCode c240Dscr charge240b \
0 0.0 0.0 0.0 ... NaN NaN 0.0
1 0.0 0.0 0.0 ... NaN NaN 0.0
2 0.0 0.0 0.0 ... NaN NaN 0.0
3 0.0 0.0 0.0 ... NaN NaN 0.0
4 0.0 0.0 0.0 ... NaN NaN 0.0
c240bDscr createdOn modifiedOn \
0 NaN Tue Jan 01 00:00:00 EST 2013 Tue Jan 01 00:00:00 EST 2013
1 NaN Tue Jan 01 00:00:00 EST 2013 Tue Jan 01 00:00:00 EST 2013
2 NaN Tue Jan 01 00:00:00 EST 2013 Tue Jan 01 00:00:00 EST 2013
3 NaN Tue Jan 01 00:00:00 EST 2013 Tue Jan 01 00:00:00 EST 2013
4 NaN Tue Jan 01 00:00:00 EST 2013 Tue Jan 01 00:00:00 EST 2013
startStop phevCity phevHwy phevComb
0 NaN 0 0 0
1 NaN 0 0 0
2 NaN 0 0 0
3 NaN 0 0 0
4 NaN 0 0 0
[5 rows x 83 columns]
39101
其中 pandas中Data Frame类的边界方法head,查看一个很有用的数据框data frame的中,包括每列的非空值数量和各列不同的数据类型的数量。
描述汽车油耗等数据
print(len(vehicles.columns)) print(vehicles.columns)83
Index(['barrels08', 'barrelsA08', 'charge120', 'charge240', 'city08',
'city08U', 'cityA08', 'cityA08U', 'cityCD', 'cityE', 'cityUF', 'co2',
'co2A', 'co2TailpipeAGpm', 'co2TailpipeGpm', 'comb08', 'comb08U',
'combA08', 'combA08U', 'combE', 'combinedCD', 'combinedUF', 'cylinders',
'displ', 'drive', 'engId', 'eng_dscr', 'feScore', 'fuelCost08',
'fuelCostA08', 'fuelType', 'fuelType1', 'ghgScore', 'ghgScoreA',
'highway08', 'highway08U', 'highwayA08', 'highwayA08U', 'highwayCD',
'highwayE', 'highwayUF', 'hlv', 'hpv', 'id', 'lv2', 'lv4', 'make',
'model', 'mpgData', 'phevBlended', 'pv2', 'pv4', 'range', 'rangeCity',
'rangeCityA', 'rangeHwy', 'rangeHwyA', 'trany', 'UCity', 'UCityA',
'UHighway', 'UHighwayA', 'VClass', 'year', 'youSaveSpend', 'guzzler',
&