项目背景
随着电池技术进步和产业化推广,我国新能源汽车产业已进入蓬勃发展的快车道,各级政府先后发布政策持续支持新能源汽车技术和产业发展,全球车企对新能源汽车发展和应用也都充满热情,不断进行探索和试验。相较于传统汽车,新能源汽车电气化、智能化、网联化、共享化程度更高,可采集的数据更丰富,可以支持多方面、深层次的数据分析需求。
与此同时,在新一轮信息技术变革趋势下,车联网及大数据技术的应用为新能源汽车数据采集、运行分析、电池管理等领域带来了新的发展引擎和动能。
本项目拟对上海市新能源汽车公共数据采集与监测研究中心提供的新能源汽车运行数据展开分析,希望可以找到影响新能源汽车电池状态以及能耗的重要因素,通过用户的驾驶行为判断其使用风险等。
数据说明
数据集分为2个csv文件,其中:
-
SHEVDC_OV6N7709.csv为纯电汽车的运行数据
-
SHEVDC_0C023H25.csv为混动汽车的运行数据
各字段释义如下:
数据采集频率为每10s一次。
一、数据导入及预处理
1 数据导入
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#电动汽车数据
data_electric = pd.read_csv('SHEVDC_OV6N7709.csv')
data_electric.head()
time | vehiclestatus | chargestatus | runmodel | speed | summileage | sumvoltage | sumcurrent | soc | dcdcstatus | ... | max_cell_volt | min_volt_num | min_volt_cell_id | min_cell_volt | max_temp_num | max_temp_probe_id | max_temp | min_temp_num | min_temp_probe_id | min_temp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2019-01-10 01:12:00 | 1 | 4 | 1 | 0.0 | 39938.0 | 397.5 | 0.4 | 100 | 1 | ... | 4.147 | 1 | 75 | 4.137 | 1 | 7 | 6 | 1 | 24 | 5 |
1 | 2019-01-10 01:12:10 | 1 | 4 | 1 | 0.0 | 39938.0 | 397.5 | 0.4 | 100 | 1 | ... | 4.147 | 1 | 75 | 4.137 | 1 | 7 | 6 | 1 | 24 | 5 |
2 | 2019-01-10 01:12:20 | 1 | 4 | 1 | 0.0 | 39938.0 | 397.5 | 0.4 | 100 | 1 | ... | 4.147 | 1 | 75 | 4.137 | 1 | 7 | 6 | 1 | 24 | 5 |
3 | 2019-01-10 01:12:30 | 1 | 4 | 1 | 0.0 | 39938.0 | 397.5 | 0.4 | 100 | 1 | ... | 4.147 | 1 | 75 | 4.137 | 1 | 7 | 6 | 1 | 24 | 5 |
4 | 2019-01-10 01:12:40 | 1 | 4 | 1 | 0.0 | 39938.0 | 397.5 | 0.4 | 100 | 1 | ... | 4.147 | 1 | 75 | 4.137 | 1 | 7 | 6 | 1 | 24 | 5 |
5 rows × 24 columns
#混动汽车数据
data_hybrid = pd.read_csv('SHEVDC_0C023H25.csv')
data_hybrid.head()
time | vehiclestatus | chargestatus | runmodel | speed | summileage | sumvoltage | sumcurrent | soc | dcdcstatus | ... | max_cell_volt | min_volt_num | min_volt_cell_id | min_cell_volt | max_temp_num | max_temp_probe_id | max_temp | min_temp_num | min_temp_probe_id | min_temp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2019-01-06 15:36:27 | 1 | 3 | 1 | 79.7 | 69788.0 | 361.2 | 10.4 | 73 | 1 | ... | 3.769 | 1 | 96 | 3.761 | 1 | 3 | 25 | 1 | 6 | 23 |
1 | 2019-01-06 15:36:37 | 1 | 3 | 1 | 78.6 | 69789.0 | 360.0 | 13.1 | 72 | 1 | ... | 3.753 | 1 | 96 | 3.743 | 1 | 3 | 25 | 1 | 6 | 23 |
2 | 2019-01-06 15:36:47 | 1 | 3 | 1 | 74.2 | 69789.0 | 361.2 | 9.5 | 72 | 1 | ... | 3.765 | 1 | 96 | 3.757 | 1 | 3 | 25 | 1 | 6 | 23 |
3 | 2019-01-06 15:36:57 | 1 | 3 | 1 | 81.8 | 69789.0 | 350.5 | 63.9 | 72 | 1 | ... | 3.663 | 1 | 96 | 3.645 | 1 | 3 | 25 | 1 | 6 | 23 |
4 | 2019-01-06 15:37:07 | 1 | 3 | 1 | 74.1 | 69789.0 | 361.2 | 3.4 | 71 | 1 | ... | 3.789 | 1 | 96 | 3.782 | 1 | 3 | 25 | 1 | 6 | 23 |
5 rows × 27 columns
2 数据检查
2.1 是否包含空值
#电动汽车
data_electric.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6231 entries, 0 to 6230
Data columns (total 24 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time 6231 non-null object
1 vehiclestatus 6231 non-null int64
2 chargestatus 6231 non-null int64
3 runmodel 6231 non-null int64
4 speed 6231 non-null float64
5 summileage 6231 non-null object
6 sumvoltage 6231 non-null float64
7 sumcurrent 6231 non-null float64
8 soc 6231 non-null int64
9 dcdcstatus 6231 non-null int64
10 gearnum 6231 non-null int64
11 insulationresistance 6231 non-null int64
12 max_volt_num 6231 non-null int64
13 max_volt_cell_id 6231 non-null int64
14 max_cell_volt 6231 non-null float64
15 min_volt_num 6231 non-null int64
16 min_volt_cell_id 6231 non-null int64
17 min_cell_volt 6231 non-null float64
18 max_temp_num 6231 non-null int64
19 max_temp_probe_id 6231 non-null int64
20 max_temp 6231 non-null int64
21 min_temp_num 6231 non-null int64
22 min_temp_probe_id 6231 non-null int64
23 min_temp 6231 non-null int64
dtypes: float64(5), int64(17), object(2)
memory usage: 1.1+ MB
电动车运行数据共6231条,不含空值,但summileage字段数据类型为object,将它转化为float64方便接下来的分析。
#summileage字段转化为float64类型
data_electric['summileage'] = pd.to_numeric(data_electric['summileage'],errors='coerce')
#向下填充值
data_electric['summileage']=data_electric['summileage'].fillna(method='ffill')
data_electric['summileage']
0 39938.0
1 39938.0
2 39938.0
3 39938.0
4 39938.0
...
6226 40152.0
6227 40152.0
6228 40152.0
6229 40152.0
6230 40152.0
Name: summileage, Length: 6231, dtype: float64
#混动汽车
data_hybrid.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3121 entries, 0 to 3120
Data columns (total 27 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time 3121 non-null object
1 vehiclestatus 3121 non-null int64
2 chargestatus 3121 non-null int64
3 runmodel 3121 non-null int64
4 speed 3121 non-null float64
5 summileage 3121 non-null float64
6 sumvoltage 3121 non-null float64
7 sumcurrent 3121 non-null float64
8 soc 3121 non-null int64
9 dcdcstatus 3121 non-null int64
10 gearnum 3121 non-null int64
11 insulationresistance 3121 non-null int64
12 enginestatus 1689 non-null float64
13 grankshaftspeed 1689 non-null float64
14 enginefuelconsumptionrate 1689 non-null float64
15 max_volt_num 3121 non-null int64
16 max_volt_cell_id 3121 non-null int64
17 max_cell_volt 3121 non-null float64
18 min_volt_num 3121 non-null int64
19 min_volt_cell_id 3121 non-null int64
20 min_cell_volt 3121 non-null float64
21 max_temp_num 3121 non-null int64
22 max_temp_probe_id 3121 non-null int64
23 max_temp 3121 non-null int64
24 min_temp_num 3121 non-null int64
25 min_temp_probe_id 3121 non-null int64
26 min_temp 3121 non-null int64
dtypes: float64(9), int64(17), object(1)
memory usage: 658.5+ KB
混动汽车运行数据共3231条,其中出现了enginestatus/grankshaftspeed/enginefuelconsumptionrate三个字段存在部分空值的情形,从前面的数据说明中我们了解到这三个字段是描述发动机状态的,既当混动汽车采取电动模式运行时这部分字段为空,是合理的,此处无需特殊处理。
2.2数据采集时间
#电动汽车
print("最早时间:",data_electric['time'].min())
print("最晚时间:",data_electric['time'].max())
最早时间: 2019-01-10 01:12:00
最晚时间: 2019-01-11 12:16:18
#混动汽车
print("最早时间:",data_hybrid['time'].min())
print("最晚时间:",data_hybrid['time'].max())
最早时间: 2019-01-06 15:36:27
最晚时间: 2019-01-07 00:31:28
2.3统计性描述
#电动汽车
data_electric.describe()
vehiclestatus | chargestatus | runmodel | speed | summileage | sumvoltage | sumcurrent | soc | dcdcstatus | gearnum | ... | max_cell_volt | min_volt_num | min_volt_cell_id | min_cell_volt | max_temp_num | max_temp_probe_id | max_temp | min_temp_num | min_temp_probe_id | min_temp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 6231.000000 | 6231.000000 | 6231.0 | 6231.000000 | 6231.000000 | 6231.000000 | 6231.000000 | 6231.000000 | 6231.000000 | 6231.000000 | ... | 6231.00000 | 6231.0 | 6231.000000 | 6231.000000 | 6231.0 | 6231.000000 | 6231.000000 | 6231.0 | 6231.000000 | 6231.000000 |
mean | 1.700690 | 1.563633 | 1.0 | 10.126705 | 40083.661692 | 363.703418 | 0.943861 | 58.357407 | 1.686888 | 4.420960 | ... | 3.79234 | 1.0 | 70.811266 | 3.782679 | 1.0 | 19.564275 | 10.004815 | 1.0 | 21.043332 | 8.783983 |
std | 0.457993 | 0.889976 | 0.0 | 21.666992 | 49.314093 | 17.014157 | 26.379983 | 26.393123 | 0.463797 | 6.556641 | ... | 0.17684 | 0.0 | 15.432097 | 0.176314 | 0.0 | 3.622739 | 1.322351 | 0.0 | 6.613793 | 1.089954 |
min | 1.000000 | 1.000000 | 1.0 | 0.000000 | 39938.000000 | 322.200000 | -113.100000 | 7.000000 | 1.000000 | 0.000000 | ... | 3.38200 | 1.0 | 6.000000 | 3.346000 | 1.0 | 7.000000 | 6.000000 | 1.0 | 2.000000 | 5.000000 |
25% | 1.000000 | 1.000000 | 1.0 | 0.000000 | 40091.000000 | 348.500000 | -9.200000 | 35.000000 | 1.000000 | 0.000000 | ... | 3.63200 | 1.0 | 75.000000 | 3.627000 | 1.0 | 17.000000 | 10.000000 | 1.0 | 24.000000 | 9.000000 |
50% | 2.000000 | 1.000000 | 1.0 | 0.000000 | 40091.000000 | 363.000000 | -8.800000 | 64.000000 | 2.000000 | 0.000000 | ... | 3.78600 | 1.0 | 75.000000 | 3.773000 | 1.0 | 19.000000 | 10.000000 | 1.0 | 24.000000 | 9.000000 |
75% | 2.000000 | 3.000000 | 1.0 | 0.000000 | 40091.000000 | 377.700000 | 0.800000 | 80.000000 | 2.000000 | 14.000000 | ... | 3.93800 | 1.0 | 75.000000 | 3.929000 | 1.0 | 23.000000 | 11.000000 | 1.0 | 24.000000 | 9.000000 |
max | 2.000000 | 4.000000 | 1.0 | 100.700000 | 40152.000000 | 397.500000 | 240.300000 | 100.000000 | 2.000000 | 15.000000 | ... | 4.14700 | 1.0 | 96.000000 | 4.137000 | 1.0 | 24.000000 | 13.000000 | 1.0 | 24.000000 | 11.000000 |
8 rows × 23 columns
以上可见:
·该电动汽车的行驶速度最大为100.7km/h,累计里程从39938km增长为40152km(共行驶214km)
·行驶过程中的总电压在322.2V~397.5V之间变化,总电流在-113.1A~240.3A之间变化
·SOC(剩余电量)最小为7%,最大为100%,平均电量为58%
·电池单体电压在3.35V~4.15V之间变化,电池温度在5~13℃之间变化
#混动汽车
data_hybrid.describe()
vehiclestatus | chargestatus | runmodel | speed | summileage | sumvoltage | sumcurrent | soc | dcdcstatus | gearnum | ... | max_cell_volt | min_volt_num | min_volt_cell_id | min_cell_volt | max_temp_num | max_temp_probe_id | max_temp | min_temp_num | min_temp_probe_id | min_temp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 3121.000000 | 3121.000000 | 3121.000000 | 3121.000000 | 3121.000000 | 3121.000000 | 3121.000000 | 3121.000000 | 3121.000000 | 3121.000000 | ... | 3121.000000 | 3121.0 | 3121.000000 | 3121.000000 | 3121.0 | 3121.000000 | 3121.000000 | 3121.0 | 3121.000000 | 3121.000000 |
mean | 1.458827 | 1.947453 | 1.736302 | 13.635854 | 69853.980455 | 361.539250 | -0.641974 | 57.543095 | 1.458827 | 7.719321 | ... | 3.767752 | 1.0 | 61.212752 | 3.760058 | 1.0 | 3.423903 | 27.523871 | 1.0 | 6.462352 | 25.536367 |
std | 0.498382 | 0.928775 | 0.454316 | 23.723204 | 42.582988 | 15.284665 | 17.985884 | 26.051087 | 0.498382 | 7.084430 | ... | 0.158410 | 0.0 | 26.075302 | 0.159284 | 0.0 | 14.340370 | 1.563247 | 0.0 | 14.126442 | 1.505558 |
min | 1.000000 | 1.000000 | 1.000000 | 0.000000 | 69788.000000 | 330.000000 | -108.000000 | 18.000000 | 1.000000 | 0.000000 | ... | 3.450000 | 1.0 | 6.000000 | 3.417000 | 1.0 | 2.000000 | 24.000000 | 1.0 | 5.000000 | 22.000000 |
25% | 1.000000 | 1.000000 | 1.000000 | 0.000000 | 69822.000000 | 348.200000 | -8.300000 | 32.000000 | 1.000000 | 0.000000 | ... | 3.630000 | 1.0 | 39.000000 | 3.621000 | 1.0 | 2.000000 | 27.000000 | 1.0 | 5.000000 | 25.000000 |
50% | 1.000000 | 2.000000 | 2.000000 | 0.000000 | 69833.000000 | 359.500000 | -5.700000 | 60.000000 | 1.000000 | 14.000000 | ... | 3.744000 | 1.0 | 68.000000 | 3.740000 | 1.0 | 2.000000 | 28.000000 | 1.0 | 5.000000 | 26.000000 |
75% | 2.000000 | 3.000000 | 2.000000 | 22.600000 | 69909.000000 | 374.700000 | 1.600000 | 81.000000 | 2.000000 | 14.000000 | ... | 3.903000 | 1.0 | 71.000000 | 3.898000 | 1.0 | 2.000000 | 28.000000 | 1.0 | 7.000000 | 26.000000 |
max | 2.000000 | 3.000000 | 3.000000 | 103.600000 | 69909.000000 | 390.200000 | 105.100000 | 100.000000 | 2.000000 | 15.000000 | ... | 4.065000 | 1.0 | 255.000000 | 4.056000 | 1.0 | 255.000000 | 33.000000 | 1.0 | 255.000000 | 31.000000 |
8 rows × 26 columns
以上可见:
·该混动汽车的最大行驶速度为103.6km/h,累计里程由69788km增长为69909km(共行驶121km)
·行驶过程中的总电压在330.0V~390.2V之间变化,总电流在-108.0A~105.1A之间变化(总电流最大值明显低于电动汽车)
·SOC(剩余电量)最小为18%,最大为100%,平均为57.5%
·电池单体电压在3.42V~4.07V之间变化(变化幅度小于电动汽车),电池温度在22~33℃之间变化(明显高于电动汽车)
3 数据预处理
由于数据采集频率为每10s一次,间隔过小,不利于后续分析,因此我们对time字段只取小时,对每个小时内的数据取平均值即可。
#电动汽车
def hour(time):
return time[5:13]
data_electric['time'] = data_electric['time'].apply(hour)
electric_group = data_electric.groupby('time').mean()
electric_group.head()
vehiclestatus | chargestatus | runmodel | speed | summileage | sumvoltage | sumcurrent | soc | dcdcstatus | gearnum | ... | max_cell_volt | min_volt_num | min_volt_cell_id | min_cell_volt | max_temp_num | max_temp_probe_id | max_temp | min_temp_num | min_temp_probe_id | min_temp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
time | |||||||||||||||||||||
01-10 01 | 1.000000 | 3.093190 | 1.0 | 14.955914 | 39940.243728 | 392.696416 | 12.277419 | 97.379928 | 1.00000 | 14.301075 | ... | 4.097606 | 1.0 | 61.620072 | 4.083305 | 1.0 | 18.673835 | 6.000000 | 1.0 | 23.261649 | 5.103943 |
01-10 02 | 1.000000 | 2.864865 | 1.0 | 42.981081 | 39974.094595 | 371.701351 |