车辆销售数据分析可视化实战

1.引言

“车辆销售和市场趋势数据集”提供了关于各种车辆销售交易的全面信息收集。该数据集包括年份、品牌、型号、车款、车身类型、变速箱类型、车辆识别号码(VIN)、注册州、状况评级、里程表读数、外部和内部颜色、卖家信息、Manheim市场报告(MMR)值、销售价格及销售日期等详细信息。数据来源于https://www.kaggle.com/datasets/syedanwarafridi/vehicle-sales-data/data

2.导入所需的包并加载数据集

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.metrics import mean_absolute_error,r2_score,mean_absolute_percentage_error
#读取数据
df=pd.read_csv('/车辆销售数据/car_prices.csv')

3.数据探索

print(df.head(5))
df.info()
"""
   year   make  ... sellingprice                                 saledate
0  2015    Kia  ...      21500.0  Tue Dec 16 2014 12:30:00 GMT-0800 (PST)
1  2015    Kia  ...      21500.0  Tue Dec 16 2014 12:30:00 GMT-0800 (PST)
2  2014    BMW  ...      30000.0  Thu Jan 15 2015 04:30:00 GMT-0800 (PST)
3  2015  Volvo  ...      27750.0  Thu Jan 29 2015 04:30:00 GMT-0800 (PST)
4  2014    BMW  ...      67000.0  Thu Dec 18 2014 12:30:00 GMT-0800 (PST)

[5 rows x 16 columns]
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 558837 entries, 0 to 558836
Data columns (total 16 columns):
 #   Column        Non-Null Count   Dtype  
---  ------        --------------   -----  
 0   year          558837 non-null  int64  
 1   make          548536 non-null  object 
 2   model         548438 non-null  object 
 3   trim          548186 non-null  object 
 4   body          545642 non-null  object 
 5   transmission  493485 non-null  object 
 6   vin           558833 non-null  object 
 7   state         558837 non-null  object 
 8   condition     547017 non-null  float64
 9   odometer      558743 non-null  float64
 10  color         558088 non-null  object 
 11  interior      558088 non-null  object 
 12  seller        558837 non-null  object 
 13  mmr           558799 non-null  float64
 14  sellingprice  558825 non-null  float64
 15  saledate      558825 non-null  object 
dtypes: float64(4), int64(1), object(11)
memory usage: 68.2+ MB
"""

4.数据处理

#数据清理
print(df.isna().sum())
print(df.nunique())

df.dropna(axis=0, inplace=True)
df.info()
##删除重复值
df=df.drop_duplicates(keep='first',subset=['vin'])
df.info()
print(df.columns.to_list())
df.drop(['vin'],axis=1)
"""
year                0
make            10301
model           10399
trim            10651
body            13195
transmission    65352
vin                 4
state               0
condition       11820
odometer           94
color             749
interior          749
seller              0
mmr                38
sellingprice       12
saledate           12
dtype: int64
year                34
make                96
model              973
trim              1963
body                87
transmission         4
vin             550297
state               64
condition           41
odometer        172278
color               46
interior            17
seller           14263
mmr               1101
sellingprice      1887
saledate          3766
dtype: int64
<class 'pandas.core.frame.DataFrame'>
Index: 472325 entries, 0 to 558836
Data columns (total 15 columns):
 #   Column        Non-Null Count   Dtype  
---  ------        --------------   -----  
 0   year          472325 non-null  int64  
 1   make          472325 non-null  object 
 2   model         472325 non-null  object 
 3   trim          472325 non-null  object 
 4   body          472325 non-null  object 
 5   transmission  472325 non-null  object 
 6   state         472325 non-null  object 
 7   condition     472325 non-null  float64
 8   odometer      472325 non-null  float64
 9   color         472325 non-null  object 
 10  interior      472325 non-null  object 
 11  seller        472325 non-null  object 
 12  mmr           472325 non-null  float64
 13  sellingprice  472325
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值