二手车交易价格预测：建模调参

最新推荐文章于 2024-04-06 00:31:25 发布

Sundm@lhq

最新推荐文章于 2024-04-06 00:31:25 发布

阅读量1k

点赞数 1

分类专栏：数据挖掘 python 文章标签：数据分析 python 数据挖掘二手车交易价格预测

本文链接：https://blog.csdn.net/sdm12345/article/details/105253728

版权

建模与调参

内容介绍

线性回归模型：

线性回归对于特征的要求；
处理长尾分布；
理解线性回归模型；

模型性能验证：

评价函数与目标函数；
交叉验证方法；
留一验证方法；
针对时间序列问题的验证；
绘制学习率曲线；
绘制验证曲线；

嵌入式特征选择：

Lasso回归；
Ridge回归；
决策树；

模型对比：

常用线性模型；
常用非线性模型；

模型调参：

贪心调参方法；
网格调参方法；
贝叶斯调参方法；

代码示例

import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

reduce_mem_usage 函数通过调整数据类型，帮助我们减少数据在内存中占用的空间

def reduce_mem_usage(df):
    """ iterate through all the columns of a dataframe and modify the data type
        to reduce memory usage.        
    """
    start_mem = df.memory_usage().sum() 
    print('Memory usage of dataframe is {:.2f} MB'.format(start_mem))
    
    for col in df.columns:
        col_type = df[col].dtype
        
        if col_type != object:
            c_min = df[col].min()
            c_max = df[col].max()
            if str(col_type)[:3] == 'int':
                if c_min > np.iinfo(np.int8).min and c_max < np.iinfo(np.int8).max:
                    df[col] = df[col].astype(np.int8)
                elif c_min > np.iinfo(np.int16).min and c_max < np.iinfo(np.int16).max:
                    df[col] = df[col].astype(np.int16)
                elif c_min > np.iinfo(np.int32).min and c_max < np.iinfo(np.int32).max:
                    df[col] = df[col].astype(np.int32)
                elif c_min > np.iinfo(np.int64).min and c_max < np.iinfo(np.int64).max:
                    df[col] = df[col].astype(np.int64)  
            else:
                if c_min > np.finfo(np.float16).min and c_max < np.finfo(np.float16).max:
                    df[col] = df[col].astype(np.float16)
                elif c_min > np.finfo(np.float32).min and c_max < np.finfo(np.float32).max:
                    df[col] = df[col].astype(np.float32)
                else:
                    df[col] = df[col].astype(np.float64)
        else:
            df[col] = df[col].astype('category')

    end_mem = df.memory_usage().sum() 
    print('Memory usage after optimization is: {:.2f} MB'.format(end_mem))
    print('Decreased by {:.1f}%'.format(100 * (start_mem - end_mem) / start_mem))
    return df

sample_feature = reduce_mem_usage(pd.read_csv('data_for_tree.csv'))

Memory usage of dataframe is 60507328.00 MB

Memory usage after optimization is: 15724107.00 MB

Decreased by 74.0%

continuous_feature_names = [x for x in sample_feature.columns if x not in ['price','brand','model','brand']]

线性回归 & 五折交叉验证 & 模拟真实业务情况

sample_feature = sample_feature.dropna().replace('-', 0).reset_index(drop=True)
sample_feature['notRepairedDamage'] = sample_featur

最低0.47元/天解锁文章

Sundm@lhq

关注

1
点赞
踩
13

收藏

觉得还不错? 一键收藏
0
评论
二手车交易价格预测：建模调参

建模与调参内容介绍线性回归模型：线性回归对于特征的要求；处理长尾分布；理解线性回归模型；模型性能验证：评价函数与目标函数；交叉验证方法；留一验证方法；针对时间序列问题的验证；绘制学习率曲线；绘制验证曲线；嵌入式特征选择：Lasso回归；Ridge回归；决策树；模型对比：常用线性模型；...
复制链接

扫一扫

专栏目录