实用机器学习(hw1/hw4)

最新推荐文章于 2023-07-19 16:38:04 发布

纷繁中淡定

最新推荐文章于 2023-07-19 16:38:04 发布

阅读量1.7k

点赞数 1

分类专栏：深度学习（pytorch）文章标签：机器学习 python 人工智能

本文链接：https://blog.csdn.net/qq_36170626/article/details/121188423

版权

本文介绍了实用机器学习的环境安装、基线代码分析及如何提升模型精度。通过增加数据量、特征工程、数据清洗等步骤，将精度从基线逐步提高到0.0483。使用了包括LightGBM在内的多种机器学习模型进行比较。

摘要由CSDN通过智能技术生成

实用机器学习(hw1/hw4)

文章目录

1. 环境安装

autogluon

2. baseline 代码分析

import pandas as pd
import numpy as np
import scipy
# https://auto.gluon.ai/stable/index.html#installation
from autogluon.tabular import TabularPredictor

# 读取数据4  WeightedEnsemble_L2   -0.265
data = pd.read_feather('house_sales.ftr')
df = data[['Sold Price', 'Sold On', 'Type', 'Year built', 'Bedrooms', 'Bathrooms']].copy()

# 使用EDA转换售价为数值，消除价格太高或者太低的时候
# print(df.head(20))
# print(df.dtypes)
c = 'Sold Price'
# 转换售价从object类型为float类型
if c in df.select_dtypes('object').columns:
    df.loc[:, c] = np.log10(
        pd.to_numeric(df[c].replace(r'[$,-]', '', regex=True)) + 1    
    )
# print(df.head(20))
# 售价控制区间为4~8
df = df[(df['Sold Price'] >= 4) & (df['Sold Price'] <= 8)]
# test = np.log10(2000001)
# print(test)
# print(df.head(20))
# print(df.dtypes)
# 使用2021-2-15~2021-3-1的数据作为测试数据
# 训练数据可以使用2021年之前的

# 根据时间戳分割训练数据集和测试数据集
test_start, test_end = pd.Timestamp(2021, 2, 15), pd.Timestamp(2021, 3, 1)
train_start = pd.Timestamp(2021, 1, 1)
df['Sold On'] = pd.to_datetime(df['Sold On'], errors='coerce')
train = df[(df['Sold On'] >= train_start) & (df['Sold On'] < test_start)]
test = df[(df['Sold On'] >= test_start) & (df['Sold On'] < test_end)]
# train.shape:  (24872, 6) tesr.shape:  (11510, 6)
print</