baseline由官方提供,目前线上跑分:494.90
两个可以直接调的优化点:
# 交叉验证函数,第一个参数为分组,分成N组,每一个组都做一次验证集,其余为训练集,以提升训练精度
# 不过单一提升该参数对于最终预测效果没有特别明显,只适合暂时打榜,且容易与线上数据产生过拟合情况
kf = KFold(n_splits=folds, shuffle=True, random_state=seed)
lightgbm参数调优
代码部分:
1.导包
import os
import gc
import math
import pandas as pd
import numpy as np
import lightgbm as lgb
import xgboost as xgb
from catboost import CatBoostRegressor
from sklearn.linear_model import SGDRegressor, LinearRegression, Ridge
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import StratifiedKFold, KFold
from sklearn.metrics import log_loss
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from tqdm import tqdm
import matplotlib.pyplot as plt
import time
import warnings
warnings.filterwarnings('ignore')
2.读取数据
train = pd.read_csv('train.csv')
test=pd.read_csv('testA.csv')
3.数据预处理
def reduce_mem_usage(df):
start_mem = df.memory_usage().sum() / 1024**2
print('Memory usage of dataframe is {:.2f} MB'.format(start_mem))