Datawhale夏令营 AI+经济 task3笔记

对照task3教程对task2在以下方面进行了改进

特征构建

task2的特征:

独热编码

发现task3的baseline对年月日都进行了独热编码

train_data=pd.get_dummies(
    electricity_price,
    columns=["hour", "day", "month", "year", "weekday",'time_of_day'],
    drop_first=True)

 滑动窗口

window_sizes = [4, 12, 24]
for window_size in pbar:
     functions = ["mean", "std", "min", "max", cal_range, increase_num,
                     decrease_num, increase_mean, decrease_mean, increase_std, decrease_std]
    for func in functions:
        func_name = func if type(func) == str else func.__name__
        column_name = f"demand_rolling_{window_size}_{func_name}"
        train_data[column_name] = train_data["demand"].rolling(
                window=window_size,        
                min_periods=window_size//2,  
                closed="left"         
            ).agg(func)              

 滞后窗口

train_data["demand_shift_1"] = train_data["demand"].shift(1)
train_data["demand_diff_1"] = train_data["demand"].diff(1)
train_data["demand_pct_1"] = train_data["demand"].pct_change(1)
mask=train_data['price'].isna()
X_train_val=train_data[~mask].drop(columns=['price']).bfill().ffill()
X_test=train_data[mask].drop(columns=['price']).bfill().ffill()
y_train_val=train_data[~mask][['price']]

模型融合

def pred(X_train,y_train,X_val):
    lgb_model=LGBMRegressor()
    linear_model = LinearRegression()
    lgb_model.fit(X_train,y_train)
    linear_model.fit(X_train[["demand"]], y_train)
    lgb_pred=lgb_model.predict(X_val)
    linear_pred = linear_model.predict(X_val[["demand"]]).flatten()
    y_pred=(lgb_pred+linear_pred)/2
    return y_pred

交叉验证

由于是时间序列数据,应该不能随机抽验证集,应该指定日期(然而task2随机抽的)

X_n=int(X.shape[0]*0.8)
X_train=X_train_val[:X_n]
y_train=y_train_val[:X_n]
X_val=X_train_val[X_n:]
y_val=y_train_val[X_n:]
y_pred=pred(X_train,y_train,X_val)
mse=mean_squared_error(y_val,y_pred)
error=(mse+np.sqrt(mse))/2
print(error)

不止留一个验证集,可以留多个验证集进行k折交叉验证

def rolling_window_cv(X, y, window_ratio,predict_ratio):
    errors = []
    model = LGBMRegressor()
    window_size=int(window_ratio*X.shape[0])
    n_predict=int(X.shape[0]*predict_ratio)
    for i in range(len(X) - window_size-n_predict+1):
        X_train = X[i:i+window_size]
        y_train = y[i:i+window_size]
        X_test = X[i+window_size:i+window_size+n_predict]
        y_test = y[i+window_size:i+window_size+n_predict]
        
        model.fit(X_train, y_train)
        predictions = model.predict(X_test)
        mse = mean_squared_error(y_test, predictions)
        error=(mse+np.sqrt(mse))/2
        errors.append(error)
    
    return np.mean(errors)
rolling_window_cv(X_train_val,y_train_val,0.8,0.199)

发现验证集上是16000左右,测试集从以前的20000多变成11000多了,,,

  • 8
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值