Kaggle web traffic比赛整理

Kaggle web traffic比赛:https://www.kaggle.com/c/web-traffic-time-series-forecasting/data 

Kaggle web traffic比赛第一名代码地址:https://github.com/Arturus/kaggle-web-traffic

Kaggle web traffic比赛第一名代码讲解:https://blog.csdn.net/uwr44uouqcnsuqb60zk2/article/details/78794503

Kaggle web traffic比赛第二名代码链接:https://github.com/jfpuget/Kaggle/tree/master/WebTrafficPrediction

Kaggle web traffic比赛第二名的阐述链接:https://www.kaggle.com/c/web-traffic-time-series-forecasting/discussion/39395

Kaggle web traffic比赛第六名py3代码链接:https://github.com/alanshu2018/web-traffic-forecasting

Kaggle web traffic比赛第六名py2代码链接:https://github.com/sjvasquez/web-traffic-forecasting

Kaggle web traffic比赛第二名代码详细解析:

1.Clone the Kaggle repository

2.Download competition data into the Kaggle/input directory

3.Go to the Kaggle/WebTrafficPrediction directory

4.Run the keras-kf-12-stage2-sept-10.ipynb notebook. This trains the base deep learning model and computes predictions from it. 

in the Kaggle/submissions directory, including:

  • keras_kf_12_stage2_sept_10_train.csv
  • keras_kf_12_stage2_sept_10_test.csv

5.The file keras_kf_12_stage2_sept_10_test.csv is my first submission. It scores 36.91121 and would have got the 4th rank overall.

train_x          median_x_y|max_x_y|median_diff_x_y|median_diff7m_x_y|site|agent

train_y          wx_dy_norm(w0_d4~w8_d3)

train_all_x    median_x_y|max_x_y|median_diff_x_y|median_diff7m_x_y|site|agent

train_all_y    NaN

j结论:先用神经网络模型获取一个预测值,作为第一阶段的结果

-------------------------------------------------------------------------第一个版本---------------------------------------------------------------------------------

6.Run the Pred_11-stage2-sept-10.ipynb notebook. This creates a median based model and computes predictions out of it. It should produce files in the Kaggle/submissions directory, including:

  • pred_10_stage2_sept_10_train.csv
  • pred_10_stage2_sept_10_test.csv

7.Run the first_stage2.ipynb notebook. It computes the first date at which a page data is non zero. It should create a file in the Kaggle/data directory:

  • first.csv

8.Run the xgb_23_keras_7_2_stage2-sept-10-2.ipynb notebook. This creates the final model by running xgboost on the residuals for the neural network predictions. It uses the past visits plus the above two notebook outputs as features. It should produce files in the Kaggle/submission directory, including:

periods = [(0,1), (1,2), (2,3), (3,4), (4,5), (5,6), (6,7), (7,8),

                 (0,2), (0,4), (0,8), (0,12), (0,16), (0, 20)]

features:

['WeekDay','YearDay','Month','WeekEnd',
 'Visits_pred_10',       log(1+NN预测值)-log(1+Huber预测值)
 'Visits_keras_kf_3',    log(1+NN预测值)
 'AllVisits',            网点维度的中位数
 'median_x_y',           比如第一周(0-1)的中位数
 'median_x_y_ratio',     median_x_y-AllVisits
 'median_day_x_y',       工作日的中位数
 'median_day_x_y_ratio', median_day_x_y- AllVisits
 'mean_x_y',             比如第一周(0-1)的中位数 
 'mean_x_y_ratio',       median_x_y-AllVisits 
 'mean_day_x_y',         工作日的中位数 
 'mean_day_x_y_ratio',   median_day_x_y- AllVisits
 'SiteLabel',            网址类别标签
 'firstval',             最后一次非NAN的天数
 'AllVar',               方差
 'AllMax',               最大值
]
  • xgb_1_2017-09-12-19-14-14_test.csv

9.This file is my second submission. It scores 36.78499 and got me the second place.

-----------------------------------------------------------------第二个版本-----------------------------------------------------------------------------------------

10.Kaggle asks to provide a simpler model that provides 90% of the performance, if possible. Such model is provided in file keras_simple.ipynb. Its feature set is much simpler, basically the median of visits for each of the last 8 weeks of training data, plus the site (eg es.wikipedia.org), and the agent-access method. Its output score 37.58692 and would have got the 9th rank.

keras_simple.ipynb

------------------------------------------------------------------建议简化版本------------------------------------------------------------------------------

数据集划分:

train          2016.3.14~2016.9.10

test           2016.9.13~2016.11.14

train_all    2017.3.14~2017~9.10

test_all     2017.9.13~2017.11.14

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值