Task04:快来一起挖掘幸福感--阿里云天池

赛题相关介绍详见:https://tianchi.aliyun.com/competition/entrance/231702/information
鉴于保密要求,无法提供具体数据下载,敬请报名参加相关赛事。
根据发布的评测标准,当前参数下的RMSE为0.48,后面将会继续调参,争取将RMSE缩减至0.47以下,争取登上前500的排行榜。

import os
import time 
import pandas as pd
import numpy as np
import lightgbm as lgb
import seaborn as sns
import matplotlib.pyplot as plt
import time
from scipy.special import jn
from IPython.display import display, clear_output

from sklearn.metrics import roc_auc_score, roc_curve
from sklearn.model_selection import KFold
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC, LinearSVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import Perceptron
from sklearn.linear_model import SGDClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
from sklearn import metrics
from sklearn.metrics import mean_squared_error
# 加载数据集 #
train = pd.read_csv("C:/Users/Bai/Python_projects/Ali_Cloud/happiness_train_complete.csv", parse_dates=["survey_time"], encoding='latin-1') 
test = pd.read_csv("C:/Users/Bai/Python_projects/Ali_Cloud/happiness_test_complete.csv", parse_dates=["survey_time"], encoding='latin-1')
train.head()
# 删除训练集中无效的标签对应的数据 #
train = train.loc[train['happiness'] != -8]

print(train.shape)
# 查看各个类别的分布情况 #
f,ax=plt.subplots(1,2,figsize=(18,8))
train['happiness'].value_counts().plot.pie(autopct='%1.1f%%',ax=ax[0],shadow=True)
ax[0].set_title('happiness')
ax[0].set_ylabel('')
train['happiness'].value_counts().plot.bar(ax=ax[1])
ax[1].set_title('happiness')
plt.show()
# 通过相关性选择特征 #
train.corr()['happiness'][abs(train.corr()['happiness'])>0.05]
# 选择相关性大于0.05的作为候选特征 #
features = (train.corr()['happiness'][abs(train.corr()['happiness'])>0.05]).index
features = features.values.tolist()
# 额外加入比较重要的特征 # 
features.extend(['Age', 'work_exper'])
features.remove('happiness')
print(len(features))
# 生成数据和标签 #
target = train['happiness']
train_selected = train[features]
test = test[features]
feature_importance_df = pd.DataFrame()
oof = np.zeros(len(train))
predictions = np.zeros(len(test))
# 模型参数设置 #
params = {'num_leaves': 9,
         'min_data_in_leaf': 40,
         'objective': 'regression',
         'max_depth': 16,
         'learning_rate': 0.01,
         'boosting': 'gbdt',
         'bagging_freq': 5,
         'bagging_fraction': 0.8,      # 每次迭代时用的数据比例 #
         'feature_fraction': 0.8201,   # 每次迭代时随机选择80%的参数 #
         'bagging_seed': 11,
         'reg_alpha': 1.728910519108444,
         'reg_lambda': 4.9847051755586085,
         'random_state': 42,
         'metric': 'rmse',
         'verbosity': -1,
         'subsample': 0.81,
         'min_gain_to_split': 0.01077313523861969,
         'min_child_weight': 19.428902804238373,
         'num_threads': 4}
kfolds = KFold(n_splits=5,shuffle=True,random_state=15)
predictions = np.zeros(len(test))

for fold_n,(trn_index,val_index) in enumerate(kfolds.split(train_selected,target)):
    print("fold_n {}".format(fold_n))
    trn_data = lgb.Dataset(train_selected.iloc[trn_index],label=target.iloc[trn_index])
    val_data = lgb.Dataset(train_selected.iloc[val_index],label=target.iloc[val_index])
    num_round=10000
    clf = lgb.train(params, trn_data, num_round, valid_sets = [trn_data, val_data], verbose_eval=1000, early_stopping_rounds = 100)
    oof[val_index] = clf.predict(train_selected.iloc[val_index], num_iteration=clf.best_iteration)
    predictions += clf.predict(test,num_iteration=clf.best_iteration)/5
    fold_importance_df = pd.DataFrame()
    fold_importance_df["feature"] = features
    fold_importance_df["importance"] = clf.feature_importance()
    fold_importance_df["fold"] = fold_n + 1
    feature_importance_df = pd.concat([feature_importance_df, fold_importance_df], axis=0)
    print("CV score: {:<8.5f}".format(mean_squared_error(target, oof)**0.5))
cols = (feature_importance_df[["feature", "importance"]]
        .groupby("feature")
        .mean()
        .sort_values(by="importance", ascending=False)[:1000].index)
best_features = feature_importance_df.loc[feature_importance_df.feature.isin(cols)]

plt.figure(figsize=(14,26))
sns.barplot(x="importance", y="feature", data=best_features.sort_values(by="importance",ascending=False))
plt.title('LightGBM Features (averaged over folds)')
plt.tight_layout()
# 计算结果 #
submit = pd.read_csv("C:/Users/Bai/Python_projects/Ali_Cloud/happiness_submit.csv")
submision_lgb1  = pd.DataFrame({"id":submit['id'].values})
submision_lgb1["happiness"]=predictions
submision_lgb1.head(5)
# 获取时间 #
time_str = time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime())
out_dir = "C:/Users/Bai/Python_projects/Ali_Cloud/{}/".format(time_str)
os.makedirs(out_dir)

# 保存模型 #
clf.save_model(out_dir + "model.txt")

# 保存结果 #
submision_lgb1.to_csv(out_dir + "submision_lgbm.csv",index=False)
  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
Executing tasks: [:app:assembleDebug] in project D:\Users\lenovo\AndroidStudioProjects\Pinduoduo WARNING: The specified Android SDK Build Tools version (27.0.0) is ignored, as it is below the minimum supported version (28.0.3) for Android Gradle Plugin 3.5.2. Android SDK Build Tools 28.0.3 will be used. To suppress this warning, remove "buildToolsVersion '27.0.0'" from your build.gradle file, as each version of the Android Gradle Plugin now has a default version of the build tools. > Task :app:preBuild UP-TO-DATE > Task :app:preDebugBuild UP-TO-DATE > Task :app:checkDebugManifest UP-TO-DATE > Task :app:generateDebugBuildConfig UP-TO-DATE > Task :app:javaPreCompileDebug UP-TO-DATE > Task :app:mainApkListPersistenceDebug UP-TO-DATE > Task :app:generateDebugResValues UP-TO-DATE > Task :app:createDebugCompatibleScreenManifests UP-TO-DATE > Task :app:mergeDebugShaders UP-TO-DATE > Task :app:compileDebugShaders UP-TO-DATE > Task :app:generateDebugAssets UP-TO-DATE > Task :app:compileDebugRenderscript NO-SOURCE > Task :app:compileDebugAidl NO-SOURCE > Task :app:generateDebugResources UP-TO-DATE > Task :app:mergeDebugResources UP-TO-DATE > Task :app:processDebugManifest > Task :app:processDebugResources FAILED AGPBI: {"kind":"error","text":"Android resource linking failed","sources":[{"file":"D:\\Users\\lenovo\\AndroidStudioProjects\\Pinduoduo\\app\\src\\main\\res\\layout\\activity_main.xml","position":{"startLine":34}}],"original":"D:\\Users\\lenovo\\AndroidStudioProjects\\Pinduoduo\\app\\src\\main\\res\\layout\\activity_main.xml:35: AAPT: error: '#875ale' is incompatible with attribute textColor (attr) reference|color.\n ","tool":"AAPT"} FAILURE: Build failed with an exception. * What went wrong: Execution failed for task ':app:processDebugResources'. > A failure occurred while executing com.android.build.gradle.internal.tasks.Workers$ActionFacade > Android resource linking failed D:\Users\lenovo\AndroidStudioProjects\Pinduoduo\app\src\main\res\layout\activity_main.xml:35: AAPT: error: '#875ale' is incompatible with attribute textColor (attr) reference|color. * Try: Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights. * Get more help at https://help.gradle.org BUILD FAILED in 3s 11 actionable tasks: 2 executed, 9 up-to-date
最新发布
06-07

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值