1,导入相关库
import pandas as pd
from lightgbm import LGBMClassifier
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier,GradientBoostingClassifier
from xgboost import XGBClassifier
2,读取数据并处理
#读取数据集
test = pd.read_csv("/home/tarena/test/web/day11/data_all.csv")
x = test.drop(['status'],axis=1)
y = test["status"]
#数据三七分,随机种子2018
X_train,X_test,y_train,y_test = train_test_split(x, y, test_size =0.3,random_state=2018)
3,建立模型,评分
#逻辑森林
Rfc = RandomForestClassifier()
Rfc.fit(X_train,y_train)
Rfc_score = Rfc.score(X_train,y_train)
#GBDT
Gbdt = GradientBoostingClassifier()
Gbdt.fit(X_train,y_train)
Gbdt_score = Gbdt.score(X_train,y_train)
#XGBoost
Xgb = XGBClassifier()
Xgb.fit(X_train,y_train)
Xgb_score = Xgb.score(X_train,y_train)
#lightgbm
Lgb = LGBMClassifier()
Lgb.fit(X_train,y_train)
lgb_score = Lgb.score(X_train,y_train)
5,打印结果
print(Rfc_score,Gbdt_score,Xgb_score,lgb_score)
0.9861737300871656 0.8623384430417794 0.852419597234746 0.9972948602344455
这一次基本copy了第一天的内容,然后通过百度搜索使用了不同的库。看到有的同学对数据做了标准化和均一化处理,查看了相关资料也没有弄明白原理,但在尝试做类似处理运行之后,结果没有发生变化,所以将数据预处理部分给删除了。因为时间有限,暂时先跑通数据,然后后面慢慢消化。