【sklearn学习】LightGBM

# lightgbm原生接口
import lightgbm as lgb
# 基于scikit-learn接口
from lightgbm import LGBMClassifier
from lightgbm import LGBMRegressor

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import mean_absolute_error, mean_squared_error
import pandas as pd
import numpy as np
from sklearn.model_selection import KFold,StratifiedKFold
from sklearn.datasets import load_boston, load_breast_cancer, load_wine
import warnings
warnings.simplefilter("ignore")
bonston = load_boston()
cancer = load_breast_cancer()
wine = load_wine()
from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing()
data_train, data_test, target_train, target_test = train_test_split(cancer.data, cancer.target, test_size = 0.2, random_state = 0)
lgbm = LGBMClassifier()
lgbm.fit(data_train, target_train)
train_score = lgbm.score(data_train, target_train)
print('train score:', train_score)
test_score = lgbm.score(data_test, target_test)
print('test score:', test_score)
data_train, data_test, target_train, target_test = train_test_split(cancer.data, cancer.target, test_size = 0.2, random_state = 0)


lgb_train = lgb.Dataset(data_train, target_train)
lgb_test = lgb.Dataset(data_test, target_test)


params = {
    'learning_rate': 0.1,
    'lambda_l1': 0.1,
    'lambda_l2': 0.2,
    'max_depth': 4,
    'objective': 'multiclass',  # 目标函数
    'num_class': 3,
}

# lgb_train
lgb_model = lgb.train(params, lgb_train, valid_sets=lgb_test)

# 模型预测
y_pred = lgb_model.predict(data_test)
y_pred = [list(x).index(max(x)) for x in y_pred]
print(y_pred)
 
 
# 模型评估
print(accuracy_score(target_test, y_pred))
### 如何在sklearn中集成和使用LightGBM #### LightGBM与scikit-learn的兼容性 由于LightGBM设计之初便考虑到了与其他机器学习库的交互性和集成能力,因此能够很好地与scikit-learn配合工作[^1]。这意味着用户不仅可以在自己的Python环境中利用LightGBM执行高效的梯度提升操作,还可以借助scikit-learn提供的丰富功能来进行预处理、交叉验证、超参数调优等工作。 #### 安装必要的软件包 为了能够在项目里边引入这两个库,在命令行输入如下指令完成安装: ```bash pip install lightgbm scikit-learn pandas numpy matplotlib seaborn ``` #### 导入所需的模块 接着定义脚本文件顶部导入语句部分,确保加载了所有必需的功能组件: ```python import lightgbm as lgb from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score import pandas as pd import numpy as np ``` #### 数据准备阶段 这里选取经典的鸢尾花数据集作为例子展示整个流程;该数据集中包含了四个属性(即特征),分别是萼片长度、宽度以及花瓣长度、宽度,并附带有一个表示种类的目标变量。下面这段代码实现了读取原始资料并将其划分为训练集合测试集两大部分: ```python data = load_iris() df = pd.DataFrame(data.data, columns=data.feature_names) target = data.target # 划分训练集和测试集 X_train, X_test, y_train, y_test = train_test_split(df.values, target, test_size=0.2, random_state=42) dtrain = lgb.Dataset(X_train, label=y_train) dtest = lgb.Dataset(X_test, reference=dtrain) ``` #### 构建LGBMClassifier实例对象 创建一个`lgb.LGBMClassifier()`类的对象,设置好相应的初始化参数之后就可以开始构建模型了。对于多分类问题,默认情况下会采用softmax损失函数自动调整内部机制以适应具体需求。 ```python model = lgb.LGBMClassifier(boosting_type='gbdt', num_leaves=31, max_depth=-1, learning_rate=0.1, n_estimators=100, objective=None, min_child_weight=0.001, subsample=0.8, colsample_bytree=0.8, reg_alpha=0., reg_lambda=0.) ``` #### 训练过程 现在有了已经配置好的估计器实体,只需要简单地调用`.fit()`方法即可启动实际的学习环节。期间可以通过指定额外的关键字参数来控制更多细节选项,比如早停策略等。 ```python evals_result = {} model.fit( X_train, y_train, eval_set=[(X_test, y_test)], eval_metric=['multi_logloss'], early_stopping_rounds=10, verbose=True, evals_result=evals_result ) ``` #### 性能评估 当训练完成后,应当对得到的新模型进行全面检验。通常做法是从多个角度出发衡量预测效果的好坏程度,此处仅列举了一种最基础的方式——计算准确率得分。 ```python y_pred = model.predict(X_test) accuracy = accuracy_score(y_test, y_pred) print(f'Accuracy: {accuracy * 100:.2f}%') ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值