【机器学习|XGBoost】基于树模型内置的“基尼/增益”重要性(feature_importances_)来计算各波段对分类结果的贡献。
【机器学习|XGBoost】基于树模型内置的“基尼/增益”重要性(feature_importances_)来计算各波段对分类结果的贡献。
欢迎宝子们点赞、关注、收藏!欢迎宝子们批评指正!
祝所有的硕博生都能遇到好的导师!好的审稿人!好的同门!顺利毕业!
大多数高校硕博生毕业要求需要参加学术会议,发表EI或者SCI检索的学术论文会议论文。详细信息可关注VX “
学术会议小灵通
”或参考学术信息专栏:https://fighting.blog.csdn.net/article/details/147526690
代码示例
import os
import numpy as np
import pandas as pd
import rasterio
import matplotlib.pyplot as plt
from xgboost import XGBClassifier # 使用 XGBoost
# 输入/输出路径
multiband_path = 'multiband.tif'
label_path = 'label.tif'
out_dir = 'feature_importance_xgb'
os.makedirs(out_dir, exist_ok=True)
# 1. 读取多波段影像
with rasterio.open(multiband_path) as src:
data = src.read().astype(np.float32) # (bands, h, w)
band_names = [f'B{i}' for i in range(1, src.count+1)]
# 2. 读取并展平标签(先 pad,再 ravel)
with rasterio.open(label_path) as src:
label2d = src.read(1)
label_padded = np.pad(
label2d,
pad_width=((1, 1), (0, 1)), # 上下各 +1 行,右侧 +1 列
mode='constant',
constant_values=0
)
label_flat = label_padded.ravel().astype(int)
# 3. 构建 X, y
X = data.reshape(data.shape[0], -1).T # (n_samples, n_bands)
y = label_flat # (n_samples,)
mask = np.isin(y, [0, 1])
X, y = X[mask], y[mask]
# 4. 训练 XGBoost 分类器
xgb = XGBClassifier(
n_estimators=100,
max_depth=6,
learning_rate=0.1,
eval_metric='logloss',
n_jobs=-1,
random_state=42
)
xgb.fit(X, y)
# 5. 获取 XGBoost 的 feature importances(基于增益)
importances = xgb.feature_importances_
# 6. 排序并可视化
indices = np.argsort(importances)[::-1]
sorted_names = [band_names[i] for i in indices]
sorted_importances = importances[indices]
plt.figure(figsize=(8, 5))
plt.bar(sorted_names, sorted_importances, color='lightcoral', edgecolor='black')
plt.xticks(rotation=45, ha='right')
plt.ylabel('Importance')
plt.title('Feature Importance (XGBoost Gain)')
plt.tight_layout()
out_png = os.path.join(out_dir, 'importance_xgb.png')
plt.savefig(out_png, dpi=300)
plt.close()
print(f'Feature importance plot saved to {out_png}')
# 7. 保存为 CSV
df_imp = pd.DataFrame({
'Band': sorted_names,
'Importance': sorted_importances
})
out_csv = os.path.join(out_dir, 'importance_xgb.csv')
df_imp.to_csv(out_csv, index=False, float_format='%.6f')
print(f'Feature importance table saved to {out_csv}')
代码说明
XGBClassifier
:使用n_estimators=100
、max_depth=6
、learning_rate=0.1
等常见默认参数。feature_importances_
:XGBoost 默认以“增益”(gain)作为特征重要性度量,可通过
xgb.get_booster().get_score(importance_type='gain')
获取更详细字典。- 输出:同时导出 PNG 可视化条形图和 CSV 表格,便于后续分析。