在Python中实现机器学习功能的4种方法

最新推荐文章于 2024-08-26 15:35:53 发布

愿码

最新推荐文章于 2024-08-26 15:35:53 发布

阅读量623

点赞数

文章标签： Python 机器功能编程开发

本文链接：https://blog.csdn.net/weixin_43970764/article/details/89249859

版权

本文介绍了Python中四种机器学习特征选择方法：单变量特征选择、递归特征消除(RFE)、主成分分析（PCA）和特征选择（feature importance），并结合Scikit-learn库展示了每种方法的实现过程和效果，强调了特征选择在模型性能提升和数据简化方面的重要性。

摘要由CSDN通过智能技术生成

来源 | 愿码(ChainDesk.CN)内容编辑

愿码Slogan | 连接每个程序员的故事

网站 | http://chaindesk.cn

愿码愿景 | 打造全学科IT系统免费课程，助力小白用户、初级工程师0成本免费系统学习、低成本进阶，帮助BAT一线资深工程师成长并利用自身优势创造睡后收入。

官方公众号 | 愿码 | 愿码服务号 | 区块链部落

免费加入愿码全思维工程师社群 | 任一公众号回复“愿码”两个字获取入群二维码

本文阅读时长：13min

在本文中，我们将介绍从数据集中选择要素的不同方法; 并使用Scikit-learn（sklearn）库讨论特征选择算法的类型及其在Python中的实现：

单变量特征选择
递归特征消除(RFE)
主成分分析（PCA）
特征选择 (feature importance)

单变量特征选择

统计测试可用于选择与输出变量具有最强关系的那些特征。

scikit-learn库提供SelectKBest类，可以与一组不同的统计测试一起使用，以选择特定数量的功能。

以下示例使用chi平方（chi ^ 2）统计检验非负特征来选择Pima Indians糖尿病数据集中的四个最佳特征：

#Feature Extraction with Univariate Statistical Tests (Chi-squared for classification)

#Import the required packages

#Import pandas to read csv import pandas

#Import numpy for array related operations import numpy

#Import sklearn's feature selection algorithm

from sklearn.feature_selection import SelectKBest

#Import chi2 for performing chi square test from sklearn.feature_selection import chi2

#URL for loading the dataset

url ="https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians diabetes/pima-indians-diabetes.data"

#Define the attribute names

names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']

#Create pandas data frame by loading the data from URL

dataframe = pandas.read_csv(url, names=names)

#Create array from data values

array = dataframe.values

#Split the data into input and target

X = array[:,0:8]

Y = array[:,8]

#We will select the features using chi square

test = SelectKBest(score_func=chi2, k=4)

#Fit the function for ranking the features by score

fit = test.fit(X, Y)

#Summarize scores numpy.set_printoptions(precision=3) print(fit.scores_)

#Apply the transformation on to dataset

features = fit.transform(X)

#Summarize selected features print(features[0:5,:])

每个属性的分数和所选的四个属性（分数最高的分数）：plas，test，mass和age。

每个功能的分数：

[111.52   1411.887 17.605 53.108  2175.565   127.669 5.393

181.304]

特色：

[[148. 0. 33.6 50. ]

[85. 0. 26.6 31. ]

[183. 0. 23.3 32. ]

[89. 94. 28.1 21. ]

[137. 168. 43.1 33. ]]

最低0.47元/天解锁文章

愿码

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫