Scikit-learn section6 for cvpytorch

最新推荐文章于 2024-07-16 11:34:41 发布

cvpytorch

最新推荐文章于 2024-07-16 11:34:41 发布

阅读量864

点赞数

文章标签： scikit-learn sklearn 机器学习

本文链接：https://blog.csdn.net/m0_59540543/article/details/122380075

版权

英文官方教程链接如下：

https://scikit-learn.org/stable/getting_started.htmlhttps://scikit-learn.org/stable/getting_started.html我决定了，还是得好好学下去。从这里开始，我争取讲得细致一些。

如有侵权，务必联系删除。

scikit-learn 提供了一系列内嵌机器学习算法和模型，它们被称作评估器。例如当你打开 pycharm，使用一个来自 scikit-learn 的算法或者模型的时候，当你将他赋值给一个变量 a，然后再另一行的代码再次使用到a的时候，你在后面输入一个小 “.”，大部分情况下你会发现它会跳出 fit 的选项。

import numpy as np
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(random_state=0)
#具体的用法等我们到了API那一章节再详细探讨，这里我们先了解一下官网给出的这个参数
#random_state是用来控制两个部分：
#第一，如果bootstrap=True，那么它将控制构建树时使用的样本自举的随机性
#第二，如果max_features<n_features，那么控制特征采样，
#      用来考虑何时在每个节点找到最佳分割方式
X = np.array([[1,2,3],[101,102,103]])
y = np.array([0,1])
fit = classifier.fit(X,y)
predict = classifier.predict(np.array([[0,0,1],[70,80,90]]))
#predict = classifier.predict(np.array([[0,0,1],[10,20,30]]))
print(predict)
#[0 1]
#[0 0]

这里我们通过溯源可以知道，RandomForestClassifier 是继承了父类 BaseForest 的一些属性，其中就包括 fit 。现在我还不是太明白 RandomForestClassifier 的具体原理，等到学习相关API的时候，我会和大家深入探讨。fit 的参数我之前有简单讲过，这里就不再重复了。被你拟合的评估器就具备了预测的能力了。

预处理器和变换器都是评估器的对象，但是 transformer 对象不具有预测方法。

import numpy as np
from sklearn.preprocessing import StandardScaler
X = np.array([[1,2],[2,4]])
transform = StandardScaler().fit(X).transform(X)
print(transform)
#[[-1. -1.]
# [ 1.  1.]]

根据 StandardScaler 的类描述来看，它是通过移除平均值并缩放到单位方差来对特征进行标准化。具体内容有些复杂，我们后面再说。

变换器和评估器可以结合到一个统一地对象上：pipline（这里我也不知道怎么翻译比较合适，管道？）。

from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
pipe = make_pipeline(StandardScaler(),LogisticRegression())
#创建pipeline对象，第一个是上面提到的变换器，第二个是评估器
X, y = load_iris(return_X_y=True)
#X是data对象，y是target对象
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
#将数组或矩阵切分成随机的训练和测试子集，按照代码所给的对饮关系
#它是将一份数据切成了两份，因为传入了莲芬数据，所以切成了2份的对应的2份，共4份
fit = pipe.fit(X_train, y_train)
#这样pipe就可以像其他的评估器那样使用了
print(accuracy_score(pipe.predict(X_test), y_test))
#0.9736842105263158

虽然说，变换器本身没有预测方法，但是由于 pipe 将他们结合起来，就有了它们的所有方法，意思有点像并集。

下面是一个5折交叉验证过程：

from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_validate
import matplotlib.pyplot as plt
X, y = make_regression(n_samples=1000, random_state=0)
lr = LinearRegression()
result = cross_validate(lr, X, y)  # defaults to 5-fold CV
print(result['test_score'])
# r_squared score is high because dataset is easy
plt.subplot(121)
plt.plot(X,'bo')
plt.subplot(122)
plt.plot(y,'mo')
plt.show()
#[1. 1. 1. 1. 1.]

这块还是交给会的人吧，我看了半天也没看出什么头绪来，希望原作能指点指点。

scikit-learn 提供了通过交叉验证来自动找到最佳参数组合的工具。

from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import train_test_split
from scipy.stats import randint
X, y = fetch_california_housing(return_X_y=True)
#这是经典的加利福尼亚房价数据集
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
# define the parameter space that will be searched over
param_distributions = {'n_estimators': randint(1, 100),
                       'max_depth': randint(1, 100)}
#定义搜寻最佳参数组合的参数空间
# now create a searchCV object and fit it to the data
search = RandomizedSearchCV(estimator=RandomForestRegressor(random_state=0),
                            n_iter=5,
                            param_distributions=param_distributions,
                            random_state=0)
#n_iter是采样参数设置数（我不太明白这是什么意思），参数3就是上面的搜寻区间
#创建searchCV对象并拟合数据
fit = search.fit(X_train, y_train)
print(search.best_params_)
#'max_depth': 37, 'n_estimators': 88
print(search.score(X_test, y_test))
#0.7933383545031552

学过卷积神经网络的同学应该听过，网络越深，结果越好（虽然现在许多研究者都已经证明事实并非如此）。我们胸上面也可以看出，参数最佳的地方也不是深度最深的地方。嘿嘿，调参是个技术活。

这部分就结束了，希望能得到前辈、同辈、后辈的指导。

cvpytorch

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Scikit-learn section6 for cvpytorch

英文官方教程链接如下：https://scikit-learn.org/stable/getting_started.htmlhttps://scikit-learn.org/stable/getting_started.html我决定了，还是得好好学下去。从这里开始，我争取讲得细致一些。如有侵权，务必联系删除。scikit-learn 提供了一系列内嵌机器学习算法和模型，它们被称作评估器。例如当你打开 pycharm，使用一个来自 scikit-learn 的算法或者模型的时候，当你将他赋值给
复制链接

扫一扫