Python学习---scikit-learn 0.23公布的亮点

最新推荐文章于 2023-11-14 15:47:35 发布

「已注销」

最新推荐文章于 2023-11-14 15:47:35 发布

阅读量251

点赞数

分类专栏：可视化数据可视化文章标签： python 学习 scikit-learn

本文链接：https://blog.csdn.net/m0_38127487/article/details/124622916

版权

可视化同时被 2 个专栏收录

526 篇文章 19 订阅

订阅专栏

数据可视化

236 篇文章 15 订阅

订阅专栏

Generalized Linear Models, and Poisson loss for gradient boosting¶
Long-awaited Generalized Linear Models with non-normal loss functions
are now available. In particular, three new regressors were
implemented: PoissonRegressor, GammaRegressor, and TweedieRegressor.
The Poisson regressor can be used to model positive integer counts, or
relative frequencies. Read more in the User Guide. Additionally,
HistGradientBoostingRegressor supports a new ‘poisson’ loss as well.

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import PoissonRegressor
from sklearn.ensemble import HistGradientBoostingRegressor

n_samples, n_features = 1000, 20
rng = np.random.RandomState(0)
X = rng.randn(n_samples, n_features)
# positive integer target correlated with X[:, 5] with many zeros:
y = rng.poisson(lam=np.exp(X[:, 5]) / 2)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=rng)
glm = PoissonRegressor()
gbdt = HistGradientBoostingRegressor(loss="poisson", learning_rate=0.01)
glm.fit(X_train, y_train)
gbdt.fit(X_train, y_train)
print(glm.score(X_test, y_test))
print(gbdt.score(X_test, y_test))

在这里插入图片描述

Scalability and stability improvements to KMeans¶ The KMeans estimator
was entirely re-worked, and it is now significantly faster and more
stable. In addition, the Elkan algorithm is now compatible with sparse
matrices. The estimator uses OpenMP based parallelism instead of
relying on joblib, so the n_jobs parameter has no effect anymore. For
more details on how to control the number of threads, please refer to
our Parallelism notes.

import scipy
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from sklearn.metrics import completeness_score

rng = np.random.RandomState(0)
X, y = make_blobs(random_state=rng)
X = scipy.sparse.csr_matrix(X)
X_train, X_test, _, y_test = train_test_split(X, y, random_state=rng)
kmeans = KMeans(algorithm="elkan").fit(X_train)
print(completeness_score(kmeans.predict(X_test), y_test))

在这里插入图片描述

Improvements to the histogram-based Gradient Boosting estimators
Various improvements were made to HistGradientBoostingClassifier and
HistGradientBoostingRegressor. On top of the Poisson loss mentioned
above, these estimators now support sample weights. Also, an automatic
early-stopping criterion was added: early-stopping is enabled by
default when the number of samples exceeds 10k. Finally, users can now
define monotonic constraints to constrain the predictions based on the
variations of specific features. In the following example, we
construct a target that is generally positively correlated with the
first feature, with some noise. Applying monotoinc constraints allows
the prediction to capture the global effect of the first feature,
instead of fitting the noise.

import numpy as np
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.inspection import plot_partial_dependence
from sklearn.ensemble import HistGradientBoostingRegressor

n_samples = 500
rng = np.random.RandomState(0)
X = rng.randn(n_samples, 2)
noise = rng.normal(loc=0.0, scale=0.01, size=n_samples)
y = 5 * X[:, 0] + np.sin(10 * np.pi * X[:, 0]) - noise

gbdt_no_cst = HistGradientBoostingRegressor().fit(X, y)
gbdt_cst = HistGradientBoostingRegressor(monotonic_cst=[1, 0]).fit(X, y)

disp = plot_partial_dependence(
    gbdt_no_cst,
    X,
    features=[0],
    feature_names=["feature 0"],
    line_kw={"linewidth": 4, "label": "unconstrained", "color": "tab:blue"},
)
plot_partial_dependence(
    gbdt_cst,
    X,
    features=[0],
    line_kw={"linewidth": 4, "label": "constrained", "color": "tab:orange"},
    ax=disp.axes_,
)
disp.axes_[0, 0].plot(
    X[:, 0], y, "o", alpha=0.5, zorder=-1, label="samples", color="tab:green"
)
disp.axes_[0, 0].set_ylim(-3, 3)
disp.axes_[0, 0].set_xlim(-1, 1)
plt.legend()
plt.show()

在这里插入图片描述
Release Highlights for scikit-learn 0.23

「已注销」

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python学习---scikit-learn 0.23公布的亮点

Generalized Linear Models, and Poisson loss for gradient boosting¶Long-awaited Generalized Linear Models with non-normal loss functionsare now available. In particular, three new regressors wereimplemented: PoissonRegressor, GammaRegressor, and Tweedie.
复制链接

扫一扫