机器学习实战(二)线性回归模型

一.线性回归(LinearRegression)

这个类是传统最小二乘回归的类.是最基础的线性回归的类.
理论知识可以参照之前的博客:机器学习笔记二:线性回归与最小二乘法

class sklearn.linear_model.LinearRegression(fit_intercept=True, normalize=False, copy_X=True, n_jobs=1)

参数:
fit_intercept : 布尔型,可选.是否计算模型的截距.要是设置为False的话,就不会计算截距了.(表明数据已经中心化了.)
normalize : 布尔型,可选,默认是False.如果是True的话,X就会在回归之前标准化.当fit_intercept被设置为False后,这个参数会被忽略.
copy_X : 布尔型,可选,默认是True.表示X会被拷贝.否则的话,X可能被重写改变.
n_jobs : int类型,可选,默认是1. 表示计算的时候使用的多个线程.如果设置为-1的话,那么所有CPU都会被使用到.

属性

:

coef_ : array类型, 形状可以是 (n_features, )或者(n_targets, n_features) (至于原因可以看理论笔记). 这个表示的是线性模型的系数
residues_ : array, shape (n_targets,) or (1,) or empty
Sum of residuals. Squared Euclidean 2-norm for each target passed during the fit. If the linear regression problem is under-determined (the number of linearly independent rows of the training matrix is less than its number of linearly independent columns), this is an empty array. If the target vector passed during the fit is 1-dimensional, this is a (1,) shape array.
New in version 0.18.
intercept_ : array类型,表示截距.

函数:

fit(X, y, sample_weight=None)

拟合线性模型.这个函数在以后的很多其他的机器学习方法类中都会有.
参数:
X : numpy array类型或者系数矩阵类型,形状为[n_samples,n_features] 表述训练数据集.
y : numpy array类型,形状为[n_samples, n_targets],标签值.
sample_weight : numpy array类型,形状为[n_samples]每个样本的权重.

get_params(deep=True)

Get parameters for this estimator.
Parameters:
deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:
params : mapping of string to any
Parameter names mapped to their values.

predict(X)

使用训练好的线性模型去预测.返回的是形状为(n_samples,)的array,表示预测值.
参数:
X : {array-like, sparse matrix}, 形状为 (n_samples, n_features),表示测试集合.

score(X, y, sample_weight=None)

Returns the coefficient of determination R^2 of the prediction.
The coefficient R^2 is defined as (1 - u/v), where u is the regression sum of squares ((y_true - y_pred) * 2).sum() and v is the residual sum of squares ((y_true - y_true.mean()) * 2).sum(). Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.
Parameters:
X : array-like, shape = (n_samples, n_features)
Test samples.
y : array-like, shape = (n_samples) or (n_samples, n_outputs)
True values for X.
sample_weight : array-like, shape = [n_samples], optional
Sample weights.
Returns:
score : float
R^2 of self.predict(X) wrt. y.

set_params(**params)

Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it’s possible to update each component of a nested object.
Returns: self :

二.岭回归(Ridge)

class sklearn.linear_model.Ridge(alpha=1.0,fit_intercept=True, normalize=False, copy_X=True, max_iter=None, tol=0.001, solver=’auto’, random_state=None)

岭回归是以损失函数为线性最小二乘函数,同时带L2正则的线性回归形式.

参数:
alpha : {float, array-like}, 形状为 (n_targets).这个是正则项的参数,表示调节的强度.必须是正的浮点型. 一般来说,越大的值,表示越强有力的调节强度.
copy_X : 布尔型,可选,默认是True.表示X会被拷贝.否则的话,X可能被重写改变.
fit_intercept : 布尔型,可选.是否计算模型的截距.要是设置为False的话,就不会计算截距了.(表明数据已经中心化了.)
max_iter : 整形,可选.表示共轭梯度求解器(conjugate gradient solver)最大的迭代次数. 对于 ‘sparse_cg’ 和‘lsqr’ 来说,默认值为scipy.sparse.linalg中的默认值.对于‘sag’来说,默认值是1000
normalize : 布尔型,可选,默认是False.如果是True的话,X就会在回归之前标准化.当fit_intercept被设置为False后,这个参数会被忽略.
solver : {‘auto’, ‘svd’, ‘cholesky’, ‘lsqr’, ‘sparse_cg’, ‘sag’} 计算方式.

‘auto’ 根据数据的类型自动选择s
‘svd’ uses a Singular Value Decomposition of X to compute the Ridge coefficients. More stable for singular matrices than ‘cholesky’.
‘cholesky’ uses the standard scipy.linalg.solve function to obtain a closed-form solution.
‘sparse_cg’ uses the conjugate gradient solver as found in scipy.sparse.linalg.cg. As an iterative algorithm, this solver is more appropriate than ‘cholesky’ for large-scale data (possibility to set tol and max_iter).
‘lsqr’ uses the dedicated regularized least-squares routine scipy.sparse.linalg.lsqr. It is the fastest but may not be available in old scipy versions. It also uses an iterative procedure.
‘sag’ uses a Stochastic Average Gradient descent. It also uses an iterative procedure, and is often faster than other solvers when both n_samples and n_features are large. Note that ‘sag’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.
All last four solvers support both dense and sparse data. However, only ‘sag’ supports sparse input when fit_intercept is True.
New in version 0.17: Stochastic Average Gradient descent solver.

tol : 浮点型,表示结果的精度.
random_state : int seed, RandomState instance, or None (default)
The seed of the pseudo random number generator to use when shuffling the data. Used only in ‘sag’ solver.
New in version 0.17: random_state to support Stochastic Average Gradient.

属性:

coef_ : array类型, 形状可以是 (n_features, )或者(n_targets, n_features) (至于原因可以看理论笔记). 这个表示的是线性模型的系数
intercept_ : array类型,表示截距.
n_iter_ : 表示每个target实际上迭代的次数.仅仅对sag和lsqr有用.其他的会返回None.

函数

fit(X, y, sample_weight=None)

拟合岭回归模型.
参数:
X : numpy array类型或者系数矩阵类型,形状为[n_samples,n_features] 表述训练数据集.
y : numpy array类型,形状为[n_samples, n_targets],标签值.
sample_weight : numpy array类型,形状为[n_samples]每个样本的权重.

get_params(deep=True)[source]
Get parameters for this estimator.
Parameters:
deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:
params : mapping of string to any
Parameter names mapped to their values.
predict(X)[source]
Predict using the linear model
Parameters:
X : {array-like, sparse matrix}, shape = (n_samples, n_features)
Samples.
Returns:
C : array, shape = (n_samples,)
Returns predicted values.
score(X, y, sample_weight=None)[source]
Returns the coefficient of determination R^2 of the prediction.
The coefficient R^2 is defined as (1 - u/v), where u is the regression sum of squares ((y_true - y_pred) * 2).sum() and v is the residual sum of squares ((y_true - y_true.mean()) * 2).sum(). Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.
Parameters:
X : array-like, shape = (n_samples, n_features)
Test samples.
y : array-like, shape = (n_samples) or (n_samples, n_outputs)
True values for X.
sample_weight : array-like, shape = [n_samples], optional
Sample weights.
Returns:
score : float
R^2 of self.predict(X) wrt. y.
set_params(**params)[source]
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form __ so that it’s possible to update each component of a nested object.
Returns: self :

例子:

from sklearn.linear_model import Ridge
import numpy as np
n_samples, n_features = 10, 5
np.random.seed(0)
y = np.random.randn(n_samples)
X = np.random.randn(n_samples, n_features)
clf = Ridge(alpha=1.0)
clf.fit(X, y) 
  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
机器学习中,线性回归是一种常见的实战方法。线性回归的目标是通过拟合一个线性模型来预测一个连续的目标变量。在实际应用中,线性回归可以用于预测房价、销售量等连续变量。 线性回归的基本定义是通过最小化预测值与真实值之间的平方误差来拟合一个线性模型。这可以通过梯度下降算法来实现。梯度下降算法是一种迭代优化算法,通过不断调整模型参数来最小化损失函数。 在线性回归中,我们可以使用最小乘法来计算模型参数。最小乘法通过求解正规方程来得到模型参数的闭式解。然而,当矩阵为非满秩矩阵时,无法求逆,这时可以采用岭回归来解决这个问题。岭回归通过在矩阵的转置乘以矩阵上加上一个正则化项来使矩阵非奇异,从而能够求逆。 另一种方法是使用梯度下降算法来求解线性回归模型的参数。梯度下降算法通过不断迭代调整模型参数来最小化损失函数。在每一次迭代中,算法根据损失函数的梯度方向更新模型参数。通过不断迭代,梯度下降算法可以逐渐接近最优解。 在实际应用中,线性回归可以通过使用不同的特征工程方法来提高模型的性能。特征工程包括选择合适的特征、处理缺失值、进行特征缩放等。此外,线性回归还可以通过引入正则化项来防止过拟合问题。 总结起来,机器学习中的线性回归是一种常见的实战方法,可以通过最小化预测值与真实值之间的平方误差来拟合一个线性模型。可以使用最小乘法或梯度下降算法来求解模型参数。在实际应用中,还可以通过特征工程和正则化来提高模型性能。 #### 引用[.reference_title] - *1* [机器学习实战(一)—— 线性回归](https://blog.csdn.net/qq_44715621/article/details/110449232)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insertT0,239^v3^insert_chatgpt"}} ] [.reference_item] - *2* [机器学习实战----线性回归](https://blog.csdn.net/zhangyingjie09/article/details/83018072)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insertT0,239^v3^insert_chatgpt"}} ] [.reference_item] - *3* [机器学习实战之线性回归](https://blog.csdn.net/luoluopan/article/details/88052806)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insertT0,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值