Scikit-learn算法总结

本文详细总结了Scikit-learn中的线性模型、K近邻、数据降维和集成学习。线性模型包括LinearRegression和LogisticRegression,K近邻介绍了KNeighborsClassifier和KNeighborsRegressor,数据降维涉及PCA、IncrementalPCA和KernelPCA,集成学习讲解了AdaBoostClassifier。
摘要由CSDN通过智能技术生成

Scikit-learn算法总结

1 线性模型

  主要在sklearn.linear_model包下

普通线性回归模型:

  • LinearRegression : f(x) = w · x + b

    原型:class sklearn.linear_model.LinearRegression(fit_intercept=True, normalize=False, copy_X=True, n_jobs=1)

from sklearn.linear_model import LinearRegression
from sklearn import datasets, model_selection

# 导入数据回归,数据为sklearn自带的糖尿病人采集数据
def load_data_regression():
    '''
    :function:导入数据:数据为sklearn自带的糖尿病人采集数据
    :return: model_selection.train_test_split函数可以将数据集分为训练集和测试集;
             返回一个元组    X_train, X_test, y_train, y_test
    '''
    diabetes = datasets.load_diabetes()
    diabetes_X = diabetes.data
    diabetes_y = diabetes.target
    print(diabetes_X)
    print(diabetes_y)
    return model_selection.train_test_split(diabetes_X, diabetes_y, test_size=0.25, random_state=0)

#普通线性回归预测
def test_LinearRegression(X_train, X_test, y_train, y_test):
    regr = LinearRegression()
    regr.fit(X_train, y_train)
    print('Score:%.2f' %regr.score(X_test, y_test))

X_train, X_test, y_train, y_test = load_data_regression()
test_LinearRegression(X_train, X_test, y_train, y_test)
普通线性回归的预测值很低 Score:0.36

逻辑回归模型:

  • LogisticRegression : f(x) = w · x + b

    原型:class sklearn.linear_model.LogisticRegression(penalty=’l2’, dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1,class_weight=None, solver=’liblinear’, max_iter=100, multi_class=’ovr’, verbose=0, n_jobs=1)

    multi_class:一个字符串,指定对于多分类问题的策略,

     - ‘ovr’:采用one-vs-rest策略
     - ‘multinomial’:直接采用多分类逻辑回归策略
    

    C:C是正则化项系数的倒数,它越小则正则化项的权重越大

from sklearn.linear_model import LogisticRegression
from sklearn import datasets, model_selection

# 导入分类数据,数据为sklearn自带的手写识别数据集digit dataset
def load_data_classification():
    '''
    :function:导入数据:数据为sklearn自带的手写识别数据集digit dataset
    :return: model_selection.train_test_split函数可以将数据集分为训练集和测试集;
             返回一个元组    X_train, X_test, y_train, y_test
    '''
    digits = datasets.load_digits()
    digits_X = digits.data
    digits_y = digits.target
    return model_selection.train_test_split(digits_X, digits_y, test_size=0.25, random_state=0)

#逻辑回归
def test_LogisticRegression(X_train, X_test, y_train, y_test):
    regr1 = LogisticRegression(multi_class='multinomial', solver='lbfgs')
    regr1.fit(X_train, y_train)
    print('Score:%.2f' %regr1.score(X_test, y_test))

X_train, X_test, y_train, y_test = load_data_c
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值