Scikit-learn算法总结
1 线性模型
主要在sklearn.linear_model包下
普通线性回归模型:
LinearRegression : f(x) = w · x + b
原型:class sklearn.linear_model.LinearRegression(fit_intercept=True, normalize=False, copy_X=True, n_jobs=1)
from sklearn.linear_model import LinearRegression
from sklearn import datasets, model_selection
# 导入数据回归,数据为sklearn自带的糖尿病人采集数据
def load_data_regression():
'''
:function:导入数据:数据为sklearn自带的糖尿病人采集数据
:return: model_selection.train_test_split函数可以将数据集分为训练集和测试集;
返回一个元组 X_train, X_test, y_train, y_test
'''
diabetes = datasets.load_diabetes()
diabetes_X = diabetes.data
diabetes_y = diabetes.target
print(diabetes_X)
print(diabetes_y)
return model_selection.train_test_split(diabetes_X, diabetes_y, test_size=0.25, random_state=0)
#普通线性回归预测
def test_LinearRegression(X_train, X_test, y_train, y_test):
regr = LinearRegression()
regr.fit(X_train, y_train)
print('Score:%.2f' %regr.score(X_test, y_test))
X_train, X_test, y_train, y_test = load_data_regression()
test_LinearRegression(X_train, X_test, y_train, y_test)
普通线性回归的预测值很低 Score:0.36
逻辑回归模型:
LogisticRegression : f(x) = w · x + b
原型:class sklearn.linear_model.LogisticRegression(penalty=’l2’, dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1,class_weight=None, solver=’liblinear’, max_iter=100, multi_class=’ovr’, verbose=0, n_jobs=1)
multi_class:一个字符串,指定对于多分类问题的策略,
- ‘ovr’:采用one-vs-rest策略 - ‘multinomial’:直接采用多分类逻辑回归策略
C:C是正则化项系数的倒数,它越小则正则化项的权重越大
from sklearn.linear_model import LogisticRegression
from sklearn import datasets, model_selection
# 导入分类数据,数据为sklearn自带的手写识别数据集digit dataset
def load_data_classification():
'''
:function:导入数据:数据为sklearn自带的手写识别数据集digit dataset
:return: model_selection.train_test_split函数可以将数据集分为训练集和测试集;
返回一个元组 X_train, X_test, y_train, y_test
'''
digits = datasets.load_digits()
digits_X = digits.data
digits_y = digits.target
return model_selection.train_test_split(digits_X, digits_y, test_size=0.25, random_state=0)
#逻辑回归
def test_LogisticRegression(X_train, X_test, y_train, y_test):
regr1 = LogisticRegression(multi_class='multinomial', solver='lbfgs')
regr1.fit(X_train, y_train)
print('Score:%.2f' %regr1.score(X_test, y_test))
X_train, X_test, y_train, y_test = load_data_c