基本需求
- Python (>=2.6 或 >=3.3)
- Numpy (>=1.6.1)
pip3 install numpy
- Scipy (>=0.9)
pip3 install scipy
安装scikit-learn
pip3 install -U scikit-learn
Sklearn流程图
通用学习模式
调用iris数据库学习
代码如下
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
iris = datasets.load_iris()
iris_X = iris.data
iris_y = iris.target
#print(iris_X[:2,:])
#print(iris_y)
X_train,X_test,y_train,y_test=train_test_split(iris_X,iris_y,test_size=0.3)
#print(y_train)
knn=KNeighborsClassifier()
knn.fit(X_train,y_train)
print(knn.predict(X_test))
print(y_test)
Datasets自带数据库
经典数据
##make_regression举例
from sklearn import datasets
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
X,y = datasets.make_regression(n_samples=1000,n_features=1,n_targets=1,noise=1)
plt.scatter(X,y)
plt.show()
图片显示
model的属性和功能
model.fit() 进行模型训练
model.predict() 通过训练进行预测
model.coef_得到相关变量,自变量的系数值
model.intercept_ 独立变量
model.get_params() 得到当前预测准确度
标准化数据
首先要导入preprocessing
from sklearn import preprocessing
import numpy as np
a=np.array([[10,2.7,3.6],
[-100,5,-2],
[120,20,40]],dtype=np.float64)
print(a)
print(preprocessing.scale(a))
结果显示
[[ 10. 2.7 3.6]
[-100. 5. -2. ]
[ 120. 20. 40. ]]
[[ 0. -0.85170713 -0.55138018]
[-1.22474487 -0.55187146 -0.852133 ]
[ 1.22474487 1.40357859 1.40351318]]
可以看出normalization之后取值范围近似相等
并且在标准化数据范围后,可以获得更加精确的预测结果
验证:
from sklearn import preprocessing
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets.samples_generator import make_classification
from sklearn.svm import SVC
X,y=make_classification(n_samples=300,n_features=2,n_redundant=0,n_informative=2,random_state=22,n_clusters_per_class=1,scale=100)
# plt.scatter(X[:,0],X[:,1],c=y)
# plt.show()
# X=preprocessing.scale(X)
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=.3)
clf = SVC()
clf.fit(X_train,y_train)
print(clf.score(X_test,y_test))
在标准化之前,预测分数为
0.4666666666666667
标准化之后,预测分数为:
0.9444444444444444
可见标准化之后会使我们的机器学习算法更加准确
之后会更新sklearn交叉验证的学习