在官网网址上可以看到很多的demo,下边这张是一张非常有用的流程图,在这个流程图中,可以根据数据集的特征,选择合适的方法。
2.sklearn使用的小例子
importnumpy as npfrom sklearn importdatasetsfrom sklearn.cross_validation importtrain_test_splitfrom sklearn.neighbors importKNeighborsClassifier
iris=datasets.load_iris()
iris_X=iris.data
iris_y=iris.targetprint(iris_X[:2,:]) #输出数据的前2行,print(iris_y)
X_train,X_test,y_train,y_test=train_test_split(iris_X,iris_y,test_size=0.3) #把数据集分为训练集和测试集两个部分一部分是训练集,一部分是测试集,其中测试集占了30%
print(y_train)
knn=KNeighborsClassifier()
knn.fit(X_train,y_train)print(knn.predict(X_test))print(y_test)
3.sklearn数据集
sklearn中自带的数据集,以房屋数据集为例:
sklearn可以生成的数据集,回归模型中使用的数据集为例:
Parameters:
n_samples : int, optional (default=100):The number of samples.
n_features : int, optional (default=100):The number of features.
n_informative : int, optional (default=10):The number of informative features, i.e., the number of features used to build the linear model used to generate the output.
n_targets : int, optional (default=1):The number of regression targets, i.e., the dimension of the y output vector associated with a sample. By default, the output is a scalar.
bias : float, optional (default=0.0):The bias term in the underlying linear model.
effective_rank : int or None, optional (default=None)
if not None:The approximate number of singular vectors required to explain most of the input data by linear combinations. Using this kind of singular spectrum in the input allows the generator to reproduce the correlations often observed in practice.
if None:The input set is well conditioned, centered and gaussian with unit variance.
tail_strength : float between 0.0 and 1.0, optional (default=0.5):The relative importance of the fat noisy tail of the singular values profile if effective_rank is not None.
noise : float, optional (default=0.0):The st