python scikit_Python SciKit学习教程

最新推荐文章于 2024-08-03 10:44:06 发布

cunchi4221

最新推荐文章于 2024-08-03 10:44:06 发布

阅读量668

点赞数

文章标签： python 机器学习深度学习 java 人工智能

原文链接：https://www.journaldev.com/18341/python-scikit-learn-tutorial

版权

python scikit

Scikit学习 (Scikit Learn)

Scikit-learn is a machine learning library for Python. It features several regression, classification and clustering algorithms including SVMs, gradient boosting, k-means, random forests and DBSCAN. It is designed to work with Python Numpy and SciPy.

Scikit-learn是用于Python的机器学习库。它具有多种回归，分类和聚类算法，包括SVM，梯度提升，k均值，随机森林和DBSCAN。它旨在与Python Numpy和SciPy一起使用。

The scikit-learn project kicked off as a Google Summer of Code (also known as GSoC) project by David Cournapeau as scikits.learn. It gets its name from “Scikit”, a separate third-party extension to SciPy.

scikit-learn项目由David Cournapeau的scikits.learn作为Google的“代码之夏”项目（也称为GSoC）启动。它的名称来自“ Scikit”，这是SciPy的独立第三方扩展。

Python Scikit学习 (Python Scikit-learn)

Scikit is written in Python (most of it) and some of its core algorithms are written in Cython for even better performance.

Scikit是用Python编写的（大部分），其一些核心算法是用Cython编写的，以实现更好的性能。

Scikit-learn is used to build models and it is not recommended to use it for reading, manipulating and summarizing data as there are better frameworks available for the purpose.

Scikit-learn用于构建模型，不建议将其用于读取，操作和汇总数据，因为有更好的框架可用于此目的。

It is open source and released under BSD license.

它是开源的，并在BSD许可下发布。

安装Scikit Learn (Install Scikit Learn)

Scikit assumes you have a running Python 2.7 or above platform with NumPY (1.8.2 and above) and SciPY (0.13.3 and above) packages on your device. Once we have these packages installed we can proceed with the installation.

Scikit假定您的设备上具有正在运行的Python 2.7或更高版本的平台，并带有NumPY（1.8.2和更高版本）和SciPY（0.13.3和更高版本）软件包。一旦安装了这些软件包，就可以继续安装。

For pip installation, run the following command in the terminal:

对于pip安装，请在终端中运行以下命令：

pip install scikit-learn

If you like conda, you can also use the conda for package installation, run the following command:

如果您喜欢conda ，也可以使用conda进行软件包安装，请运行以下命令：

conda install scikit-learn

使用Scikit-Learn (Using Scikit-Learn)

Once you are done with the installation, you can use scikit-learn easily in your Python code by importing it as:

完成安装后，可以通过将scikit-learn导入为：

import sklearn

Scikit学习加载数据集 (Scikit Learn Loading Dataset)

Let’s start with loading a dataset to play with. Let’s load a simple dataset named Iris. It is a dataset of a flower, it contains 150 observations about different measurements of the flower. Let’s see how to load the dataset using scikit-learn.

让我们从加载要使用的数据集开始。让我们加载一个名为Iris的简单数据集。它是花朵的数据集，包含有关花朵不同测量值的150个观察值。让我们看看如何使用scikit-learn加载数据集。

# Import scikit learn
from sklearn import datasets
# Load data
iris= datasets.load_iris()
# Print shape of data to confirm data is loaded
print(iris.data.shape)

We are printing shape of data for ease, you can also print whole data if you wish so, running the codes gives an output like this:

为了方便起见，我们正在打印数据的形状，如果您愿意，您也可以打印整个数据，运行代码可以得到如下输出：

Scikit学习SVM –学习和预测 (Scikit Learn SVM – Learning and Predicting)

Now we have loaded data, let’s try learning from it and predict on new data. For this purpose we have to create an estimator and then call its fit method.

现在我们已经加载了数据，让我们尝试从中学习并预测新数据。为此，我们必须创建一个估计器，然后调用其fit方法。

from sklearn import svm
from sklearn import datasets
# Load dataset
iris = datasets.load_iris()
clf = svm.LinearSVC()
# learn from the data
clf.fit(iris.data, iris.target)
# predict for unseen data
clf.predict([[ 5.0,  3.6,  1.3,  0.25]])
# Parameters of model can be changed by using the attributes ending with an underscore
print(clf.coef_ )

Here is what we get when we run this script:

这是运行此脚本时得到的结果：

Scikit学习线性回归 (Scikit Learn Linear Regression)

Creating various models is rather simple using scikit-learn. Let’s start with a simple example of regression.

使用scikit-learn创建各种模型非常简单。让我们从一个简单的回归示例开始。

#import the model
from sklearn import linear_model
reg = linear_model.LinearRegression()
# use it to fit a data
reg.fit ([[0, 0], [1, 1], [2, 2]], [0, 1, 2])
# Let's look into the fitted data
print(reg.coef_)

Running the model should return a point that can be plotted on the same line:

运行模型应返回可以在同一条线上绘制的点：

k最近邻居分类器 (k-Nearest neighbour classifier)

Let’s try a simple classification algorithm. This classifier uses an algorithm based on ball trees to represent the training samples.

让我们尝试一个简单的分类算法。该分类器使用基于球树的算法来表示训练样本。

from sklearn import datasets
# Load dataset
iris = datasets.load_iris()
# Create and fit a nearest-neighbor classifier
from sklearn import neighbors
knn = neighbors.KNeighborsClassifier()
knn.fit(iris.data, iris.target)
# Predict and print the result
result=knn.predict([[0.1, 0.2, 0.3, 0.4]])
print(result)

Let’s run the classifier and check results, the classifier should return 0. Let’s try the example:

让我们运行分类器并检查结果，分类器应返回0。让我们尝试以下示例：

K均值聚类 (K-means clustering)

This is the simplest clustering algorithm. The set is divided into ‘k’ clusters and each observation is assigned to a cluster. This is done iteratively until the clusters converge.

这是最简单的聚类算法。该集合分为“ k”个聚类，每个观察值都分配给一个聚类。反复进行此操作，直到群集收敛为止。

We will create one such clustering model in the following program:

我们将在以下程序中创建一个这样的集群模型：

from sklearn import cluster, datasets
# load data
iris = datasets.load_iris()
# create clusters for k=3
k=3
k_means = cluster.KMeans(k)
# fit data
k_means.fit(iris.data)
# print results
print( k_means.labels_[::10])
print( iris.target[::10])

On running the program we’ll see separate clusters in the list. Here is the output for above code snippet:

在运行程序时，我们将在列表中看到单独的群集。以下是上述代码段的输出：

结论 (Conclusion)

In this tutorial, we have seen that Scikit-Learn makes it easy to work with several machine learning algorithms. We have seen examples of Regression, Classification and Clustering.

在本教程中，我们已经看到Scikit-Learn使使用多种机器学习算法变得容易。我们已经看到了回归，分类和聚类的例子。

Scikit-Learn is still in development phase and being developed and maintained by volunteers but is very popular in community. Go and try your own examples.

Scikit-Learn仍处于开发阶段，由志愿者开发和维护，但在社区中非常受欢迎。去尝试你自己的例子。