pandas和numpy_ML的三剑客：NumPy，Pandas和Sklearn入门。

最新推荐文章于 2024-03-26 14:51:50 发布

weixin_26726011

最新推荐文章于 2024-03-26 14:51:50 发布

阅读量1k

点赞数

文章标签： python numpy 机器学习人工智能

原文链接：https://medium.com/godatascience/introduction-to-numpy-pandas-and-sklearn-the-three-musketeers-of-ml-33f5141d7e6a

版权

本文介绍了机器学习领域中重要的三个库——NumPy、Pandas和Scikit-learn（Sklearn）的基础知识，它们是Python数据分析和机器学习的核心工具。NumPy提供高效的多维数组操作，Pandas则为数据清洗和预处理提供了强大的数据框，而Sklearn是用于构建和评估机器学习模型的重要库。

摘要由CSDN通过智能技术生成

pandas和numpy

谁是目标受众？ (Who are the targeted audience?)

This tutorial has been prepared for those who want to learn about the basics of NumPy, Pandas and Sklearn. It is specifically useful for algorithm developers and anyone who is curious about Machine Learning and wants to have in depth knowledge about ML or just needs to brush up a few concepts. After completing this tutorial, you will find yourself at a basic level of expertise from where you can take yourself to higher levels of expertise.

本教程是为那些想了解NumPy，Pandas和Sklearn基础知识的人准备的。对于算法开发人员和对机器学习感到好奇并希望深入了解ML或只需要梳理一些概念的人来说，它特别有用。完成本教程后，您将发现自己具有基本的专业知识水平，从那里您可以进入更高的专业知识水平。

为什么本教程会有用？ (Why would this tutorial prove useful ?)

Since libraries are an integral part of Data preprocessing understanding these libraries is of utmost importance. Knowing the functions these libraries can provide can make your coding tasks a lot simpler and help you save your precious time and energy.

由于库是数据预处理不可或缺的一部分，因此理解这些库至关重要。了解这些库可以提供的功能可以使您的编码任务简单得多，并帮助您节省宝贵的时间和精力。

To explore any path, we need to brush-up some skills that lay foundation and help us ease our journey to reach our ultimate destination.

要探索任何道路，我们需要复习一些基础知识，并帮助我们简化到达最终目的地的旅程。

In depth knowledge of Python Libraries helps us lay this strong foundation in mastering Machine Learning which proves essential in the long run.

深入的Python库知识可帮助我们奠定扎实的基础，以精通机器学习，从长远来看，这是必不可少的。

Numpy, Pandas, Scikit-learn are some of these important libraries which can make machine learning a whole lot easier and time saving. They are the pillars on which a strong model can be designed.

Numpy，Pandas，Scikit-learn是其中的一些重要库，这些库可以使机器学习变得更加轻松且省时。它们是可以设计强大模型的Struts。

什么是python库？ (What are python Libraries?)

A Python library is a reusable chunk of code that you may want to include in your programs/ projects. Each library in Python contains a huge number of useful modules that you can import for your everyday programming.

Python库是您可能想包含在程序/项目中的可重用代码块。 Python中的每个库都包含大量有用的模块，您可以将它们导入以进行日常编程。

With technology reaching astonishing heights, Data Science, Artificial Intelligence, Machine Learning are some frequently used buzzwords we get to hear. They have completely transformed the way of living. This technology has proved to be a wonder in itself. So what’s all the fuss?

随着技术达到惊人的高度，数据科学，人工智能，机器学习是我们经常听到的一些流行词。他们彻底改变了生活方式。事实证明，这项技术本身就是一个奇迹。那有什么大惊小怪的？

什么是机器学习？ (What is Machine Learning?)

Here’s all you need to know about beginning your journey to excel in machine learning.

这是您开始迈向机器学习卓越之旅的全部知识。

Machine Learning (ML) is an application of Artificial Intelligence (AI) that provides the system with the ability to learn and improve from experience without the need for explicit programming. Thus, the formal definition of ML is

机器学习(ML)是人工智能(AI)的一种应用，它使系统无需进行显式编程即可从经验中学习和改进。因此，ML的正式定义是

A computer program is said to learn from experience ‘E’ concerning some task ‘T’ and some performance measure ‘P’ which improves with experience(E)

据说一个计算机程序可以从经验“ E”中学习有关某些任务“ T”和一些性能指标“ P”的经验，这些经验指标会随着经验的提高而提高(E)

Okay, so now that we are clear with what Machine learning is, let us understand why we should invest time in mastering it.

好的，现在我们清楚了什么是机器学习，让我们理解为什么我们应该花时间来掌握它。

Goals of studying Machine Learning:

学习机器学习的目标：

To make the computer smarter/more intelligent. The more direct objective in this aspect is to develop a system for specific practical learning tasks in the application domain.
使计算机更智能/更智能。在这方面更直接的目标是开发一种用于应用领域中特定实践学习任务的系统。
To develop computational models of the human learning process and perform computer simulations.
开发人类学习过程的计算模型并执行计算机仿真。
To explore new learning methods and develop general learning algorithms independent of applications.
探索新的学习方法并开发独立于应用程序的通用学习算法。

Now, let us dive right in ML and start our journey to master it. Let us first get acquainted with some python libraries required for Machine Learning.

现在，让我们直接学习ML并开始掌握它的旅程。首先让我们熟悉机器学习所需的一些python库。

(Note : The scope of python libraries is very vast to cover up, thus only the basic requirement is fulfilled in this article which can get you going with ease)

(注意：python库的范围很广，可以掩盖，因此本文仅满足基本要求，可以使您轻松进行)

If you are thinking about a career in Machine Learning or Data science, the very first thing you will need to do is study some libraries.

如果您正在考虑机器学习或数据科学的职业，那么您要做的第一件事就是学习一些图书馆。

为什么图书馆在机器学习中很重要？ (Why are Libraries important in Machine Learning?)

Machine Learning is largely based upon mathematics. Designing a ML model involves complex mathematical calculations. Python libraries enable us to do these calculations effortlessly without writing numerous lines of code.

机器学习主要基于数学。设计ML模型涉及复杂的数学计算。 Python库使我们能够轻松进行这些计算，而无需编写大量代码。

NumPy库的基础研究： (Basic Study of NumPy Library:)

NumPy forms the foundation for the machine learning stack. NumPy (Numerical Python) is a python package, consisting of multi-dimensional array objects and a collection of routines for processing these array objects.

NumPy构成了机器学习堆栈的基础。 NumPy(数字Python)是一个python程序包，由多维数组对象和用于处理这些数组对象的例程的集合组成。

In this article, we will cover frequently used NumPy operations used in ML

在本文中，我们将介绍ML中常用的NumPy操作

Firstly, we need to import the NumPy library using the following code:

首先，我们需要使用以下代码导入NumPy库：

import numpy as np

Once we import the NumPy library we can use various routines that come with the library to perform array operations with ease. These include

导入NumPy库后，我们可以使用该库附带的各种例程轻松执行数组操作。这些包括

Creating a Vector:
创建向量 ：

1-D array is known as a vector. Vector can be created using NumPy as follows:

一维数组称为向量。可以使用NumPy创建向量，如下所示：

#Load Libraryimport numpy as np#Create a vector as a Rowvector_row = np.array([11,21,31])#Create vector as a Columnvector_column = np.array([[15],[25],[35]])

2. Creating a Numpy Array: A 2-D array is known as Matrix. It can be created using NumPy as follows:

2.创建一个Numpy数组 ：2-D数组称为Matrix。可以使用NumPy如下创建它：

#Load Libraryimport numpy as np#Create a Matrixmatrix = np.array([[1,2,3],[41,52,63]])print(matrix)

3. Selecting Elements: Selection of one or more elements from the matrix can be done using the NumPy library as follows:

3. 选择元素：可以使用NumPy库从矩阵中选择一个或多个元素，如下所示：

#Load Libraryimport numpy as np#Create a vector as a Rowvector_row = np.array([ 1,2,3,4,5,6 ])#Create a Matrixmatrix = np.array([[1,2,3],[4,5,6],[7,8,9]])print(matrix)#Select 3rd element of Vectorprint(vector_row[2])#Select 2nd row 2nd columnprint(matrix[1,1])#Select all elements of a vectorprint(vector_row[:])#Select everything up to and including the 3rd elementprint(vector_row[:3])#Select the everything after the 3rd elementprint(vector_row[3:])#Select the last elementprint(vector[-1])#Select the first 2 rows and all the columns of the matrixprint(matrix[:2,:])#Select all rows and the 2nd column of the matrixprint(matrix[:,1:2])

Basic Study of Pandas Library:

熊猫图书馆基础研究：

Pandas which stands for ‘Panel Data’ has so many uses that it might be a time-saver to point out the things it cannot do, instead of what it can! As humans, we have some basic needs similarly, Pandas is the basic need for your data. Pandas help in analyzing, cleaning, and transforming your data.

代表“面板数据”的熊猫有很多用途，以至于指出它不能做的事情而不是它可以做的事很节省时间！作为人类，我们同样有一些基本需求，Pandas是您数据的基本需求。熊猫有助于分析，清理和转换数据。

We will now look at some essential bits of information regarding Pandas and its use.

现在，我们将讨论有关熊猫及其使用的一些重要信息。

To import Pandas we usually import it with a shorter name (np) since it is easy to use and used widely.

要导入熊猫，我们通常以短名称(np)导入它，因为它易于使用且用途广泛。

import pandas as pd

The primary two components of pandas are Series and DataFrame.

熊猫的主要两个组成部分是Series和DataFrame。

A series is essentially a column, and a Data Frame is a multi-dimensional table made up of a collection of Series.

Series本质上是一列，而Data Frame是由Series集合组成的多维表 。

There are many ways to create a Data frame, the simplest method is to create using a dictionary and then pass it to the DataFrame constructor.

创建数据框架的方法有很多，最简单的方法是使用字典进行创建，然后将其传递给DataFrame构造函数。

1.创建一个数据框并找到值： (1. Creating a Data frame and locating values:)

a. Create a dictionary

一个。 创建字典

data = {‘Pears’: [3, 7, 0, 11],‘oranges’: [0, 9, 5, 2]}

b. Pass it to DataFrame constructor

b。 将其传递给DataFrame构造函数

orders = pd.DataFrame(data)

A dictionary in Python is a pair of keys and values.

Python中的字典是一对键和值。

Let’s add corresponding keys to the values.

让我们向值添加相应的键。

orders= pd.DataFrame(data, index=[‘Jonas’, ‘Dan’, ‘Serena’, ‘Emily’])

c. Locate Values

C。 定位值

purchases.loc[‘Serena’]

2. Reading Values from a CSV file :

2.从CSV文件读取值：

With CSV files all you need is a single line to load in the data:

使用CSV文件，您只需要一行即可加载数据：

df = pd.read_csv(‘Address where your csv is stored’)

Basic Study of Scikit-Learn Library:

Scikit-Learn图书馆基础研究：

If you are looking for a robust library using which you can use to bring your machine learning models into production, Scikit-learn is always a preferred option.

如果您正在寻找一个健壮的库，可用于将机器学习模型投入生产，那么Scikit-learn始终是首选。

Scikit-learn supports different operations that are performed by machine learning models like classification, regression, clustering, model selection, etc.

Scikit-learn支持由机器学习模型执行的不同操作，例如分类，回归，聚类，模型选择等。

You name it — and scikit-learn has a module for that.

您将其命名-scikit-learn为此提供了一个模块。

This is the basic prerequisite to get you started with some basic ML models. The more we dive deeper the more libraries you’ll explore.

这是入门一些基本ML模型的基本前提。我们越深入，您将探索的图书馆越多。

(Image Source: Internet)

(图片来源：互联网)