熊猫数据集_如何从大熊猫的现有系列和数据框中提取数据

最新推荐文章于 2023-12-14 19:55:00 发布

weixin_26705651

最新推荐文章于 2023-12-14 19:55:00 发布

阅读量165

点赞数

文章标签： python 大数据人工智能机器学习 java

原文链接：https://towardsdatascience.com/how-to-extract-data-from-existing-series-and-dataframe-in-pandas-8d6882814ab7

版权

熊猫数据集

One key operation in preparing the datasets involves extracting information from existing data. Two vital functions in Pandas are particularly designed to fulfill this job — map() and apply(). I’ve noticed that many existing tutorials have focused on introducing how to use these functions without setting up a proper context, such that many beginners are still confused when to use which in their own project. In this article, I want to address this problem by setting up proper use scenarios for these functions, in the hope of making it easier for you to directly translate these use cases to your own specific business needs.

准备数据集的一项关键操作涉及从现有数据中提取信息。为了完成这项工作，Pandas专门设计了两个重要功能-map map()和apply() 。我注意到，许多现有的教程都侧重于介绍如何在不设置适当上下文的情况下使用这些功能，从而使许多初学者仍然困惑在何时在自己的项目中使用它们。在本文中，我想通过为这些功能设置正确的使用场景来解决此问题，以期使您能够更轻松地将这些用例直接转换为自己的特定业务需求。

For the purpose of various demonstrations, we’ll use the following Series and DataFrame objects where applicable.

为了进行各种演示，我们将在适用时使用以下Series和DataFrame对象。

Sample Data

样本数据

方案1.从现有系列创建系列 (Scenario 1. Create a Series from an existing Series)

Suppose that you’re starting with a Series object and what you want is to create another Series with individual values that are based on the corresponding values. It’s best to simply use the map() function on the Series object. There are multiple ways to use the map() function, and they’re shown in the following code snippet.

假设您从一个Series对象开始，并且想要创建另一个具有基于相应值的各个值的Series。最好只对Series对象使用map()函数。使用map()函数有多种方法，下面的代码段中显示了它们。

Mapping of Existing Series

现有系列的映射

The above three usages produce a Series object of the same values.
上面的三种用法产生具有相同值的Series对象。
There is an additional argument that you can play with — na_action, which specifies how the mapping deals with the NaN value of the exiting Series. You can either ‘ignore’, which will create a NaN value in the new Series or None (the default option), which will pass the NaN value to the mapping dictionary or function. You can learn more here.
您可以使用另一个参数na_action ，它指定映射如何处理现有Series的NaN值。您可以'ignore' (将在新系列中创建一个NaN值)或“ None (默认选项)，这会将NaN值传递给映射字典或函数。您可以在此处了解更多信息。
Because rows and columns of a DataFrame object are Series objects, this use case can be used to create a new column of a DataFrame object from an exiting column (it can work for rows too). The following code shows you such usages — a new row and a new column by mapping.
由于DataFrame对象的行和列是Series对象，因此该用例可用于从现有列创建DataFrame对象的新列(它也可以用于行)。以下代码通过映射向您展示了这种用法-新行和新列。

DataFrame Map New Column

DataFrame映射新列

方案2.从一个DataFrame中的多个系列创建一个系列 (Scenario 2. Create a Series from multiple Series in a DataFrame)

It’s sometimes necessary that you need to extract data from multiple rows or columns. For the simplicity of this tutorial, let’s suppose that we need to create a column from other columns. In this case, we can use the apply() function on the DataFrame object. Just as the map() function, you can set a lambda function or a regular function to the apply() function.

有时有必要从多个行或多个列中提取数据。为了简化本教程，我们假设需要从其他列创建一个列。在这种情况下，我们可以在DataFrame对象上使用apply()函数。就像map()函数一样，您可以将lambda函数或常规函数设置为apply()函数。

DataFrame Create Column

DataFrame创建列

The argument axis is to set 1, which means that we’re working on the rows and want to create columns.
参数axis设置为1，这意味着我们正在处理行并希望创建列。
When axis=1, the default argument for the lambda function or regular function is the row Series, whose values can be accessed using keys (e.g., [‘col 0’]).
当axis = 1时，lambda函数或常规函数的默认参数为Series系列，可以使用键(例如['col 0'] )访问其值。
When you need to specify additional arguments, you can set the args argument, and these arguments can be sent to the mapping function besides the row data. In the example, we set 5 as the n for the mapping function’s extra argument.
当需要指定其他参数时，可以设置args参数，并且这些参数除了行数据外还可以发送到映射函数。在示例中，我们将5设置为映射函数的Extra参数的n 。

方案3.从一个现有系列创建多个系列 (Scenario 3. Create multiple Series from an existing Series)

Suppose that you need to create multiple Series from an existing Series. You can use the apply() method on the Series object. Please note that even though it has the same name with the apply() function as mentioned in the previous section, but this one is a method of a Series object while the previous one is a DataFrame’s method. They just happen to have the same name, given their similarity in functions. The following code shows you a trivial example of such usage.

假设您需要从一个现有系列创建多个系列。您可以在Series对象上使用apply()方法。请注意，即使它与上一节中提到的apply()函数具有相同的名称，但这也是Series对象的方法，而前一个是DataFrame的方法。考虑到它们的功能相似，它们恰好具有相同的名称。以下代码显示了这种用法的一个简单示例。

Create Multiple Series (i.e., DataFrame) From a Series

从系列创建多个系列(即，DataFrame)

As mentioned previously, because a DataFrame’s columns are Series objects, we can create multiple columns from a DataFrame column. The following code shows you how it works.

如前所述，由于DataFrame的列是Series对象，因此我们可以从DataFrame列中创建多个列。以下代码向您展示了它是如何工作的。

New Columns From a Column

列中的新列

As the previous code example, the apply() method on the Series object has to return a Series object (notice that the lambda function creates a Series object). Here, we’re creating a Series object of two values, because we’re creating two columns.
作为前面的代码示例，Series对象上的apply()方法必须返回Series对象(请注意，lambda函数创建了Series对象)。在这里，我们要创建两个值的Series对象，因为我们要创建两列。
You have to specify the columns that you’re creating on the left side of the equation, such that the newly created columns can know where to go.
您必须在方程式的左侧指定要创建的列，以便新创建的列可以知道要去哪里。

方案4.从多个系列(即DataFrame)创建多个系列 (Scenario 4. Create Multiple Series From Multiple Series (i.e., DataFrame))

In Pandas, a DataFrame object can be thought of having multiple series on both axes. Thus, the scenario described in the section’s title is essentially create new columns from existing columns or create new rows from existing rows. The best way to do it is to use the apply() method on the DataFrame object. For the sake of simplicity, let’s just consider creating new columns from existing columns in the example below and followed by some clarifications.

在熊猫中，可以认为DataFrame对象在两个轴上都有多个序列。因此，本节标题中描述的方案实质上是从现有列创建新列或从现有行创建新行。最好的方法是在DataFrame对象上使用apply()方法。为了简单起见，让我们考虑在下面的示例中从现有列创建新列，然后进行一些说明。

Multiple Columns From Multiple Columns

多列中的多列

The lambda function (you can use a regular defined function too) is creating a list of two items, because we’re creating two columns.
lambda函数(您也可以使用常规定义的函数)正在创建两个项目的列表，因为我们正在创建两列。
The x in the lambda function represents the row data, and we can access individual values using the column names.
lambda函数中的x代表行数据，我们可以使用列名访问各个值。
The important thing here is to specify the result_type argument to be ‘expand’, which will expand the result of the lambda function to form new columns. In other words, the lambda function here is creating a list of two items, which will be expanded to two columns.
这里重要的是将result_type参数指定为'expand' ，这将扩展lambda函数的结果以形成新列。换句话说，这里的lambda函数创建的是两个项目的列表，该列表将扩展为两列。

Notably, there are a few other options for the
值得注意的是，对于

result_type argument. For instance, the default option is None, which doesn’t expand the result and keep it as it is. Another option is ‘reduce’, which does the opposite job to the ‘expand’ option by just returning a Series object. To learn more about these different options, you can go to the documentation here.
result_type参数。例如，默认选项是None ，它不会扩展结果并保持原样。另一个选项是'reduce' ，通过返回一个Series对象，它与'expand'选项的作用相反。要了解有关这些不同选项的更多信息，请访问此处的文档。
One thing not covered in the example is that just as the result shown with the apply() method in Section 2, we can supply additional position and keyword arguments that are to be used in the mapping function.
该示例未涵盖的一件事是，就像第2节中apply()方法显示的结果一样，我们可以提供要在映射函数中使用的其他位置和关键字参数。

结论 (Conclusions)

In this article, we covered how to create new Series objects (e.g., rows and columns) from existing data in the form of Series and DataFrame objects. We identify the four most common use scenarios that you can encounter in your project, which will allow you to adapt the code here to your own needs.

在本文中，我们介绍了如何以Series和DataFrame对象的形式从现有数据中创建新的Series对象(例如，行和列)。我们确定了您在项目中可能遇到的四种最常见的使用场景，这将使您可以根据自己的需要调整此处的代码。

翻译自: https://towardsdatascience.com/how-to-extract-data-from-existing-series-and-dataframe-in-pandas-8d6882814ab7

熊猫数据集

weixin_26705651

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
熊猫数据集_如何从大熊猫的现有系列和数据框中提取数据

熊猫数据集One key operation in preparing the datasets involves extracting information from existing data. Two vital functions in Pandas are particularly designed to fulfill this job — map() and apply(). I’...
复制链接

扫一扫