熊猫分发_熊猫成语

吴雄辉

于 2020-10-09 03:15:40 发布

阅读量222

点赞数

原文链接：https://medium.com/python-in-plain-english/pandas-idioms-23bfc93451e7

版权

熊猫分发

Python programmers will often suggest that there many ways the language can be used to solve a particular problem. But that some are more appropriate than others. The best solutions are celebrated as Idiomatic Python and there are lots of great examples of this on stack overflow and websites.An idiomatic solution is often one which has both high performance and high readability. This isn’t necessarily true. A sort of sub-language within Python, Pandas has its own set of idioms. We’ve alluded to some of these already, such as using vectorization whenever possible, and not using iterative loops if you don’t need to. Several developers and users within the Panda’s community have used the term pandorable for these idioms. I think it’s a great term. So, I wanted to share with you a couple of key features of how you can make your code pandorable.The first of these is called method chaining. Now we saw that previously, you could chain pandas calls together when you’re querying DataFrames. For, instance if you wanted to select rows based on index like county name. Then you wanted to only project certain columns like the total population, you can write a query, like df.loc[“Washtenaw”][“Total Population”]

Python程序员经常会建议使用多种语言来解决特定问题。但是，有些比其他更合适。最好的解决方案被称为Idiomatic Python，在堆栈溢出和网站上有很多很好的例子。惯用的解决方案通常是兼具高性能和高可读性的解决方案。这不一定是正确的。 Pandas是Python中的一种子语言，有其自己的惯用语集。我们已经提到了其中一些，例如，尽可能使用向量化，如果不需要，则不使用迭代循环。熊猫社区中的一些开发人员和用户已将术语Pandor用于这些习语。我认为这是一个很棒的名词。因此，我想与大家分享一些使代码变得可担负的关键功能，其中第一个称为方法链接。 现在我们已经看到，在查询DataFrames时可以将熊猫调用链接在一起。例如，如果您想基于索引(如县名)选择行。然后，您只想投影某些列，例如总人口，就可以编写查询，例如df.loc [“ Washtenaw”] [“ Total Population”]

This is a form of chaining, called chain indexing. And it’s generally a bad practice. Because it’s possible that pandas could be returning a copy or a view of the DataFrame depending upon the underlying NumPy library. In his descriptions of idiomatic Pandas patterns developer Tom Osberger described a rule of thumb for this. If you see back to back square brackets, then you should think carefully if you want to be doing chain indexing. I think this is great as a sort of code smell or anti-pattern.

这是链接的一种形式，称为链索引。 这通常是一个坏习惯。因为熊猫可能会根据基础的NumPy库返回DataFrame的副本或视图。开发人员Tom Osberger在对惯用的熊猫模式的描述中描述了这一经验法则。如果看到背对背方括号，那么如果要进行链索引编制，则应仔细考虑。我认为这很不错，可以作为某种代码气味或反模式使用。

Method chaining though, little bit different. The general idea behind method chaining is that every method on an object returns a reference to that object. The beauty of this is that you can condense many different operations on a DataFrame, for instance, into one line or at least one statement of code. Here’s an example of two pieces of code in pandas using our census data.

方法链接虽然有点不同。方法链接背后的一般思想是，对象上的每个方法都返回对该对象的引用。 这样做的好处是您可以将一个DataFrame上的许多不同操作压缩为一行，或者至少包含一条代码语句。这是使用我们的人口普查数据在熊猫中编写两段代码的示例。

The first is the pandorable way to write the code with method chaining. In this code, there’s no in place flag being used and you can see that when we first run a where query, then a dropna, then a set_index, and then a rename. You might wonder why the whole statement is enclosed in parentheses and that’s just to make the statement more readable. In Python, if you begin with an open parentheses, you can span a statement over multiple lines and things read a little bit nicer.

第一种是使用方法链接编写代码的可恶方法。在这段代码中，没有使用到位标志，您可以看到，当我们第一次运行where查询，然后是dropna，然后是set_index，然后是重命名。您可能想知道为什么将整个语句括在括号中，只是为了使该语句更具可读性。在Python中，如果以开放的括号开头，则可以将一条语句跨越多行，并且事情会更好一些。

The second example is a more traditional way of writing code.

第二个示例是一种更传统的代码编写方式。

There’s nothing wrong with this code in the functional sense, you might even be able to understand it better as a new person to the language. It’s just not as pandorable as the first example. Now, the key with any good idiom is to understand when it isn’t helping you. In this case, you can actually time both methods and see that the latter method is faster. So, this is a particular example of a classic time readability trade-off.You’ll see lots of examples on stock overflow and in documentation of people using method chaining in their pandas. And so, I think being able to read and understand the syntax is really worth your time. Here’s another pandas idiom. Python has a wonderful function called map, which is sort of a basis for functional programming in the language. When you want to use map in Python, you pass it some function you want called, and some iterable, like a list, that you want the function to be applied to. The results are that the function is called against each item in the list, and there’s a resulting list of all of the evaluations of that function.Python has a similar function called applymap. In applymap, you provide some function which should operate on each cell of a DataFrame, and the return set is itself a DataFrame. Now I think applymap is fine, but I actually rarely use it. Instead, I find myself often wanting to map across all of the rows in a DataFrame. And pandas has a function that I use heavily there, called apply. Let’s look at an example.Let’s take our census DataFrame. In this DataFrame, we have five columns for population estimates. Each column corresponding with one year of estimates. It’s quite reasonable to want to create some new columns for minimum or maximum values, and the apply function is an easy way to do this.

从功能上来说，此代码没有什么错，您甚至可以以崭新的方式更好地理解它。它不如第一个示例那么可笑。现在，任何成语的关键是要了解什么时候对您没有帮助。在这种情况下，您实际上可以对这两种方法进行计时，并看到后一种方法更快。因此，这是经典的时间可读性权衡的一个特殊示例。您将在股票溢出和使用熊猫方法链的人的文档中看到很多示例。因此，我认为能够阅读和理解语法确实值得您花时间。这是另一个熊猫习语。 Python有一个很棒的函数，称为map，这是使用该语言进行函数编程的基础。当您想在Python中使用map时，会向其传递您要调用的某些函数，以及一些希望对其应用的可迭代函数(如列表)。结果是针对列表中的每个项目调用了该函数，并且存在该函数所有求值的结果列表.Python有一个类似的函数称为applymap 。在applymap中，您提供了一些应在DataFrame的每个单元上运行的函数，而返回集本身就是一个DataFrame。现在我认为applymap很好，但是实际上我很少使用它。相反，我发现自己经常想在DataFrame中的所有行上进行映射。熊猫具有我在那儿大量使用的功能，称为Apply。我们来看一个例子，让我们来做普查数据框架。在此DataFrame中，我们有五列用于人口估计。每列对应一年的估算值。想要为最小值或最大值创建一些新列是很合理的，而apply函数是实现此目的的简便方法。

First, we need to write a function which takes in a particular row of data, finds a minimum and maximum values, and returns a new row of data. We’ll call this function min_max, this is pretty straight forward. We can create some small slice of a row by projecting the population columns. Then use the NumPy min and max functions, and create a new series with a label values represent the new values we want to apply.Then we just need to call apply on the DataFrame. Apply takes the function and the axis on which to operate as parameters. Now, we have to be a bit careful, we’ve talked about axis zero being the rows of the DataFrame in the past. But this parameter is really the parameter of the index to use. So, to apply across all rows, you pass axis equal to one.

首先，我们需要编写一个函数，该函数接收特定的数据行，查找最小值和最大值，然后返回新的数据行。我们将这个函数称为min_max，这很简单。我们可以通过投影总体列来创建一行的一小部分。然后使用NumPy的min和max函数，创建一个新的系列，其标签值代表我们要应用的新值，然后只需要在DataFrame上调用apply。 Apply将功能和将在其上操作的轴作为参数。现在，我们必须要小心一点，我们过去谈论过零轴是DataFrame的行。但是此参数实际上是要使用的索引的参数。因此，要应用于所有行，请传递等于1的轴。

Of course there’s no need to limit yourself to returning a new series object. If you’re doing this as part of data cleaning your likely to find yourself wanting to add new data to the existing DataFrame. In that case you just take the row values and add in new columns indicating the max and minimum scores. This is a regular part of my workflow when bringing in data and building summary or descriptive statistics. And is often used heavily with the merging of DataFrames.

当然，没有必要将自己局限于返回一个新的系列对象。如果您在进行数据清理时这样做，则可能会发现自己想要向现有DataFrame添加新数据。在这种情况下，您只需获取行值，然后添加新列以指示最高和最低分数。这是导入数据并建立摘要或描述性统计信息时工作流程的常规部分。并且经常与DataFrame的合并一起大量使用。

Okay, this is all great, and apply is an extremely important tool in your toolkit. But this lecture wasn’t really supposed to be about the new features of the API, but about making pandorable code. The reason I introduced supply here is that you rarely see it used with large function definitions as we did. Instead, you typically see it used with lambdas.

好的，这一切都很好，在您的工具箱中应用是一个非常重要的工具。但是，本次讲座实际上并不是关于API的新功能，而是关于编写可疑的代码。我在这里介绍供应的原因是，您很少像我们那样看到它与大型函数定义一起使用。相反，您通常会看到它与lambda一起使用。

https://www.coursera.org/learn/python-data-analysis/lecture/Ln156/pandas-idioms