mysql 复选框数据查询_查询数据框

最新推荐文章于 2024-07-21 00:18:35 发布

weixin_26729763

最新推荐文章于 2024-07-21 00:18:35 发布

阅读量1k

点赞数

文章标签： mysql java 数据库 sql python

原文链接：https://medium.com/python-in-plain-english/querying-a-dataframe-821fb7649a26

版权

本文介绍了如何在MySQL数据库中进行复选框数据的查询操作，结合实例讲解了利用SQL语句处理复选框数据的方法，适用于Java、Python等后端开发人员进行数据查询。

摘要由CSDN通过智能技术生成

mysql 复选框数据查询

Before we talk about how to query data frames, we need to talk about Boolean masking. Boolean masking is the heart of fast and efficient querying in NumPy. It’s analogous a bit to masking used in other computational areas.A Boolean mask is an array which can be of one dimension like a series, or two dimensions like a data frame, where each of the values in the array are either true or false. This array is essentially overlaid on top of the data structure that we’re querying. And any cell aligned with the true value will be admitted into our final result, and any sign aligned with a false value will not.

在讨论如何查询数据帧之前，我们需要讨论布尔掩码。 布尔掩码是NumPy中快速高效查询的核心。这与其他计算区域中使用的掩码有点类似。布尔掩码是一个数组，可以是一个维度，例如一系列，也可以是二维，例如数据框，其中数组中的每个值都是true或false。 该数组实质上覆盖了我们正在查询的数据结构。任何与真实值对齐的单元格都将被纳入我们的最终结果，而任何与错误值对齐的符号都将不被接受。

Boolean masking is powerful conceptually and is the cornerstone of efficient NumPy and pandas querying. This technique is well used in other areas of computer science, for instance, in graphics. But it doesn’t really have an analogue in other traditional relational databases, so I think it’s worth pointing out here. Boolean masks are created by applying operators directly to the pandas series or DataFrame objects.

布尔掩码在概念上很强大，并且是高效的NumPy和熊猫查询的基础。该技术在计算机科学的其他领域(例如图形)中得到了很好的使用。但这在其他传统的关系数据库中并没有真正的类似物，因此我认为值得在此指出。布尔掩码是通过将运算符直接应用于pandas系列或DataFrame对象而创建的。

For instance, in our Olympics data set, you might be interested in seeing only those countries who have achieved a gold medal at the summer Olympics. To build a Boolean mask for this query, we project the gold column using the indexing operator and apply the greater than operator with a comparison value of zero. This is essentially broadcasting a comparison operator, greater than, with the results being returned as a Boolean series. The resultant series is indexed where the value of each cell is either true or false depending on whether a country has won at least one gold medal, and the index is the country name.

例如，在我们的奥运会数据集中，您可能只想看看那些在夏季奥运会上获得金牌的国家。要为此查询构建布尔掩码，我们使用索引运算符投影黄金列，并应用比较值零的大于运算符。本质上，这是广播一个大于的比较运算符，其结果以布尔序列返回。对所得系列进行索引，其中每个单元格的值是true还是false取决于一个国家是否赢得了至少一枚金牌，并且索引是国家名称。

So this builds us the Boolean mask, which is half the battle. What we want to do next is overlay that mask on the data frame. We can do this using the where function. The where function takes a Boolean mask as a condition, applies it to the data frame or series, and returns a new data frame or series of the same shape. Let’s apply this Boolean mask to our Olympics data and create a data frame of only those countries who have won a gold at a summer games.

因此，这为我们构建了布尔型掩码，这是成功的一半。接下来我们要做的是将该掩码覆盖在数据帧上。 我们可以使用where函数来做到这一点。 where函数以布尔掩码为条件，将其应用于数据框或序列，然后返回形状相同的新数据框或序列。让我们将此布尔蒙版应用于奥运会数据，并创建仅包含那些在夏季奥运会上获得金牌的国家的数据框。

We see that the resulting data frame keeps the original indexed values, and only data from countries that met the condition are retained. All of the countries which did not meet the condition have NaN data instead. This is okay. Most statistical functions built into the data frame object ignore values of NaN.For instance, if we call the df.count on the only gold data frame, we see that there are 100 countries which have had gold medals awarded at the summer games, while if we call count on the original data frame, we see that there are 147 countries total.

我们看到，结果数据框将保留原始索引值，并且仅保留来自符合条件的国家/地区的数据。相反，所有不符合条件的国家都具有NaN数据。没关系数据框对象中内置的大多数统计函数都会忽略NaN的值。例如，如果我们在唯一的金数据框上调用df.count ，我们会看到有100个国家在夏季奥运会上获得了金牌，而如果我们在原始数据框架上调用计数，则会看到总共有147个国家/地区。

Often we want to drop those rows which have no data. To do this, we can use the drop NA function. You can optionally provide drop NA the axes it should be considering. Remember that the axes is just an indicator for the columns or rows and that the default is zero, which means rows.

通常，我们想删除那些没有数据的行。为此，我们可以使用drop NA函数。您可以选择提供落差NA应当考虑的轴。请记住，轴只是列或行的指示器，默认值为零，表示行。

When you find yourself talking about pandas and saying phrases like, often I want to, it’s quite likely the developers have included a shortcut for this common operation.

当您发现自己在谈论大熊猫并说出我经常想说的短语时，很可能开发人员已经为该常用操作提供了快捷方式。

For instance, in this example, we don’t actually have to use the where function explicitly. The pandas developers allow the indexing operator to take a Boolean mask as a value instead of just a list of column names. The syntax might look a little messy, especially if you’re not used to programming languages with overloaded operators, but the result is that you’re able to filter and reduce data frames relatively quickly.

例如，在此示例中，我们实际上不必显式使用where函数。大熊猫开发人员允许索引运算符将布尔掩码作为值，而不仅仅是列名列表。语法可能看起来有些混乱，特别是如果您不习惯使用重载运算符对语言进行编程，但是结果是您能够相对快速地过滤和减少数据帧。

Here’s a more concise example of how we could query this data frame. You’ll notice that there are no NaNs when you query the data frame in this manner. pandas automatically filter out the rows with now values.One more thing to keep in mind if you’re not used to Boolean or bit masking for data reduction. The output of two Boolean masks being compared with logical operators is another Boolean mask. This means that you can chain together a bunch of and/or statements in order to create more complex queries, and the result is a single Boolean mask.For instance, we could create a mask for all of those countries who have received a gold in the summer Olympics and logically order that with all of those countries who have received a gold in the winter Olympics. If we apply this to the data frame and use the length function to see how many rows there are, we see that there are 101 countries which have won a gold metal at some time.

这是我们如何查询此数据帧的更简洁的示例。您会注意到以这种方式查询数据帧时没有NaN。大熊猫会自动使用now值过滤掉行。如果您不习惯使用布尔或位掩码进行数据缩减，还需要记住一件事。与逻辑运算符进行比较的两个布尔掩码的输出是另一个布尔掩码。这意味着您可以将一堆and和/或语句链接在一起以创建更复杂的查询，并且结果是一个布尔掩码。 例如，我们可以为在夏季奥运会上获得金牌的所有国家/地区创建一个遮罩，并按照逻辑地对在冬季奥运会上获得金牌的所有国家/地区创建掩码。如果将其应用于数据框并使用长度函数查看有多少行，我们会看到有101个国家在某个时候赢得了金牌。

Another example for fun. Have there been any countries who have only won a gold in the winter Olympics and never in the summer Olympics? Here’s one way to answer that.

另一个有趣的例子。有没有哪个国家仅在冬季奥运会上获得金牌，而在夏季奥运会上从未获得金牌？这是一种答案。

Poor Liechtenstein. Thankfully the Olympics come every four years. I know who I’ll be cheering for in 2020 to win their first summer gold.Extremely important, and often an issue for new users, is to remember that each Boolean mask needs to be encased in parenthesis because of the order of operations. This can cause no end of frustration if you’re not used to it, so be careful.

可怜的列支敦士登。幸运的是，奥运会每四年举行一次。我知道我会在2020年为谁赢得第一枚夏季金牌而欢呼，这非常重要，而且对于新用户来说经常是一个问题，要记住，由于操作顺序，每个布尔型掩码都必须用括号括起来。 如果您不习惯这种方法，就不会造成挫折感，因此请务必小心。

https://www.coursera.org/learn/python-data-analysis/lecture/QnPL7/querying-a-dataframe