SQL仅选择列上具有最大值的行[重复]

最新推荐文章于 2024-08-12 03:14:44 发布

xfxf996

最新推荐文章于 2024-08-12 03:14:44 发布

阅读量967

点赞数

文章标签： mysql sql aggregate-functions greatest-n-per-group groupwise-maximum

原文链接：https://oldbug.net/q/WUzB/SQL-select-only-rows-with-max-value-on-a-column-duplicate

版权

本文翻译自：SQL select only rows with max value on a column [duplicate]

Want to improve this post? 想要改善这篇文章吗？ Provide detailed answers to this question, including citations and an explanation of why your answer is correct. 提供此问题的详细答案，包括引文和为什么答案正确的解释。 Answers without enough detail may be edited or deleted. 答案不够详细的答案可能会被编辑或删除。

This question already has answers here : 这个问题已经在这里有了答案 ：

Retrieving the last record in each group - MySQL (25 answers) 检索每个组中的最后一个记录-MySQL （25个答案）

Closed 8 months ago . 8个月前关闭。

I have this table for documents (simplified version here): 我有此表用于文档（此处为简化版）：

+------+-------+--------------------------------------+
| id   | rev   | content                              |
+------+-------+--------------------------------------+
| 1    | 1     | ...                                  |
| 2    | 1     | ...                                  |
| 1    | 2     | ...                                  |
| 1    | 3     | ...                                  |
+------+-------+--------------------------------------+

How do I select one row per id and only the greatest rev? 如何为每个ID选择一行，而仅选择最大转速？
With the above data, the result should contain two rows: [1, 3, ...] and [2, 1, ..] . 根据上述数据，结果应包含两行： [1, 3, ...]和[2, 1, ..] 。 I'm using MySQL . 我正在使用MySQL 。

Currently I use checks in the while loop to detect and over-write old revs from the resultset. 目前，我在while循环中使用检查功能来检测并覆盖结果集中的旧版本。 But is this the only method to achieve the result? 但这是获得结果的唯一方法吗？ Isn't there a SQL solution? 没有SQL解决方案吗？

Update 更新资料
As the answers suggest, there is a SQL solution, and here a sqlfiddle demo . 作为答案提示，有一个SQL的解决方案，并且这里sqlfiddle演示。

Update 2 更新2
I noticed after adding the above sqlfiddle , the rate at which the question is upvoted has surpassed the upvote rate of the answers. 在添加上述sqlfiddle之后 ，我注意到问题被投票的速率已经超过答案的投票率。 That has not been the intention! 那不是意图！ The fiddle is based on the answers, especially the accepted answer. 小提琴基于答案，尤其是已接受的答案。

#1楼

参考：https://stackoom.com/question/WUzB/SQL仅选择列上具有最大值的行-重复

#2楼

I can't vouch for the performance, but here's a trick inspired by the limitations of Microsoft Excel. 我不能保证性能，但这是受Microsoft Excel局限性启发的技巧。 It has some good features 它有一些好的功能

GOOD STUFF 好东西

It should force return of only one "max record" even if there is a tie (sometimes useful) 即使有平局，它也应该只强制返回一个“最大记录”（有时有用）
It doesn't require a join 不需要加入

APPROACH 方法

It is a little bit ugly and requires that you know something about the range of valid values of the rev column. 这有点丑陋，要求您对rev列的有效值范围有所了解。 Let us assume that we know the rev column is a number between 0.00 and 999 including decimals but that there will only ever be two digits to the right of the decimal point (eg 34.17 would be a valid value). 让我们假设我们知道rev列是介于0.00和999之间的数字，包括小数，但是小数点右边永远只有两位数字（例如34.17是有效值）。

The gist of the thing is that you create a single synthetic column by string concatenating/packing the primary comparison field along with the data you want. 要点是，您可以通过字符串将主要比较字段与所需数据串联/打包来创建单个合成列。 In this way, you can force SQL's MAX() aggregate function to return all of the data (because it has been packed into a single column). 这样，您可以强制SQL的MAX（）聚合函数返回所有数据（因为它已打包到单个列中）。 Then you have to unpack the data. 然后，您必须解压缩数据。

Here's how it looks with the above example, written in SQL 这是上面用SQL编写的示例的外观

SELECT id, 
       CAST(SUBSTRING(max(packed_col) FROM 2 FOR 6) AS float) as max_rev,
       SUBSTRING(max(packed_col) FROM 11) AS content_for_max_rev 
FROM  (SELECT id, 
       CAST(1000 + rev + .001 as CHAR) || '---' || CAST(content AS char) AS packed_col
       FROM yourtable
      ) 
GROUP BY id

The packing begins by forcing the rev column to be a number of known character length regardless of the value of rev so that for example 打包首先通过将rev列强制为多个已知字符长度来实现，而不管rev的值如何，例如

3.2 becomes 1003.201 3.2变为1003.201
57 becomes 1057.001 57变成1057.001
923.88 becomes 1923.881 923.88变为1923.881

If you do it right, string comparison of two numbers should yield the same "max" as numeric comparison of the two numbers and it's easy to convert back to the original number using the substring function (which is available in one form or another pretty much everywhere). 如果操作正确，则两个数字的字符串比较应产生与两个数字的数字比较相同的“最大值”，并且使用substring函数（以一种形式或另一种形式可用）很容易转换回原始数字到处）。

#3楼

How about this: 这个怎么样：

SELECT all_fields.*  
FROM (SELECT id, MAX(rev) FROM yourtable GROUP BY id) AS max_recs  
LEFT OUTER JOIN yourtable AS all_fields 
ON max_recs.id = all_fields.id

#4楼

Yet another solution is to use a correlated subquery: 另一个解决方案是使用相关子查询：

select yt.id, yt.rev, yt.contents
    from YourTable yt
    where rev = 
        (select max(rev) from YourTable st where yt.id=st.id)

Having an index on (id,rev) renders the subquery almost as a simple lookup... 在（id，rev）上有一个索引几乎使子查询成为一个简单的查询...

Following are comparisons to the solutions in @AdrianCarneiro's answer (subquery, leftjoin), based on MySQL measurements with InnoDB table of ~1million records, group size being: 1-3. 以下是对@AdrianCarneiro的答案（子查询，leftjoin）中的解决方案的比较，该解决方案基于MySQL的测量结果，其中InnoDB表具有约100万条记录，组大小为1-3。

While for full table scans subquery/leftjoin/correlated timings relate to each other as 6/8/9, when it comes to direct lookups or batch ( id in (1,2,3) ), subquery is much slower then the others (Due to rerunning the subquery). 虽然对于全表扫描，子查询/ leftjoin /相关时间相互关联为6/8/9，但在直接查找或批处理（ id in (1,2,3) ）时，子查询要比其他查询慢得多（由于重新运行子查询）。 However I couldnt differentiate between leftjoin and correlated solutions in speed. 但是，我无法区分左连接和相关解决方案的速度。

One final note, as leftjoin creates n*(n+1)/2 joins in groups, its performance can be heavily affected by the size of groups... 最后一点，当leftjoin在组中创建n *（n + 1）/ 2个连接时，其性能可能会受到组大小的严重影响。

#5楼

This solution makes only one selection from YourTable, therefore it's faster. 此解决方案仅从YourTable中进行选择，因此速度更快。 It works only for MySQL and SQLite(for SQLite remove DESC) according to test on sqlfiddle.com. 根据sqlfiddle.com上的测试，它仅适用于MySQL和SQLite（对于SQLite删除DESC）。 Maybe it can be tweaked to work on other languages which I am not familiar with. 也许可以调整它以使用我不熟悉的其他语言。

SELECT *
FROM ( SELECT *
       FROM ( SELECT 1 as id, 1 as rev, 'content1' as content
              UNION
              SELECT 2, 1, 'content2'
              UNION
              SELECT 1, 2, 'content3'
              UNION
              SELECT 1, 3, 'content4'
            ) as YourTable
       ORDER BY id, rev DESC
   ) as YourTable
GROUP BY id

#6楼

NOT mySQL , but for other people finding this question and using SQL, another way to resolve the greatest-n-per-group problem is using Cross Apply in MS SQL 不是mySQL ，但是对于其他人发现此问题并使用SQL的方法，解决最大每组最大问题的另一种方法是在MS SQL中使用Cross Apply

WITH DocIds AS (SELECT DISTINCT id FROM docs)

SELECT d2.id, d2.rev, d2.content
FROM DocIds d1
CROSS APPLY (
  SELECT Top 1 * FROM docs d
  WHERE d.id = d1.id
  ORDER BY rev DESC
) d2