获取每个组的前1行

本文翻译自:Get top 1 row of each group

I have a table which I want to get the latest entry for each group. 我有一张桌子,我想获取每个组的最新条目。 Here's the table: 这是桌子:

DocumentStatusLogs Table DocumentStatusLogs

|ID| DocumentID | Status | DateCreated |
| 2| 1          | S1     | 7/29/2011   |
| 3| 1          | S2     | 7/30/2011   |
| 6| 1          | S1     | 8/02/2011   |
| 1| 2          | S1     | 7/28/2011   |
| 4| 2          | S2     | 7/30/2011   |
| 5| 2          | S3     | 8/01/2011   |
| 6| 3          | S1     | 8/02/2011   |

The table will be grouped by DocumentID and sorted by DateCreated in descending order. 该表将按DocumentID分组,并按DateCreated降序排序。 For each DocumentID , I want to get the latest status. 对于每个DocumentID ,我想获取最新状态。

My preferred output: 我的首选输出:

| DocumentID | Status | DateCreated |
| 1          | S1     | 8/02/2011   |
| 2          | S3     | 8/01/2011   |
| 3          | S1     | 8/02/2011   |
  • Is there any aggregate function to get only the top from each group? 是否有任何汇总函数只能从每个组中获得最高排名? See pseudo-code GetOnlyTheTop below: 请参见下面的伪代码GetOnlyTheTop

     SELECT DocumentID, GetOnlyTheTop(Status), GetOnlyTheTop(DateCreated) FROM DocumentStatusLogs GROUP BY DocumentID ORDER BY DateCreated DESC 
  • If such function doesn't exist, is there any way I can achieve the output I want? 如果不存在这样的功能,有什么办法可以实现所需的输出?

  • Or at the first place, could this be caused by unnormalized database? 或者首先,这可能是由于数据库未规范化引起的吗? I'm thinking, since what I'm looking for is just one row, should that status also be located in the parent table? 我在想,因为我要查找的只是一行,所以该status也应该位于父表中吗?

Please see the parent table for more information: 请参阅父表以获取更多信息:

Current Documents Table 当前Documents

| DocumentID | Title  | Content  | DateCreated |
| 1          | TitleA | ...      | ...         |
| 2          | TitleB | ...      | ...         |
| 3          | TitleC | ...      | ...         |

Should the parent table be like this so that I can easily access its status? 父表是否应该像这样,以便我可以轻松访问其状态?

| DocumentID | Title  | Content  | DateCreated | CurrentStatus |
| 1          | TitleA | ...      | ...         | s1            |
| 2          | TitleB | ...      | ...         | s3            |
| 3          | TitleC | ...      | ...         | s1            |

UPDATE I just learned how to use "apply" which makes it easier to address such problems. UPDATE我刚刚学习了如何使用“应用”,这使得解决此类问题更加容易。


#1楼

参考:https://stackoom.com/question/ShoT/获取每个组的前-行


#2楼

I just learned how to use cross apply . 我刚刚学习了如何使用cross apply Here's how to use it in this scenario: 在这种情况下,如何使用它:

 select d.DocumentID, ds.Status, ds.DateCreated 
 from Documents as d 
 cross apply 
     (select top 1 Status, DateCreated
      from DocumentStatusLogs 
      where DocumentID = d.DocumentId
      order by DateCreated desc) as ds

#3楼

In scenarios where you want to avoid using row_count(), you can also use a left join: 在要避免使用row_count()的情况下,还可以使用左连接:

select ds.DocumentID, ds.Status, ds.DateCreated 
from DocumentStatusLogs ds
left join DocumentStatusLogs filter 
    ON ds.DocumentID = filter.DocumentID
    -- Match any row that has another row that was created after it.
    AND ds.DateCreated < filter.DateCreated
-- then filter out any rows that matched 
where filter.DocumentID is null 

For the example schema, you could also use a "not in subquery", which generally compiles to the same output as the left join: 对于示例架构,您还可以使用“不在子查询中”,通常将其编译为与左联接相同的输出:

select ds.DocumentID, ds.Status, ds.DateCreated 
from DocumentStatusLogs ds
WHERE ds.ID NOT IN (
    SELECT filter.ID 
    FROM DocumentStatusLogs filter
    WHERE ds.DocumentID = filter.DocumentID
        AND ds.DateCreated < filter.DateCreated)

Note, the subquery pattern wouldn't work if the table didn't have at least one single-column unique key/constraint/index, in this case the primary key "Id". 请注意,如果表没有至少一个单列唯一键/约束/索引,在这种情况下为主键“ Id”,则子查询模式将不起作用。

Both of these queries tend to be more "expensive" than the row_count() query (as measured by Query Analyzer). 这两个查询都比row_count()查询(由查询分析器衡量)更“昂贵”。 However, you might encounter scenarios where they return results faster or enable other optimizations. 但是,您可能会遇到这样的情况:它们更快地返回结果或启用其他优化。


#4楼

My code to select top 1 from each group 我的代码从每个组中选择前1名

select a.* from #DocumentStatusLogs a where 
 datecreated in( select top 1 datecreated from #DocumentStatusLogs b
where 
a.documentid = b.documentid
order by datecreated desc
)

#5楼

If you're worried about performance, you can also do this with MAX(): 如果您担心性能,也可以使用MAX()来做到这一点:

SELECT *
FROM DocumentStatusLogs D
WHERE DateCreated = (SELECT MAX(DateCreated) FROM DocumentStatusLogs WHERE ID = D.ID)

ROW_NUMBER() requires a sort of all the rows in your SELECT statement, whereas MAX does not. ROW_NUMBER()要求SELECT语句中的所有行都属于某种类型,而MAX则不需要。 Should drastically speed up your query. 应该大大加快您的查询。


#6楼

It is checked in SQLite that you can use the following simple query with GROUP BY 在SQLite中检查您可以对GROUP BY使用以下简单查询

SELECT MAX(DateCreated), *
FROM DocumentStatusLogs
GROUP BY DocumentID

Here MAX help to get the maximum DateCreated FROM each group. 在这里, MAX帮助您从每个组中获取最大的DateCreated

But it seems that MYSQL doesn't associate *-columns with the value of max DateCreated :( 但似乎MYSQL并不将*列与max DateCreated的值相关联:(

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值