cte公用表表达式_SQL Server中的CTE; 查询公用表表达式

cte公用表表达式

Common table expressions (CTEs) in SQL Server provide us with a tool that allows us to design and organize queries in ways that may allow faster development, troubleshooting, and improve performance. In the first part of this series, we’ll look at querying against these with a practice data set. From examples of wrapped query checks to organization of data to multiple structured queries, we’ll see how many options we have with this tool and where it may be useful when we query data.

SQL Server中的通用表表达式(CTE)为我们提供了一个工具,使我们能够以更快的开发,故障排除和提高性能的方式设计和组织查询。 在本系列的第一部分中,我们将研究使用实践数据集对这些查询。 从包装查询检查的示例到数据的组织再到多个结构化查询,我们将看到此工具有多少个选项,以及在查询数据时可能在什么地方有用。

测试数据集 (Test Data Set)

The data set in these examples I use queries 362 values of a date range between December 1988 to January 2019 of almond values measured by a unit of output. The “AlmondValue” is not important to this tip, while the dates are for the examples. Any monthly stored values will work for these exercises with table name and column name substitutions. For readers who may want to copy queries directly, I’ve provided the below loop which will create these values for you to run these examples – the AlmondValue and Base10AlmondValue will differ in the output when you run these query examples, but the logical design of each query in the examples won’t.

这些示例中的数据集我使用362个值来查询1988年12月至2019年1月之间的日期范围的杏仁值,该值以输出单位来度量。 “ AlmondValue”对本技巧并不重要,而日期仅用于示例。 任何月度存储值都可以用表名和列名替换来进行这些练习。 对于可能要直接复制查询的读者,我提供了以下循环,该循环将为您创建这些值以运行这些示例–运行这些查询示例时,AlmondValue和Base10AlmondValue的输出将有所不同,但是逻辑上的设计示例中的每个查询都不会。

CREATE TABLE tbAlmondData (AlmondDate DATE, AlmondValue DECIMAL(13,2))
 
DECLARE @startdate DATE = '1988-12-01', @value DECIMAL(13,2)
 
 
WHILE @startdate <= '2019-01-01'
BEGIN
  SET @value = ((RAND()*100))
 
  INSERT INTO tbAlmondData VALUES (@startdate,@value)
 
  SET @startdate = DATEADD(MM,1,@startdate)
END
 
SELECT MAX(AlmondDate) LatestDate, MIN(AlmondDate) EarliestDate, COUNT(AlmondDate) CountofValues
FROM tbAlmondData

CTE和Select语句的基础 (The Basics of CTE and Select Statements)

The basic part of a common table expression is that we can wrap a query and run CRUD operations with the wrapped query. Let’s look at a simple example of wrapping a query and changing column names, then selecting from these new columns. In the below image, we’ll see the highlighted CTE in SQL Server, even though we ran the full query and this shows a quick debugging option that we have – we can select the wrapped query to do a quick check before running the query and this is very useful with CRUD operations that require validation, or where we may be trying to debug a transaction and need to run the first part of a statement that our CRUD operation will run again. This is an important part a CTE in SQL Server: what we see in the wrapped query is what our operation is applied against; in this case, a select statement.

通用表表达式的基本部分是我们可以包装查询并使用包装的查询运行CRUD操作。 让我们看一个简单的示例,该示例包装查询并更改列名,然后从这些新列中进行选择。 在下图中,即使我们运行了完整的查询,也将在SQL Server中看到突出显示的CTE,这显示了我们拥有的快速调试选项–我们可以选择包装的查询来快速检查运行的查询,然后这对于需要验证的CRUD操作非常有用,或者在我们可能尝试调试事务并需要运行CRUD操作将再次运行的语句的第一部分时非常有用。 这是SQL Server中CTE的重要组成部分:在包装查询中看到的是应用操作的内容; 在这种情况下,将使用一条select语句。

;WITH GroupAlmondDates AS(
  SELECT 
    YEAR(AlmondDate) AlmondYear
    , AlmondDate
    , AlmondValue
    , (AlmondValue*10) Base10AlmondValue
  FROM tbAlmondData
)
SELECT *
FROM GroupAlmondDates

One important point when creating these is that we will get an error if we don’t explicitly name our columns in the wrapped query. Let’s take the same above example and see the error when we remove the column name AlmondYear in the wrapped query:

创建这些列时,重要的一点是,如果未在包装查询中显式命名列,则会出现错误。 让我们以上面的示例为例,在包装查询中删除列名AlmondYear时看到错误:

;WITH GroupAlmondDates AS(
  SELECT 
    YEAR(AlmondDate)
    , AlmondDate
    , AlmondValue
    , (AlmondValue*10) Base10AlmondValue
  FROM tbAlmondData
)
SELECT *
FROM GroupAlmondDates

This is a derived column, meaning that it comes from the values within another column, but as we see, a name is required. We can alternatively create the names explicitly and ignore naming the columns in the wrapped query and the explicit names will override any column name in the query – though I prefer to name in the wrapped query for faster troubleshooting purposes. Notice what happens to the explicit name of Base10 versus Base10AlmondValue.

这是一个派生的列,这意味着它来自另一列中的值,但是正如我们所看到的,名称是必需的。 我们也可以选择显式创建名称,而忽略命名包装查询中的列,而显式名称将覆盖查询中的任何列名称-尽管我更愿意在包装查询中命名,以便更快地进行故障排除。 请注意,Base10与Base10AlmondValue的显式名称发生了什么。

;WITH GroupAlmondDates (AlmondYear,AlmondDate,AlmondValue,Base10) AS(
  SELECT 
    YEAR(AlmondDate)
    , AlmondDate
    , AlmondValue
    , (AlmondValue*10) Base10AlmondValue
  FROM tbAlmondData
)
SELECT *
FROM GroupAlmondDates

We see the Base10 in the output, so if we have a mixture of explicit column names that differ from the column names within the wrapped query, we should remember to use the explicit column names in references.

我们在输出中看到了Base10,因此,如果混合使用显式列名与包装查询中的列名不同,则应记住在引用中使用显式列名。

Another technique we can use with common table expressions is creating multiple CTEs together, similar to creating multiple subqueries that reference other subqueries (a common query output from Entity Framework). Although we can do this, this does not mean these are the most optimal solution, relative to what we’re trying to query. In the below query, we use a combination of these to query our data like we’ve queried above this, and in the second CTE, we look at the average of the AlmondValue for the year. From there, we join the GroupAlmondDates with GetAverageByYear on the AlmondYear and compare the AlmondValue with the average of the year. Our join here allows us to avoid creating another independent common table expression and save to a table, or use a temp table through creating a table, saving data, then removing it later.

我们可以与公用表表达式一起使用的另一种技术是一起创建多个CTE,类似于创建引用其他子查询的多个子查询(实体框架的常见查询输出)。 尽管我们可以做到,但这并不意味着相对于我们要查询的内容,这是最佳的解决方案。 在下面的查询中,我们结合使用这些查询来查询我们的数据,就像在上面查询的那样,在第二个CTE中,我们查看了当年AlmondValue的平均值。 从那里开始,我们将GroupAlmondDates连同在AlmondYear上的GetAverageByYear进行比较,并将AlmondValue与一年中的平均值进行比较。 通过此处的联接,我们可以避免创建另一个独立的公用表表达式并保存到表中,或者通过创建表,保存数据然后在以后删除它来使用临时表。

;WITH GroupAlmondDates AS(
  SELECT 
    YEAR(AlmondDate) AlmondYear
    , AlmondDate
    , AlmondValue
    , (AlmondValue*10) Base10AlmondValue
  FROM tbAlmondData
), GetAverageByYear AS(
  SELECT 
    AlmondYear
    , AVG(AlmondValue) AvgAlmondValueForYear
  FROM GroupAlmondDates
  GROUP BY AlmondYear
)
SELECT t.AlmondDate
  , (t.AlmondValue - tt.AvgAlmondValueForYear) ValueDiff
FROM GroupAlmondDates t
  INNER JOIN GetAverageByYear tt ON t.AlmondYear = tt.AlmondYear

This can be a useful technique provided that we consider we can’t troubleshoot the GetAverageByYear in the way that we can with GroupAlmondDates – meaning that we can’t highlight the wrapped query in the second CTE since it requires the first to exist. Should we use combinations? Sometimes, provided that debugging is either not required (intuitive design) or it’s the only option we have that performs the best.

如果我们认为无法像使用GroupAlmondDates一样对GetAverageByYear进行故障排除,这将是一种有用的技术-这意味着我们无法在第二个CTE中突出显示已包装的查询,因为它需要第一个存在。 我们应该结合使用吗? 有时,前提是不需要调试(直观的设计),或者这是我们执行效果最好的唯一选择。

In our next few queries, we’ll apply two SQL Server functions – ROW_NUMBER() and DENSE_RANK() – to view how we can extend analysis in our select statement. Both of these functions involve ordering data, one which orders rows of data by the row number of the data demarcated with the order by clause and the other which ranks data delineated with the order by clause. In the below code, I order five date combinations using descending and ascending with some of these – the row number of the date, the row number of the date specified by the year of the date, and the rank of the year.

在接下来的几个查询中,我们将应用两个SQL Server函数ROW_NUMBER()和DENSE_RANK()来查看如何在select语句中扩展分析。 这两个功能都涉及对数据进行排序,一个功能通过按order by子句划定的数据的行号对数据行进行排序,另一个功能对用order by子句划定的数据进行排名。 在下面的代码中,我使用降序和升序对五个日期组合进行排序-日期的行号,日期的年指定的日期的行号以及年份的等级。

;WITH GroupAlmondDates AS(
  SELECT 
    ROW_NUMBER() OVER (ORDER BY AlmondDate DESC) DateDescId
    , ROW_NUMBER() OVER (ORDER BY AlmondDate ASC) DateAscId
    , ROW_NUMBER() OVER (PARTITION BY YEAR(AlmondDate) ORDER BY YEAR(AlmondDate) ASC) YearAscId
    , DENSE_RANK() OVER (ORDER BY YEAR(AlmondDate) DESC) YearOrderedDescId
    , DENSE_RANK() OVER (ORDER BY YEAR(AlmondDate) ASC) YearOrderedAscId
    , YEAR(AlmondDate) AlmondYear
    , AlmondDate
    , AlmondValue
    , (AlmondValue*10) Base10AlmondValue
  FROM tbAlmondData
)
SELECT 
  AlmondYear
  , AlmondDate
  , DateDescId
  , DateAscId
  , YearAscId
  , YearOrderedDescId
  , YearOrderedAscId
FROM GroupAlmondDates

An overview of what we’re seeing in the above output:

我们在上面的输出中看到的概述:

  • DateDescId/DataAscId: we see that we have 362 records because we’re getting the row number by the AlmondDate column. The DateAscId orders the date from 1 to 362, while the DateDescId orders the date from 362 to 1 DateDescId / DataAscId :我们看到有362条记录,因为我们通过AlmondDate列获取行号。 DateAscId将日期从1到362排序,而DateDescId将日期从362到1排序。
  • YearAscId: these fields are ordering the rows by date of the year. Because these are monthly values, starting in 1989, we see the order matching the months for the. Since there is only one month in 1988, we see the YearAscId of 1 before it starts over at the beginning of 1989
  • YearAscId :这些字段按年份的日期行进行排序。 由于这些是从1989年开始的月度值,因此我们看到与月的匹配的订单。 由于1988年只有一个月,因此我们看到YearAscId为1,然后在1989年初才开始
  • YearOrderedDescId/YearOrderedAscId: this orders the years by their group – for an example, 1988 and 1989 are in different groups (ranks) – rank 32 and 31. We can see that we have a total number of 32 years
  • YearOrderedDescId / YearOrderedAscId :这按年份对年份进行排序 -例如,1988和1989在不同的组(等级)中-排名32和31。我们可以看到总数为32年

Ordering and partitioning values like this can provide us with tools to organize our data to apply mathematical functions, get values of data from sets within sets, etc. Using the same data set, suppose that we wanted to know what the median almond value of the first four years 1988, 1989, 1990, 1991 was. Since we know that we have 1 value in 1988 and 12 values in each of the other years, we’d need to find the 19th value. Using a derivative of the above common table expression (removing unnecessary columns and adding the value), we can get this value:

像这样对值进行排序和划分可以为我们提供工具,以组织数据以应用数学函数,从集合内的集合中获取数据的值,等等。使用相同的数据集,假设我们想知道什么是杏仁的中值前四年是1988年,1989年,1990年,1991年。 由于我们知道1988年的价值为1,其他年份的价值为12,因此我们需要找到第19 价值。 使用上述公共表表达式的派生类(删除不必要的列并添加值),我们可以获得以下值:

;WITH GroupAlmondDates AS(
  SELECT 
    ROW_NUMBER() OVER (ORDER BY AlmondDate ASC) DateAscId
    , YEAR(AlmondDate) AlmondYear
    , AlmondDate
    , AlmondValue
    , (AlmondValue*10) Base10AlmondValue
  FROM tbAlmondData
)
SELECT *
FROM GroupAlmondDates
WHERE DateAscId = 19

Using the order of our years from the previous query, we could get the average almond value of the median year from our years. In the below query, we filter out the 1988 date and get the median of the other 31 values, seeing that the year 2004 returns.

使用上一个查询的年份顺序,我们可以从我们的年份中获取中位数年份的平均杏仁值。 在下面的查询中,我们看到1988年返回,因此过滤掉1988年的日期并获得其他31个值的中值。

;WITH GroupAlmondDates AS(
  SELECT 
    DENSE_RANK() OVER (ORDER BY YEAR(AlmondDate) ASC) YearOrderedAscId
    , YEAR(AlmondDate) AlmondYear
    , AlmondDate
    , AlmondValue
    , (AlmondValue*10) Base10AlmondValue
  FROM tbAlmondData
  WHERE AlmondDate > '1988-12-31'
)
SELECT 
  YearOrderedAscId
  , AlmondYear
  , AlmondDate
  --AVG(AlmondValue) AvgAlmondValue
FROM GroupAlmondDates
WHERE YearOrderedAscId = 16

I’ve left the average value commented out so that we see the data set that we would be running our average value. We can use this same logic for other aggregates where we may want an aggregate for a specific year based out of the data set, like median is the middle of a data set.

我已将平均值留在注释中,以便我们看到将运行平均值的数据集。 我们可以对其他汇总使用相同的逻辑,在这些汇总中,我们可能希望根据数据集来特定年份的汇总,例如中位数是数据集的中间值。

结论 (Conclusion)

As we see, common table expressions (CTE in SQL Server) can provide us with a convenient way to query data similar to using tools like temp tables and subqueries. We have quick troubleshooting options with these where we can run select statements inside the wrapping of the CTE and we can create multiple statements to join together. Are these faster? In some cases, they may be, or they may be faster to troubleshoot – which could be important. Still, we should always test our queries to ensure that the way we’re running our query is the most optimal route.

如我们所见,通用表表达式(SQL Server中的CTE)可以为我们提供一种方便的方法来查询数据,类似于使用临时表和子查询之类的工具。 我们提供了快速的故障排除选项,可以在CTE包装中运行select语句,并且可以创建多个语句以将它们连接在一起。 这些更快吗? 在某些情况下,它们可能是或者可能更快地进行故障排除-这可能很重要。 尽管如此,我们仍应始终测试查询以确保我们运行查询的方式是最佳途径。

目录 (Table of contents)

CTEs in SQL Server; Querying Common Table Expressions
Inserts and Updates with CTEs in SQL Server (Common Table Expressions)
CTE SQL Deletes; Considerations when Deleting Data with Common Table Expressions in SQL Server
CTEs in SQL Server; Using Common Table Expressions To Solve Rebasing an Identifier Column
SQL Server中的CTE; 查询公用表表达式
在SQL Server中使用CTE进行插入和更新(公用表表达式)
CTE SQL删除; 在SQL Server中删除具有公用表表达式的数据时的注意事项
SQL Server中的CTE; 使用公用表表达式解决重新编制标识符列的问题

翻译自: https://www.sqlshack.com/ctes-in-sql-server-querying-common-table-expressions/

cte公用表表达式

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值