提高sql性能资源(1)

最新推荐文章于 2022-03-07 08:53:58 发布

zb0618

最新推荐文章于 2022-03-07 08:53:58 发布

阅读量234

点赞数

分类专栏： Database 文章标签： sql insert join table sql server constants

Database 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

五种提高 SQL 性能的方法(转)

从 INSERT 返回 IDENTITY
我决定从遇到许多问题的内容入手：如何在执行 SQL INSERT 后检索 IDENTITY 值。通常，问题不在于如何编写检索值的查询，而在于在哪里以及何时进行检索。在 SQL Server 中，下面的语句可用于检索由最新在活动数据库连接上运行的 SQL 语句所创建的 IDENTITY 值：

SELECT @@IDENTITY这个 SQL 语句并不复杂，但需要记住的一点是：如果这个最新的 SQL 语句不是 INSERT，或者您针对非 INSERT SQL 的其他连接运行了此 SQL，则不会获得期望的值。您必须运行下列代码才能检索紧跟在 INSERT SQL 之后且位于同一连接上的 IDENTITY，如下所示：

INSERT INTO Products (ProductName) VALUES ('Chalk')
SELECT @@IDENTITY在一个连接上针对 Northwind 数据库运行这些查询将返回一个名称为 Chalk 的新产品的 IDENTITY 值。所以，在使用 ADO 的 Visual Basic? 应用程序中，可以运行以下语句：

Set oRs = oCn.Execute("SET NOCOUNT ON;INSERT INTO Products _
(ProductName) VALUES ('Chalk');SELECT @@IDENTITY"
lProductID = oRs(0)此代码告诉 SQL Server 不要返回查询的行计数，然后执行 INSERT 语句，并返回刚刚为这个新行创建的 IDENTITY 值。SET NOCOUNT ON 语句表示返回的记录集有一行和一列，其中包含了这个新的 IDENTITY 值。如果没有此语句，则会首先返回一个空的记录集（因为 INSERT 语句不返回任何数据），然后会返回第二个记录集，第二个记录集中包含 IDENTITY 值。这可能有些令人困惑，尤其是因为您从来就没有希望过 INSERT 会返回记录集。之所以会发生此情况，是因为 SQL Server 看到了这个行计数（即一行受到影响）并将其解释为表示一个记录集。因此，真正的数据被推回到了第二个记录集。当然您可以使用 ADO 中的 NextRecordset 方法获取此第二个记录集，但如果总能够首先返回该记录集且只返回该记录集，则会更方便，也更有效率。

此方法虽然有效，但需要在 SQL 语句中额外添加一些代码。获得相同结果的另一方法是在 INSERT 之前使用 SET NOCOUNT ON 语句，并将 SELECT @@IDENTITY 语句放在表中的 FOR INSERT 触发器中，如下面的代码片段所示。这样，任何进入该表的 INSERT 语句都将自动返回 IDENTITY 值。

CREATE TRIGGER trProducts_Insert ON Products FOR INSERT AS
SELECT @@IDENTITY
GO
触发器只在 Products 表上发生 INSERT 时启动，所以它总是会在成功 INSERT 之后返回一个 IDENTITY。使用此技术，您可以始终以相同的方式在应用程序中检索 IDENTITY 值。

返回页首
内嵌视图与临时表
某些时候，查询需要将数据与其他一些可能只能通过执行 GROUP BY 然后执行标准查询才能收集的数据进行联接。例如，如果要查询最新五个定单的有关信息，您首先需要知道是哪些定单。这可以使用返回定单 ID 的 SQL 查询来检索。此数据就会存储在临时表（这是一个常用技术）中，然后与 Products 表进行联接，以返回这些定单售出的产品数量：

CREATE TABLE #Temp1 (OrderID INT NOT NULL, _
                     OrderDate DATETIME NOT NULL)
INSERT INTO #Temp1 (OrderID, OrderDate)
SELECT     TOP 5 o.OrderID, o.OrderDate
FROM Orders o ORDER BY o.OrderDate DESC
SELECT     p.ProductName, SUM(od.Quantity) AS ProductQuantity
FROM     #Temp1 t
    INNER JOIN [Order Details] od ON t.OrderID = od.OrderID
    INNER JOIN Products p ON od.ProductID = p.ProductID
GROUP BY p.ProductName
ORDER BY p.ProductName
DROP TABLE #Temp1这些 SQL 语句会创建一个临时表，将数据插入该表中，将其他数据与该表进行联接，然后除去该临时表。这会导致此查询进行大量 I/O 操作，因此，可以重新编写查询，使用内嵌视图取代临时表。内嵌视图只是一个可以联接到 FROM 子句中的查询。所以，您不用在 tempdb 中的临时表上耗费大量 I/O 和磁盘访问，而可以使用内嵌视图得到同样的结果：

SELECT p.ProductName,
    SUM(od.Quantity) AS ProductQuantity
FROM     (
    SELECT TOP 5 o.OrderID, o.OrderDate
    FROM     Orders o
    ORDER BY o.OrderDate DESC
    ) t
    INNER JOIN [Order Details] od ON t.OrderID = od.OrderID
    INNER JOIN Products p ON od.ProductID = p.ProductID
GROUP BY
    p.ProductName
ORDER BY
    p.ProductName此查询不仅比前面的查询效率更高，而且长度更短。临时表会消耗大量资源。如果只需要将数据联接到其他查询，则可以试试使用内嵌视图，以节省资源。

返回页首
避免 LEFT JOIN 和 NULL
当然，有很多时候您需要执行 LEFT JOIN 和使用 NULL 值。但是，它们并不适用于所有情况。改变 SQL 查询的构建方式可能会产生将一个花几分钟运行的报告缩短到只花几秒钟这样的天壤之别的效果。有时，必须在查询中调整数据的形态，使之适应应用程序所要求的显示方式。虽然 TABLE 数据类型会减少大量占用资源的情况，但在查询中还有许多区域可以进行优化。SQL 的一个有价值的常用功能是 LEFT JOIN。它可以用于检索第一个表中的所有行、第二个表中所有匹配的行、以及第二个表中与第一个表不匹配的所有行。例如，如果希望返回每个客户及其定单，使用 LEFT JOIN 则可以显示有定单和没有定单的客户。

此工具可能会被过度使用。LEFT JOIN 消耗的资源非常之多，因为它们包含与 NULL（不存在）数据匹配的数据。在某些情况下，这是不可避免的，但是代价可能非常高。LEFT JOIN 比 INNER JOIN 消耗资源更多，所以如果您可以重新编写查询以使得该查询不使用任何 LEFT JOIN，则会得到非常可观的回报（请参阅图 1 中的图）。

图 1 查询

加快使用 LEFT JOIN 的查询速度的一项技术涉及创建一个 TABLE 数据类型，插入第一个表（LEFT JOIN 左侧的表）中的所有行，然后使用第二个表中的值更新 TABLE 数据类型。此技术是一个两步的过程，但与标准的 LEFT JOIN 相比，可以节省大量时间。一个很好的规则是尝试各种不同的技术并记录每种技术所需的时间，直到获得用于您的应用程序的执行性能最佳的查询。

测试查询的速度时，有必要多次运行此查询，然后取一个平均值。因为查询（或存储过程）可能会存储在 SQL Server 内存中的过程缓存中，因此第一次尝试耗费的时间好像稍长一些，而所有后续尝试耗费的时间都较短。另外，运行您的查询时，可能正在针对相同的表运行其他查询。当其他查询锁定和解锁这些表时，可能会导致您的查询要排队等待。例如，如果您进行查询时某人正在更新此表中的数据，则在更新提交时您的查询可能需要耗费更长时间来执行。

避免使用 LEFT JOIN 时速度降低的最简单方法是尽可能多地围绕它们设计数据库。例如，假设某一产品可能具有类别也可能没有类别。如果 Products 表存储了其类别的 ID，而没有用于某个特定产品的类别，则您可以在字段中存储 NULL 值。然后您必须执行 LEFT JOIN 来获取所有产品及其类别。您可以创建一个值为“No Category”的类别，从而指定外键关系不允许 NULL 值。通过执行上述操作，现在您就可以使用 INNER JOIN 检索所有产品及其类别了。虽然这看起来好像是一个带有多余数据的变通方法，但可能是一个很有价值的技术，因为它可以消除 SQL 批处理语句中消耗资源较多的 LEFT JOIN。在数据库中全部使用此概念可以为您节省大量的处理时间。请记住，对于您的用户而言，即使几秒钟的时间也非常重要，因为当您有许多用户正在访问同一个联机数据库应用程序时，这几秒钟实际上的意义会非常重大。

返回页首
灵活使用笛卡尔乘积
对于此技巧，我将进行非常详细的介绍，并提倡在某些情况下使用笛卡尔乘积。出于某些原因，笛卡尔乘积 (CROSS JOIN) 遭到了很多谴责，开发人员通常会被警告根本就不要使用它们。在许多情况下，它们消耗的资源太多，从而无法高效使用。但是像 SQL 中的任何工具一样，如果正确使用，它们也会很有价值。例如，如果您想运行一个返回每月数据（即使某一特定月份客户没有定单也要返回）的查询，您就可以很方便地使用笛卡尔乘积。图 2 中的 SQL 就执行了上述操作。

虽然这看起来好像没什么神奇的，但是请考虑一下，如果您从客户到定单（这些定单按月份进行分组并对销售额进行小计）进行了标准的 INNER JOIN，则只会获得客户有定单的月份。因此，对于客户未订购任何产品的月份，您不会获得 0 值。如果您想为每个客户都绘制一个图，以显示每个月和该月销售额，则可能希望此图包括月销售额为 0 的月份，以便直观标识出这些月份。如果使用图 2 中的 SQL，数据则会跳过销售额为 0 美元的月份，因为在定单表中对于零销售额不会包含任何行（假设您只存储发生的事件）。

图 3 中的代码虽然较长，但是可以达到获取所有销售数据（甚至包括没有销售额的月份）的目标。首先，它会提取去年所有月份的列表，然后将它们放入第一个 TABLE 数据类型表 (@tblMonths) 中。下一步，此代码会获取在该时间段内有销售额的所有客户公司的名称列表，然后将它们放入另一个 TABLE 数据类型表 (@tblCus-tomers) 中。这两个表存储了创建结果集所必需的所有基本数据，但实际销售数量除外。第一个表中列出了所有月份（12 行），第二个表中列出了这个时间段内有销售额的所有客户（对于我是 81 个）。并非每个客户在过去 12 个月中的每个月都购买了产品，所以，执行 INNER JOIN 或 LEFT JOIN 不会返回每个月的每个客户。这些操作只会返回购买产品的客户和月份。

笛卡尔乘积则可以返回所有月份的所有客户。笛卡尔乘积基本上是将第一个表与第二个表相乘，生成一个行集合，其中包含第一个表中的行数与第二个表中的行数相乘的结果。因此，笛卡尔乘积会向表 @tblFinal 返回 972 行。最后的步骤是使用此日期范围内每个客户的月销售额总计更新 @tblFinal 表，以及选择最终的行集。

如果由于笛卡尔乘积占用的资源可能会很多，而不需要真正的笛卡尔乘积，则可以谨慎地使用 CROSS JOIN。例如，如果对产品和类别执行了 CROSS JOIN，然后使用 WHERE 子句、DISTINCT 或 GROUP BY 来筛选出大多数行，那么使用 INNER JOIN 会获得同样的结果，而且效率高得多。如果需要为所有的可能性都返回数据（例如在您希望使用每月销售日期填充一个图表时），则笛卡尔乘积可能会非常有帮助。但是，您不应该将它们用于其他用途，因为在大多数方案中 INNER JOIN 的效率要高得多。

返回页首
拾遗补零
这里介绍其他一些可帮助提高 SQL 查询效率的常用技术。假设您将按区域对所有销售人员进行分组并将他们的销售额进行小计，但是您只想要那些数据库中标记为处于活动状态的销售人员。您可以按区域对销售人员分组，并使用 HAVING 子句消除那些未处于活动状态的销售人员，也可以在 WHERE 子句中执行此操作。在 WHERE 子句中执行此操作会减少需要分组的行数，所以比在 HAVING 子句中执行此操作效率更高。HAVING 子句中基于行的条件的筛选会强制查询对那些在 WHERE 子句中会被去除的数据进行分组。

另一个提高效率的技巧是使用 DISTINCT 关键字查找数据行的单独报表，来代替使用 GROUP BY 子句。在这种情况下，使用 DISTINCT 关键字的 SQL 效率更高。请在需要计算聚合函数（SUM、COUNT、MAX 等）的情况下再使用 GROUP BY。另外，如果您的查询总是自己返回一个唯一的行，则不要使用 DISTINCT 关键字。在这种情况下，DISTINCT 关键字只会增加系统开销。

Tips for Writing
Efficient SQL Queries
From:
Vigyan Kaushik
2
In order to improve overall application performance, it’s very important to construct SQL queries
in the most efficient way. There can be many different ways to write a SQL query. Here are few
tips that can help you in writing efficient and reusable SQL queries. All examples given below
are based on Oracle default demo tables EMP and DEPT. You can create these tables in your
local schema from the following directory in windows environment.
%ORACLE_HOME%/sqlplus/demo/demobld.sql
Order of the tables in Joins: If you specify 2 or more tables in the FROM clause of a SELECT
statement, then Oracle parser will process the tables from right to left, so the table name you
specify last will be processed first. In this case you have to choose one table as driving table.
Always choose the table with less number of records as the driving table.
Name the Columns in a Query: There are three good reasons why it is better to name the
columns in a query rather than to use "select * from ...".
1. Network traffic is reduced. This can have a significant impact on performance if the table has
a large number of columns, or the table has a long or long raw column (both of which can be up
to 2 GB in length). These types of columns will take a long time to transfer over the network and
so they should not be fetched from the database unless they are specifically required.
2. The code is easier to understand.
3. It could save the need for changes in the future. If any columns is added to or removed from
the base table/view, then “select * “statement can produce wrong results set and statement may
fail.
Use table alias: Always use table alias and prefix all column names with the aliases when you
are using more than one table.
Never compare NULL to anything else: All expressions return NULL if one of the operands is
NULL. This is applicable for all operators except Concatenation operator (||).
Use Bind Variables: It is also better to use bind variables in queries. That way the query
becomes generic and therefore re-usable. For example, instead of writing a query like -
SELECT ename, sal
FROM emp
WHERE deptno = 20;
Change it to -
SELECT ename, sal
FROM emp
WHERE deptno = :deptno;
The first query can be re-used for deptno number 20 only, whereas the second query can be reused
for any other deptno also.
3
SQL Writing Convention: It is a good practice to use a standard syntax for wiring SQL queries.
I will recommend following standards to use while writing SQL queries.
Write all standard SQL TEXT in upper case:
For example:
SELECT ename, sal
FROM emp
WHERE deptno = 20;
Write all non standard SQL TEXT (Table name, Column name etc) in lower case:
For example:
SELECT ename, sal
FROM emp
WHERE deptno = 20;
Formatter Plus in Toad can be use to format SQL statements in this format. Select the complete
SQL statement and right click on the “Format Code” menu.
Note: It is important to write similar SQL statement in same case.
For example: Oracle will reparse following queries as they are not written in the same case
Select * from EMP;
Select * from emp;
Use EXISTS instead of DISTINCT: Use EXISTS in place of DISTINCT if you want the result
set to contain distinct values while joining tables.
For example:
SELECT DISTINCT d.deptno, d.dname
FROM dept d, emp e
WHERE d.deptno = e.deptno;
The following SQL statement is a better alternative.
SELECT d.deptno, d.dname
FROM dept d
WHERE EXISTS (SELECT e.deptno
FROM emp e
WHERE d.deptno = e.deptno);
Use of expressions and indexes: The optimizer fully evaluates expressions whenever
possible and translates certain syntactic constructs into equivalent constructs. This is done
either because Oracle can more quickly evaluate the resulting expression than the original
4
expression or because the original expression is merely a syntactic equivalent of the resulting
expression.
Any computation of constants is performed only once when the statement is optimized rather
than each time the statement is executed. Consider these conditions that test for salaries
greater than $2000.
sal > 24000/12
sal > 2000
sal*12 > 24000
If a SQL statement contains the first condition, the optimizer simplifies it into the second
condition.
Please note that optimizer does not simplify expressions across comparison operators. The
optimizer does not simplify the third expression into the second. For this reason, we should
write conditions that compare columns with constants whenever possible, rather than conditions
with expressions involving columns.
The Optimizer does not use index for the following statement:
SELECT *
FROM emp
WHERE sal*12 > 24000 ;
Instead of this use the following statement:
SELECT *
FROM emp
WHERE sal > 24000/12 ;
Use of NOT operator on indexed columns: Never use NOT operator on an indexed column.
Whenever Oracle encounters a NOT on an index column, it will perform full-table scan.
For Example:
SELECT *
FROM emp
WHERE NOT deptno = 0;
Instead use the following:
SELECT *
FROM emp
WHERE deptno > 0;
Function or Calculation on indexed columns: Never use a function or calculation on an
indexed column. If there is any function is used on an index column, optimizer will not use
index.
For Example:
5
Do not use until need exactly match string:
SELECT *
FROM emp
WHERE SUBSTR (ename, 1, 3) = 'MIL';
Use following instead:
SELECT *
FROM emp
WHERE ename LIKE 'MIL%';
Do not use the following as || is the concatenate function. Like other functions and it disables
index.
SELECT *
FROM emp
WHERE ename || job = 'MILLERCLERK';
Use the following instead
SELECT *
FROM emp
WHERE ename = 'MILLER' AND job = 'CLERK';.
Avoid Transformed Columns in the WHERE Clause: Use untransformed column values.
For example, use:
WHERE a.order_no = b.order_no
Rather than
WHERE TO_NUMBER (SUBSTR(a.order_no, INSTR(b.order_no, '.') - 1))
= TO_NUMBER (SUBSTR(a.order_no, INSTR(b.order_no, '.') - 1))
Combine Multiples Scans with CASE Statements: Often, it is necessary to calculate
different aggregates on various sets of tables. Usually, this is done with multiple scans on the
table, but it is easy to calculate all the aggregates with one single scan. Eliminating n-1 scans
can greatly improve performance.
Combining multiple scans into one scan can be done by moving the WHERE condition of each
scan into a CASE statement, which filters the data for the aggregation. For each aggregation,
there could be another column that retrieves the data.
The following example has count of all employees who earn less then 2000, between 2000 and
4000, and more than 4000 each month. This can be done with three separate queries.
SELECT COUNT (*)
6
FROM emp
WHERE sal < 2000;
SELECT COUNT (*)
FROM emp
WHERE sal BETWEEN 2000 AND 4000;
SELECT COUNT (*)
FROM emp
WHERE sal>4000;
However, it is more efficient to run the entire query in a single statement. Each number is
calculated as one column. The count uses a filter with the CASE statement to count only the
rows where the condition is valid. For example:
SELECT COUNT (CASE WHEN sal < 2000
THEN 1 ELSE null END) count1,
COUNT (CASE WHEN sal BETWEEN 2001 AND 4000
THEN 1 ELSE null END) count2,
COUNT (CASE WHEN sal > 4000
THEN 1 ELSE null END) count3
FROM emp;
Please feel free to write your questions/comments at vkaushik@dbapool.com

zb0618

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
提高sql性能资源(1)

五种提高 SQL 性能的方法(转)从 INSERT 返回 IDENTITY 我决定从遇到许多问题的内容入手：如何在执行 SQL INSERT 后检索 IDENTITY 值。通常，问题不在于如何编写检索值的查询，而在于在哪里以及何时进行检索。在 SQL Server 中，下面的语句可用于检索由最新在活动数据库连接上运行的 SQL 语句所创建的 IDENTITY 值：SELECT @@IDEN
复制链接

扫一扫

专栏目录