错误SQL Server统计信息– SQL查询性能杀手–更新SQL Server统计信息

previous part of this series, inaccurate statistics can degrade SQL Server performance. We described how to work with SQL Server statistics using SQL Server Management Studio options and T-SQL. In this article, we will show how to update SQL Server statistics, what are the updating costs, and when updating is recommended. 前一部分 ,不正确的统计信息可能会降低SQL Server性能。 我们描述了如何使用SQL Server Management Studio选项和T-SQL处理SQL Server统计信息。 在本文中,我们将展示如何更新SQL Server统计信息,更新成本是多少以及何时建议更新。

如何更新SQL Server统计信息? (How to update SQL Server statistics?)

The DBCC SHOW_STATISTICS statement shows statistics for a specific table.

DBCC SHOW_STATISTICS语句显示特定表的统计信息。

 
DBCC SHOW_STATISTICS ("Person.Address", PK_Address_AddressID)
    

Dialog showing statistics for a specific table using DBCC SHOW_STATISTICS statement

The statistics shown by this command are created using the default sampling rate. Note that the number of sampled rows is the same as the total number of table rows.

该命令显示的统计信息是使用默认采样率创建的。 请注意,采样的行数与表行的总数相同。

Query Optimizer updates statistics whenever it determines it’s needed. In some situations, statistics are not automatically updated and optimal performance is not provided. That’s when SQL Server statistics should be manually updated.

Query Optimizer会在需要时更新统计信息。 在某些情况下,统计信息不会自动更新,也无法提供最佳性能。 那时应该手动更新SQL Server统计信息。

Although updated SQL Server statistics provide better execution plans, keep in mind that updating requires time and query recompilation also, which can slow down SQL Server performance. Therefore, frequent statistics updates should be avoided.

尽管更新SQL Server统计信息提供了更好的执行计划,但是请记住,更新还需要时间和查询重新编译,这可能会降低SQL Server的性能。 因此,应避免频繁更新统计信息。

Another method to update statistics is to use the sp_updatestats stored procedure, but this is recommended only for advanced users, as the procedure updates statistics for all tables and indexed views in the database, which can significantly downgrade SQL Server performance.

更新统计信息的另一种方法是使用sp_updatestats存储过程,但这仅建议高级用户使用,因为该过程将更新数据库中所有表和索引视图的统计信息,这会大大降低SQL Server的性能。

 
EXEC sp_updatestats
    

To update statistics on a specific table or change the sampling rate used to create statistics, use the UPDATE STATISTICS statement.

要更新特定表上的统计信息或更改用于创建统计信息的采样率,请使用UPDATE STATISTICS语句。

 
UPDATE STATISTICS <table_name>
    

If there were no data changes on the table, UPDATE STATISTICS will have no effect. Even ten day old statistics will be accurate. On a table with frequent data changes, statistics old an hour can be obsolete and inaccurate.

如果表上没有数据更改,则UPDATE STATISTICS将无效。 即使是十天的统计数据也将是准确的。 在数据频繁更改的表上,一个小时以前的统计信息可能过时且不准确。

In this example, we will use the AdventureWorks database, the Person.Address table. The table contains 19,614 rows and we will insert additional 10,000 records.

在此示例中,我们将使用AdventureWorks数据库Person.Address表。 该表包含19,614行,我们将插入10,000条记录。

 
DECLARE @i int
SET @i = 0
WHILE @i < 10000
    BEGIN
        INSERT INTO Person.Address( AddressLine1, 
                                    AddressLine2, 
                                    City, 
                                    StateProvinceID, 
                                    PostalCode, 
                                    rowguid, 
                                    ModifiedDate )
        VALUES( 'Adr1', 
                'AddressLine2', 
                'New York', 
                78, 
                98011, 
                NEWID(), 
                GETDATE())
        SET @i = @i + 1
    END
    

Then, we will view the table statistics by executing:

然后,我们将通过执行以下命令查看表统计信息:

 
DBCC SHOW_STATISTICS ("Person.Address", PK_Address_AddressID); 
    

The statistics shown are old and inaccurate. Instead of 29,614 rows, there are only 19,614. All other parameters shown are also obsolete.

显示的统计数据过旧且不准确。 而不是29,614行,只有19,614行。 显示的所有其他参数也已过时。

Dialog showing old and inaccurate SQL Server statistics

Although the statistics are inaccurate, the estimated query execution plan shows the correct number of records. However, this is the case only for the tables with a small number of records, such as this one.

尽管统计信息不准确,但是估计的查询执行计划显示了正确的记录数。 但是,仅对于具有少量记录的表(例如该表)才是这种情况。

Dialog showing the estimated query execution plan

After updating table statistics, the correct values are shown.

更新表统计信息后,将显示正确的值。

 
UPDATE STATISTICS Person.Address
    

Updating table statistics - the correct values are shown

Note the All density value in both cases. The All density value is calculated as 1/total number of distinct rows. In the first case it’s
1/19,614 = 0.00005098099 = 5.098399102681758e-5

在两种情况下,请注意“ 所有密度”值。 所有密度值计算为不同行的总数的1 /。 在第一种情况下
1 / 19,614 = 0.00005098099 = 5.098399102681758e-5

In the second, it’s:
1/29,614 = 0.00003376781 = 3.376781252110488e-5

在第二个中,它是:
1 / 29,614 = 0.00003376781 = 3.376781252110488e-5

The values are of the same order of magnitude and the difference can be neglected. Even if the obsolete statistics are used, there will be almost no performance degradation. However, this is valid only for tables with a small number of records.

这些值具有相同的数量级,并且可以忽略差异。 即使使用了过时的统计信息,性能也几乎不会降低。 但是,这仅对具有少量记录的表有效。

To update statistics just for a specific table index, use the following syntax:

要仅针对特定表索引更新统计信息,请使用以下语法:

 
UPDATE STATISTICS <table_name> <index_name>
    

UPDATE STATISTICS参数 (UPDATE STATISTICS parameters)

The UPDATE STATISTICS statement has parameters that define table sampling rate.

UPDATE STATISTICS语句具有定义表采样率的参数。

FULLSCAN – new statistics are created by scanning all table/view rows and the number of Rows Sampled is equal to the number of the table/view rows. For tables with a small number of rows, even when this parameter is not specified, all table/view rows are sampled.

FULLSCAN –通过扫描所有表/视图行创建新的统计信息,并且“采样行数”等于表/视图行的数目。 对于行数少的表,即使未指定此参数,也会对所有表/视图行进行采样。

 
UPDATE STATISTICS Person.Address WITH FULLSCAN
    

SAMPLE – the new statistics are created by sampling a specific number of table/view rows.

样本–通过对特定数量的表/视图行进行采样来创建新的统计信息。

Using SAMPLE 100 PERCENT gives the same results as using the FULLSCAN parameter. SAMPLE and FULLSCAN cannot be used in the same UPDATE STATISTICS statement.

使用SAMPLE 100 PERCENT可获得与使用FULLSCAN参数相同的结果。 SAMPLE和FULLSCAN不能在同一UPDATE STATISTICS语句中使用。

 
UPDATE STATISTICS Person.Address WITH SAMPLE 10 PERCENT
    

SQL Server statistics are created by sampling all table rows

Don’t be surprised that even though you specified the exact percentage of the rows to be scanned, the statistics are created by sampling all table rows. This is what SQL Server does for tables with a small number of rows. This behavior provides accurate statistics for small tables, as updating statistics for small tables cost is less than the inaccurate statistics cost.

即使您指定了要扫描的行的确切百分比也不要感到惊讶,但统计信息是通过对所有表行进行采样而创建的。 这就是SQL Server对行数少的表的作用。 此行为为小型表提供了准确的统计信息,因为针对小型表的更新统计信息的费用小于不准确的统计信息的费用。

When SQL Server statistics on large tables are updated with the SAMPLE parameter, sampling percent isn’t ignored, and the number of sampled rows in lower than the number of table rows. The percentage specified is taken as the minimal number of rows that will be sampled. It’s usually higher.

当使用SAMPLE参数更新大表上SQL Server统计信息时,不会忽略采样百分比,并且采样行数低于表行数。 指定的百分比被视为要采样的最小行数。 通常更高。

Updating statistics on large tables takes much more time, and taking the sample percentage into account significantly reduces the time needed to update statistics.

在大型表上更新统计信息会花费更多时间,并且考虑到样本百分比会大大减少更新统计信息所需的时间。

We measured the time needed to update SQL Server statistics on the Person.Address table before the records were added, when the table had 19,614 rows. It took less than a second.

当表有19,614行时,我们测量了在添加记录之前更新Person.Address表上SQL Server统计信息所需的时间。 花了不到一秒钟的时间。

 
SET STATISTICS TIME ON 
UPDATE STATISTICS Person.Address WITH 
SET STATISTICS TIME OFF
    

SQL Server Execution Times:    CPU time = 203 ms, elapsed time = 552 ms.

SQL Server执行时间:CPU时间= 203毫秒,经过的时间= 552毫秒。

After adding more than a million rows to the Person.Address table, we updated its statistics using a ten percent sampling rate and measured the time needed to complete.

在Person.Address表中添加了超过一百万行之后,我们使用百分之十的采样率更新了其统计信息,并测量了完成所需的时间。

 
SET STATISTICS TIME ON 
UPDATE STATISTICS Person.Address WITH SAMPLE 10 PERCENT
SET STATISTICS TIME OFF
    

The time needed was more than 6 minutes.

所需时间超过6分钟。

SQL Server Execution Times:    CPU time = 2438 ms, elapsed time = 385063 ms.

SQL Server执行时间:CPU时间= 2438毫秒,经过的时间= 385063毫秒。

An example presenting the situation when number of the sampled rows is slightly higher than 10% of the total row number

The number of the sampled rows is slightly higher than 10% of the total row number. This example shows how expensive statistics updates on large tables are and why the SAMPLE option should be used whenever possible.

采样的行数略高于总行数的10%。 本示例说明了在大表上进行统计信息更新的代价如何,以及为什么应尽可能使用SAMPLE选项。

NORECOMPUTE – after the statistics are updated, the AUTO_UPDATE_STATISTICS option is set to off, and no further auto-updates of the statistics are possible, unless the option is set back to on. The statistics can be updated only by executing the UPDATE STATISTICS statement or sp_updatestats stored procedure.

NORECOMPUTE –更新统计信息后,AUTO_UPDATE_STATISTICS选项设置为off,除非该选项重新设置为on,否则无法进行统计信息的进一步自动更新。 只能通过执行UPDATE STATISTICS语句或sp_updatestats存储过程来更新统计信息。

 
UPDATE STATISTICS Person.Address WITH NORECOMPUTE
    

默认采样率是否足够好? (Is the default sampling rate good enough?)

In the example above, we showed how time-consuming updating statistics can be for a large number of table rows, and how the SAMPLE parameter can help. The next question is whether you should use the default SQL Server sampling rate, or specify your custom sampling rate using the SAMPLE parameter. There is no out-of-the-box answer, as it depends on your database structure, usage, data change frequency, performance requirements, etc.

在上面的示例中,我们展示了大量表行的更新统计信息如何耗时,以及SAMPLE参数如何提供帮助。 下一个问题是您应该使用默认SQL Server采样率,还是使用SAMPLE参数指定自定义采样率。 没有开箱即用的答案,因为这取决于您的数据库结构,使用情况,数据更改频率,性能要求等。

Here are some guidelines that can help you determine the right strategy.

以下是一些指南,可以帮助您确定正确的策略。

Update the statistics on a table by running UPDATE STATISTICS using the SQL Server default sampling rate (no sampling parameters added). Then, execute DBCC SHOW_STATISTICS to view the statistics and create a screenshot, you will need it for later. Run UPDATE STATISTICS on the same table again, but this time, add the FULLSCAN parameter. View the updated statistics and create a new screenshot.

通过使用SQL Server默认采样率(未添加采样参数)运行UPDATE STATISTICS,更新表上的统计信息。 然后,执行DBCC SHOW_STATISTICS以查看统计信息并创建一个屏幕截图,稍后将需要它。 再次在同一表上运行UPDATE STATISTICS,但是这次,添加FULLSCAN参数。 查看更新的统计信息并创建新的屏幕截图。

Compare the All density values for default and custom percentage sampling rate. If the values are different by an order of magnitude, the SQL Server default statistics sampling rate shouldn’t be used, as it provides statistics that are not accurate enough.

比较默认和自定义百分比采样率的所有密度值。 如果值相差一个数量级,则不应使用SQL Server默认统计信息采样率,因为它提供的统计信息不够准确。

Another useful method is to compare the Actual and Estimated number of rows in a query execution plan. If they are different by an order of magnitude, the SQL Server statistics for the queried table were inaccurate, so investigate it for later executions.

另一个有用的方法是比较查询执行计划中的实际行数和估计行数。 如果它们相差一个数量级,则查询表SQL Server统计信息不准确,因此请对其进行调查以供以后执行。

Looking at the Actual and Estimated number of rows in a SQL query execution plan

In this article, we showed how to manually update SQL Server statistics, how data is sampled in small and large tables, and gave recommendations how to determine whether manual statistics updates with a custom sampling rate are needed. Keep in mind a performance tradeoff between SQL Server statistics updating and optimal query execution plans.

在本文中,我们展示了如何手动更新SQL Server统计信息,如何在大小表中对数据进行采样,并提供了有关如何确定是否需要使用自定义采样率进行手动统计信息更新的建议。 请记住,在SQL Server统计信息更新和最佳查询执行计划之间进行性能折衷。

翻译自: https://www.sqlshack.com/inaccurate-sql-server-statistics-sql-query-performance-killer-update-sql-server-statistics/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值