如何使用SQL Server数据压缩来节省空间

匹诺丶老曹

已于 2022-09-06 14:20:47 修改

阅读量2.3k

点赞数

分类专栏：数据库节省空间 SqlServer 文章标签：数据库 sql sqlserver

于 2022-09-02 17:10:18 首次发布

本文链接：https://blog.csdn.net/weixin_43628744/article/details/126667165

版权

数据库同时被 3 个专栏收录

3 篇文章 0 订阅

订阅专栏

节省空间

1 篇文章 0 订阅

订阅专栏

SqlServer

1 篇文章 0 订阅

订阅专栏

本文介绍了如何使用SQL Server的内置函数和动态管理视图来跟踪数据库空间使用情况，重点关注数据压缩以节省存储空间。文章展示了如何找出未压缩的表和索引，以及如何估算压缩可能带来的空间节省。通过压缩数据，可以显著减少磁盘占用，同时减少I/O操作，提高系统性能。文章还提供了查询示例以检查文件系统空间并确定是否需要收缩文件。

摘要由CSDN通过智能技术生成

In the previous article, on tracking database file storage usage How to track SQL Server database space usage with built-in functions and DMVs, I introduced some ideas and calculations for figuring out when your database is going to run out of headroom on the file system – hopefully in time to get an order in the works to increase that space!

在上一篇文章中，有关跟踪数据库文件存储使用情况如何使用内置函数和DMV跟踪SQL Server数据库空间使用情况，我介绍了一些思路和计算方法，用于弄清数据库何时将耗尽文件系统上的余量。 –希望能及时得到订单，以扩大该空间！

In the meantime, though, you know it will be smart to do all you can to minimize the storage footprint of your database. SQL Server data compression, available in Enterprise versions, is one such approach. The idea is simple: Find repetitive sections of data in a table and store unique information just per row or page or column store index. Modern compression algorithms can reduce the on-disk footprint by 40-60% or even more, depending on the type of data.

但是，与此同时，您知道尽一切努力将数据库的存储空间降至最低是明智的。企业版中提供SQL Server数据压缩就是这样一种方法。这个想法很简单：在表中查找数据的重复部分，并仅按行，页面或列的存储索引存储唯一信息。根据数据类型的不同，现代压缩算法可以减少40-60％甚至更多的磁盘占用空间。

Compressing your SQL Server data is a simple way to cram more into your limited disk space. In this article, I’ll start off by showing you how you can find out what tables and indexes are not compressed. This will give you vital insight into what to prioritize when you get to work on those space hogs!

压缩SQL Server数据是一种将更多数据填充到有限磁盘空间中的简单方法。在本文中，我将向您展示如何找出未压缩的表和索引。这将为您提供重要的见解，使您可以了解在处理这些太空猪时应优先考虑的事项！

Before we get started, let me answer a question you may be asking about performance. Compressed tables use extra CPU cycles to compress/decompress their contents. However, you will need less I/O to read and write those tables, since they now occupy (much) less space on disk. Unless you are severely CPU-bound – in which case you have bigger problems – compression may well enhance your system throughput by reducing I/O.

在开始之前，让我回答一个您可能会问到的性能问题。压缩表使用额外的CPU周期来压缩/解压缩其内容。但是，您将需要更少的I / O来读取和写入这些表，因为它们现在在磁盘上占用的空间要少得多。除非您的CPU受到严重限制（在这种情况下您会遇到更大的问题），否则压缩可以通过减少I / O来很好地提高系统吞吐量。

哪些表和索引未压缩？ (What tables and indexes are not compressed?)
Two handy DMVs will give us this information. Here is a query that will extract a neat overview of compression opportunities:

两个方便的DMV将为我们提供此信息。这是一个查询，它将提取压缩机会的完整概述：

SELECT DISTINCT
    s.name,
    t.name,
    i.name,
    i.type,
    i.index_id,
    p.partition_number,
    p.rows
FROM sys.tables t
LEFT JOIN sys.indexes i
ON t.object_id = i.object_id
JOIN sys.schemas s
ON t.schema_id = s.schema_id
LEFT JOIN sys.partitions p
ON i.index_id = p.index_id
    AND t.object_id = p.object_id
WHERE t.type = 'U' 
  AND p.data_compression_desc = 'NONE'
ORDER BY p.rows desc

Running this against the AdventureWorks database, you might get a result set like this:

在AdventureWorks数据库上运行此命令，可能会得到如下结果集：
在这里插入图片描述

Without even considering the contents of these tables, you can see that the SalesOrderDetail table is the largest. It may be a good candidate for compression. By the way, the type column indicates the type of the index. 1 is for clustered indexes. 2 means a non-clustered index. 0 (not shown in this example) is a heap. Recall that a table is either a heap or a clustered index.

甚至不用考虑这些表的内容，就可以看到SalesOrderDetail表是最大的。它可能是压缩的不错选择。顺便说一句，类型列指示索引的类型。 1用于聚集索引。 2表示非聚集索引。 0（在此示例中未显示）是堆。回想一下表是堆还是聚集索引。

我将节省多少空间？ (How much space will I save?)
SQL Server provides an easy-to-use stored procedure to estimate the savings compression can give you. Since there are two types of compression, ROW and PAGE, let’s try both:

SQL Server提供了一个易于使用的存储过程，以估算压缩可以为您节省的费用。由于压缩有两种类型，即ROW和PAGE，因此我们尝试两种方式：

EXEC sp_estimate_data_compression_savings 
    @schema_name = 'Sales', 
    @object_name = 'SalesOrderDetail', 
    @index_id = NULL, 
    @partition_number = NULL, 
    @data_compression = 'ROW'
 
EXEC sp_estimate_data_compression_savings 
    @schema_name = 'Sales', 
    @object_name = 'SalesOrderDetail', 
    @index_id = NULL, 
    @partition_number = NULL, 
    @data_compression = 'PAGE'

These calls yielded the results:

这些调用产生了结果：
在这里插入图片描述

I highlighted the columns showing the current compression level (none) and the requested compression (ROW first, then PAGE). Clearly, PAGE compression is the best option for index ids 1 and 3. Index id 2 does not seem to benefit so much. Referring to the earlier results, you can see that indexes 1 and 3 are the primary key and a non-clustered index. Index two, as the name implies, is an “Alternate Key” on a row GUID. GUIDs are not very compressible (nor should they be! If they were, that would imply lots of repeated characters, which would be very bad GUIDs indeed!) so it’s not worth trying.

我突出显示了显示当前压缩级别（无）和请求的压缩（首先是ROW，然后是PAGE）的列。显然，PAGE压缩是索引ID 1和3的最佳选择。索引ID 2似乎没有太大好处。参考早期的结果，您可以看到索引1和3是主键和非聚集索引。顾名思义，索引二是行GUID上的“备用键”。 GUID不是非常可压缩的（也不应该是可压缩的！如果是的话，则意味着很多重复的字符，这实际上是非常糟糕的GUID！），因此不值得尝试。

好！让我们压缩一下！ (OK! Let’s get things compressed!)
Once you know what to compress, the actual action is easy:

一旦你知道压缩，实际的动作是十分容易的：

ALTER INDEX PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID 
        ON Sales.SalesOrderDetail 
        REBUILD PARTITION = ALL 
        WITH (DATA_COMPRESSION = PAGE);
 
ALTER INDEX IX_SalesOrderDetail_ProductID
        ON Sales.SalesOrderDetail 
        REBUILD PARTITION = ALL 
        WITH (DATA_COMPRESSION = PAGE);

We’ll compress the two candidates we found. The actual run time of the compression operation is dependent on the size of the tables and indexes, of course. For a large table, expect to wait a bit! Also, note that I specified “PARTITION = ALL”. For AdventureWorks, all tables are in a single partition, so this is fine, though I could have written “PARTITION = 1” with the same result. Imagine, though, that you have very large tables that are partitioned. In such a case, it may be prudent to compress one partition at a time, rather than hogging the system while you compress all partitions at once.

我们将压缩找到的两个候选对象。当然，压缩操作的实际运行时间取决于表和索引的大小。对于大桌子，请稍等！另外，请注意，我指定了“ PARTITION = ALL”。对于AdventureWorks，所有表都在单个分区中，所以这很好，尽管我可以用相同的结果编写“ PARTITION = 1”。但是，请想象一下，您有非常大的分区表。在这种情况下，最好一次压缩一个分区，而不是在一次压缩所有分区时占用系统空间。

我的表是压缩的，那么文件系统空间呢？ (My tables are compressed, so what about the file system space?)
Say you’ve gone through the exercise of compressing everything that looks worth the effort. That may mean that your partitions (or the PRIMARY partition if your tables are not partitioned) are now only half-full. Harking back to the previous article, you may wonder if you can now get file system space back. The answer, as always, is – “It depends.”

假设您已经完成了压缩所有看起来值得努力的工作。这可能意味着您的分区（如果您的表未分区，则为PRIMARY分区）现在仅占一半。回到上一篇文章，您可能想知道是否现在可以重新获得文件系统空间。与往常一样，答案是–“取决于情况。”

If your tables have simply been growing at the end of their clustered indexes (or if they are heaps), then I’d raise “It depends” to “quite possible”. If your tables have seen lots of inserts in the middle, updates, or deletes, then maybe not. When you compress a number of tables in a partition, SQL will not defragment the partition. That may mean that the partitions have no space to give up. If that occurs, the only remedy is to copy the partition to another filegroup, table by table, then delete the original file.

如果您的表只是在它们的聚集索引的末尾（或者它们是堆）增长，那么我将“取决于”变为“相当可能”。如果您的表在中间有很多插入，更新或删除，则可能没有。当您压缩分区中的许多表时，SQL不会对分区进行碎片整理。这可能意味着分区没有空间可以放弃。如果发生这种情况，唯一的解决方法是将分区逐表复制到另一个文件组，然后删除原始文件。

You can see if there is any space to give back with a fairly simple query:

您可以通过一个相当简单的查询来查看是否有空间可以回馈：

SELECT name,
    s.used / 128.0                  AS SpaceUsedInMB,
    size / 128.0 - s.used / 128.0   AS AvailableSpaceInMB
FROM sys.database_files
CROSS APPLY 
    (SELECT CAST(FILEPROPERTY(name, 'SpaceUsed') AS INT)) 
s(used)
WHERE FILEPROPERTY(name, 'SpaceUsed') IS NOT NULL;

After compressing the two indexes above, I get the result:

压缩上面的两个索引后，我得到结果：
在这里插入图片描述

So, there is some space available for growth. If I wanted to reclaim some of it, I can use the command:

因此，存在一些可用于增长的空间。如果要回收其中的一些内容，可以使用以下命令：

DBCC SHRINKFILE (N’AdventureWorks2014_Data’, 202)

This will shrink the data file for the database, giving back some storage to the file system. I’ve created a little stored procedure that does a bit more. It looks at all the files for a given database, then shrinks those that:

这将收缩数据库的数据文件，将一些存储空间还给文件系统。我创建了一个小的存储过程，它做了更多的工作。它查看给定数据库的所有文件，然后收缩那些文件：

Are large enough to bother with
足够大，可以打扰
Have more than a given amount of free space
拥有超过给定数量的可用空间
Are on a specific drive (for databases spread across multiple drives)
在特定驱动器上（用于跨多个驱动器分布的数据库）
It first builds a table of candidate files, then processes the files, issuing a DBCC command to shrink those that pass the tests. Consider this proc to be a work in progress. Use it, borrow from it, enhance it or ignore it! I’ve included it at the end of this article

它首先建立一个候选文件表，然后处理这些文件，发出DBCC命令以缩小通过测试的文件。将此过程视为正在进行中的工作。使用，借鉴，增强或忽略它！我已经在本文结尾处添加了它

摘要 (Summary)
Compression is a great way to get more mileage out of your database. It’s not hard to find candidate tables and even easier to actually compress them. Plus, as a bonus, your system will the do

压缩是从数据库中获取更多收益的好方法。找到候选表并不难，甚至可以更容易地对其进行压缩。另外，作为奖励，您的系统会做

the same work with less I/O!

只需更少的I / O就可以完成相同的工作！

Previous articles in this series:

本系列以前的文章：

How to track SQL Server database space usage with built-in functions and DMVs如何使用内置函数和DMV跟踪SQL Server数据库空间使用情况
Discovering database specific information using built-in functions and dynamic management views (DMVs)使用内置功能和动态管理视图（DMV）发现特定于数据库的信息
Discovering SQL server instance information using system views使用系统视图发现SQL Server实例信息
Discovering more SQL Server information using the built-in dynamic management views (DMVs)使用内置的动态管理视图（DMV）发现更多SQL Server信息
看更多 (See more)
Consider these free tools for SQL Server that improve database developer productivity.

考虑使用这些免费SQL Server工具来提高数据库开发人员的生产力。

参考资料 (References)
Data Compression资料压缩
sp_estimate_data_compression_savingssp_estimate_data_compression_savings
FILEPROPERTY文件属性
DBCC SHRINKFILEDBCC缩略文件
缩小文件存储过程 (Shrink Files Stored Procedure)

IF object_id(N'tempdb..#shrink', N'P') IS NOT NULL 
    DROP PROC #shrink
go
CREATE PROC #shrink
	@freepct float = 10.0,
	@drive char(1) = 'I',
	@minsize int = 10,
	@debug bit = 1
 
AS BEGIN
     SET NOCOUNT ON;
 
     DECLARE @files TABLE (fn sysname primary key, used float, free float);
     INSERT INTO @files (fn, used, free)
        SELECT name,
            s.used/128.0                    AS SpaceUsedInMB,
            (size/128.0) - (s.used/128.0)   AS AvailableSpaceInMB
        FROM sys.database_files
        CROSS APPLY 
            (SELECT CAST(FILEPROPERTY(name, 'SpaceUsed') AS INT)) 
        s(used)
        WHERE LEFT(physical_name, 1) = @drive
          AND FILEPROPERTY(name, 'SpaceUsed') IS NOT NULL;
 
    DECLARE cur CURSOR LOCAL READ_ONLY FAST_FORWARD FORWARD_ONLY FOR
	   SELECT fn, used, free 
	   FROM @files
	   WHERE free + used &gt; @minsize	
	   AND free/used &gt;= @freepct/100
 
    DECLARE @fn sysname, @used float, @free float;
 
    OPEN cur
    WHILE 1 = 1 BEGIN 
	   FETCH NEXT FROM cur INTO @fn, @used, @free
	   IF @@FETCH_STATUS &lt;&gt; 0 BREAK
 
   DECLARE @newsize int = @used + @used*@freepct/100
	   DECLARE @stmt nvarchar(255) = N'
		 DBCC SHRINKFILE (N''' 
		 + @fn + ''', ' 
		 + cast(@newsize AS varchar(10))
		 + ')
		  '
	   RAISERROR(@stmt, 0, 0) WITH NOWAIT;;
	   IF @debug = 0 EXEC sp_executesql @stmt = @stmt
    END
 
    CLOSE cur
    DEALLOCATE cur
END

翻译自: https://www.sqlshack.com/use-sql-server-data-compression-save-space/