清理SQL Server中的旧代码和未使用的对象

最新推荐文章于 2024-01-26 09:55:00 发布

culuo4781

最新推荐文章于 2024-01-26 09:55:00 发布

阅读量331

点赞数

文章标签：数据库 python mysql java linux

原文链接：https://www.sqlshack.com/cleaning-up-old-code-and-unused-objects-in-sql-server/

版权

In this article, we are going to talk about discovering and archiving SQL references to invalid procedures.

在本文中，我们将讨论发现和归档对无效过程SQL引用。

We have a significant amount of database code within our database procedures we no longer use, as we’ve changed objects referenced in procedures. As we’ve moved to source control for all of our databases, we’re seeing the costs of keeping old code in our environment that we no longer use. The trouble is that we don’t know how to identify what we use since we have many teams who develop objects in our databases and use them for the features they create. What are some techniques for cleaning old database code on the database level as well as preventing this problem in the future?

由于我们已更改了过程中引用的对象，因此我们不再使用的数据库过程中有大量数据库代码。随着我们转向所有数据库的源代码控制，我们看到了将旧代码保留在我们不再使用的环境中的成本。问题在于我们不知道如何识别使用的东西，因为我们有许多团队在数据库中开发对象并将其用于其创建的功能。有什么技术可以在数据库级别清除旧数据库代码以及将来防止此问题？

总览 (Overview)

Outside of the storage costs and repository costs where we must sync with many objects, old database code in the form of stored procedures that we no longer use raises security risks, as procedures provide access to data through CRUD operations. If we don’t use database code, we should practice eliminating it immediately. It is a major security risk to have stored procedures that are no longer used. As for functions, this depends on the function in question, but I recommend the practice of only keeping what’s required. We can apply the same lessons to views, users, roles, etc as well – the more database objects we use, the more we need to restrict or eliminate as we change our environment.

除了必须与许多对象同步的存储成本和存储库成本外，不再使用的存储过程形式的旧数据库代码还会带来安全风险，因为过程可通过CRUD操作提供对数据的访问。如果我们不使用数据库代码，则应立即练习消除它。具有不再使用的存储过程是主要的安全风险。至于功能，这取决于所讨论的功能，但是我建议仅保留所需内容的做法。我们也可以将相同的课程应用于视图，用户，角色等–我们使用的数据库对象越多，我们在更改环境时就需要限制或消除的对象就越多。

For the best practice to reduce effort, if we replace a database object with another object (such as replacing an old procedure with a new procedure), we should remove the old object immediately. We can archive the old object for SQL reference in an archive source control location, but we should never keep the object live. This best practice will prevent a database from possessing objects that are no longer used but could possibly reveal inappropriate information to a malicious actor.

为了减少工作量的最佳实践，如果我们用另一个对象替换数据库对象（例如用新过程替换旧过程），则应立即删除旧对象。我们可以在存档源代码控制位置中存档旧对象以供SQL参考，但是我们永远不要保持该对象处于活动状态。这种最佳做法将防止数据库拥有不再使用但可能会向恶意行为者泄露不适当信息的对象。

演示存储过程 (Demonstration with A Stored Procedure)

In the below code, we create a table that we’ll use for an experiment. Once we create the table, we create a procedure that inserts a record from a parameter the procedure accepts into our newly created table. We then test the procedure a few times, query the table, then drop the table. Now, when we execute the stored procedure, we get an error – the procedure fails. As we see in the below code, we can remove a table that’s referenced in a procedure.

在下面的代码中，我们创建一个表用于实验。创建表后，我们将创建一个过程，该过程将从该过程接受的参数中插入一条记录到新创建的表中。然后，我们对该过程进行几次测试，查询表，然后删除表。现在，当我们执行存储过程时，我们会收到一个错误–该过程失败。如下面的代码所示，我们可以删除过程中引用的表。

--- Window 1
CREATE TABLE DataEntry(
	TextId TINYINT IDENTITY(1,1),
	TextEntry VARCHAR(9)
)
 
--- Window 2
CREATE PROCEDURE addDataEntry
@string VARCHAR(9)
AS
BEGIN
	INSERT INTO Dataentry (TextEntry) VALUES (@string)
END
 
EXEC addDataEntry 'ref-1'
EXEC addDataEntry 'ref-2'
EXEC addDataEntry 'ref-3'
EXEC addDataEntry 'ref-4'
EXEC addDataEntry 'ref-5'
 
SELECT *
FROM DataEntry
 
DROP TABLE DataEntry
 
EXEC addDataEntry 'ref-6'

Our query shows the results of the data we added.

我们的查询显示了我们添加的数据的结果。

We can remove the table, but now when we call the procedure, the procedure fails since the table doesn’t exist.

我们可以删除该表，但是现在当我们调用该过程时，该过程将失败，因为该表不存在。

This may be one of the complexities that we’re solving for: an invalid SQL reference where an object refers to another object that no longer exists. This scenario could also be a procedure calling another procedure with the second procedure not existing.

这可能是我们要解决的复杂问题之一：无效SQL引用，其中一个对象引用另一个不再存在的对象。此方案也可以是一个过程，该过程在不存在第二个过程的情况下调用另一个过程。

删除之前识别SQL引用 (Identifying SQL References Before a Drop)

One clear solution here is to identify all references prior to dropping the referenced object and the same applies if we rename the object. Using the above example of removing a table, we could look for all the references to the table we’re dropping in our procedures, views, etc. This code becomes invalid if they reference the removed object and removing them with it reduces possible errors. There may be cases where we want to update the reference if we’ve renamed the object or created a new object to replace the old one. The flow of removing an object should be to identify references first and include these references as objects to be removed if we need to remove an object. The below diagram shows this.

一种明确的解决方案是在删除引用的对象之前先识别所有引用，如果我们重命名该对象，则同样适用。使用上面的删除表的示例，我们可以在过程，视图等中查找要删除的表的所有引用。如果这些代码引用了删除的对象，并使用它们删除它们可以减少可能的错误，则此代码将无效。在某些情况下，如果我们已重命名对象或创建了一个新对象来替换旧对象，则我们想更新引用。删除对象的流程应该是首先标识引用，如果需要删除对象，则将这些引用包括为要删除的对象。下图显示了这一点。

Using the above image as an example, if we remove the referenced table, notice the impact it would cause on all the other references to the table – they would be invalid.

以上面的图像为例，如果我们删除引用的表，请注意它将对表的所有其他引用造成影响–它们将是无效的。

The same logic applies to other changes from renaming objects to changing data types to changing primary keys (which should be extremely rare, but unfortunately happen). These changes can have significant effects on code that reference these. We must account for all dependent code which will be affected prior to making the change.

相同的逻辑适用于从重命名对象到更改数据类型再到更改主键的其他更改（这应该非常少见，但不幸的是发生了）。这些更改可能会对引用这些代码的代码产生重大影响。在进行更改之前，我们必须考虑所有将受影响的相关代码。

Assuming that we use source control for our database – from database objects to application code – we can iterate over our files using PowerShell and obtain SQL references. We’ll want to use searches that reflect how we name objects; notice the nuances in the below select statements for one table – all of which return the same data set, but query the table differently:

假设我们对数据库使用源代码管理–从数据库对象到应用程序代码–我们可以使用PowerShell遍历文件并获取SQL引用。我们将要使用反映我们如何命名对象的搜索；请注意下面的select语句中的一张表的细微差别-所有这些语句都返回相同的数据集，但对表的查询不同：

SELECT *
FROM etlMidLoader
 
SELECT *
FROM dbo.etlMidLoader
 
SELECT *
FROM [dbo].etlMidLoader
 
SELECT *
FROM [dbo].[etlMidLoader]
 
SELECT etlMidLoader.*
FROM [dbo].[etlMidLoader]
 
SELECT [etlMidLoader].*
FROM [dbo].[etlMidLoader]

Developers sometimes name tables the same name as column names, so searching for a table name may return results you don’t want to see, such as returning a column from a table you don’t want to remove or update. I tend to develop with the approach of restricting most object names (tables, views, procedures, etc) with a starting identifier either indicating the object type or purpose (tb or etl) or using the action it performs (helpful for views or procedures) and the reason is that searches through source control become fast as I never get a column and table clash. One of the best DBAs I worked with enforced this rule by removing any object that didn’t follow appropriate naming convention immediately without warning – it made developers create objects with the right name from the beginning and saved time when tracking down dependencies (this also saved a lot of time for other tracking purposes). However, this assumption would be inappropriate in many development environments because many environments don’t have strict practices like this.

开发商有时名称表相同的名称，列名，因此寻找一个表名可能返回你不希望看到的，如你不想删除或更新表返回一列结果。我倾向于以限制大多数对象名称（表，视图，过程等）的方法来开发，该方法使用起始标识符来表明对象类型或目的（tb或etl）或使用其执行的操作（有助于视图或过程）原因是通过源代码管理的搜索变得很快，因为我从未遇到列和表冲突。我与之合作的最好的DBA之一通过删除所有没有立即遵循适当命名约定的对象而没有发出警告就实施了此规则-它使开发人员从一开始就使用正确的名称创建对象，并在跟踪依赖项时节省了时间（这也节省了时间）很多时间用于其他跟踪目的）。但是，此假设在许多开发环境中将是不合适的，因为许多环境没有这样的严格实践。

In the below code, we look for the object etload in our source control by filtering on SQL files exclusively (our assumption here being that we’re using SQL for references) and we return the file name plus the line of the file we found it along with the line of text. This script is not procedure-biased either unless we only point it at a directory that holds procedures (relative to design); it will search all SQL files for references. Even comments should be reviewed, as it’s a common practice by developers to use multi-line comments. Relative to how specific we want our filter, we can edit the if statement in the below code to be very specific with regular expressions.

在下面的代码中，我们通过专门过滤SQL文件在源代码控制中查找对象etload （这里的假设是我们使用SQL作为引用），然后返回文件名加上找到的文件行以及文字行。除非我们仅将其指向保存过程的目录（相对于设计而言），否则该脚本不会受到过程的偏见。它将在所有SQL文件中搜索参考。甚至应评论评论，因为开发人员通常使用多行评论。相对于我们想要过滤器的具体程度，我们可以将以下代码中的if语句编辑为对于正则表达式非常具体。

Function Read-ForSpecificText { 
    Param(
        [Parameter(Mandatory=$true)][string]$file
        , [Parameter(Mandatory=$true)][string]$objectpattern
    )
    Process
    {
        $readfile = New-Object System.IO.StreamReader($file)
        $identifylineno = 0
 
        while (($fileline = $readfile.ReadLine()) -ne $null)
        {
            if ($fileline -match “$objectpattern”)
            {
                Write-Output (‘”’ + $file + ‘” ‘ + $identifylineno.ToString() + “: “ + $fileline)
            }
            $identifylineno++
        }
 
        $readfile.Close()
        $readfile.Dispose()
    }
}
 
$sourcecodesql = Get-ChildItem “C:\Src\Repo One\Reposins\” -Filter *.sql -Recurse
 
foreach ($sqlfile in $sourcecodesql)
{
    Read-ForSpecificText -file $sqlfile.FullName -objectpattern “etlload”
}

We see three objects reference our table with the filenames and the line numbers of the files.

我们看到三个对象用文件名和文件的行号引用了我们的表。

What we do in cases like this may be to alter the objects and remove the SQL reference, or archive the objects first – if we want to keep these even if the etlload table has been removed – and then alter the objects.

在这种情况下，我们可能要做的是更改对象并删除SQL引用，或者先存档对象（如果即使在etlload表已删除的情况下我们也要保留这些对象），然后更改对象。

Because some environments will write SQL code directly in their middleware or frontend code, this script can expand to those file extensions, such as .cs, etc.

由于某些环境会直接在其中间件或前端代码中编写SQL代码，因此该脚本可以扩展为这些文件扩展名，例如.cs等。

刷新存储过程 (Refreshing the Stored Procedure)

While removing referenced objects along with their references is the best practice as we can immediately prevent this issue from the beginning, if we’re in a situation where we need to remove objects that may be invalid, we can use T-SQL’s built-in sql_dependencies, which can assist us in some situations. If we use the above example for our stored procedure that references a table we’ll see the effects of the results when we query this when the table exists and when it doesn’t:

虽然删除引用的对象及其引用是最佳做法，因为我们可以从一开始就立即防止出现此问题，但是如果我们需要删除可能无效的对象，则可以使用T-SQL的内置函数sql_dependencies，在某些情况下可以为我们提供帮助。如果我们在引用表的存储过程中使用上面的示例，则当表存在和不存在时查询结果时，都会看到结果的效果：

--- Before drop
SELECT 
	t2.[name] ProcedureName
	, t3.[name] ReferenceName
FROM sys.sql_dependencies t1
	INNER JOIN sys.procedures t2 ON t1.object_id = t2.object_id
	INNER JOIN sys.objects t3 ON t1.referenced_major_id = t3.object_id
WHERE t2.[name] = 'addDataEntry'
 
--- After drop
SELECT 
	t2.[name] ProcedureName
	, t3.[name] ReferenceName
FROM sys.sql_dependencies t1
	INNER JOIN sys.procedures t2 ON t1.object_id = t2.object_id
	INNER JOIN sys.objects t3 ON t1.referenced_major_id = t3.object_id
WHERE t2.[name] = 'addDataEntry'

The results before we drop the DataEntry table.

删除DataEntry表之前的结果。

The results after we drop the DataEntry table

删除DataEntry表后的结果

We may be tempted to write a query from the above tables where we find no references, but in some cases, stored procedures may not reference any tables and this may be valid. In the same manner, a stored procedure may call dynamic code inside it, which would make our assumptions invalid (whereas the PowerShell script would catch this). For an example, let’s create a procedure with dynamic SQL that references our table:

我们可能会尝试从上面的表中编写查询，但找不到引用，但是在某些情况下，存储过程可能不会引用任何表，这可能是有效的。以同样的方式，存储过程可能会在其中调用动态代码，这会使我们的假设无效（而PowerShell脚本会捕获这一点）。例如，让我们使用动态SQL创建一个引用表的过程：

CREATE PROCEDURE addDataEntryDyn
AS
BEGIN
 
	EXEC('INSERT INTO Dataentry (TextEntry) VALUES (''String'')')
 
END
 
--- Recreate the table:
CREATE TABLE DataEntry(
	TextId TINYINT IDENTITY(1,1),
	TextEntry VARCHAR(9)
)
 
--- Call our new procedure
EXEC addDataEntryDyn
 
--- Query our table
SELECT * FROM DataEntry
 
--- No references?
SELECT 
	t2.[name] ProcedureName
	, t3.[name] ReferenceName
FROM sys.sql_dependencies t1
	INNER JOIN sys.procedures t2 ON t1.object_id = t2.object_id
	INNER JOIN sys.objects t3 ON t1.referenced_major_id = t3.object_id
WHERE t2.[name] = 'addDataEntryDyn'

No references found.

找不到参考。

Developers will sometimes call CRUD operations in this way, so we have to be careful in assuming that we’ll catch everything related with this query.

开发人员有时会以这种方式调用CRUD操作，因此在假设我们将捕获与此查询相关的所有内容时必须格外小心。

结论 (Conclusion)

Part of creating software involves working with dependent code where an object, like a stored procedure or view, references another object, such as a table. As we fix bugs and update our software, we may remove objects that cause failures or invalid SQL references. We also may lose code that’s valuable for future reference (thus my recommendation of archiving it some way). These invalid references, errors, or calls can introduce possible bugs along with possible security risks. The best practice is to catch references as early as possible, as it’s the easiest time to remove or update references if an object must be removed.

创建软件的一部分涉及使用相关代码，在该代码中，对象（如存储过程或视图）引用另一个对象（如表）。在修复错误和更新软件时，我们可能会删除导致失败或无效SQL参考的对象。我们还可能会丢失对以后的参考有价值的代码（因此，我建议以某种方式对其进行归档）。这些无效的引用，错误或调用会引入可能的错误以及可能的安全风险。最佳实践是尽早捕获引用，因为如果必须删除对象，这是删除或更新引用的最简单时间。