维度建模怎么生成维度表_如何快速生成大量维度表以用于报表应用程序

最新推荐文章于 2024-05-15 19:28:32 发布

culuo4781

最新推荐文章于 2024-05-15 19:28:32 发布

阅读量1.9k

点赞数

文章标签：数据库大数据 python mysql java

原文链接：https://www.sqlshack.com/quickly-generate-large-number-dimension-tables-reporting-applications/

版权

维度建模怎么生成维度表

描述 (Description)

When building reporting structures, we typically have the need to build fact and dimension tables to support the apps that will consume this data. Sometimes we need to generate large numbers of dimension tables to support application needs, such as in Tableau, Entity Framework, or Power BI.

在构建报告结构时，我们通常需要构建事实和维度表以支持将使用此数据的应用程序。有时我们需要生成大量维度表来支持应用程序需求，例如在Tableau，Entity Framework或Power BI中。

Creating this schema by hand is time-consuming and error-prone. Automating it can be a way to improve predictability, maintainability, and save a ton of time in the process!

手动创建此架构非常耗时且容易出错。自动化可以提高过程的可预测性，可维护性，并节省大量时间！

介绍 (Introduction)

Consider a database in which we have a number of tables. These tables may be fact tables in a data warehouse or OLTP-style tables with a variety of columns that contain dimension-type data. This is fairly typical in nearly any database we design. Important columns describe a job title, country of origin, pay grade, security clearance, or one of any other type of descriptive metadata.

考虑一个我们有许多表的数据库。这些表可以是数据仓库中的事实表，也可以是具有各种包含维类型数据的列的OLTP样式表。在我们设计的几乎所有数据库中，这都是相当典型的。重要列描述了职务，原籍国，工资等级，安全许可或任何其他类型的描述性元数据。

Many applications require or prefer these dimensions to be spelled out for them with literal lookup tables, rather than referencing the fact data directly. We can design, test, and implement each table as needed, and then add code to populate and maintain them. This solution is viable but time-consuming. The alternative that I’d like to present here is a solution that can generate dimension tables using a tiny amount of metadata and code as the backbone of the process.

许多应用程序要求或希望使用文字查找表来为这些尺寸拼写出来，而不是直接引用事实数据。我们可以根据需要设计，测试和实现每个表，然后添加代码以填充和维护它们。该解决方案是可行的，但是很耗时。我想在这里提出的替代方案是一种解决方案，可以使用少量的元数据和代码作为流程的主干来生成维度表。

We can model a process that would generate dimension tables like this:

我们可以对将生成尺寸表的过程进行建模，如下所示：

The area in the dotted lines represents what we will be creating in this article!

虚线区域表示我们将在本文中创建的内容！

设置我们的模式 (Setting up our schema)

Our solution relies on a table of metadata that briefly describes what tables & columns we would like to process and where to place the resulting dimension data. The goal is to keep everything as absolutely minimalist and simple as possible.

我们的解决方案依赖于元数据表，该元数据表简要描述了我们要处理的表和列以及在何处放置结果维数据。目的是使所有内容都尽可能地简约和简单。

A metadata table can be replaced with direct queries against system views if your schema is exceedingly well-organized. We could also hard-code these table names into the stored procedure, though I tend to prefer metadata, as it’s easier to modify. In addition, metadata updates are less risky and easier to maintain when larger and more complex applications may also be involved.

如果您的架构过于井井有条，则可以使用针对系统视图的直接查询来替换元数据表。我们也可以将这些表名硬编码到存储过程中，尽管我倾向于使用元数据，因为它更容易修改。此外，当还可能涉及更大，更复杂的应用程序时，元数据更新的风险较小，并且更易于维护。

In order to transform fact data into dimension data, we need to enumerate the following details of that data:

为了将事实数据转换为维度数据，我们需要枚举该数据的以下详细信息：

source schema
源模式
source table
源表
source column
源列
target dimension table name
目标尺寸表名称

With this information, we can scan a column’s contents, create a dimension table with the new name, and deposit that data into it. Here is a table definition that meets our needs:

有了这些信息，我们可以扫描列的内容，使用新名称创建维度表，然后将数据存储到其中。这是一个满足我们需求的表定义：

CREATE TABLE dbo.Dimension_Table_Metadata
	(	Dimension_ID SMALLINT IDENTITY(1,1) NOT NULL CONSTRAINT PK_Dimension_Table_Metadata PRIMARY KEY CLUSTERED,
		Target_Dimension_Table_Name VARCHAR(50) NOT NULL,
		Source_Fact_Schema_Name VARCHAR(128) NOT NULL,
		Source_Fact_Table_Name VARCHAR(128) NOT NULL,
		Source_Fact_Column_Name VARCHAR(50) NOT NULL
	);

We could use the source table as a natural clustered primary key, but I’ve chosen to use a surrogate instead. This allows for the possibility that you might need to copy the contents of a given column into multiple places. If needed, we won’t violate a primary key in the process.

我们可以将源表用作自然的集群主键，但是我选择了使用替代表。这允许您可能需要将给定列的内容复制到多个位置。如果需要，我们不会在此过程中违反主键。

For all demos in this article, we will use data available in AdventureWorks. This can easily be replicated with any string columns in any table in any source database. With this table created, let’s populate it with some test data:

对于本文中的所有演示，我们将使用AdventureWorks中提供的数据。这可以轻松地与任何源数据库中任何表中的任何字符串列一起复制。创建此表后，让我们用一些测试数据填充它：

INSERT INTO dbo.Dimension_Table_Metadata
	(Target_Dimension_Table_Name, Source_Fact_Schema_Name, Source_Fact_Table_Name, Source_Fact_Column_Name)
VALUES
	('Dim_Department_GroupName', 'HumanResources', 'Department', 'GroupName'),
	('Dim_JobTitle', 'HumanResources', 'Employee', 'JobTitle'),
	('Dim_PersonType', 'Person', 'Person', 'PersonType'),
	('Dim_ProductColor', 'Production', 'Product', 'Color'),
	('Dim_ProductInventoryShelf', 'Production', 'ProductInventory', 'Shelf'),
	('Dim_TransactionType', 'Production', 'TransactionHistory', 'TransactionType')

We can check and see how this looks by selecting all data from our new table:

我们可以通过从新表中选择所有数据来检查并查看其外观：

We can see that we have chosen 6 source columns, providing the schema name, table name, and column name, in order to point us directly at the data we are interested in. We have also chosen a target dimension table name. This is completely arbitrary, though I’ve made an attempt to be descriptive in naming it. You may choose whatever names fit your standard database schema naming conventions as this will have no bearing on performance or results.

我们可以看到，我们选择了6个源列，分别提供了架构名称，表名称和列名称，以便直接将我们指向我们感兴趣的数据。我们还选择了目标维度表名称。这是完全任意的，尽管我试图在描述时对其进行描述。您可以选择符合标准数据库架构命名约定的任何名称，因为这与性能或结果无关。

With these test columns ready, we may proceed to building a solution that will enumerate the contents of these columns into dimension data that could later be fed into a reporting tool, analytical engine, or some other destination.

在准备好这些测试列之后，我们可以继续构建解决方案，将这些列的内容枚举为维度数据，然后再将其提供给报告工具，分析引擎或其他某个目标。

解决方案 (The solution)

We have constructed a problem and outlined what data we wish to transform. Our goal is to use as little code as possible to use system views and our metadata to generate some simple, yet complete dimension tables.

我们已经构造了一个问题，并概述了我们希望转换的数据。我们的目标是使用尽可能少的代码来使用系统视图和元数据来生成一些简单而完整的维度表。

In order to do this, we will use a handful of system views in order to join and validate the metadata we have provided:

为了做到这一点，我们将使用少数几个系统视图，以便加入并验证我们提供的元数据：

Sys.tables: a list of all tables in a given database, along with a wide variety of attributes pertaining to them Sys.tables ：给定数据库中所有表的列表，以及与之相关的各种属性
Sys.schemas: a list of schemas in a given database, along with some basic information about them Sys.schemas ：给定数据库中的模式列表，以及有关它们的一些基本信息
Sys.columns: a list of all columns in a database, including the table they belong to and their data types Sys.columns ：数据库中所有列的列表，包括它们所属的表及其数据类型
Sys.types: a list of all available data types in a database Sys.types ：数据库中所有可用数据类型的列表

Each of these views can be joined together as follows to get a complete view of all tables/columns in a database:

这些视图中的每一个都可以按如下方式连接在一起，以获得数据库中所有表/列的完整视图：

SELECT
	schemas.name AS SchemaName,
	tables.name AS TableName,
	columns.name AS ColumnName,
	CASE
		WHEN columns.max_length = -1 THEN 'MAX'
		WHEN types.name IN ('char', 'varchar') THEN CAST(columns.max_length AS VARCHAR(MAX))
		ELSE CAST(columns.max_length / 2 AS VARCHAR(MAX))
	END AS ColumnSize,
	types.name AS TypeName
FROM sys.tables
INNER JOIN sys.columns
ON tables.object_id = columns.object_id
INNER JOIN sys.schemas
ON schemas.schema_id = tables.schema_id
INNER JOIN sys.types
ON columns.user_type_id = types.user_type_id
WHERE types.name IN ('char', 'nchar', 'nvarchar', 'varchar')
ORDER BY schemas.name, tables.name, columns.name;

The result is a big list of everything, but one that tells us the data types and lengths of the strings we are considering:

结果是所有内容的清单很大，但是却能告诉我们我们正在考虑的字符串的数据类型和长度：

Note that the length of an NVARCHAR or NCHAR column will be twice the length of a similar VARCHAR or CHAR column in sys.columns, hence the need to divide max_length by 2 when determining the actual column size that you would see in SQL Server.

请注意，NVARCHAR或NCHAR列的长度将是sys.columns中类似VARCHAR或CHAR列的长度的两倍，因此，在确定SQL Server中的实际列大小时，需要将max_length除以2。

We can now take the metadata we created earlier, combine it with this system view data, and generate useful dimension tables using it. Let’s create a stored procedure that will consume each row of metadata in our table and generate a single dimension table for each. We’ll walk through the process step-by-step in order to explain each query:

现在，我们可以获取之前创建的元数据，将其与该系统视图数据结合起来，并使用它来生成有用的维表。让我们创建一个存储过程，该过程将使用表中的每一行元数据，并为每行生成一个维表。我们将逐步介绍该过程，以解释每个查询：

IF EXISTS (SELECT * FROM sys.procedures WHERE procedures.name = 'Generate_Dimension_Tables')
BEGIN
	DROP PROCEDURE dbo.Generate_Dimension_Tables
END
GO
 
CREATE PROCEDURE dbo.Generate_Dimension_Tables
AS
BEGIN
	SET NOCOUNT ON;
 
	DECLARE @Sql_Command NVARCHAR(MAX) = '';

This is the housekeeping for our stored procedure. If it exists, drop it, create a new proc, set NOCOUNT, and declare a string @sql_command for use in executing the dynamic SQL needed to create new tables. Dynamic SQL is a huge convenience here, allowing us to add schema, table, and column names, as well as the column size into the CREATE TABLE statement.

这是我们存储过程的内务处理。如果存在，则将其删除，创建一个新的proc，设置NOCOUNT，并声明一个字符串@sql_command，以用于执行创建新表所需的动态SQL。动态SQL在这里非常方便，允许我们在CREATE TABLE语句中添加模式，表和列名称以及列大小。

While it’s possible to write this stored procedure without using dynamic SQL, the result would be significantly longer and more complex. We could theoretically create a dummy table, rename it, rename the columns, and resize the string column in order to achieve the same results, but I was hunting here for short and sweet, which that would not be.

尽管可以在不使用动态SQL的情况下编写此存储过程，但结果将明显更长且更复杂。从理论上讲，我们可以创建一个虚拟表，对其进行重命名，对列进行重命名，并调整字符串列的大小，以实现相同的结果，但是我在这里只是为了寻找简短而甜蜜的东西，而事实并非如此。

Next, we need to generate our dynamic SQL from the metadata table. Since there can be any number of rows in the table, I will opt to build a command string via a set based string approach, rather than using a CURSOR or WHILE loop:

接下来，我们需要从元数据表生成动态SQL。由于表中可以有任何数量的行，因此我将选择通过基于集合的字符串方法来构建命令字符串，而不是使用CURSOR或WHILE循环：

SELECT @Sql_Command = @Sql_Command + '
		USE AdventureWorks2016;
 
		IF EXISTS (SELECT * FROM sys.tables INNER JOIN sys.schemas ON schemas.schema_id = tables.schema_id WHERE tables.name = ''' + Dimension_Table_Metadata.Target_Dimension_Table_Name + ''' AND schemas.name = ''' + Dimension_Table_Metadata.Source_Fact_Schema_Name + ''')
		BEGIN
			DROP TABLE [' + Dimension_Table_Metadata.Source_Fact_Schema_Name + '].[' + Dimension_Table_Metadata.Target_Dimension_Table_Name + '];
		END
 
		CREATE TABLE [' + Dimension_Table_Metadata.Source_Fact_Schema_Name + '].[' + Dimension_Table_Metadata.Target_Dimension_Table_Name + ']
		(	[' + Dimension_Table_Metadata.Target_Dimension_Table_Name + '_Id] INT NOT NULL IDENTITY(1,1) CONSTRAINT [PK_' + Dimension_Table_Metadata.Target_Dimension_Table_Name + '] PRIMARY KEY CLUSTERED,
			[' + Dimension_Table_Metadata.Source_Fact_Column_Name + '] ' + types.name  + '(' + 
			CASE WHEN columns.max_length = -1 THEN 'MAX'
					WHEN types.name IN ('char', 'varchar') THEN CAST(columns.max_length AS VARCHAR(MAX))
					ELSE CAST(columns.max_length / 2 AS VARCHAR(MAX))
			END	+ ') NOT NULL);
 
		INSERT INTO [' + Dimension_Table_Metadata.Source_Fact_Schema_Name + '].[' + Dimension_Table_Metadata.Target_Dimension_Table_Name + ']
			([' + Dimension_Table_Metadata.Source_Fact_Column_Name + '])
		SELECT DISTINCT
			' + Dimension_Table_Metadata.Source_Fact_Table_Name + '.' + Dimension_Table_Metadata.Source_Fact_Column_Name + '
		FROM [' + Dimension_Table_Metadata.Source_Fact_Schema_Name + '].[' + Dimension_Table_Metadata.Source_Fact_Table_Name + ']
		WHERE ' + Dimension_Table_Metadata.Source_Fact_Table_Name + '.' + Dimension_Table_Metadata.Source_Fact_Column_Name + ' IS NOT NULL;'
	FROM dbo.Dimension_Table_Metadata
	INNER JOIN sys.tables
	ON tables.name = Dimension_Table_Metadata.Source_Fact_Table_Name
	INNER JOIN sys.columns
	ON tables.object_id = columns.object_id
	AND columns.name COLLATE database_default = Dimension_Table_Metadata.Source_Fact_Column_Name
	INNER JOIN sys.schemas
	ON schemas.schema_id = tables.schema_id
	AND schemas.name = Dimension_Table_Metadata.Source_Fact_Schema_Name
	INNER JOIN sys.types
	ON columns.user_type_id = types.user_type_id
	WHERE types.name IN ('char', 'nchar', 'nvarchar', 'varchar');

This dynamic SQL accomplishes a few tasks for us:

这个动态SQL为我们完成了一些任务：

Does the dimension table already exist? If so, drop it. We do this in case the column length or type has changed. This will ensure the resulting table is of the correct type/size. Creating it takes little effort.
尺寸表已经存在吗？如果是这样，请将其丢弃。如果列长或类型已更改，我们会这样做。这样可以确保结果表的类型/大小正确。创建它几乎不需要付出任何努力。
Create the dimension table. We’ll place it in the same schema as the source table and customize the table and column names to reflect where the data is being pulled from. We’ll retain the same data type, as well as column size.
创建尺寸表。我们将其放置在与源表相同的架构中，并自定义表和列名称以反映从中提取数据的位置。我们将保留相同的数据类型以及列大小。
Populate the new table with all distinct values from the source table. On a very large table, you’ll want this column to be indexed, or performance might be a bit slow. If it’s a reporting table with no OLTP workload against it, then you’ll have more flexibility available for a warehouse-style data load process such as this.
使用源表中所有不同的值填充新表。在一个非常大的表上，您希望将此列编入索引，否则性能可能会有点慢。如果它是一个没有OLTP工作负载的报告表，那么对于这样的仓库式数据加载过程，您将具有更大的灵活性。
We only insert non-NULL values. For the sake of this proof-of-concept, I saw little value in including NULL as a distinct dimension “value” and omitted it. If it’s important to your data model, feel free to include it. As always, though use caution when comparing against any NULLable column as equality and inequality operations will result in undesired results if ANSI_NULLS is set ON (the default and ANSI standard).
我们只插入非NULL值。为了进行概念验证，我认为将NULL作为不同的维“值”包含很少的值，并省略了它。如果它对您的数据模型很重要，请随时添加。与往常一样，尽管在与任何NULLable列进行比较时要谨慎，因为如果将ANSI_NULLS设置为ON（默认值和ANSI标准），则相等和不相等运算将导致不良结果。

Lastly, we execute our dynamic SQL and wrap up our stored procedure:

最后，我们执行动态SQL并包装存储过程：

EXEC sp_executesql @Sql_Command;
END
GO

Now, let’s take the new proc for a spin:

现在，让我们来试用一下新的proc：

EXEC dbo.Generate_Dimension_Tables;

The proc runs in under a second, with no results or indication that anything happened. If desired, feel free to add logging into the process to provide some level of feedback or error trapping, in the event that you want to be notified in the event of unexpected results.

该过程在一秒钟内运行，没有结果或表明有任何事情发生。如果需要，可以随时在过程中添加日志记录，以提供一定程度的反馈或错误捕获，以便在发生意外结果时希望得到通知。

When we expand and refresh the tables list for the database, we can see the following new dimension tables added:

当我们扩展和刷新数据库的表列表时，我们可以看到添加了以下新的维表：

HumanResources.Dim_JobTitle HumanResources.Dim_JobTitle
Person.Dim_PersonType Person.Dim_PersonType
Production.Dim_ProductColor Production.Dim_ProductColor
Production.Dim_ProductInventoryShelf Production.Dim_Product库存货架
Production.Dim_TransactionType Production.Dim_TransactionType

Note that HumanResources.Department.GroupName did not get processed by our stored procedure. If we look at the table, we can see why:

请注意，我们的存储过程未处理HumanResources.Department.GroupName 。如果看表，我们可以看到原因：

The column GroupName is defined as custom type “Name”, which is a NVARCHAR(50), but in sys.types will appear as Name instead. We can work around this if we wish by making an additional join to sys.types in order to find the corresponding system data type for the custom data type:

GroupName列定义为自定义类型“ Name ”，它是NVARCHAR（50），但在sys.types中将显示为Name。如果愿意，我们可以通过另外连接sys.types来解决此问题 ，以便为自定义数据类型找到相应的系统数据类型：

SELECT
		schemas.name AS SchemaName,
		tables.name AS TableName,
		columns.name AS ColumnName,
		USERDATATYPE.name AS UserDataType,
		SYSTEMDATATYPE.name AS SystemDataType
	FROM sys.tables
	INNER JOIN sys.columns
	ON tables.object_id = columns.object_id
	INNER JOIN sys.schemas
	ON schemas.schema_id = tables.schema_id
	INNER JOIN sys.types USERDATATYPE
	ON columns.user_type_id = USERDATATYPE.user_type_id
	INNER JOIN sys.types SYSTEMDATATYPE
	ON SYSTEMDATATYPE.user_type_id = USERDATATYPE.system_type_id
	WHERE schemas.name = 'HumanResources'
	AND tables.name = 'department'
	AND columns.name = 'GroupName';

The result of this query shows that the actual data type is NVARCHAR:

该查询的结果显示实际数据类型为NVARCHAR：

If we want to support user-defined data types, this additional join could provide insight into the original data type. Alternatively, we could simply pass in the data type as a literal and check both system and user-defined data types for the string types we are looking for. To achieve this, we would adjust our dynamic SQL as follows:

如果我们要支持用户定义的数据类型，则此附加联接可以提供对原始数据类型的深入了解。另外，我们可以简单地将数据类型作为文字传递，并检查系统和用户定义的数据类型是否包含所需的字符串类型。为此，我们将调整动态SQL，如下所示：

SELECT @Sql_Command = @Sql_Command + '
		USE AdventureWorks2016;
 
		IF EXISTS (SELECT * FROM sys.tables INNER JOIN sys.schemas ON schemas.schema_id = tables.schema_id WHERE tables.name = ''' + Dimension_Table_Metadata.Target_Dimension_Table_Name + ''' AND schemas.name = ''' + Dimension_Table_Metadata.Source_Fact_Schema_Name + ''')
		BEGIN
			DROP TABLE [' + Dimension_Table_Metadata.Source_Fact_Schema_Name + '].[' + Dimension_Table_Metadata.Target_Dimension_Table_Name + '];
		END
 
		CREATE TABLE [' + Dimension_Table_Metadata.Source_Fact_Schema_Name + '].[' + Dimension_Table_Metadata.Target_Dimension_Table_Name + ']
		(	[' + Dimension_Table_Metadata.Target_Dimension_Table_Name + '_Id] INT NOT NULL IDENTITY(1,1) CONSTRAINT [PK_' + Dimension_Table_Metadata.Target_Dimension_Table_Name + '] PRIMARY KEY CLUSTERED,
			[' + Dimension_Table_Metadata.Source_Fact_Column_Name + '] ' + USERDATATYPE.name +
			CASE WHEN USERDATATYPE.name IN ('char', 'nchar', 'nvarchar', 'varchar') THEN 
				'(' + 
					CASE WHEN columns.max_length = -1 THEN 'MAX'
							WHEN USERDATATYPE.name IN ('char', 'varchar') THEN CAST(columns.max_length AS VARCHAR(MAX))
							ELSE CAST(columns.max_length / 2 AS VARCHAR(MAX))
					END	+ ')'
			ELSE '' END + ' NOT NULL);
 
		INSERT INTO [' + Dimension_Table_Metadata.Source_Fact_Schema_Name + '].[' + Dimension_Table_Metadata.Target_Dimension_Table_Name + ']
			([' + Dimension_Table_Metadata.Source_Fact_Column_Name + '])
		SELECT DISTINCT
			' + Dimension_Table_Metadata.Source_Fact_Table_Name + '.' + Dimension_Table_Metadata.Source_Fact_Column_Name + '
		FROM [' + Dimension_Table_Metadata.Source_Fact_Schema_Name + '].[' + Dimension_Table_Metadata.Source_Fact_Table_Name + ']
		WHERE ' + Dimension_Table_Metadata.Source_Fact_Table_Name + '.' + Dimension_Table_Metadata.Source_Fact_Column_Name + ' IS NOT NULL;'
	FROM dbo.Dimension_Table_Metadata
	INNER JOIN sys.tables
	ON tables.name = Dimension_Table_Metadata.Source_Fact_Table_Name
	INNER JOIN sys.columns
	ON tables.object_id = columns.object_id
	AND columns.name COLLATE database_default = Dimension_Table_Metadata.Source_Fact_Column_Name
	INNER JOIN sys.schemas
	ON schemas.schema_id = tables.schema_id
	AND schemas.name = Dimension_Table_Metadata.Source_Fact_Schema_Name
	INNER JOIN sys.types USERDATATYPE
	ON columns.user_type_id = USERDATATYPE.user_type_id
	INNER JOIN sys.types SYSTEMDATATYPE
	ON SYSTEMDATATYPE.user_type_id = USERDATATYPE.system_type_id
	WHERE (	  USERDATATYPE.name IN ('char', 'nchar', 'nvarchar', 'varchar')
		   OR SYSTEMDATATYPE.name IN ('char', 'nchar', 'nvarchar', 'varchar'));

The result will now perform the following logic for the table creation:

现在，结果将对表创建执行以下逻辑：

If the data type is NCHAR, CHAR, NVARCHAR, or VARCHAR, then use it and add the size in parenthesis.
如果数据类型为NCHAR，CHAR，NVARCHAR或VARCHAR，则使用它并在括号中添加大小。
If the type is user-defined and the basis of the user-defined type is NCHAR, CHAR, NVARCHAR, or VARCHAR, then use the user-defined type name. For this scenario, leave out the size as it is implicit to the custom data type.
如果类型是用户定义的类型，并且用户定义的类型的基础是NCHAR，CHAR，NVARCHAR或VARCHAR，则使用用户定义的类型名称。对于这种情况，请忽略自定义数据类型隐含的大小。

Refreshing the table list, we now see a new dimension table for our missing column. It’s been properly defined as type “Name”, which resolves to NVARCHAR(50).

刷新表列表，现在我们为缺少的列看到一个新的维表。它已正确定义为“名称”类型，可以解析为NVARCHAR（50）。

Our other tables all contain the correct data types as well, for example:

我们的其他表也都包含正确的数据类型，例如：

If we SELECT * from this new table, we can inspect the results and confirm that things look as they should:

如果我们从这个新表中选择*，我们可以检查结果并确认情况看起来应该是：

其他数据类型 (Other data types)

I limited our discussion to strings as they are the most common data types used in lookup tables and normalized dimensions. There is no reason that we could not expand this to include bits, numeric data types, or others that might be useful to you. The only additional work would be to:

我将讨论限于字符串，因为它们是查找表和规范化维度中最常用的数据类型。我们没有理由不能将其扩展为包括位，数字数据类型或可能对您有用的其他类型。唯一的附加工作是：

sys.types filters that we have sys.types过滤器中
ensure that the column length does not get included unless needed
确保除非需要，否则不包括列长
cast non-string columns as strings when needed in order to avoid a type clash in our dynamic SQL string between string and non-string data types
必要时将非字符串列转换为字符串，以避免动态SQL字符串在字符串和非字符串数据类型之间发生类型冲突

Testing is quite easy for this. An undesirable result will typically be that either the resulting table is never created, or a horrific error is thrown. Both of these are fairly easy to uncover, troubleshoot, and fix.

为此测试很容易。不良结果通常是，要么从不创建结果表，要么抛出可怕的错误。两者都很容易发现，排除故障和修复。

When in doubt, change the EXEC sp_executesql statement to a PRINT statement and step through the resulting TSQL to ensure it is valid and what you expect of it.

如有疑问，请将EXEC sp_executesql语句更改为PRINT语句，并逐步执行生成的TSQL以确保其有效以及对它的期望。

安全须知 (Security notes)

Utilities like this are typically internal processes that are not exposed to the internet, end users, or others who might seek to take advantage of SQL injection.

这样的实用程序通常是内部程序，它们不会暴露给互联网，最终用户或其他可能试图利用SQL注入的程序。

That being said, whenever working with dynamic SQL and the creation/deletion of the schema, always ensure that:

话虽如此，每当使用动态SQL和创建/删除模式时，始终确保：

Security on the stored procedure, metadata table, and related jobs/tasks are limited to only those that should be executing it. Since this stored procedure takes no parameters, the only point of entry is the metadata table. Ensure that this is locked down appropriately, and you’ll greatly reduce the chances of accidents happening.
存储过程，元数据表和相关作业/任务的安全性仅限于应执行该存储过程的安全性。由于此存储过程不带参数，因此唯一的入口点是元数据表。确保已将其适当锁定，您将大大减少发生事故的机会。
Include brackets around schema, table, and column names. This makes injection more difficult and far more likely to trap errors, rather than result in logical flaws that can be exploited.
在模式，表和列名称周围包括方括号。这使得注入更加困难，并且更有可能捕获错误，而不是导致可以利用的逻辑缺陷。
sp_executesql rather than EXEC. This reduces the chances that the string can be tampered with. sp_executesql而不是EXEC。这减少了字符串可能被篡改的机会。
Maintain this outside of the internet-at-large, and outside of the company-at-large. Maintenance scripts should be limited to the experts that administer them, whether DBAs, developers, or other technicians. These are not meant for development by anyone that is not familiar enough with the scripts to understand their function.
在大型Internet外部和大型公司外部进行维护。维护脚本应仅限于管理脚本的专家，无论是DBA，开发人员还是其他技术人员。这些不是供那些不熟悉脚本以了解其功能的人开发的。
Feel free to add additional data consistency checks to the stored procedure in order to validate metadata from our table. TRY…CATCH can also be used to trap errors. The extent to which you add layers of security to this process should be based on the level of control (or lack thereof) you have over it.
随时向存储过程中添加其他数据一致性检查，以验证表中的元数据。 TRY…CATCH也可以用来捕获错误。您对此过程添加安全层的程度应基于对它的控制级别（或缺乏控制级别）。

结论 (Conclusion)

Using 50 lines of TSQL, we were able to process a metadata table and generate any number of dimension tables based on its contents. We leveraged dynamic SQL and system views in order to gain an understanding of the data types involved and recreate the schema in a new and useful form.

使用50行TSQL，我们能够处理元数据表并根据其内容生成任意数量的维表。我们利用动态SQL和系统视图来了解所涉及的数据类型并以新的有用形式重新创建模式。

This general approach can be applied in a myriad of ways in order to automate data or schema load processes. In addition, we improve the maintainability of processes by removing the need to hard-code schema, table, column, and data type names all over the place. The only hard-coded literal in our stored procedure is the database name, which could very easily be removed. Adding a new table is a matter of inserting a single row into dbo.Dimension_Table_Metadata.

可以以多种方式应用此通用方法，以使数据或架构加载过程自动化。此外，我们无需在各处硬编码模式，表，列和数据类型名称，从而提高了流程的可维护性。在我们的存储过程中，唯一的硬编码文字是数据库名称，可以很容易地将其删除。添加新表只需在dbo.Dimension_Table_Metadata中插入一行即可。

This demo can be a fun proof-of-concept for anyone looking to create, alter, or drop schema on-the-fly. It can also show how we can generate a large quantity of dynamic SQL without using loops or cursors. Lastly, it can demonstrate how we can pull data from system views in order to better understand table structures and data types. As a bonus, we can quickly analyze user-defined data types and explore how they were created.

对于希望即时创建，更改或删除架构的任何人，该演示都是一个有趣的概念证明。它还可以说明如何在不使用循环或游标的情况下生成大量动态SQL。最后，它可以演示如何从系统视图中提取数据，以便更好地了解表结构和数据类型。另外，我们可以快速分析用户定义的数据类型并探索如何创建它们。

Enjoy!

请享用！

资料下载 (Downloads)

Generate the metadata table and stored procedure (script)生成元数据表和存储过程（脚本）

翻译自: https://www.sqlshack.com/quickly-generate-large-number-dimension-tables-reporting-applications/

维度建模怎么生成维度表

culuo4781

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
维度建模怎么生成维度表_如何快速生成大量维度表以用于报表应用程序

维度建模怎么生成维度表描述 (Description) When building reporting structures, we typically have the need to build fact and dimension tables to support the apps that will consume this data. Sometimes we need t...
复制链接

扫一扫