sql azure 语法_Azure SQL数据仓库简介

最新推荐文章于 2022-07-24 11:11:34 发布

culuo4781

最新推荐文章于 2022-07-24 11:11:34 发布

阅读量669

点赞数

文章标签：数据仓库数据库分布式 python java

原文链接：https://www.sqlshack.com/introduction-azure-sql-data-warehouse/

版权

sql azure 语法

介绍 (Introduction)

Azure SQL Data Warehouse is a new addition to the Azure Data Platform. When I first heard about it I wasn’t quite sure about what exactly it would be. As it turns out it is relational database for large amounts of database and really big queries as a service. This is essentially the equivalent of the APS (Analytics Platform System) in the cloud.

Azure SQL数据仓库是Azure数据平台的新增功能。当我第一次听说它的时候，我不确定它到底是什么。事实证明，它是用于大量数据库和非常大的查询即服务的关系数据库。从本质上讲，这等效于云中的APS（分析平台系统）。

In this article, I will explore the Azure SQL DW and look at some of its key features to determine what the best use cases would be.

在本文中，我将探索Azure SQL DW并查看其一些关键功能，以确定最佳用例。

基础 (The Basics)

Provisioning an Azure SQL Data warehouse is simple enough. Once logged into Azure, go to New ->
Databases -> SQL Data Warehouse.

设置Azure SQL数据仓库非常简单。登录到Azure后，转到新建->
数据库-> SQL数据仓库。

In the SQL Data Warehouse blade enter the following fields:

在“ SQL数据仓库”刀片中，输入以下字段：

No	Item	Description
1	Database Name	Select a name for your DW. This name must be unique for the selected server.
2	Subscription	Choose which of your Azure subscriptions you’d like to use if you have more than one.
3	Resource Group	Select an existing resource group or create a new one. If you are doing experiments, it always good to put all the resources in the same resource group. That way when you are done, you can simply delete the resource group, and it will delete everything.
4	Select Source	One of 3 available options.
5	Server	If you do not have an existing server, you will be able to create one here. This can be the same server you may have used previously for a SQL DB
6	Collation	Just like SQL Server, you must pick the collation. Choose carefully as it cannot be changed after you created the database.
7	Performance Level	This slider is used to scale up or down the number of Data Warehouse Units you’d like to use. A DWU is a measurement used to calculate the compute power of a data warehouse.
8	Pricing	When you have selected your DWU’s it will show an estimated cost of running your data warehouse per hour.
9	Create	Click on create to provision your data warehouse. This takes a couple of minutes.

没有	项目	描述
1个	数据库名称	为您的DW选择一个名称。该名称对于所选服务器必须是唯一的。
2	订阅	如果您有多个Azure订阅，请选择要使用的订阅。
3	资源组	选择一个现有的资源组或创建一个新的资源组。如果您要进行实验，最好将所有资源放在同一个资源组中。这样，完成后，您只需删除资源组，它将删除所有内容。
4	选择来源	3个可用选项之一。
5	服务器	如果您没有现有服务器，则可以在此处创建一个。这可以是您以前用于SQL DB的同一服务器
6	校对	就像SQL Server一样，您必须选择排序规则。请谨慎选择，因为创建数据库后无法更改。
7	表现水平	此滑块用于按比例缩放您要使用的数据仓库单元的数量。 DWU是用于计算数据仓库的计算能力的度量。
8	价钱	选择DWU后，它将显示每小时运行数据仓库的估计成本。
9	创造	单击创建以配置您的数据仓库。这需要几分钟。

Once the DW has been provisioned you can connect to it using SSMS, remember that you have to configure the server firewall to allow access from your client.

设置好DW后，您可以使用SSMS连接到它，请记住，您必须配置服务器防火墙以允许来自客户端的访问。

In SSMS you will see that the icon for your DW looks different to that of a regular SQL DB.

在SSMS中，您将看到DW的图标看起来与常规SQL DB的图标不同。

The icon looks like a bunch of databases together, which is quite apt if we look at the architecture…

该图标看起来像是一堆数据库，如果我们看一下体系结构，这非常合适……

建筑 (Architecture)

Azure SQL Data Warehouse uses distributed data and a massively parallel processing (MPP) design. The storage is de-coupled from the compute and control nodes, and as such, it can be scaled independently.

Azure SQL数据仓库使用分布式数据和大规模并行处理（MPP）设计。存储与计算节点和控制节点分离，因此可以独立扩展。

SQL DW data is distributed into 60 distributions, but it can have 1 or more compute node, depending on the number of DWUs that you select.

SQL DW数据分布在60个分布中，但是它可以具有1个或多个计算节点，具体取决于您选择的DWU的数量。

In my SQL DW created above I selected 400 DTU. Let’s have a look at what that gives me.

在上面创建SQL DW中，我选择了400 DTU。让我们看一下能给我带来什么。

 
SELECT distinct pdw_node_id, MIN(distribution_id) [min_distributions_id], MAX(distribution_id) [max_distributions_id]
FROM SYS.pdw_distributions
GROUP BY pdw_node_id
ORDER BY 2

I can see here that I have 4 compute nodes, and that each node has 15 distributions. You can play around with this, but essentially as the number of compute nodes change the number of distributions will only be re-arranged to be equally distributed between the compute nodes. The distributions will always add up to 60. If you chose DWU 6000 you will essentially get a 1 to 1 ratio of compute to storage.

我在这里看到有4个计算节点，每个节点有15个分布。您可以尝试一下，但是实际上随着计算节点数量的变化，分布的数量只会重新排列以在计算节点之间平均分配。分配的总和为60。如果您选择DWU 6000，则计算和存储的比例实际上为1：1。

Azure SQL DW has two different types of distributions that can be used. The type of distribution is specified when a table is created.

Azure SQL DW具有可以使用的两种不同类型的分发。创建表时指定分发类型。

Round Robin distribution

循环分配

With this distribution, data is randomly assigned to each distribution. It assigns the data pretty evenly across all 60 distributions. Round-robin is the default distribution. In some cases, this can result in poorer performance than the hash distribution, because when assigning the rows it does not take the row content into account.

使用此分布，数据被随机分配给每个分布。它在所有60个分布中平均分配数据。轮询是默认分发。在某些情况下，这可能导致性能比哈希分布更差，因为在分配行时，它不会考虑行内容。
Hash distribution

散列分布

This distribution allows you to pick a column to use as a hashing key. Selecting the wrong column to be used for the hashing function can result in unevenly distributed data (data skew). So be sure to select a column which has a lot of distinct values ideally 60 or more, since the data will be distributed amongst 60 distributions.

通过此分布，您可以选择一列用作哈希键。选择用于哈希函数的错误列可能会导致数据分布不均（数据偏斜）。因此，请确保选择一个具有很多不同值（理想情况下为60或更多）的列，因为数据将分布在60个分布中。

用例 (Use Cases)

Azure SQL DW is best used for analytical workloads that makes use of large volumes of data and needs to consolidate disparate data into a single location.

Azure SQL DW最适合用于分析工作负载，该工作负载使用大量数据，并且需要将分散的数据合并到一个位置。

Azure SQL DW has been specifically designed to deal with very large volumes of data. In fact, if there is too little data it may perform poorly because the data is distributed. You can imagine that if you had only 10 rows per distribution, the cost of consolidating the data will be way more than the benefit gained by distributing it.

Azure SQL DW专门设计用于处理大量数据。实际上，如果数据太少，由于数据是分布式的，因此性能可能会很差。您可以想象，如果每个分布只有10行，那么整合数据的成本将远远超过通过分布获得的收益。

SQL DW is a good place to consolidate disparate data, transform, shape and aggregate it, and then perform analysis on it. It is ideal for running burst workloads, such as month end financial reporting etc.

SQL DW是合并异构数据，对其进行转换，整形和聚合然后对其进行分析的好地方。它是运行突发工作负载（例如月末财务报告等）的理想选择。

Azure SQL DW should not be used when small row by row updates are expected as in OLTP workloads. It should only be used for large scale batch operations.

当预期在OLTP工作负载中进行逐行小的更新时，不应使用Azure SQL DW。它仅应用于大规模批处理操作。

加载数据中 (Loading data)

One of the key features of Azure Data Warehouse is the ability to load data from practically anywhere using a variety of tools.

Azure数据仓库的主要功能之一是能够使用各种工具从几乎任何地方加载数据。

Since PolyBase is built in, it can be used to load data parallelly from Azure blob storage. You can also use Azure Data Factory to facilitate the load from Azure blob storage with PolyBase.

由于PolyBase是内置的，因此可用于从Azure blob存储并行加载数据。您还可以使用Azure数据工厂通过PolyBase方便从Azure Blob存储中进行加载。

Additionally, SQL Server Integration Services (SSIS), AZCopy, BCP, Import/ Export can be used.

此外，可以使用SQL Server集成服务（SSIS），AZCopy，BCP，导入/导出。

比例计算 (Scaling Compute)

Because storage and compute is decoupled in Azure Data Warehouse, it can be scaled independently.

由于存储和计算在Azure数据仓库中是分离的，因此可以独立扩展。

Compute is measured in DWUs (Data warehouse units), your DWUs determines how many compute nodes you will have and the ratio of distributions to compute nodes. To scale compute you need to change the DWU setting. Scaling happens within minutes, so you can play around with it to find the optimal configuration.

计算以DWU（数据仓库单位）衡量，DWU确定您将拥有多少个计算节点以及分布与计算节点的比率。要扩展计算，您需要更改DWU设置。扩展会在几分钟内完成，因此您可以试用它来找到最佳配置。

Another great thing about Azure SQL DW compute is that it can be paused. While it is paused, you won’t incur any costs for DWUs. This is really handy if you only need your DW some of the like, like for month end reporting etc.

Azure SQL DW计算的另一个好处是可以暂停它。暂停期间，DWU不会产生任何费用。如果您只需要DW之类的东西（例如用于月末报告等），这真的很方便。

结论 (Conclusion)

Azure SQL Data Warehouse, is the ideal solution for when you need massively parallel processing. Unlike the on-premises equivalent (APS), Azure SQL DW is easily accessible to anyone with a workload using the familiar T-SQL language.

Azure SQL数据仓库是需要大量并行处理的理想解决方案。与本地等效（APS）不同，使用熟悉的T-SQL语言的工作量很大的任何人都可以轻松访问Azure SQL DW。