Data Warehouse

rm -r dp203 -f

git clone https://github.com/MicrosoftLearning/Dp-203-azure-data-engineer dp203

cd dp203/Allfiles/labs/08

./setup.ps1

 Expand the dbo.FactInternetSales table and its Columns folder to see the columns in this table. Note that many of the columns are keys that reference rows in the dimension tables. Others are numeric values (measures) for analysis.

The keys are used to relate a fact table to one or more dimension tables, often in a star schema; in which the fact table is directly related to each dimension table (forming a multi-pointed "star" with the fact table at the center).

 

View the columns for the dbo.DimPromotion table, and note that it has a unique PromotionKey that uniquely identifies each row in the table. It also has an AlternateKey.

Usually, data in a data warehouse has been imported from one or more transactional sources. The alternate key reflects the business identifier for the instance of this entity in the source, but a unique numeric surrogate key is usually generated to uniquely identify each row in the data warehouse dimension table. One of the benefits of this approach is that it enables the data warehouse to contain multiple instances of the same entity at different points in time (for example, records for the same customer reflecting their address at the time an order was placed).

 

View the columns for the dbo.DimProduct, and note that it contains a ProductSubcategoryKey column, which references the dbo.DimProductSubcategory table, which in turn contains a ProductCategoryKey column that references the dbo.DimProductCategory table.

In some cases, dimensions are partially normalized into multiple related tables to allow for different levels of granularity - such as products that can be grouped into subcategories and categories. This results in a simple star being extended to a snowflake schema, in which the central fact table is related to a dimension table, which is turn related to further dimension tables.

 

View the columns for the dbo.DimDate table, and note that it contains multiple columns that reflect different temporal attributes of a date - including the day of week, day of month, month, year, day name, month name, and so on.

Time dimensions in a data warehouse are usually implemented as a dimension table containing a row for each of the smallest temporal units of granularity (often called the grain of the dimension) by which you want to aggregate the measures in the fact tables. In this case, the lowest grain at which measures can be aggregated is an individual date, and the table contains a row for each date from the first to the last date referenced in the data. The attributes in the DimDate table enable analysts to aggregate measures based on any date key in the fact table, using a consistent set of temporal attributes (for example, viewing orders by month based on the order date). The FactInternetSales table contains three keys that relate to the DimDate table: OrderDateKeyDueDateKey, and ShipDateKey.

SELECT  d.CalendarYear AS Year,
        SUM(i.SalesAmount) AS InternetSalesAmount
FROM FactInternetSales AS i
JOIN DimDate AS d ON i.OrderDateKey = d.DateKey
GROUP BY d.CalendarYear
ORDER BY Year;

SELECT  d.CalendarYear AS Year,
        d.MonthNumberOfYear AS Month,
        SUM(i.SalesAmount) AS InternetSalesAmount
FROM FactInternetSales AS i
JOIN DimDate AS d ON i.OrderDateKey = d.DateKey
GROUP BY d.CalendarYear, d.MonthNumberOfYear
ORDER BY Year, Month;

SELECT  d.CalendarYear AS Year,
        g.EnglishCountryRegionName AS Region,
        SUM(i.SalesAmount) AS InternetSalesAmount
FROM FactInternetSales AS i
JOIN DimDate AS d ON i.OrderDateKey = d.DateKey
JOIN DimCustomer AS c ON i.CustomerKey = c.CustomerKey
JOIN DimGeography AS g ON c.GeographyKey = g.GeographyKey
GROUP BY d.CalendarYear, g.EnglishCountryRegionName
ORDER BY Year, Region;

SELECT  g.EnglishCountryRegionName AS Region,
        ROW_NUMBER() OVER(PARTITION BY g.EnglishCountryRegionName
                          ORDER BY i.SalesAmount ASC) AS RowNumber,
        i.SalesOrderNumber AS OrderNo,
        i.SalesOrderLineNumber AS LineItem,
        i.SalesAmount AS SalesAmount,
        SUM(i.SalesAmount) OVER(PARTITION BY g.EnglishCountryRegionName) AS RegionTotal,
        AVG(i.SalesAmount) OVER(PARTITION BY g.EnglishCountryRegionName) AS RegionAverage
FROM FactInternetSales AS i
JOIN DimDate AS d ON i.OrderDateKey = d.DateKey
JOIN DimCustomer AS c ON i.CustomerKey = c.CustomerKey
JOIN DimGeography AS g ON c.GeographyKey = g.GeographyKey
WHERE d.CalendarYear = 2022
ORDER BY Region;

SELECT  g.EnglishCountryRegionName AS Region,
        g.City,
        SUM(i.SalesAmount) AS CityTotal,
        SUM(SUM(i.SalesAmount)) OVER(PARTITION BY g.EnglishCountryRegionName) AS RegionTotal,
        RANK() OVER(PARTITION BY g.EnglishCountryRegionName
                    ORDER BY SUM(i.SalesAmount) DESC) AS RegionalRank
FROM FactInternetSales AS i
JOIN DimDate AS d ON i.OrderDateKey = d.DateKey
JOIN DimCustomer AS c ON i.CustomerKey = c.CustomerKey
JOIN DimGeography AS g ON c.GeographyKey = g.GeographyKey
GROUP BY g.EnglishCountryRegionName, g.City
ORDER BY Region;

 

SELECT d.CalendarYear AS CalendarYear,
    COUNT(DISTINCT i.SalesOrderNumber) AS Orders
FROM FactInternetSales AS i
JOIN DimDate AS d ON i.OrderDateKey = d.DateKey
GROUP BY d.CalendarYear
ORDER BY CalendarYear;

 

SELECT d.CalendarYear AS CalendarYear,
    APPROX_COUNT_DISTINCT(i.SalesOrderNumber) AS Orders
FROM FactInternetSales AS i
JOIN DimDate AS d ON i.OrderDateKey = d.DateKey
GROUP BY d.CalendarYear
ORDER BY CalendarYear;

  • 9
    点赞
  • 14
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值