mysql维表的代理键字段_mysql多维数据仓库指南--第三篇第12章(2)

宾夕法尼亚州地区客户维

在本节我将用宾夕法尼亚州地区客户的子集维度来解释第二种维度子集的类型。我也将向你说明如何测试该子集维度。

相对的,一个向上钻取的维包含了它基础维的所有更高级别的数据。而一个特定子集维度则选择了它基础维的某个特定的数据集合。列表12-3所示的脚本产生并加载了宾夕法尼亚州(PA)地区客户子集维。

注意到,有两个事情是宾夕法尼亚州地区客户子集维区别于月份子集维的地方。

npa_customer_dim表和customer_dim表的字段结构一样,而month_dim表没有 date_dim表中的日期字段。

npa_customer_dim表的代理键是客户表的代理键。Month_dim表的代理键只属于month_dim表,并不是来自日期维表。

列表12-3 PA的客户:

/**********************************************************************/

/**/

/* pa_customer.sql*/

/**/

/**********************************************************************/

USE dw;

CREATE TABLE pa_customer_dim

( customer_sk INT NOT NULL AUTO_INCREMENT PRIMARY KEY

, customer_number INT

, customer_name CHAR (50)

, customer_street_address CHAR (50)

, customer_zip_code INT (5)

, customer_city CHAR (30)

, customer_state CHAR (2)

, shipping_address CHAR (50)

, shipping_zip_code INT (5)

, shipping_city CHAR (30)

, shipping_state CHAR (2)

, effective_date DATE

, expiry_date DATE )

;

INSERT INTO pa_customer_dim

SELECT

customer_sk

, customer_number

, customer_name

, customer_street_address

, customer_zip_code

, customer_city

, customer_state

, shipping_address

, shipping_zip_code

, shipping_city

, shipping_state

, effective_date

, expiry_date

FROM customer_dim

WHERE customer_state = 'PA'

;

/* end of script*/

为了测试PA子维脚本,你需要先用列表12-4的脚本来增加三个居住于俄亥俄州的客户。

列表12-4:非PA客户:

/**********************************************************************/

/**/

/* non_pa_customer.sql*/

/**/

/**********************************************************************/

/* default to dw*/

USE dw;

INSERT INTO customer_dim

( customer_sk

, customer_number

, customer_name

, customer_street_address

, customer_zip_code

, customer_city

, customer_state

, shipping_address

, shipping_zip_code

, shipping_city

, shipping_state

, effective_date

, expiry_date )

VALUES

(NULL, 10, 'Bigger Customers', '7777 Ridge Rd.', '44102',

'Cleveland', 'OH', '7777 Ridge Rd.', '44102', 'Cleveland',

'OH', CURRENT_DATE, '9999-12-31')

, (NULL, 11, 'Smaller Stores', '8888 Jennings Fwy.', '44102',

'Cleveland', 'OH', '8888 Jennings Fwy.', '44102',

'Cleveland', 'OH', CURRENT_DATE, '9999-12-31')

, (NULL, 12, 'Small-Medium Retailers', '9999 Memphis Ave.', '44102',

'Cleveland', 'OH', '9999 Memphis Ave.', '44102', 'Cleveland',

'OH', CURRENT_DATE, '9999-12-31')

;

/* end of script*/

用以下命令运行列表12-4中的脚本。

mysql> \. c:\mysql\scripts\non_pa_customer.sql

在你的控制台上将得到下面的响应信息:

Database changed

Query OK, 3 rows affected (0.86 sec)

Records: 3Duplicates: 0Warnings: 0

现在你已经准备好运行列表12-3中的pa_customer.sql脚本,在你做这些之前,确定你的Msql数据库的日期仍然是2007-03-02。

你可以用如下命令方式运行pa_customer.sql脚本。

mysql> \. c:\mysql\scripts\pa_customer.sql

你将在控制台上看到:

Database changed

Query OK, 0 rows affected (0.20 sec)

Query OK, 18 rows affected (0.08 sec)

Records: 18 Duplicates: 0 Warnings: 0

为了确保三个OH客户已经成功的载入,查询customer_dim表:

mysql> select customer_name, customer_state, effective_date from customer_dim;

控制台上将看到:

+---------------------------------+----------------+---------------+

| customer_name| customer_state | effective_date|

+---------------------------------+----------------+---------------+

| Really Large Customers| PA| 2005-03-01|

| Small Stores| PA| 2005-03-01|

| Medium Retailers| PA| 2005-03-01|

| Good Companies| PA| 2005-03-01|

| Wonderful Shops| PA| 2005-03-01|

| Extremely Loyal Clients| PA| 2005-03-01|

| Distinguished Agencies| PA| 2005-03-01|

| Extremely Loyal Clients| PA| 2007-03-01|

| Subsidiaries| PA| 2007-03-01|

| Really Large Customers| PA| 2007-03-02|

| Small Stores| PA| 2007-03-02|

| Medium Retailers| PA| 2007-03-02|

| Good Companies| PA| 2007-03-02|

| Wonderful Shops| PA| 2007-03-02|

| Extremely Loyal Clients| PA| 2007-03-02|

| Distinguished Agencies| PA| 2007-03-02|

| Subsidiaries| PA| 2007-03-02|

| Online Distributors| PA| 2007-03-02|

| Bigger Customers| OH| 2007-03-02|

| Smaller Stores| OH| 2007-03-02|

| Small-Medium Retailers| OH| 2007-03-02|

+---------------------------------+----------------+---------------+

21 rows in set (0.00 sec)

现在查询pa_customer_dim表来确定只有PA的客户在PA客户维表中。

mysql> select customer_name, customer_state, effective_date from pa_customer_dim;

结果如下:

+---------------------------------+----------------+---------------+

| customer_name| customer_state | effective_date|

+---------------------------------+----------------+---------------+

| Really Large Customers| PA| 2004-01-01|

| Small Stores| PA| 2004-01-01|

| Medium Retailers| PA| 2004-01-01|

| Good Companies| PA| 2004-01-01|

| Wonderful Shops| PA| 2004-01-01|

| Extremely Loyal Clients| PA| 2004-01-01|

| Distinguished Agencies| PA| 2004-01-01|

| Extremely Loyal Clients| PA| 2005-11-01|

| Subsidiaries| PA| 2005-11-01|

| Really Large Customers| PA| 2005-11-03|

| Small Stores| PA| 2005-11-03|

| Medium Retailers| PA| 2005-11-03|

| Good Companies| PA| 2005-11-03|

| Wonderful Shops| PA| 2005-11-03|

| Extremely Loyal Clients| PA| 2005-11-03|

| Distinguished Agencies| PA | 2005-11-03|

| Subsidiaries| PA| 2005-11-03|

| Online Distributors| PA| 2005-11-03|

+---------------------------------+----------------+---------------+

18 rows in set (0.00 sec)

正如你所看到的,只有PA客户进入到该表中,之前加入的OH客户并没有在其中。

修改定期装载

当一个新的PA客户资料加入到客户维中时,为了能够同时加入PA客户维表,你需要把PA客户子集维的装载合并到数据仓库的定期装载过程中。修改后的定期装载脚本将在列表12-5中列出。这个改变(新增)用粗体显示。注意到,当每次你运行每日的定期装载脚本时,该脚本重建(截断,然后加入所有的PA客户)了PA客户子集维。

列表12-5:修正后的每日DW定期装载

/******************************************************************/

/**/

/* dw_regular_12.sql*/

/**/

/******************************************************************/

USE dw;

/* CUSTOMER_DIM POPULATION*/

TRUNCATE customer_stg;

LOAD DATA INFILE 'customer.csv'

INTO TABLE customer_stg

FIELDS TERMINATED BY ' , '

OPTIONALLY ENCLOSED BY '"'

LINES TERMINATED BY '\r\n'

IGNORE 1 LINES

( customer_number

, customer_name

, customer_street_address

, customer_zip_code

, customer_city

, customer_state

, shipping_address

, shipping_zip_code

, shipping_city

, shipping_state )

;

/* SCD 2 ON ADDRESSES*/

UPDATE

customer_dim a

, customer_stg b

SET

a.expiry_date = SUBDATE (CURRENT_DATE, 1)

WHERE

a.customer_number = b.customer_number

AND (a.customer_street_address <> b.customer_street_address

OR a.customer_city <> b.customer_city

OR a.customer_zip_code <> b.customer_zip_code

OR a.customer_state <> b.customer_state

OR a.shipping_address <> b.shipping_address

OR a.shipping_city <> b.shipping_city

OR a.shipping_zip_code <> b.shipping_zip_code

OR a.shipping_state <> b.shipping_state

OR a.shipping_address IS NULL

OR a.shipping_city IS NULL

OR a.shipping_zip_code IS NULL

OR a.shipping_state IS NULL)

AND expiry_date = '9999-12-31'

;

INSERT INTO customer_dim

SELECT

NULL

, b.customer_number

, b.customer_name

, b.customer_street_address

, b.customer_zip_code

, b.customer_city

, b.customer_state

, b.shipping_address

, b.shipping_zip_code

, b.shipping_city

, b.shipping_state

, CURRENT_DATE

, '9999-12-31'

FROM

customer_dim a

, customer_stg b

WHERE

a.customer_number = b.customer_number

AND (a.customer_street_address <> b.customer_street_address

OR a.customer_city <> b.customer_city

OR a.customer_zip_code <> b.customer_zip_code

OR a.customer_state <> b.customer_state

OR a.shipping_address <> b.shipping_address

OR a.shipping_city <> b.shipping_city

OR a.shipping_zip_code <> b.shipping_zip_code

OR a.shipping_state <> b.shipping_state

OR a.shipping_address IS NULL

OR a.shipping_city IS NULL

OR a.shipping_zip_code IS NULL

OR a.shipping_state IS NULL)

AND EXISTS (

SELECT *

FROM customer_dim x

WHERE b.customer_number = x.customer_number

AND a.expiry_date - SUBDATE (CURRENT_DATE, 1))

AND NOT EXISTS (

SELECT *

FROM customer_dim y

WHEREb.customer_number = y.customer_number

AND y.expiry_date - '9999-12-31')

;

/* END OF SCD 2*/

/* SCD 1 ON NAME*/

UPDATE customer_dim a, customer_stg b

SET a.customer_name = b.customer_name

WHEREa.customer_number = b.customer_number

AND a.expiry_date - '9999-12-31'

AND a.customer_name <> b.customer_name

;

/* ADD NEW CUSTOMER*/

INSERT INTO customer_dim

SELECT

NULL

, customer_number

, customer_name

, customer_street_address

, customer_zip_code

, customer_city

, customer_state

, shipping_address

, shipping_zip_code

, shipping_city

, shipping_state

, CURRENT_DATE

, '9999-12-31'

FROM customer_stg

WHERE customer_number NOT IN(

SELECT a.customer_number

FROM

customer_dim a

, customer_stg b

WHERE b.customer_number = a.customer_number )

/* RE-BUILD PA CUSTOMER DIMENSION*/

TRUNCATE pa_customer_dim;

INSERT INTO pa_customer_dim

SELECT

customer_sk

, customer_number

, customer_name

, customer_street_address

, customer_zip_code

, customer_city

, customer_state

, shipping_address

, shipping_zip_code

, shipping_city

, shipping_state

, effective_date

, expiry_date

FROM customer_dim

WHERE customer_state = 'PA'

;

/* END OF CUSTOMER_DIM POPULATION*/

/* product dimension loading*/

TRUNCATE product_stg

;

LOAD DATA INFILE 'product.txt'

INTO TABLE product_stg

FIELDS TERMINATED BY ''

OPTIONALLY ENCLOSED BY ''

LINES TERMINATED BY '\r\n'

IGNORE 1 LINES

( product_code

, product_name

, product_category )

;

/* PRODUCT_DIM POPULATION*/

/* SCD2 ON PRODUCT NAME AND GROUP*/

UPDATE

product_dim a

, product_stg b

SET

expiry_date = SUBDATE (CURRENT_DATE, 1)

WHERE

a.product_code = b.product_code

AND (a.product_name <> b.product_name

OR a.product_category <> b.product_category )

AND expiry_date = '9999-12-31'

;

INSERT INTO product_dim

SELECT

NULL

, b.product_code

, b.product_name

, b.product_category

, CURRENT_DATE

, '9999-12-31'

FROM

product_dim a

, product_stg b

WHERE

a.product_code = b.product_code

AND (a.product_name <> b.product_name

OR a.product_category <> b.product_category )

AND EXISTS (

SELECT *

FROM product_dim x

WHERE b.product_code = x.product_code

AND a.expiry_date = SUBDATE (CURRENT_DATE, 1) )

AND NOT EXISTS (

SELECT *

FROM product_dim y

WHERE b.product_code = y.product_code

AND y.expiry_date = '9999-12-31')

;

/* END OF SCD 2*/

/* ADD NEW PRODUCT*/

INSERT INTO product_dim

SELECT

NULL

, product_code

, product_name

, product_category

, CURRENT_DATE

, '9999-12-31'

FROM product_stg

WHERE product_code NOT IN(

SELECT y.product_code

FROM product_dim x, product_stg y

WHERE x.product_code = y.product_code

;

/* END OF PRODUCT_DIM POPULATION*/

/* ORDER_DIM POPULATION*/

INSERT INTO order_dim (

order_sk

, order_number

, effective_date

, expiry_date

)

SELECT

NULL

, order_number

, order_date

, '9999-12-31'

FROM source.sales_order

WHERE entry_date = CURRENT_DATE

;

/* END OF ORDER_DIM POPULATION*/

/* SALES_ORDER_FACT POPULATION*/

INSERT INTO sales_order_fact

SELECT

order_sk

, customer_sk

, product_sk

, date_sk

, order_amount

, order_quantity

FROM

source.sales_order a

, order_dim b

, customer_dim c

, product_dim d

, date_dim e

WHERE

a.order_number = b.order_number

AND a.customer_number = c.customer_number

AND a.order_date >= c.effective_date

AND a.order_date <= c.expiry_date

AND a.product_code = d.product_code

AND a.order_date >= d.effective_date

AND a.order_date <= d.expiry_date

AND a.order_date = e.date

AND a.entry_date = CURRENT_DATE

;

/* end of script*/

测试修正后的定期装载

现在你可以测试列表12-5的脚本。在你实施之前,先增加一些客户数据,通过运行列表12-6的脚本增加一个PA客户和一个OH客户到客户维中。

列表12-6:增加两个客户

/******************************************************************/

/**/

/* two_more_customers.sql*/

/**/

/******************************************************************/

/* default to dw*/

USE dw;

INSERT INTO customer_dim

( customer_sk

, customer_number

, customer_name

, customer_street_address

, customer_zip_code

, customer_city

, customer_state

, shipping_address

, shipping_zip_code

, shipping_city

, shipping_state

, effective_date

, expiry_date )

VALUES

(NULL, 13, 'PA Customer', '1111 Louise Dr.', '17050',

'Mechanicsburg', 'PA', '1111 Louise Dr.', '17050',

'Mechanicsburg', 'PA', CURRENT_DATE, '9999-12-31')

, (NULL, 14, 'OH Customer', '6666 Ridge Rd.', '44102',

'Cleveland', 'OH', '6666 Ridge Rd.', '44102',

'Cleveland', 'OH', CURRENT_DATE, '9999-12-31')

;

/* end of script*/

现在运行列表12-6中的脚本:

mysql> \. c:\mysql\scripts\two_more_customers.sql

Mysql将显示有两个记录生效。

Database changed

Query OK, 2 rows affected (0.06 sec)

Records: 2Duplicates: 0Warnings: 0

现在更改你的Msql数据库的日期为2007-03-03以保证老的数据不会重新载入,然后运行dw_regular_12.sql脚本。

mysql> \. c:\mysql\scripts\dw_regular_12.sql

你将看到你的控制台有如下显示:

Database changed

Query OK, 9 rows affected (0.15 sec)

Query OK, 9 rows affected (0.14 sec)

Records: 9Deleted: 0Skipped: 0Warnings: 0

Query OK, 0 rows affected (0.05 sec)

Rows matched: 0Changed: 0Warnings: 0

Query OK, 0 rows affected (0.01 sec)

Records: 0Duplicates: 0Warnings: 0

Query OK, 0 rows affected (0.00 sec)

Rows matched: 0Changed: 0Warnings: 0

Query OK, 0 rows affected (0.00 sec)

Records: 0Duplicates: 0Warnings: 0

Query OK, 18 rows affected (0.04 sec)

Query OK, 19 rows affected (0.06 sec)

Records: 19Duplicates: 0Warnings: 0

Query OK, 4 rows affected (0.09 sec)

Query OK, 4 rows affected (0.07 sec)

Records: 4Deleted: 0Skipped: 0Warnings: 0

Query OK, 0 rows affected (0.06 sec)

Rows matched: 0Changed: 0Warnings: 0

Query OK, 0 rows affected (0.00 sec)

Records: 0Duplicates: 0Warnings: 0

Query OK, 0 rows affected (0.00 sec)

Records: 0Duplicates: 0Warnings: 0

Query OK, 0 rows affected (0.15 sec)

Records: 0 Duplicates: 0Warnings: 0

Query OK, 0 rows affected (0.17 sec)

Records: 0Duplicates: 0Warnings: 0

现在用该语句查询pa_customer_dim表。你将看到新的PA客户被插入到表中。

mysql> select customer_name, customer_state, effective_date from pa_customer_dim;

这是结果:

+-------------------------+----------------+---------------+

| customer_name| customer_state | effective_date|

+-------------------------+----------------+---------------+

| Really Large Customers| PA| 2005-03-01|

| Small Stores| PA| 2005-03-01|

| Medium Retailers| PA| 2005-03-01|

| Good Companies| PA| 2005-03-01|

| Wonderful Shops| PA| 2005-03-01|

| Extremely Loyal Clients | PA| 2005-03-01|

| Distinguished Agencies| PA| 2005-03-01|

| Extremely Loyal Clients | PA| 2007-03-01|

| Subsidiaries| PA| 2007-03-01|

| Really Large Customers| PA| 2007-03-02|

| Small Stores| PA| 2007-03-02|

| Medium Retailers| PA| 2007-03-02|

| Good Companies| PA| 2007-03-02|

| Wonderful Shops| PA| 2007-03-02|

| Extremely Loyal Clients | PA| 2007-03-02|

| Distinguished Agencies| PA| 2007-03-02|

| Subsidiaries| PA| 2007-03-02|

| Online Distributors| PA| 2007-03-02|

| PA Customer| PA| 2007-03-03|

+-------------------------+----------------+---------------+

19 rows in set (0.00 sec)

小结

在这章,你学习了两种类型的子集维。月份子集维是向上钻取维的一个例子,一个装载自比它更为详细的基础维的具有更高级的维。PA客户维是一个特定的子集维;只从它的基础维选择PA客户数据进行装载。

下一章,你将学习另外一个复用现有维的技术,叫角色扮演维度。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值