一、实验材料:
数据库为db2 v11.5 社区版 docker搭建, 数据库客户端工具使用razorSQL, python版本3.8.5
二、实验目标:
按照etl_date中的日期数据进行创建分区表,验证数据转移流程及遇到问题处理。
三、实验流程:
1.docker搭建db2数据库
docker run --name db2_control_node1 --privileged=true -e LICENSE=accept -e DB2INST1_PASSWORD=qwe123 -e DBNAME=testdb -p 50000:50000 -d ibmcom/db2
2.创建模拟表
模拟表表名 my_table2, 此SQL在razorSQL工具上执行
CREATE TABLE db2inst1.my_table2 (
id INT,
name VARCHAR(50),
ETL_DATE date
);
执行结果:执行成功..
3.查看当前表是否存在分区
db2 describe data partitions for table my_table2;
此命令无法在razorsql执行,进入db2命令行中执行
连接用户流程
docker exec -it db2_control_node1 /bin/bash -c "su - db2inst1"
db2 connect to testdb
查询结果
PartitionId Inclusive (y/n) Inclusive (y/n)
Low Value High Value
----------- - ------------------------------- - -------------------------------
0 Y Y
1 record(s) selected.
4.创建空的表空间
CREATE TABLESPACE my_tablespace2
AUTORESIZE yes
5.根据表结构创建空的分区表尝试SQL执行成功
CREATE TABLE MY_TABLE_TMP LIKE MY_TABLE2
DISTRIBUTE BY HASH("ID")
PARTITION BY RANGE ("ETL_DATE")
(
PARTITION my_table_1993_01_01 STARTING ('1993-01-01') ENDING ('1993-12-31') IN MY_TABLESPACE,
PARTITION my_table_1994_01_01 STARTING ('1994-01-01') ENDING ('1994-12-31') IN MY_TABLESPACE
)
6.验证alter语句:
ALTER TABLE MY_TABLE_TMP ADD PARTITION my_table_1995_01_01 STARTING ('1995-01-01') ENDING ('1995-12-31') in MY_TABLESPACE;
ALTER TABLE MY_TABLE_TMP ADD PARTITION my_table_1996_01_01 STARTING ('1996-01-01') ENDING ('1996-12-31') in MY_TABLESPACE;
7.创建模拟数据:
模拟数据日期范围1993-01-01至1996-01-01
使用python脚本执行,代码如下
import random
from datetime import datetime, timedelta
# 生成百万级别的数据
num_rows = 1000000
# 打开文件以写入数据
with open('data_file.txt', 'w') as file:
for _ in range(num_rows):
id_value = random.randint(1, 1000000)
name_value = 'Name' + str(id_value)
etl_date_value = datetime(1993, 1, 1) + timedelta(days=random.randint(0, 1095))
etl_date_str = etl_date_value.strftime('%Y-%m-%d')
row = f"{id_value},{name_value},{etl_date_str}\n"
file.write(row)
8.导入模拟数据
# 先使用复制模拟数据到docker容器中
docker cp /home/work/upload/data_file.txt db2_control_node1:/database/config/db2inst1/
# 然后执行load命令加载到表中
docker exec -it db2_control_node1 /bin/bash -c "su - db2inst1"
db2 connect to testdb
db2
LOAD FROM 'data_file.txt' OF DEL INSERT INTO db2inst1.my_table2 (id, name, ETL_DATE)
9.尝试数据从普通表中迁移到分区表中
--此语句在db2中不支持,
DECLARE C1 CURSOR FOR SELECT * FROM my_table2
LOAD FROM C1 of CURSOR INSERT INTO my_table_tmp ;
--改为这个
INSERT INTO db2inst1.my_table_tmp SELECT * FROM db2inst1.my_table2
--执行不成功,原因:虚拟内存不够,爆了
最终方案:
--使用load方式
db2 export to 1993.del of del select * from MY_TABLE2 where TO_CHAR(ETL_DATE, 'YYYY')='1993' with ur
db2 load from 1993.del of del insert into MY_TABLE_TMP
db2 export to 1994.del of del select * from MY_TABLE2 where TO_CHAR(ETL_DATE, 'YYYY')='1994' with ur
db2 load from 1994.del of del insert into MY_TABLE_TMP
db2 export to 1995.del of del select * from MY_TABLE2 where TO_CHAR(ETL_DATE, 'YYYY')='1995' with ur
db2 load from 1995.del of del insert into MY_TABLE_TMP
db2 export to 1996.del of del select * from MY_TABLE2 where TO_CHAR(ETL_DATE, 'YYYY')='1996' with ur
db2 load from 1996.del of del insert into MY_TABLE_TMP
对比分区表与原表数据
select count(1) from my_table2
select count(1) from my_table_tmp