目录
一、命名规范
1.业务系统简称规范
命名规则:3-6个英文字母,数字组成
举例:
No | 英文简称 | 英文名称 | 说明 |
1 | WMS | Warehouse Management System | 仓库管理系统 |
2 | MES | Manufacturing Execution System | 生产管理系统 |
3 | SCM | Supply Chain Management | 供应链管理 |
4 | CRM | Customer Relationship Management | 客户管理系统 |
2.数据源简称规范
命名规则:3-6个英文字母,数字组成
举例:
No | 简称 | 说明 |
1 | WMSDB | 仓库管理数据库 |
3.数仓分层规范
No | 简称 | 说明 |
1 | STG | 贴源层 |
2 | ODS | 操作层 |
3 | DIM | 公共维度层 |
4 | DWD | 明细层 |
5 | DWS | 指标层 |
6 | ADS | 应用层 |
4.业务板块简称规范
命名规则:3-6个英文字母,数字组成
举例:
No | 简称 | 说明 |
1 | SAL | 销售板块 |
2 | FMS | 财务板块 |
4.数据域简称规范
命名规则:3-6个英文字母,数字组成
举例:
No | 简称 | 说明 |
1 | CUST | 客户域 |
2 | TRAD | 交易域 |
5.数据域简称规范
命名规则:3-6个英文字母,数字组成
举例:
No | 简称 | 说明 |
1 | CUST | 客户域 |
2 | TRAD | 交易域 |
6.业务过程简称规范
命名规则:3-6个英文字母,数字组成
举例:
No | 简称 | 说明 |
1 | ORD | 下单 |
3 | PAY | 支付 |
7.表名规范
通用规范:
- 增量全量标识:i-增量,f-全量
- 刷新周期标识:ss-实时,mm-分钟,hh-小时,d-日,w-周,m-月,q-季,y-年
- 刷新周期范围缩写:nd-多少天,td-历史至今,sd-年初至今,自然周-cw
No | 分层名称 | 命名规范 | 说明 |
1 | STG | STG_{源数据库简称}_{源表名}_{刷新周期标识}{增量全量标识} | stg_ord_t_brand_di |
2 | ODS | ODS_{源数据库简称}_{源表名}_{刷新周期标识}{增量全量标识} | ods_ord_t_brand_df |
3 | DIM | DIM_{业务板块/pub}_{自定义表名}_{刷新周期标识}{增量全量标识} | dim_mes_factory_df |
4 | DWD | DWD_{业务板块}_{数据域}_{业务过程}_{自定义表名}_{刷新周期标识}{增量全量标识} | dwd_sal_trad_ord_pos_order_df |
5 | DWS | DWS_{业务板块}_{数据域}_{统计日期}_{自定义表名}_{统计周期范围}{刷新周期标识} | dws_sal_trad_confirm_date_shop_retn_order_1d |
6 | ADS | ADS_{业务板块}_{数据域}_{自定义表名}_{统计周期范围}{刷新周期标识} | ads_sal_trad_c_order_analysis_1d |
7 | TMP | TMP_{表名} | tmp_dim_mes_factory_df |
二、字段规范
1.Hive表字段类型规范
No | 类型名称 | 字段类型 | 长度 |
1 | 字符串 | string | |
2 | 长整型 | bigint | |
3 | 短整型 | tinyint | |
4 | 日期 | string | |
5 | 浮点数 | decimal | (18,4),(18,8) |
2.CK表字段类型规范
No | 类型名称 | 字段类型 | 长度 |
1 | 字符串 | String | |
2 | 长整型 | Int64 | |
3 | 短整型 | Int8 | |
4 | 日期(年月日) | Date | |
5 | 日期(年月日时分秒) | DateTime | |
6 | 浮点数 | decimal | (18,4),(18,8) |
3.字段默认值规范
No | 默认值类型 | 默认值 |
1 | 指标数值 | 0 |
2 | 维度数值 | -99 |
3 | 时间 | 1970-01-01 00:00:00 |
日期 | 1970-01-01 | |
5 | 字符串 | None |
4.审计列规范
分层名称 | 字段名称 | 注释 | 类型 |
STG | source_db | 源数据库名称 | string |
imported_time | 程序导入时间 | string | |
inserted_time | 目标初次插入时间 | string | |
updated_time | 目标最后更新时间 | string | |
ODS | source_db | 源数据库名称 | string |
is_deleted | 是否(物理)删除 | tinyint | |
inserted_time | 目标初次插入时间 | string | |
updated_time | 目标最后更新时间 | string | |
DIM | is_deleted | 是否(物理)删除 | tinyint |
Is_current(拉链表) | 是否当前版本 | tinyint | |
start_time(拉链表) | 开始使用日期 | string | |
end_time(拉链表) | 结束日期 | string | |
inserted_time | 目标初次插入时间 | string | |
updated_time | 目标最后更新时间 | string | |
inserted_time | 目标初次插入时间 | string | |
updated_time | 目标最后更新时间 | string | |
DWD | is_deleted | 是否(物理)删除 | tinyint |
inserted_time | 目标初次插入时间 | string | |
updated_time | 目标最后更新时间 | string | |
DWS | inserted_time | 目标初次插入时间 | string |
updated_time | 目标最后更新时间 | string | |
ADS | inserted_time | 目标初次插入时间 | string |
updated_time | 目标最后更新时间 | string |
三、指标规范
指标分为:
原子指标:事实表中的度量,原子指标=业务过程+度量
派生指标:根据业务需求加工的度量,派生指标=修饰词+原子指标+统计周期范围+刷新周期标识
衍生指标:一个或多各派生指标通过数学公式加工的度量。衍生指标=派生指标+公式
命名规则:{修饰词}_{指标名称}_{统计周期范围}{刷新周期标识}_{统计时间序号}{可变标识}
统计时间序号:代表不同的统计时间,例如预约时间、下单时间。
取值范围{a-z}
可变标识:根据统计时间判断是否已经完成或正在发生的事实
i-不可变;v-可变
事例:pay_sale_amt_1d_ai
说明:按照订单支付时间统计最近1天已经支付的订单金额
四、建模规范
通用规范:
- 表和每个字段必须有注释
- 字段类型按照规范的数据类型
- 键类型:
- 主键(PK):表的主键
- 外键(FK):表的外键
- 业务主键(BUSK):从业务的角度来确定数据的粒度
- 变化时间戳(CDCK):增量更新字段
- 审计键(AUK):表的审计字段
- 分区键(PTK):表的分区字段
- 4.除了审计和分区字段必须有源表字段
1.STG层建模规范
- 表存储格式textfile
- 字段顺序和源库保持一致
- 表添加分区字段dt
2.ODS层建模规范
- 表存储格式orc
- 数据压缩是snappy
- 写入方式:truncate,merge
- 字段顺序和源库保持一致
- 表添加分区字段dt
3.DIM层建模规范
- 表存储格式orc
- 数据压缩是snappy
- 写入方式:truncate,linked
- 字段顺序按照主键组、外键组、字典组、属性组、指标组、审计组、分区组进行排序
- 外键组和字典组把相关的编码和名称冗余存储
4.DWD层建模规范
- 表存储格式orc
- 数据压缩是snappy
- 写入方式:merge
- 字段顺序按照主键组、外键组、字典组、属性组、指标组、审计组、分区组进行排序
- 外键组只存储ID,保证数据粒度一致,添加关联维度表信息
- 字典组把相关的编码和名称冗余存储
5.DWS层建模规范
- 表存储格式orc
- 数据压缩是snappy
- 字段顺序按照主键组、外键组、字典组、属性组、指标组、审计组、分区组进行排序
- 外键组只存储ID,保证数据粒度一致,添加关联维度表信息
- 字典组把相关的编码和名称冗余存储
- 表添加分区字段dt
6.ADS层建模规范
- 表存储格式orc
- 数据压缩是snappy
- 字段顺序按照业务需求进行排序
- 字典组把相关的编码和名称冗余存储
- 表添加分区字段dt
五、建表规范
通用规范:
- 表名前加上数据库名
- 数据库名、表名、字段名必须带上反引号
- 数据库关键字大写
1.STG层建表规范
举例:
DROP TABLE IF EXISTS `stg`.`stg_ord_t_brand_df`;
CREATE TABLE IF NOT EXISTS `stg`.`stg_ord_t_brand_df`(
`id` bigint COMMENT '主键'
,`brand_china_name` string COMMENT '品牌名称(中文)'
,`brand_english_name` string COMMENT '品牌名称(英文)'
,`brand_desc` string COMMENT '品牌描述'
,`band_logo` string COMMENT '品牌LOGO'
,`status` tinyint COMMENT '有效=1,无效=0'
,`create_time` string COMMENT '创建时间'
,`update_time` string COMMENT '修改时间'
,`create_id` bigint COMMENT '创建人ID'
,`source_db` string COMMENT '源数据库名'
,`imported_time` string COMMENT '目标导入时间'
,`inserted_time` string COMMENT '目标初次插入时间'
,`updated_time` string COMMENT '目标最后更新时间'
) COMMENT '品牌表'
PARTITIONED BY (`dt` string COMMENT '分区')-- 日分区:yyyy-mm-dd
STORED AS textfile;
2.ODS层建表规范
举例:
DROP TABLE IF EXISTS `ods`.`ods_ord_t_brand_df`;
CREATE TABLE IF NOT EXISTS `ods`.`ods_ord_t_brand_df`(
`id` bigint COMMENT '主键'
,`brand_china_name` string COMMENT '品牌名称(中文)'
,`brand_english_name` string COMMENT '品牌名称(英文)'
,`brand_desc` string COMMENT '品牌描述'
,`band_logo` string COMMENT '品牌LOGO'
,`status` tinyint COMMENT '有效=1,无效=0'
,`create_time` string COMMENT '创建时间'
,`update_time` string COMMENT '修改时间'
,`create_id` bigint COMMENT '创建人ID'
,`source_db` string COMMENT '源数据库名'
,`is_deleted` tinyint COMMENT '是否(物理)删除'
,`inserted_time` string COMMENT '目标初次插入时间'
,`updated_time` string COMMENT '目标最后更新时间'
) COMMENT '品牌表'
PARTITIONED BY (`dt` string COMMENT '分区')-- 日分区:yyyy-mm-dd
STORED AS orc
TBLPROPERTIES ("orc.compression"="snappy");
3.DIM层建表规范
- 如果是拉链表插入一条默认记录
举例:
DROP TABLE IF EXISTS `dim`.`dim_sal_commodity_df`;
CREATE TABLE IF NOT EXISTS `dim`.`dim_sal_commodity_df`(
`commodity_id` bigint COMMENT '主键'
,`brand_id` bigint COMMENT '品牌ID'
,`brand_cn_name` string COMMENT '品牌中文名称'
,`commodity_no` string COMMENT '产品代码'
,`commodity_name` string COMMENT '产品名称'
,`commodity_desc` string COMMENT '产品描述'
,`commodity_color` string COMMENT '颜色'
,`commodity_length` decimal(18,4) COMMENT '长'
,`commodity_width` decimal(18,4) COMMENT '宽'
,`commodity_height` decimal(18,4) COMMENT '高'
,`commodity_weight` decimal(18,4) COMMENT '重量'
,`retail_price` decimal(18,4) COMMENT '零售价'
,`cost_price` decimal(18,4) COMMENT '成本价'
,`launch_date` string COMMENT '上市日期'
,`delisting_date` string COMMENT '下士日期'
,`origin` string COMMENT '产地'
,`create_id` bigint COMMENT '操作人ID'
,`update_id` bigint COMMENT '更新人ID'
,`create_time` string COMMENT '创建时间'
,`update_time` string COMMENT '更新时间'
,`is_deleted` tinyint COMMENT '是否(物理)删除'
,`is_current` tinyint COMMENT '是否当前版本'
,`start_date` string COMMENT '开始使用日期'
,`end_date` string COMMENT '结束日期'
,`inserted_time` string COMMENT '目标初次插入时间'
,`updated_time` string COMMENT '目标最后更新时间'
) COMMENT '商品维表'
STORED AS orc
TBLPROPERTIES ("orc.compression"="snappy");
INSERT INTO `dim`.`dim_sal_commodity_df` (
`commodity_id`
,`brand_id`
,`brand_cn_name`
,`commodity_no`
,`commodity_name`
,`commodity_desc`
,`commodity_color`
,`commodity_length`
,`commodity_width`
,`commodity_height`
,`commodity_weight`
,`retail_price`
,`cost_price`
,`launch_date`
,`delisting_date`
,`origin`
,`create_id`
,`update_id`
,`create_time`
,`update_time`
,`is_deleted`
,`is_current`
,`start_date`
,`end_date`
,`inserted_time`
,`updated_time`
) VALUES
(
-99
,-99
,'None'
,'None'
,'None'
,'None'
,'None'
,0
,0
,0
,0
,0
,0
,'1970-01-01'
,'1970-01-01'
,'None'
,-99
,-99
,'1970-01-01 00:00:00'
,'1970-01-01 00:00:00'
,0
,1
,'1970-01-01'
,'9999-12-31'
,DATE_FORMAT(CURRENT_TIMESTAMP, 'yyyy-MM-dd HH:mm:ss')
,DATE_FORMAT(CURRENT_TIMESTAMP, 'yyyy-MM-dd HH:mm:ss')
);
4.DWD层建表规范
举例:
DROP TABLE IF EXISTS `dwd`.`dwd_sal_trd_ord_order_df`;
CREATE TABLE IF NOT EXISTS `dwd`.`dwd_sal_trd_ord_order_df`(
`order_id` bigint COMMENT '主键id'
,`commodity_id` bigint COMMENT '商品id'
,`user_id` bigint COMMENT 'C端用户ID'
,`order_status_code` bigint COMMENT '订单状态(0=已取消,1=待付款,2=待配送,3=配送中,4=配送成功,5=配送失败)'
,`order_status_name` string COMMENT '订单状态(0=已取消,1=待付款,2=待配送,3=配送中,4=配送成功,5=配送失败)'
,`pay_type_code` bigint COMMENT '支付方式:1=支付宝,2=微信,3=刷卡(线上),4=线上支付/积分支付,5=云闪付支付'
,`pay_type_name` string COMMENT '支付方式:1=支付宝,2=微信,3=刷卡(线上),4=线上支付/积分支付,5=云闪付支付'
,`pay_status_code` bigint COMMENT '支付状态(1=待付款,2=已付款)'
,`pay_status_name` string COMMENT '支付状态(1=待付款,2=已付款)'
,`total_amt` decimal(18,4) COMMENT '订单总金额(明细分摊之后的汇总)'
,`total_qty` decimal(18,4) COMMENT '商品总件数'
,`create_id` bigint COMMENT '创建者'
,`create_time` string COMMENT '创建日期'
,`update_id` bigint COMMENT '更新者'
,`update_time` string COMMENT '更新时间'
,`is_deleted` tinyint COMMENT '是否(物理)删除'
,`inserted_time` string COMMENT '目标初次插入时间'
,`updated_time` string COMMENT '目标最后更新时间'
) COMMENT '订单表'
STORED AS orc
TBLPROPERTIES ("orc.compression"="snappy");
5.DWS层建表规范
举例:
DROP TABLE IF EXISTS `px_dws`.`dws_sal_trd_complete_date_xj_retn_order_1d`;
CREATE TABLE IF NOT EXISTS `px_dws`.`dws_sal_trd_complete_date_xj_retn_order_1d`(
`shop_id` bigint COMMENT '门店ID'
,`member_id` bigint COMMENT '会员ID'
,`commodity_id` bigint COMMENT '商品ID'
,`source_type_code` tinyint COMMENT '订单来源(1线上,2线下)'
,`source_type_name` string COMMENT '订单来源:1-APP,2-POS,3-小程序,4-饿了么,5-鲜食APP,6-团购,7-京东到家'
,`order_id` bigint COMMENT '源订单ID'
,`return_order_id` bigint COMMENT '退货订单ID'
,`return_order_no` string COMMENT '退货订单编号'
,`delivery_retn_amt_1d_ai` decimal(18,4) COMMENT '最近1天配送成功退货总金额'
,`delivery_retn_cmd_cnt_1d_ai` decimal(18,4) COMMENT '最近1天配送成功退货商品数量'
,`origin_sale_amt_1d_ai` decimal(18,4) COMMENT '最近1天原价金额'
,`trade_amt_1d_ai` decimal(18,4) COMMENT '最近1天实收总金额'
,`real_delivery_amt_1d_ai` decimal(18,4) COMMENT '最近1天实发总金额'
,`inserted_time` string COMMENT '目标初次插入时间'
,`updated_time` string COMMENT '目标最后更新时间'
) COMMENT '最近一天按退货完成日期线上退货订单汇总报表'
PARTITIONED BY (`dt` string COMMENT '分区')-- 日分区:yyyy-mm-dd
STORED AS orc
TBLPROPERTIES ("orc.compression"="snappy");
6.ADS层建表规范
举例:
DROP TABLE IF EXISTS `ads`.`ads_sal_trd_pay_date_order_sum_1d`;
CREATE TABLE IF NOT EXISTS `ads`.`ads_sal_trd_pay_date_order_sum_1d`(
`report_date` string COMMENT '统计日期'
,`commodity_id` bigint COMMENT '商品id'
,`user_id` bigint COMMENT 'C端用户ID'
,`order_status_code` tinyint COMMENT '订单状态(0=已取消,1=待付款,2=待配送,3=配送中,4=配送成功,5=配送失败)'
,`order_status_name` string COMMENT '订单状态(0=已取消,1=待付款,2=待配送,3=配送中,4=配送成功,5=配送失败)'
,`pay_type_code` tinyint COMMENT '支付方式:1=支付宝,2=微信,3=刷卡(线上),4=线上支付/积分支付,5=云闪付支付'
,`pay_type_name` string COMMENT '支付方式:1=支付宝,2=微信,3=刷卡(线上),4=线上支付/积分支付,5=云闪付支付'
,`pay_status_code` tinyint COMMENT '支付状态(1=待付款,2=已付款)'
,`pay_status_name` string COMMENT '支付状态(1=待付款,2=已付款)'
,`total_amt_1d_ai` decimal(18,4) COMMENT '订单总金额(明细分摊之后的汇总)'
,`total_qty_1d_ai` decimal(18,4) COMMENT '商品总件数'
,`inserted_time` string COMMENT '目标初次插入时间'
,`updated_time` string COMMENT '目标最后更新时间'
) COMMENT '最近1天按支付日期订单汇总报表'
PARTITIONED BY (`dt` string COMMENT '分区')-- 日分区:yyyy-mm-dd
STORED AS orc
TBLPROPERTIES ("orc.compression"="snappy");
7.DAS层建表规范
- 表名和ADS层表名保持一致
- 字段顺序和ADS层保持一致
- 排序字段按照业务主键进行设置
- 数据生命周期按照业务需求设置
举例:
DROP TABLE IF EXISTS `das`.`ads_sal_trd_pay_date_order_sum_1d`;
CREATE TABLE IF NOT EXISTS `das`.`ads_sal_trd_pay_date_order_sum_1d`(
`report_date` Date COMMENT '统计日期'
,`commodity_id` Int64 COMMENT '商品id'
,`user_id` Int64 COMMENT 'C端用户ID'
,`order_status_code` Int8 COMMENT '订单状态(0=已取消,1=待付款,2=待配送,3=配送中,4=配送成功,5=配送失败)'
,`order_status_name` String COMMENT '订单状态(0=已取消,1=待付款,2=待配送,3=配送中,4=配送成功,5=配送失败)'
,`pay_type_code` Int8 COMMENT '支付方式:1=支付宝,2=微信,3=刷卡(线上),4=线上支付/积分支付,5=云闪付支付'
,`pay_type_name` String COMMENT '支付方式:1=支付宝,2=微信,3=刷卡(线上),4=线上支付/积分支付,5=云闪付支付'
,`pay_status_code` Int8 COMMENT '支付状态(1=待付款,2=已付款)'
,`pay_status_name` String COMMENT '支付状态(1=待付款,2=已付款)'
,`total_amt_1d_ai` Decimal(18,4) COMMENT '订单总金额(明细分摊之后的汇总)'
,`total_qty_1d_ai` Decimal(18,4) COMMENT '商品总件数'
,`inserted_time` DateTime COMMENT '目标初次插入时间'
,`updated_time` DateTime COMMENT '目标最后更新时间'
)
ENGINE = MergeTree()
ORDER BY (`report_date`,`commodity_id`,`user_id`)
PARTITION BY toYYYYMMDD(`report_date`)
TTL report_date + INTERVAL 765 DAY;
六、代码规范
通用规范:
- 保证代码可重复执行,而不影响结果
- 每层的源表只能是上一层的表,不能跨层查询。
- 表名前加上数据库名
- 数据库名、表名、字段名必须带上反引号
- 数据库关键字大写
- 分区名以参数的形式配置
- 指定队列名称
- 添加以下注释:作者、创建日期、功能描述、目标表、源表、修改日志信息(修改日期、修改人、修改内容)
1.STG层代码规范
- 源数据库和目标数据库以参数的形式配置
- STG层采用二种同步方式:
增量同步:如果表的数据量比较大,并且有更新时间戳
全量同步:如果表的数据量比较小或者缺少更新时间戳
举例:
增量脚本:
{
"job": {
"setting": {
"speed": {
"channel":4
}
},
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "$USER_NAME",
"password": "$PASSWORD",
"connection": [
{
"jdbcUrl": ["$JDBCURL"],
"querySql": [
"
select
`id`
,`commodity_code`
,`commodity_name`
,`commodity_describe`
,`brand_id`
,`retail_price`
,`cost_price`
,`launch_date`
,`delisting_date`
,`commodity_color`
,`create_time`
,`update_time`
,`origin`
,`length`
,`width`
,`height`
,`weight`
,`create_id`
,`update_id`
,`status`
,'order'
,now()
,now()
,now()
from t_commodity
WHERE update_time BETWEEN
str_to_date( '$START_TIME', '%Y-%m-%d %H:%i:%s' ) AND str_to_date( '$END_TIME', '%Y-%m-%d %H:%i:%s' )
"
]
}
]
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{
"name": "id",
"type": "bigint"
},
{
"name": "commodity_code",
"type": "string"
},
{
"name": "commodity_name",
"type": "string"
},
{
"name": "commodity_describe",
"type": "string"
},
{
"name": "brand_id",
"type": "bigint"
},
{
"name": "retail_price",
"type": "double"
},
{
"name": "cost_price",
"type": "double"
},
{
"name": "launch_date",
"type": "string"
},
{
"name": "delisting_date",
"type": "string"
},
{
"name": "commodity_color",
"type": "string"
},
{
"name": "create_time",
"type": "string"
},
{
"name": "update_time",
"type": "string"
},
{
"name": "origin",
"type": "string"
},
{
"name": "length",
"type": "double"
},
{
"name": "width",
"type": "double"
},
{
"name": "height",
"type": "double"
},
{
"name": "weight",
"type": "double"
},
{
"name": "create_id",
"type": "bigint"
},
{
"name": "update_id",
"type": "bigint"
},
{
"name": "status",
"type": "tinyint"
},
{
"name": "source_db",
"type": "string"
},
{
"name": "imported_time",
"type": "string"
},
{
"name": "inserted_time",
"type": "string"
},
{
"name": "updated_time",
"type": "string"
}
],
"hadoopConfig":{
"dfs.client.failover.proxy.provider.mycluster": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
"dfs.ha.namenodes.mycluster": "nn1,nn2",
"dfs.namenode.rpc-address.mycluster.nn1": "node101:8020",
"dfs.namenode.rpc-address.mycluster.nn2": "node102:8020",
"dfs.nameservices": "mycluster"
},
"compress": "",
"defaultFS": "$HDFS_URL",
"fieldDelimiter": "\u0001",
"fileName": "$TRGT_TABLE_NAME",
"fileType": "text",
"path": "${HIVE_DB_DIR}stg.db/$TRGT_TABLE_NAME/dt=$PARTITION",
"writeMode": "truncate"
}
}
}
],
}
}
全量脚本:
{
"job": {
"setting": {
"speed": {
"channel":4
}
},
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "$USER_NAME",
"password": "$PASSWORD",
"splitPk": "id",
"column": [
"id"
,"brand_china_name"
,"brand_english_name"
,"brand_desc"
,"band_logo"
,"status"
,"create_time"
,"update_time"
,"create_id"
,"'order'"
,'now()'
,'now()'
,'now()'
],
"connection": [
{
"jdbcUrl": ["$JDBCURL"],
"table": ["t_brand"]
}
]
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{
"name": "id",
"type": "bigint"
},
{
"name": "brand_china_name",
"type": "string"
},
{
"name": "brand_english_name",
"type": "string"
},
{
"name": "brand_desc",
"type": "string"
},
{
"name": "band_logo",
"type": "string"
},
{
"name": "status",
"type": "tinyint"
},
{
"name": "create_time",
"type": "string"
},
{
"name": "update_time",
"type": "string"
},
{
"name": "create_id",
"type": "bigint"
},
{
"name": "source_db",
"type": "string"
},
{
"name": "imported_time",
"type": "string"
},
{
"name": "inserted_time",
"type": "string"
},
{
"name": "updated_time",
"type": "string"
}
],
"hadoopConfig":{
"dfs.client.failover.proxy.provider.mycluster": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
"dfs.ha.namenodes.mycluster": "nn1,nn2",
"dfs.namenode.rpc-address.mycluster.nn1": "node101:8020",
"dfs.namenode.rpc-address.mycluster.nn2": "node102:8020",
"dfs.nameservices": "mycluster"
},
"defaultFS": "$HDFS_URL",
"fieldDelimiter": "\u0001",
"fileName": "$TRGT_TABLE_NAME",
"fileType": "text",
"path": "${HIVE_DB_DIR}stg.db/$TRGT_TABLE_NAME/dt=$PARTITION",
"writeMode": "truncate"
}
}
}
],
}
}
2.ODS层代码规范
ODS层采用二种同步方式:
- 全量覆盖:如果数据删除比较频繁,且只用当前的数据,如果数据合并会造成大量重复无用的数据
- 数据合并:一般情况下用数据合并
举例:
全量覆盖:
-- ================================================
-- 作者: Administrator
-- 创建日期: 2024-06-11
-- 功能描述: 同步ods层品牌表
-- 目标表: ods_ord_t_brand_df
-- 源表: stg_ord_t_brand_df
-- 修改日志:
-- 修改日期: 修改人: 修改内容:
-- ================================================
set mapreduce.job.queuename=hive;
INSERT OVERWRITE TABLE `ods`.`ods_ord_t_brand_df` PARTITION (`dt`='${PARTITITON}')
SELECT
t1.`id` AS `id`
,t1.`brand_china_name` AS `brand_china_name`
,t1.`brand_english_name` AS `brand_english_name`
,t1.`brand_desc` AS `brand_desc`
,t1.`band_logo` AS `band_logo`
,t1.`status` AS `status`
,t1.`create_time` AS `create_time`
,t1.`update_time` AS `update_time`
,t1.`create_id` AS `create_id`
,t1.`source_db` AS `is_deleted`
,0 AS `is_deleted`
,DATE_FORMAT(CURRENT_TIMESTAMP, 'yyyy-MM-dd HH:mm:ss') AS `inserted_time`
,DATE_FORMAT(CURRENT_TIMESTAMP, 'yyyy-MM-dd HH:mm:ss') AS `updated_time`
FROM `stg`.`stg_ord_t_brand_df` t1 WHERE `dt`='${PARTITITON}';
数据合并:
-- ================================================
-- 作者: Administrator
-- 创建日期: 2024-06-11
-- 功能描述: 同步ods层品牌表
-- 目标表: ods_ord_t_commodity_df
-- 源表: stg_ord_t_commodity_di
-- 修改日志:
-- 修改日期: 修改人: 修改内容:
-- ================================================
set mapreduce.job.queuename=hive;
INSERT OVERWRITE TABLE `ods`.`ods_ord_t_commodity_df` PARTITION (`dt`='${PARTITITON}')
SELECT
COALESCE(t2.`id`,t1.`id`) AS `id`
,COALESCE(t2.`commodity_code`,t1.`commodity_code`) AS `commodity_code`
,COALESCE(t2.`commodity_name`,t1.`commodity_name`) AS `commodity_name`
,COALESCE(t2.`commodity_describe`,t1.`commodity_describe`) AS `commodity_describe`
,COALESCE(t2.`brand_id`,t1.`brand_id`) AS `brand_id`
,COALESCE(t2.`retail_price`,t1.`retail_price`) AS `retail_price`
,COALESCE(t2.`cost_price`,t1.`cost_price`) AS `cost_price`
,COALESCE(t2.`launch_date`,t1.`launch_date`) AS `launch_date`
,COALESCE(t2.`delisting_date`,t1.`delisting_date`) AS `delisting_date`
,COALESCE(t2.`commodity_color`,t1.`commodity_color`) AS `commodity_color`
,COALESCE(t2.`create_time`,t1.`create_time`) AS `create_time`
,COALESCE(t2.`update_time`,t1.`update_time`) AS `update_time`
,COALESCE(t2.`origin`,t1.`origin`) AS `origin`
,COALESCE(t2.`length`,t1.`length`) AS `length`
,COALESCE(t2.`width`,t1.`width`) AS `width`
,COALESCE(t2.`height`,t1.`height`) AS `height`
,COALESCE(t2.`weight`,t1.`weight`) AS `weight`
,COALESCE(t2.`create_id`,t1.`create_id`) AS `create_id`
,COALESCE(t2.`update_id`,t1.`update_id`) AS `update_id`
,COALESCE(t2.`status`,t1.`status`) AS `status`
,COALESCE(t2.`source_db`,t1.`source_db`) AS `source_db`
,COALESCE(t1.`is_deleted`, 0) AS `is_deleted`
,COALESCE(t1.`inserted_time`,DATE_FORMAT(CURRENT_TIMESTAMP, 'yyyy-MM-dd HH:mm:ss')) AS `inserted_time`
,CASE WHEN t2.`update_time` >t1.`update_time`
THEN DATE_FORMAT(CURRENT_TIMESTAMP, 'yyyy-MM-dd HH:mm:ss')
ELSE COALESCE(t1.`updated_time`,DATE_FORMAT(CURRENT_TIMESTAMP, 'yyyy-MM-dd HH:mm:ss'))
END AS `updated_time`
FROM
(SELECT * FROM `ods`.`ods_ord_t_commodity_df` WHERE `dt`='${PRE_PARTITITON}') t1
FULL JOIN (SELECT * FROM `stg`.`stg_ord_t_commodity_di` WHERE `dt`='${PARTITITON}') t2
ON t1.`id`=t2.`id`;
3.DIM层代码规范
- 保证每个字段没有空值,如果有按默认值填充
- DIM层采用二种同步方式:
- 全量覆盖:如果只用当前的数据,历史数据无效
- 拉链表:一般情况下用拉链表
举例:
全量覆盖:
-- ================================================
-- 作者: Administrator
-- 创建日期: 2024-06-11
-- 功能描述: 同步dim层品牌维表
-- 目标表: dim_sal_brand_df
-- 源表: ods_ord_t_brand_df
-- 修改日志:
-- 修改日期: 修改人: 修改内容:
-- ================================================
set mapreduce.job.queuename=hive;
INSERT OVERWRITE TABLE `dim`.`dim_sal_brand_df`
SELECT
defaultv(t1.`id`,-99) AS `brand_id`
,defaultv(t1.`brand_china_name`,'None') AS `brand_cn_name`
,defaultv(t1.`brand_english_name`,'None') AS `brand_en_name`
,defaultv(t1.`brand_desc`,'None') AS `brand_desc`
,defaultv(t1.`band_logo`,'None') AS `band_logo`
,defaultv(t1.`status`,-99) AS `band_status`
,defaultv(t1.`create_time`,'1970-01-01 00:00:00') AS `create_time`
,defaultv(t1.`update_time`,'1970-01-01 00:00:00') AS `update_time`
,defaultv(t1.`create_id`,-99) AS `create_id`
,0 AS `is_deleted`
,DATE_FORMAT(CURRENT_TIMESTAMP, 'yyyy-MM-dd HH:mm:ss') AS `inserted_time`
,DATE_FORMAT(CURRENT_TIMESTAMP, 'yyyy-MM-dd HH:mm:ss') AS `updated_time`
FROM `ods`.`ods_ord_t_brand_df` t1
WHERE t1.`dt` = '${PARTITITON}';
拉链表:
-- ================================================
-- 作者: Administrator
-- 创建日期: 2024-06-11
-- 功能描述: 同步dim层商品维表
-- 目标表: dim_sal_commodity_df
-- 源表: ods_ord_t_commodity_df,ods_ord_t_brand_df
-- 修改日志:
-- 修改日期: 修改人: 修改内容:
-- ================================================
set mapreduce.job.queuename=hive;
WITH `tmp_dim_sal_commodity_df` AS
(SELECT
COALESCE(t2.`commodity_id`,t1.`commodity_id`) AS `commodity_id`
,COALESCE(t2.`brand_id`,t1.`brand_id`) AS `brand_id`
,COALESCE(t2.`brand_cn_name`,t1.`brand_cn_name`) AS `brand_cn_name`
,COALESCE(t2.`commodity_no`,t1.`commodity_no`) AS `commodity_no`
,COALESCE(t2.`commodity_name`,t1.`commodity_name`) AS `commodity_name`
,COALESCE(t2.`commodity_desc`,t1.`commodity_desc`) AS `commodity_desc`
,COALESCE(t2.`commodity_color`,t1.`commodity_color`) AS `commodity_color`
,COALESCE(t2.`commodity_length`,t1.`commodity_length`) AS `commodity_length`
,COALESCE(t2.`commodity_width`,t1.`commodity_width`) AS `commodity_width`
,COALESCE(t2.`commodity_height`,t1.`commodity_height`) AS `commodity_height`
,COALESCE(t2.`commodity_weight`,t1.`commodity_weight`) AS `commodity_weight`
,COALESCE(t2.`retail_price`,t1.`retail_price`) AS `retail_price`
,COALESCE(t2.`cost_price`,t1.`cost_price`) AS `cost_price`
,COALESCE(t2.`launch_date`,t1.`launch_date`) AS `launch_date`
,COALESCE(t2.`delisting_date`,t1.`delisting_date`) AS `delisting_date`
,COALESCE(t2.`origin`,t1.`origin`) AS `origin`
,COALESCE(t2.`create_id`,t1.`create_id`) AS `create_id`
,COALESCE(t2.`update_id`,t1.`update_id`) AS `update_id`
,COALESCE(t2.`create_time`,t1.`create_time`) AS `create_time`
,COALESCE(t2.`update_time`,t1.`update_time`) AS `update_time`
,COALESCE(t2.`is_deleted`, t1.`is_deleted`) AS `is_deleted`
,DATE_FORMAT(CURRENT_TIMESTAMP, 'yyyy-MM-dd HH:mm:ss') AS `inserted_time`
,DATE_FORMAT(CURRENT_TIMESTAMP, 'yyyy-MM-dd HH:mm:ss') AS `updated_time`
,CASE
WHEN t1.`commodity_id` IS NULL THEN 'insert'
ELSE 'update'
END AS `flag`
FROM
(SELECT * FROM `dim`.`dim_sal_commodity_df` WHERE `is_current` =1) t1
FULL JOIN (SELECT
defaultv(t1.`id`,-99) AS `commodity_id`
,defaultv(t1.`brand_id`,-99) AS `brand_id`
,defaultv(t2.`brand_china_name`,'None') AS `brand_cn_name`
,defaultv(t1.`commodity_code`,'None') AS `commodity_no`
,defaultv(t1.`commodity_name`,'None') AS `commodity_name`
,defaultv(t1.`commodity_describe`,'None') AS `commodity_desc`
,defaultv(t1.`commodity_color`,'None') AS `commodity_color`
,defaultv(t1.`length`,0) AS `commodity_length`
,defaultv(t1.`width`,0) AS `commodity_width`
,defaultv(t1.`height`,0) AS `commodity_height`
,defaultv(t1.`weight`,0) AS `commodity_weight`
,defaultv(t1.`retail_price`,0) AS `retail_price`
,defaultv(t1.`cost_price`,0) AS `cost_price`
,defaultv(t1.`launch_date`,'1970-01-01') AS `launch_date`
,defaultv(t1.`delisting_date`,'1970-01-01') AS `delisting_date`
,defaultv(t1.`origin`,'None') AS `origin`
,defaultv(t1.`create_id`,-99) AS `create_id`
,defaultv(t1.`update_id`,-99) AS `update_id`
,defaultv(t1.`create_time`,'1970-01-01 00:00:00') AS `create_time`
,defaultv(t1.`update_time`,'1970-01-01 00:00:00') AS `update_time`
,t1.`is_deleted`
,t1.`inserted_time`
,t1.`updated_time`
FROM `ods`.`ods_ord_t_commodity_df` t1
LEFT JOIN `ods`.`ods_ord_t_brand_df` t2 ON t1.`brand_id`=t2.`id` AND t2.`dt` ='${PARTITITON}'
WHERE t1.`dt`='${PARTITITON}'
) t2
ON t1.`commodity_id`=t2.`commodity_id`
WHERE MD5(CONCAT(
t2.`brand_id`
,t2.`brand_cn_name`
,t2.`commodity_no`
,t2.`commodity_name`
,t2.`commodity_desc`
,t2.`commodity_color`
,t2.`commodity_length`
,t2.`commodity_width`
,t2.`commodity_height`
,t2.`commodity_weight`
,t2.`retail_price`
,t2.`cost_price`
,t2.`launch_date`
,t2.`delisting_date`
,t2.`origin`
,t2.`create_id`
,t2.`update_id`
,t2.`create_time`
,t2.`update_time`
,t2.`is_deleted`
)) <> MD5(CONCAT(
t1.`brand_id`
,t1.`brand_cn_name`
,t1.`commodity_no`
,t1.`commodity_name`
,t1.`commodity_desc`
,t1.`commodity_color`
,t1.`commodity_length`
,t1.`commodity_width`
,t1.`commodity_height`
,t1.`commodity_weight`
,t1.`retail_price`
,t1.`cost_price`
,t1.`launch_date`
,t1.`delisting_date`
,t1.`origin`
,t1.`create_id`
,t1.`update_id`
,t1.`create_time`
,t1.`update_time`
,t1.`is_deleted`
)) OR t1.`commodity_id` IS NULL)
INSERT OVERWRITE TABLE `dim`.`dim_sal_commodity_df`
SELECT * FROM
(
SELECT
t1.`commodity_id`
,t1.`brand_id`
,t1.`brand_cn_name`
,t1.`commodity_no`
,t1.`commodity_name`
,t1.`commodity_desc`
,t1.`commodity_color`
,t1.`commodity_length`
,t1.`commodity_width`
,t1.`commodity_height`
,t1.`commodity_weight`
,t1.`retail_price`
,t1.`cost_price`
,t1.`launch_date`
,t1.`delisting_date`
,t1.`origin`
,t1.`create_id`
,t1.`update_id`
,t1.`create_time`
,t1.`update_time`
,t1.`is_deleted`
,CASE
WHEN t1.`end_date` = '9999-12-31' AND t2.`commodity_id` IS NOT NULL THEN 0
ELSE t1.`is_current`
END AS `is_current`
,t1.`start_date`
,CASE
WHEN t1.`end_date` = '9999-12-31' AND t2.`commodity_id` IS NOT NULL THEN DATE_FORMAT(DATE_ADD(CURRENT_TIMESTAMP, -1), 'yyyy-MM-dd')
ELSE t1.`end_date`
END AS `end_date`
,t1.`inserted_time`
,t1.`updated_time`
FROM `dim`.`dim_sal_commodity_df` t1
LEFT JOIN (SELECT `commodity_id` FROM `tmp_dim_sal_commodity_df` WHERE `flag`='update') t2
ON t1.`commodity_id` = t2.`commodity_id`
UNION ALL
SELECT
t1.`commodity_id`
,t1.`brand_id`
,t1.`brand_cn_name`
,t1.`commodity_no`
,t1.`commodity_name`
,t1.`commodity_desc`
,t1.`commodity_color`
,t1.`commodity_length`
,t1.`commodity_width`
,t1.`commodity_height`
,t1.`commodity_weight`
,t1.`retail_price`
,t1.`cost_price`
,t1.`launch_date`
,t1.`delisting_date`
,t1.`origin`
,t1.`create_id`
,t1.`update_id`
,t1.`create_time`
,t1.`update_time`
,0 `is_deleted`
,1 `is_current`
,CASE WHEN `flag`='insert' THEN '1970-01-01' ELSE DATE_FORMAT(DATE_ADD(CURRENT_TIMESTAMP, -1), 'yyyy-MM-dd') END AS `start_date`
,'9999-12-31' AS `end_date`
,`inserted_time`
,`updated_time`
FROM `tmp_dim_sal_commodity_df` t1
) a;
4.DWD层代码规范
- 更新时间范围要以参数的形式配置
- 保证每个字段没有空值,如果有按默认值填充
- 关联大表时要加上时间范围条件而不能全表扫描
举例:
-- ================================================
-- 作者: Administrator
-- 创建日期: 2024-06-11
-- 功能描述: 同步dwd层订单表
-- 目标表: dwd_sal_trd_ord_order_df
-- 源表: ods_ord_t_order_df.,dim_pub_dictionary
-- 修改日志:
-- 修改日期: 修改人: 修改内容:
-- ================================================
set mapreduce.job.queuename=hive;
INSERT OVERWRITE TABLE `dwd`.`dwd_sal_trd_ord_order_df`
SELECT
COALESCE(t2.`order_id`,t1.`order_id`) AS `order_id`
,COALESCE(t2.`commodity_id`,t1.`commodity_id`) AS `commodity_id`
,COALESCE(t2.`user_id`,t1.`user_id`) AS `user_id`
,COALESCE(t2.`order_status_code`,t1.`order_status_code`) AS `order_status_code`
,COALESCE(t2.`order_status_name`,t1.`order_status_name`) AS `order_status_name`
,COALESCE(t2.`pay_type_code`,t1.`pay_type_code`) AS `pay_type_code`
,COALESCE(t2.`pay_type_name`,t1.`pay_type_name`) AS `pay_type_name`
,COALESCE(t2.`pay_status_code`,t1.`pay_status_code`) AS `pay_status_code`
,COALESCE(t2.`pay_status_name`,t1.`pay_status_name`) AS `pay_status_name`
,COALESCE(t2.`total_amt`,t1.`total_amt`) AS `total_amt`
,COALESCE(t2.`total_qty`,t1.`total_qty`) AS `total_qty`
,COALESCE(t2.`create_id`,t1.`create_id`) AS `create_id`
,COALESCE(t2.`create_time`,t1.`create_time`) AS `create_time`
,COALESCE(t2.`update_id`,t1.`update_id`) AS `update_id`
,COALESCE(t2.`update_time`,t1.`update_time`) AS `update_time`
,COALESCE(t2.`is_deleted`, t1.`is_deleted`) AS `is_deleted`
,COALESCE(t1.`inserted_time`, t2.`inserted_time`) AS `inserted_time`
,COALESCE(t2.`updated_time`, t1.`updated_time`) AS `updated_time`
FROM
(SELECT * FROM `dwd`.`dwd_sal_trd_ord_order_df`) t1
FULL JOIN (SELECT
defaultv(t1.`id`,-99) AS `order_id`
,defaultv(t1.`commodity_id`,-99) AS `commodity_id`
,defaultv(t1.`user_id`,-99) AS `user_id`
,defaultv(t1.`order_status`,-99) AS `order_status_code`
,defaultv(t2.`dic_name`,'None') AS `order_status_name`
,defaultv(t1.`pay_type`,-99) AS `pay_type_code`
,defaultv(t2.`dic_name`,'None') AS `pay_type_name`
,defaultv(t1.`pay_status`,-99) AS `pay_status_code`
,defaultv(t1.`pay_status`,'None') AS `pay_status_name`
,defaultv(t1.`total_amount`,0) AS `total_amt`
,defaultv(t1.`total_quantity`,0) AS `total_qty`
,defaultv(t1.`create_id`,-99) AS `create_id`
,defaultv(t1.`create_time`,'1970-01-01 00:00:00') AS `create_time`
,defaultv(t1.`update_id`,-99) AS `update_id`
,defaultv(t1.`update_time`,'1970-01-01 00:00:00') AS `update_time`
,t1.`is_deleted`
,DATE_FORMAT(CURRENT_TIMESTAMP, 'yyyy-MM-dd HH:mm:ss') AS `inserted_time`
,DATE_FORMAT(CURRENT_TIMESTAMP, 'yyyy-MM-dd HH:mm:ss') AS `updated_time`
FROM `ods`.`ods_ord_t_order_df.` t1
LEFT JOIN `dim`.`dim_pub_dictionary` t2 ON AND t2.`dt` ='${PARTITITON}'
WHERE t1.`dt`='${PARTITITON}'
AND t1.`update_time` >= DATE_FORMAT(${START_TIME}, 'yyyy-MM-dd') AND t1.`update_time` < DATE_FORMAT(${END_TIME}, 'yyyy-MM-dd')
) t2
ON t1.`order_id`=t2.`order_id`;
5.DWS层代码规范
- 更新时间范围要以参数的形式配置
- 保证每个字段没有空值,如果有按默认值填充
- 可变指标和不可变指标不能放在同一个表里
举例:
-- ================================================
-- 作者: Administrator
-- 创建日期: 2024-03-02
-- 功能描述: 最近一天按退货完成日期线上退货订单汇总报表
-- 目标表: dws_sal_trd_complete_date_xj_retn_order_1d
-- 源表: dwd_sal_trd_retn_ord_xj_order_df
-- 修改日志:
-- 修改日期: 修改人: 修改内容:
-- ================================================
set mapreduce.job.queuename=hive;
INSERT OVERWRITE TABLE `px_dws`.`dws_sal_trd_complete_date_xj_retn_order_1d`
SELECT
t1.`shop_id`
,t1.`member_id`
,t1.`commodity_id`
,t1.`source_type_code`
,t1.`source_type_name`
,t1.`return_time` AS `return_order_date`
,t1.`create_time` AS `return_order_time`
,t1.`update_time` AS `return_complete_date`
,t1.`order_id`
,t1.`return_order_id`
,t1.`return_order_no`
,SUM(t1.`commodity_price` * t1.`commodity_quantity`) AS `origin_retn_sale_amt_1d_ai`
,SUM(t1.`commodity_sale_amt`) AS `trade_retn_amt_1d_ai`
,SUM(t1.`commodity_real_amt`) AS `delivery_retn_amt_1d_ai`
,SUM(t1.`commodity_real_quantity` ) AS `delivery_retn_cmd_cnt_1d_ai`
,SUM(CASE WHEN t1.`order_status_code` = 6 THEN t1.`commodity_real_amt` ELSE 0 END) AS `commodity_real_amt_1d_ai`
,SUM(CASE WHEN t1.`order_status_code` = 6 THEN t1.`commodity_real_quantity` ELSE 0 END) AS `commodity_real_quantity_1d_ai`
,DATE_FORMAT(CURRENT_TIMESTAMP, 'yyyy-MM-dd HH:mm:ss') AS `inserted_time`
,DATE_FORMAT(CURRENT_TIMESTAMP, 'yyyy-MM-dd HH:mm:ss') AS `updated_time`
,DATE_FORMAT(t1.`update_time` , 'yyyy-MM-dd') AS `dt`
FROM `px_dwd`.`dwd_sal_trd_retn_ord_xj_order_df` t1
WHERE
t1.`dt` ='9999-12-31'
AND t1.`is_deleted`=0
AND t1.`return_order_status_code`=7
AND t1.`update_time` >= DATE_FORMAT(${START_TIME}, 'yyyy-MM-dd')
AND t1.`update_time`< DATE_FORMAT(${END_TIME}, 'yyyy-MM-dd')
GROUP BY
t1.`shop_id`
,t1.`member_id`
,t1.`commodity_id`
,t1.`return_time`
,t1.`create_time`
,t1.`update_time`
,t2.`source_type_code`
,t2.`source_type_name`
,t1.`order_id`
,t1.`return_order_id`
,t1.`return_order_no`
,DATE_FORMAT(t1.`update_time` , 'yyyy-MM-dd');
6.ADS层代码规范
- 更新时间范围要以参数的形式配置
- 保证每个字段没有空值,如果有按默认值填充
- 维度表字段根据需求取快照值或当前值
举例:
-- ================================================
-- 作者: Administrator
-- 创建日期: 2024-04-16
-- 功能描述: 最近1天线上门店商品销售分析报表
-- 目标表: ads_sal_trd_pay_retn_order_shop_item_1d
-- 源表: dws_sal_trd_complete_date_xj_retn_order_1d,dim_sal_shop,dim_pub_commodity
-- 修改日志:
-- 修改日期: 修改人: 修改内容:
-- ================================================
set mapreduce.job.queuename=hive;
INSERT OVERWRITE TABLE `px_ads`.`ads_sal_trd_pay_retn_order_shop_item_1d`
SELECT
t1.`dt` AS `report_date`
,t1.`shop_id`
,t2.`shop_no`
,t2.`shop_name`
,t1.`commodity_id`
,t3.`commodity_no`
,t3.`commodity_name`
,t1.`source_type_code`
,t1.`source_type_name`
,SUM(t2.`trade_amt_1d_ai`) AS `trade_amt_1d_ai`
,SUM(t2.`gross_cmd_cnt_1d_ai`) AS `gross_cmd_cnt_1d_ai`
,SUM(t2.`pay_sale_amt_1d_ai`) AS `pay_sale_amt_1d_ai`
,SUM(t2.`pay_cmd_cnt_1d_ai`) AS `pay_cmd_cnt_1d_ai`
,DATE_FORMAT(CURRENT_TIMESTAMP, 'yyyy-MM-dd HH:mm:ss') AS `inserted_time`
,DATE_FORMAT(CURRENT_TIMESTAMP, 'yyyy-MM-dd HH:mm:ss') AS `updated_time`
,t1.`dt`
FROM `px_dws`.`dws_sal_trd_complete_date_xj_retn_order_1d` t1
LEFT JOIN `px_dim`.`dim_sal_shop` t2 ON t1.`shop_id` = t2.`shop_id` AND t1.`dt`>= DATE_FORMAT(t2.`start_time`, 'yyyy-MM-dd' ) AND t1.`dt`< DATE_FORMAT(t2.`end_time`, 'yyyy-MM-dd' )
LEFT JOIN `px_dim`.`dim_pub_commodity` t3 ON t1.`commodity_id` = t3.`commodity_id` AND t3.`is_current` =1
WHERE t1.`dt` >= DATE_FORMAT(${START_TIME}, 'yyyy-MM-dd') AND t1.`dt` < DATE_FORMAT(${END_TIME}, 'yyyy-MM-dd')
GROUP BY
t1.`shop_id`
,t2.`shop_no`
,t2.`shop_name`
,t1.`commodity_id`
,t3.`commodity_no`
,t3.`commodity_name`
,t1.`source_type_code`
,t1.`source_type_name`
,t1.`dt`
7.DAS表代码规范
- 源数据库和目标数据库以参数的形式配置
举例:
{
"job": {
"setting": {
"speed": {
"channel":4
}
},
"content": [
{
"reader": {
"name": "hdfsreader",
"parameter": {
"path": "${HIVE_DB_DIR}ads.db/$TRGT_TABLE_NAME/dt=$PARTITION",
"hadoopConfig":{
"dfs.client.failover.proxy.provider.mycluster": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
"dfs.ha.namenodes.mycluster": "nn1,nn2",
"dfs.namenode.rpc-address.mycluster.nn1": "node101:8020",
"dfs.namenode.rpc-address.mycluster.nn2": "node102:8020",
"dfs.nameservices": "mycluster"
},
"defaultFS": "$HDFS_URL",
"column": [
{
"index": "0",
"type": "string"
},
{
"index": "1",
"type": "long"
},
{
"index": "2",
"type": "long"
},
{
"index": "3",
"type": "long"
},
{
"index": "4",
"type": "string"
},
{
"index": "5",
"type": "long"
},
{
"index": "6",
"type": "string"
},
{
"index": "7",
"type": "long"
},
{
"index": "8",
"type": "string"
},
{
"index": "9",
"type": "double"
},
{
"index": "10",
"type": "double"
},
{
"index": "11",
"type": "string"
},
{
"index": "12",
"type": "string"
}
],
"fileType": "orc",
"encoding": "UTF-8",
"fieldDelimiter": "\u0001"
}
},
"writer": {
"name": "clickhousewriter",
"parameter": {
"username": "$USER_NAME",
"password": "$PASSWORD",
"column": [
"report_date"
,"commodity_id"
,"user_id"
,"order_status_code"
,"order_status_name"
,"pay_type_code"
,"pay_type_name"
,"pay_status_code"
,"pay_status_name"
,"total_amt"
,"total_qty"
,"inserted_time"
,"updated_time"
],
"preSql": ["ALTER TABLE $TRGT_TABLE_NAME DROP PARTITION '$CK_PARTITION'"],
"connection": [
{
"jdbcUrl":"$JDBCURL",
"table": ["$TRGT_TABLE_NAME"]
}
]
}
}
}
],
}
}