大数据-案例-离线数仓-电商:【MySQL(业务)-ETL(Kettle)】+【前端JS埋点->日志->Flume->HDFS->ETL(SparkRDD)】->Hive数仓->MySQL->可视化

一、电商行业简介

1、电商行业分析

近年来,中国的电子商务快速发展,交易额连创新高,电子商务在各领域的应用不断拓展和深化、相关服务业蓬勃发展、支撑体系不断健全完善、创新的动力和能力 不断增强。电子商务正在与实体经济深度融合,进入规模性发展阶段,对经济社会生活的影响不断增大,正成为我国经济发展的新引擎。

中国电子商务研究中心数据显示,截止到 2012 年底,中国电子商务市场交易规模达 7.85万亿人民币,同比增长 30.83%。其中,B2B 电子商务交易额 达 6.25 万亿,同比增长 27%。而 2011 年全年,中国电子商务市场交易额达 6 万亿人民币,同比增长 33%,占 GDP 比重上升到 13%;2012 年,电子商务占 GDP 的比重已经高达 15%。

电商行业技术特点

  • 技术新

  • 技术范围广

  • 分布式

  • 高并发、集群、负载均衡(Nginx)、高可用(备机)

  • 海量数据

  • 业务复杂

  • 系统安全

2、电商业务系统简介

品优购网上商城是一个综合性的 B2B2C 平台,类似京东商城、天猫商城。网站采用商家入驻的模式,商家入驻平台提交申请,有平台进行资质审核,审核通过后,商家拥有独立的管理后台录入商品信息。商品经过平台审核后即可发布。

品优购网上商城主要分为

  • 网站前台
  • 运营商后台
  • 商家管理后台

2.1 网站前台

主要包括

  • 网站首页
  • 商家首页
  • 商品详细页
  • 搜索页
  • 会员中心
  • 订单与支付相关页面
  • 秒杀频道等。
前台
在这里插入图片描述

2.2 运营商后台

是运营商的运营人员的管理后台。 主要包括商家审核、品牌管理、规格管理、模板管理、商品分类管理、商品审核、广告类型管理、广告管理、订单查询、商家结算等。

后台
在这里插入图片描述

2.3 商家管理后台

入驻的商家进行管理的后台,主要功能是对商品的管理以及订单查询统计、资金结算等功能。

商家管理后台
在这里插入图片描述

二、大数据数仓项目简介

1、业务流程

本次数仓业务流程主要分为两类,

  • 一类是用户下单、提交订单、支付、退款这一条线,
  • 另一类是我们收集用户的页面行为数据:用户搜索商品、添加购物车 、提交订单、支付订单 的日志数据,分析电商网站常见的PV,UV,GMV,

GMV (Gross Merchandise Volume):主要是指网站的成交金额,而这里的成交金额包括:付款金额和未付款。

千亿级数仓模仿阿里巴巴双十一的大屏显示功能实现的互联网电商指标的离线分析,同时也模仿了阿里巴巴大数据平台上面数据仓库的设计思想和理念。通过这个项目,能够掌握以下三个核心技能:

1、数据仓库的概念和建设过程

2、离线数据仓库的功能、使用场景和常用的技术栈

2、大数据离线数仓项目架构

离线项目架构图
在这里插入图片描述

3、项目具体技术简介

  • Kettle
  • 缓慢变化维(拉链表):时间维度,脚本生成,时间维度生成之后不会变化,SCD问题我们使用拉链表来解决;
  • Hive
  • kettle:导出数据的工具
  • Spark SQL:计算引擎
  • Kylin:计算引擎,进行预计算之后的多维统计分析可以达到亚秒级别。

4、项目环境介绍

  • 业务数据量

    • 用户数:300W
    • 每日订单量:10W
    • 每日交易额:700W
    • 商家数:5W
    • 商品数:45W
    • PV:500W
    • UV:50W
  • 数据在hdfs中平均每天 40G左右的速度增长,存储3份,每天增长大概120G,存储hive表时

    会说过parquet格式+snappy压缩

  • 硬件资源

    • 数量:30台

    • CPU资源:24核

    • 内存:128G

    • 硬盘:4T

5、业务系统表结构介绍

在这里插入图片描述
在这里插入图片描述

订单表itcast_orders
字段名称 数据类型 字段说明
orderId bigint(11) 订单id
orderNo varchar(20) 订单编号
userId bigint(11) 用户id
orderStatus tinyint(4) 订单状态,-3:用户拒收;-2:未付款的订单;-1:用户取消;0:待发货;1:配送中;2:用户确认收货
goodsMoney decimal(11,2) 商品金额
deliverType tinyint(4) 收货方式
deliverMoney decimal(11,2) 运费
totalMoney decimal(11,2) 订单金额(包括运费)
realTotalMoney decimal(11,2) 实际订单金额(折扣后金额)
payType tinyint(4) 支付方式,0:未知;1:支付宝,2:微信;3、现金;4、其他
payFrom varchar(20) 支付来源
isPay tinyint(4) 是否支付
areaId int(11) 区域最低一级
areaIdPath varchar(255) 区域idpath
userName varchar(20) 收件人姓名
userAddressId int(11) 收件人地址ID
userAddress varchar(255) 收件人地址
userPhone char(20) 收件人电话
orderScore int(11) 订单所得积分
isInvoice tinyint(4) 是否开发票,1:需要;0:不需要
invoiceClient varchar(255) 发票抬头
orderRemarks varchar(255) 订单备注
orderSrc tinyint(4) 订单来源,0:商城;1:微信;2:手机版;3:安卓App4:苹果App;5订餐设备
needPay decimal(11,2) 需缴费用
payRand int(11) 货币单位
orderType int(11) 订单类型
isRefund tinyint(4) 是否退款
isAppraise tinyint(4) 是否点评
cancelReason int(11) 取消原因ID
rejectReason int(11) 用户拒绝原因ID
rejectOtherReason varchar(255) 拒收原因
isClosed tinyint(4) 是否订单已完结
orderunique varchar(50) 订单流水号
isFromCart tinyint(1) 是否来自购物车 0:直接下单 1:购物车
receiveTime varchar(25) 收货时间
deliveryTime varchar(25) 发货时间
tradeNo varchar(100) 在线支付交易流水
dataFlag tinyint(4) 订单有效标志
createTime varchar(25) 下单时间
settlementId int(11) 是否结算,大于0的话则是结算ID
commissionFee decimal(11,2) 订单应收佣金
scoreMoney decimal(11,2) 积分抵扣金额
useScore int(11) 花费积分
extraJson text 额外信息
noticeDeliver tinyint(3) 提醒发货,0:未提醒;1:已提醒
invoiceJson text 发票信息
lockCashMoney decimal(11,2) 锁定提现金额
payTime varchar(25) 支付时间
isBatch tinyint(4) 是否拼单
totalPayFee int(11) 总支付金额
modifiedTime timestamp 更新时间
订单明细表 itcast_order_goods
字段名 类型 说明
ogId bigint(11) 订单明细(商品)id
orderId bigint(11) 订单id
goodsId bigint(11) 商品id
goodsNum bigint(11) 商品数量
goodsPrice decimal(11,2) 商品价格
goodsSpecId int(11) 商品规格id
goodsSpecNames varchar(500) 商品规格列表
goodsName varchar(200) 商品名称
goodsImg varchar(150) 商品图片
extraJson text 额外信息
goodsType tinyint(4) 商品类型
commissionRate decimal(11,2) 商品佣金比率
goodsCode varchar(20) 商品编码
promotionJson text 促销信息
createTime varchar(20) 创建时间
商品信息表 itcast_goods
goodsId bigint(11) 商品id
goodsSn varchar(20) 商品编号
productNo varchar(20) 商品货号
goodsName varchar(200) 商品名称
goodsImg varchar(150) 商品图片
shopId bigint(11) 门店ID
goodsType tinyint(4) 货物类型
marketPrice decimal(11,2) 市场价
shopPrice decimal(11,2) 门店价
warnStock bigint(11) 预警库存
goodsStock bigint(11) 商品总库存
goodsUnit char(10) 单位
goodsTips text 促销信息
isSale tinyint(4) 是否上架 0:不上架 1:上架
isBest tinyint(4) 是否精品 0:否 1:是
isHot tinyint(4) 是否热销产品 0:否 1:是
isNew tinyint(4) 是否新品 0:否 1:是
isRecom tinyint(4) 是否推荐 0:否 1:是
goodsCatIdPath varchar(255) 商品分类ID路径catId1_catId2_catId3
goodsCatId int(11) 最后一级商品分类ID
shopCatId1 int(11) 门店商品分类第一级ID
shopCatId2 int(11) 门店商品第二级分类ID
brandId int(11) 品牌ID
goodsDesc text 商品描述
goodsStatus tinyint(4) 商品状态 -1:违规 0:未审核 1:已审核
saleNum int(11) 总销售量
saleTime varchar(25) 上架时间
visitNum int(11) 访问数
appraiseNum int(11) 评价书
isSpec tinyint(4) 是否有规格 0:没有 1:有
gallery text 商品相册
goodsSeoKeywords varchar(200) 商品SEO关键字
illegalRemarks varchar(255) 状态说明 一般用于说明拒绝原因
dataFlag tinyint(4) 删除标志 -1:删除 1:有效
createTime varchar(25) 创建时间
isFreeShipping tinyint(4)
goodsSerachKeywords text 商品搜索关键字
店铺表 itcast_shops
字段名 字段类型 字段说明
shopId int(11) 商铺ID,自增
shopSn varchar(20)
userId int(11) 商铺联系人ID,
areaIdPath varchar(255)
areaId int(11)
isSelf tinyint(4)
shopName varchar(100) 商铺名称,
shopkeeper varchar(50)
telephone varchar(20) 联系人电话,
shopCompany varchar(255) 商家实体名称,
shopImg varchar(150) logo图片,
shopTel varchar(40) 商家联系电话,
shopQQ varchar(50) 联系人QQ,
shopWangWang varchar(50)
shopAddress varchar(255) 商家地址,
bankId int(11)
bankNo varchar(20)
bankUserName varchar(50)
isInvoice tinyint(4)
invoiceRemarks varchar(255)
serviceStartTime bigint(20) 服务开始时间,
serviceEndTime bigint(20) 服务结束时间,
freight int(11)
shopAtive tinyint(4)
shopStatus tinyint(4) 商铺状态,
statusDesc varchar(255)
dataFlag tinyint(4)
createTime date
shopMoney decimal(11,2)
lockMoney decimal(11,2)
noSettledOrderNum int(11)
noSettledOrderFee decimal(11,2)
paymentMoney decimal(11,2)
bankAreaId int(11)
bankAreaIdPath varchar(100)
applyStatus tinyint(4)
applyDesc varchar(255)
applyTime datetime
applyStep tinyint(4)
shopNotice varchar(300) 店铺公告,
rechargeMoney decimal(11,2) 充值金额,
longitude decimal(10,7)
latitude decimal(10,7)
mapLevel int(11)
BDcode varchar(16) 公司管理人员code,
商品分类表 itcast_goods_cats
字段名 字段说明
catId 品类ID
parentId 父ID
catName 分类名称
isShow 是否显示
isFloor 是否显示楼层
catSort 排序号
dataFlag 删除标志
createTime 建立时间
commissionRate 商品佣金比例
catImg
subTitle 楼层副标题
simpleName 简写名称
seoTitle 分类SEO标题
seoKeywords 分类SEO关键字
seoDes 分类SEO描述
catListTheme 商品分类列表风格
detailTheme 商品详情风格
mobileCatListTheme 移动端商品分类列表风格
mobileDetailTheme 移动端商品详情风格
wechatCatListTheme 微信端商品分类列表风格
wechatDetailTheme 微信端商品详情风格
cat_level 分类级别,共3级
组织结构表 itcast_org
字段名 字段说明
orgId 组织ID
parentId 父ID
orgName 组织名称
orgLevel 组织级别1;总部及大区级部门;2:总部下属的各个部门及基部门;3:具体工作部门
managerCode 主管工号
isdelete 删除标志,1:删除;0:有效
createTime 创建时间
updateTime 最后修改时间
isShow 是否显示,0:是;1:否
orgType 组织类型,0:总裁办;1:研发;2:销售;3:运营;4:产品
订单退货表 itcast_order_refunds
id int(11) 自增ID
orderId int(11) 订单id
goodsId int(11) 商品id
refundTo int(11) 接收退款用户
refundReson int(11) 用户申请退款原因ID
refundOtherReson varchar(255) 用户申请退款原因
backMoney decimal(11,2) 退款金额
refundTradeNo varchar(100) 退款流水号
refundRemark varchar(500) 退款备注
refundTime varchar(25) 退款时间
shopRejectReason varchar(255) 店铺不同意退款原因
refundStatus tinyint(4) 退款状态
createTime varchar(25) 用户申请退款时间
用户表 itcast_users
userId int(11) 用户id
loginName varchar(20) 登录名
loginSecret int(11) 登录凭证
loginPwd varchar(50) 登录密码
userType tinyint(4) 用户类型
userSex tinyint(4) 用户性别
userName varchar(100) 用户名
trueName varchar(100) 真实姓名
brithday date 生日
userPhoto varchar(200) 用户头像
userQQ varchar(20) 用户QQ
userPhone char(11) 用户手机号
userEmail varchar(50) 邮箱
userScore int(11) 积分
userTotalScore int(11) 总积分
lastIP varchar(16) 最后一次登录IP
lastTime datetime 最后一次登录时间
userFrom tinyint(4) 注册渠道
userMoney decimal(11,2) 用户余额
lockMoney decimal(11,2) 锁定余额
userStatus tinyint(4) 用户状态
dataFlag tinyint(4) 数据状态
createTime datetime 创建时间
payPwd varchar(100) 支付密码
rechargeMoney decimal(11,2) 重置金额
isInform tinyint(4) 是否接收通知
用户收货地址表 itcast_user_address
addressId int(11) 地址id
userId int(11) 用户id
userName varchar(50) 用户名
otherName varchar(50) 地址类型
userPhone varchar(20) 用户联系方式
areaIdPath varchar(255) 地址id路径
areaId int(11) 区域ID
userAddress varchar(255) 用户地址
isDefault tinyint(4) 是否默认地址
dataFlag tinyint(4) 数据状态
createTime datetime 创建时间
支付方式表 itcast_payments
id int(11) 唯一id
payCode varchar(20) 支付类型码
payName varchar(255) 支付类型名称
payDesc text 描述
payOrder int(11) 显示顺序
payConfig text 配置
enabled tinyint(4) 是否启用
isOnline tinyint(4) 是否在线
payFor varchar(100)

6、项目环境初始化

三、业务数据采集与处理

1、导入mysql模拟数据

将资料\mysql建表语句\10tables.sql文件上传到linux,登录mysql使用source命令执行该sql文件创建数据库和表

mysql -uroot -p
source /root/sql/10tables.sql;

2、Hive分层说明

  • 分库存放

    • ods层
    • dw层
    • ads层
  • 命名规则

    • ods层表与原始数据库表名称相同

    • dw层表

      • fact_前缀表示事实表

      • dim_前缀表示维度表

创建分层数据库:

create database itcast_ods;
create database itcast_dw;
create database itcast_ads;

3、创建ods层数据表

  • hive 分为外部表与内部表,为便于管理,该部分均使用内部表
  • 执行资料\hiveods层建表语句\ods_create_table.sql
    -- 创建ods层订单表
    drop table if exists `itcast_ods`.`itcast_orders`;
    create EXTERNAL table `itcast_ods`.`itcast_orders`(
        orderId            bigint,
        orderNo            string,
        shopId             bigint,
        userId             bigint,
        orderStatus        bigint,
        goodsMoney         double,
        deliverType        bigint,
        deliverMoney       double,
        totalMoney         double,
        realTotalMoney     double,
        payType            bigint,
        isPay              bigint,
        areaId             bigint,
        userAddressId      bigint,
        areaIdPath         string,
        userName           string,
        userAddress        string,
        userPhone          string,
        orderScore         bigint,
        isInvoice          bigint,
        invoiceClient      string,
        orderRemarks       string,
        orderSrc           bigint,
        needPay            double,
        payRand            bigint,
        orderType          bigint,
        isRefund           bigint,
        isAppraise         bigint,
        cancelReason       bigint,
        rejectReason       bigint,
        rejectOtherReason  string,
        isClosed           bigint,
        goodsSearchKeys    string,
        orderunique        string,
        receiveTime        string,
        deliveryTime       string,
        tradeNo            string,
        dataFlag           bigint,
        createTime         string,
        settlementId       bigint,
        commissionFee      double,
        scoreMoney         double,
        useScore           bigint,
        orderCode          string,
        extraJson          string,
        orderCodeTargetId  bigint,
        noticeDeliver      bigint,
        invoiceJson        string,
        lockCashMoney      double,
        payTime            string,
        isBatch            bigint,
        totalPayFee        bigint,
        modifiedTime        string
    )
    partitioned by (dt string)
    STORED AS PARQUET TBLPROPERTIES('parquet.compression'='SNAPPY');
    
    -- 创建ods层订单明细表
    drop table if exists `itcast_ods`.`itcast_order_goods`;
    create EXTERNAL table `itcast_ods`.`itcast_order_goods`(
        ogId            bigint,
        orderId         bigint,
        goodsId         bigint,
        goodsNum        bigint,
        goodsPrice      double,
        payPrice        double,
        goodsSpecId     bigint,
        goodsSpecNames  string,
        goodsName       string,
        goodsImg        string,
        extraJson       string,
        goodsType       bigint,
        commissionRate  double,
        goodsCode       string,
        promotionJson   string,
        createtime      string
    )
    partitioned by (dt string)
    STORED AS PARQUET TBLPROPERTIES('parquet.compression'='SNAPPY');
    
    -- 创建ods层店铺表
    drop table if exists `itcast_ods`.`itcast_shops`;
    create EXTERNAL table `itcast_ods`.`itcast_shops`(
        shopId             bigint,
        shopSn             string,
        userId             bigint,
        areaIdPath         string,
        areaId             bigint,
        isSelf             bigint,
        shopName           string,
        shopkeeper         string,
        telephone          string,
        shopCompany        string,
        shopImg            string,
        shopTel            string,
        shopQQ             string,
        shopWangWang       string,
        shopAddress        string,
        bankId             bigint,
        bankNo             string,
        bankUserName       string,
        isInvoice          bigint,
        invoiceRemarks     string,
        serviceStartTime   bigint,
        serviceEndTime     bigint,
        freight            bigint,
        shopAtive          bigint,
        shopStatus         bigint,
        statusDesc         string,
        dataFlag           bigint,
        createTime         string,
        shopMoney          double,
        lockMoney          double,
        noSettledOrderNum  bigint,
        noSettledOrderFee  double,
        paymentMoney       double,
        bankAreaId         bigint,
        bankAreaIdPath     string,
        applyStatus        bigint,
        applyDesc          string,
        applyTime          string,
        applyStep          bigint,
        shopNotice         string,
        rechargeMoney      double,
        longitude          double,
        latitude           double,
        mapLevel           bigint,
        BDcode             string,
        modifyTime         string
    )
    partitioned by (dt string)
    STORED AS PARQUET TBLPROPERTIES('parquet.compression'='SNAPPY');
    
    -- 创建ods层商品表
    drop table if exists `itcast_ods`.`itcast_goods`;
    create EXTERNAL table `itcast_ods`.`itcast_goods`(
        goodsId              bigint,
        goodsSn              string,
        productNo            string,
        goodsName            string,
        goodsImg             string,
        shopId               bigint,
        goodsType            bigint,
        marketPrice          double,
        shopPrice            double,
        warnStock            bigint,
        goodsStock           bigint,
        goodsUnit            string,
        goodsTips            string,
        isSale               bigint,
        isBest               bigint,
        isHot                bigint,
        isNew                bigint,
        isRecom              bigint,
        goodsCatIdPath       string,
        goodsCatId           bigint,
        shopCatId1           bigint,
        shopCatId2           bigint,
        brandId              bigint,
        goodsDesc            string,
        goodsStatus          bigint,
        saleNum              bigint,
        saleTime             string,
        visitNum             bigint,
        appraiseNum          bigint,
        isSpec               bigint,
        gallery              string,
        goodsSeoKeywords     string,
        illegalRemarks       string,
        dataFlag             bigint,
        createTime           string,
        isFreeShipping       bigint,
        goodsSerachKeywords  string,
        modifyTime           string
    )
    partitioned by (dt string)
    STORED AS PARQUET TBLPROPERTIES('parquet.compression'='SNAPPY');
    
    -- 创建ods层组织机构表
    drop table `itcast_ods`.`itcast_org`;
    create EXTERNAL table `itcast_ods`.`itcast_org`(
        orgId        bigint,
        parentId     bigint,
        orgName      string,
        orgLevel     bigint,
        managerCode  string,
        isdelete     bigint,
        createTime   string,
        updateTime   string,
        isShow       bigint,
        orgType      bigint
    )
    partitioned by (dt string)
    STORED AS PARQUET TBLPROPERTIES('parquet.compression'='SNAPPY');
    
    -- 创建ods层商品分类表
    drop table if exists `itcast_ods`.`itcast_goods_cats`;
    create EXTERNAL table `itcast_ods`.`itcast_goods_cats`(
        catId               bigint,
        parentId            bigint,
        catName             string,
        isShow              bigint,
        isFloor             bigint,
        catSort             bigint,
        dataFlag            bigint,
        createTime          string,
        commissionRate      double,
        catImg              string,
        subTitle            string,
        simpleName          string,
        seoTitle            string,
        seoKeywords         string,
        seoDes              string,
        catListTheme        string,
        detailTheme         string,
        mobileCatListTheme  string,
        mobileDetailTheme   string,
        wechatCatListTheme  string,
        wechatDetailTheme   string,
        cat_level           bigint
    )
    partitioned by (dt string)
    STORED AS PARQUET TBLPROPERTIES('parquet.compression'='SNAPPY');
    
    -- 创建ods层用户表
    drop table if exists `itcast_ods`.`itcast_users`;
    create EXTERNAL table `itcast_ods`.`itcast_users`(
        userId          bigint,
        loginName       string,
        loginSecret     bigint,
        loginPwd        string,
        userType        bigint,
        userSex         bigint,
        userName        string,
        trueName        string,
        brithday        string,
        userPhoto       string,
        userQQ          string,
        userPhone       string,
        userEmail       string,
        userScore       bigint,
        userTotalScore  bigint,
        lastIP          string,
        lastTime        string,
        userFrom        bigint,
        userMoney       double,
        lockMoney       double,
        userStatus      bigint,
        dataFlag        bigint,
        createTime      string,
        payPwd          string,
        rechargeMoney   double,
        isInform        bigint
    )
    partitioned by (dt string)
    STORED AS PARQUET TBLPROPERTIES('parquet.compression'='SNAPPY');
    
    -- 创建ods层退货表
    drop table if exists `itcast_ods`.`itcast_order_refunds`;
    create EXTERNAL table `itcast_ods`.`itcast_order_refunds`(
        id                bigint,
        orderId           bigint,
        goodsId           bigint,
        refundTo          bigint,
        refundReson       bigint,
        refundOtherReson  string,
        backMoney         double,
        refundTradeNo     string,
        refundRemark      string,
        refundTime        string,
        shopRejectReason  string,
        refundStatus      bigint,
        createTime        string,
        modifiedTime        string
    )
    partitioned by (dt string)
    STORED AS PARQUET TBLPROPERTIES('parquet.compression'='SNAPPY');
    
    -- 创建ods层地址表
    drop table if exists `itcast_ods`.`itcast_user_address`;
    create EXTERNAL table `itcast_ods`.`itcast_user_address`(
        addressId    bigint,
        userId       bigint,
        userName     string,
        otherName    string,
        userPhone    string,
        areaIdPath   string,
        areaId       bigint,
        userAddress  string,
        isDefault    bigint,
        dataFlag     bigint,
        createTime   string
    )
    partitioned by (dt string)
    STORED AS PARQUET TBLPROPERTIES('parquet.compression'='SNAPPY');
    
    -- 创建ods层支付方式表
    drop table if exists `itcast_ods`.`itcast_payments`;
    create EXTERNAL table `itcast_ods`.`itcast_payments`(
        id         bigint,
        payCode    string,
        payName    string,
        payDesc    string,
        payOrder   bigint,
        payConfig  string,
        enabled    bigint,
        isOnline   bigint,
        payFor     string
    )
    partitioned by (dt string)
    STORED AS PARQUET TBLPROPERTIES('parquet.compression'='SNAPPY');
    

4、数据采集

4.1 ods层全量数据抽取

步骤一:拖拽组件构建Kettle作业结构图
全量采集配置图
在这里插入图片描述
步骤二:转换结构图–》配置命名参数
配置转换命名参数
在这里插入图片描述
步骤三:配置Hive SQL脚本
msck repair table itcast_ods.itcast_orders;
msck repair table itcast_ods.itcast_goods;
msck repair table itcast_ods.itcast_order_goods;
msck repair table itcast_ods.itcast_shops;
msck repair table itcast_ods.itcast_goods_cats;
msck repair table itcast_ods.itcast_org;
msck repair table itcast_ods.itcast_order_refunds;
msck repair table itcast_ods.itcast_users;
msck repair table itcast_ods.itcast_user_address;
msck repair table itcast_ods.itcast_payments;
修复分区
在这里插入图片描述
步骤四:配置表输入
SELECT
*
FROM itcast_orders
WHERE DATE_FORMAT(createtime, '%Y%m%d') <= '${dt}';
组件图
在这里插入图片描述
步骤五:配置字段选择指定日期格式,配置parquet格式并设置snappy压缩输出
字段选择
在这里插入图片描述
文件位置
在这里插入图片描述
在这里插入图片描述
parquet output配置
在这里插入图片描述
测试

测试数据是否都正确被加载。

select * from itcast_ods.itcast_orders limit 2;
select * from itcast_ods.itcast_goods limit 2;
select * from itcast_ods.itcast_order_goods limit 2;
select * from itcast_ods.itcast_shops limit 2;
select * from itcast_ods.itcast_goods_cats limit 2;
select * from itcast_ods.itcast_org limit 2;
select * from itcast_ods.itcast_order_refunds limit 2;
select * from itcast_ods.itcast_users limit 2;
select * from itcast_ods.itcast_user_address limit 2;
select * from itcast_ods.itcast_payments limit 2;

注意:

  • 其中itcast_orders,itcast_order_goods,itcast_order_refunds表是根据时间抽取,其余表进行全量抽取!!
  • 注意使用字段选择组件时要注意修改日期格式为UTF8!!,parquet中fields中date类型改为UTF8类型!!

4.2 商品维度数据加载(使用拉链表解决SCD问题)

4.2.1 dw层建表
-- dw层建表
DROP TABLE IF EXISTS `itcast_dw`.`dim_goods`;
CREATE TABLE `itcast_dw`.`dim_goods`(
    goodsId bigint,
    goodsSn string,
    productNo string,
    goodsName string,
    goodsImg string,
    shopId bigint,
    goodsType bigint,
    marketPrice double,
    shopPrice double,
    warnStock bigint,
    goodsStock bigint,
    goodsUnit string,
    goodsTips string,
    isSale bigint,
    isBest bigint,
    isHot bigint,
    isNew bigint,
    isRecom bigint,
    goodsCatIdPath string,
    goodsCatId bigint,
    shopCatId1 bigint,
    shopCatId2 bigint,
    brandId bigint,
    goodsDesc string,
    goodsStatus bigint,
    saleNum bigint,
    saleTime string,
    visitNum bigint,
    appraiseNum bigint,
    isSpec bigint,
    gallery string,
    goodsSeoKeywords string,
    illegalRemarks string,
    dataFlag bigint,
    createTime string,
    isFreeShipping bigint,
    goodsSerachKeywords string,
    modifyTime string,
    dw_start_date string,
    dw_end_date string
)
STORED AS PARQUET TBLPROPERTIES('parquet.compression'='SNAPPY');
4.2.2 “拉链表”方案导入数据

1 第一次全量导入

  • 所有的ODS数据全部导入到拉链历史记录表中

2 增量导入(某天,举例:2019-09-10)

  • 增量导入某天的数据到ODS分区
  • 合并历史数据
  • 通过连接查询方式更新
4.2.2.1 全量导入
  • 将所有 2019年09月09日以前创建的商品以及修改的数据全部导入到拉链历史记录表中

操作步骤:

  1. 使用Kettle将20190909以前的数据抽取到ods

    SELECT *
    FROM itcast_ods.itcast_goods
    WHERE  DATE_FORMAT(modifyTime, '%Y%m%d') <= '20190909';
    
  2. 使用spark sql将全量数据导入到dw层维度表

    set spark.sql.shuffle.partitions=1; --shuffle时的分区数,默认是200个
    -- 使用spark sql将全量数据导入到dw层维度表
    insert overwrite table `itcast_dw`.`dim_goods`
    select
        goodsId,
        goodsSn,
        productNo,
        goodsName,
        goodsImg,
        shopId,
        goodsType,
        marketPrice,
        shopPrice,
        warnStock,
        goodsStock,
        goodsUnit,
        goodsTips,
        isSale,
        isBest,
        isHot,
        isNew,
        isRecom,
        goodsCatIdPath,
        goodsCatId,
        shopCatId1,
        shopCatId2,
        brandId,
        goodsDesc,
        goodsStatus,
        saleNum,
        saleTime,
        visitNum,
        appraiseNum,
        isSpec,
        gallery,
        goodsSeoKeywords,
        illegalRemarks,
        dataFlag,
        createTime,
        isFreeShipping,
        goodsSerachKeywords,
        modifyTime,
         case when modifyTime is not null
            then from_unixtime(unix_timestamp(modifyTime, 'yyyy-MM-dd HH:mm:ss'),'yyyy-MM-dd')
            else from_unixtime(unix_timestamp(createTime, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd') 
            end as dw_start_date,
        '9999-12-31' as dw_end_date
    from
        `itcast_ods`.`itcast_goods` t
    where dt='20190909';
    
4.2.2.2 增量导入
  • 将2019年09月10日创建的 修改的数据全部导入到历史拉链表中

操作步骤:

  1. 使用Kettle将20190910创建的 或者修改的数据抽取到ods

    SELECT *
    FROM itcast_goods
    WHERE  DATE_FORMAT(modifyTime, '%Y%m%d') = '${dt}';
    
  2. 编写spark-sql更新历史数据

    -- 更新历史数据
    select
        dw.goodsId,
        dw.goodsSn,
        dw.productNo,
        dw.goodsName,
        dw.goodsImg,
        dw.shopId,
        dw.goodsType,
        dw.marketPrice,
        dw.shopPrice,
        dw.warnStock,
        dw.goodsStock,
        dw.goodsUnit,
        dw.goodsTips,
        dw.isSale,
        dw.isBest,
        dw.isHot,
        dw.isNew,
        dw.isRecom,
        dw.goodsCatIdPath,
        dw.goodsCatId,
        dw.shopCatId1,
        dw.shopCatId2,
        dw.brandId,
        dw.goodsDesc,
        dw.goodsStatus,
        dw.saleNum,
        dw.saleTime,
        dw.visitNum,
        dw.appraiseNum,
        dw.isSpec,
        dw.gallery,
        dw.goodsSeoKeywords,
        dw.illegalRemarks,
        dw.dataFlag,
        dw.createTime,
        dw.isFreeShipping,
        dw.goodsSerachKeywords,
        dw.modifyTime,
        dw.dw_start_date,
        case when dw.dw_end_date = '9999-12-31' and ods.goodsId is not null
            then '2019-09-09'
            else dw.dw_end_date
            end as dw_end_date
    from
        `itcast_dw`.`dim_goods` dw
        left join 
        (select * from `itcast_ods`.`itcast_goods` where dt='20190910') ods
        on dw.goodsId = ods.goodsId ;
    
  3. 编写spark-sql获取当日数据

    -- 今日数据
    select
        goodsId,
        goodsSn,
        productNo,
        goodsName,
        goodsImg,
        shopId,
        goodsType,
        marketPrice,
        shopPrice,
        warnStock,
        goodsStock,
        goodsUnit,
        goodsTips,
        isSale,
        isBest,
        isHot,
        isNew,
        isRecom,
        goodsCatIdPath,
        goodsCatId,
        shopCatId1,
        shopCatId2,
        brandId,
        goodsDesc,
        goodsStatus,
        saleNum,
        saleTime,
        visitNum,
        appraiseNum,
        isSpec,
        gallery,
        goodsSeoKeywords,
        illegalRemarks,
        dataFlag,
        createTime,
        isFreeShipping,
        goodsSerachKeywords,
        modifyTime,
        case when modifyTime is not null
            then from_unixtime(unix_timestamp(modifyTime, 'yyyy-MM-dd HH:mm:ss'),'yyyy-MM-dd')
            else from_unixtime(unix_timestamp(createTime, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd') 
            end as dw_start_date,
        '9999-12-31' as dw_end_date
    from
        `itcast_ods`.`itcast_goods`
    where dt = '20190910';
    
  4. 将历史数据 当日数据合并加载到临时表

    -- 将历史数据 当日数据合并加载到临时表
    drop table if exists `itcast_dw`.`tmp_dim_goods_history`;
    create table `itcast_dw`.`tmp_dim_goods_history`
    as
    select
        dw.goodsId,
        dw.goodsSn,
        dw.productNo,
        dw.goodsName,
        dw.goodsImg,
        dw.shopId,
        dw.goodsType,
        dw.marketPrice,
        dw.shopPrice,
        dw.warnStock,
        dw.goodsStock,
        dw.goodsUnit,
        dw.goodsTips,
        dw.isSale,
        dw.isBest,
        dw.isHot,
        dw.isNew,
        dw.isRecom,
        dw.goodsCatIdPath,
        dw.goodsCatId,
        dw.shopCatId1,
        dw.shopCatId2,
        dw.brandId,
        dw.goodsDesc,
        dw.goodsStatus,
        dw.saleNum,
        dw.saleTime,
        dw.visitNum,
        dw.appraiseNum,
        dw.isSpec,
        dw.gallery,
        dw.goodsSeoKeywords,
        dw.illegalRemarks,
        dw.dataFlag,
        dw.createTime,
        dw.isFreeShipping,
        dw.goodsSerachKeywords,
        dw.modifyTime,
        dw.dw_start_date,
        case when dw.dw_end_date >= '9999-12-31' and ods.goodsId is not null
            then '2019-09-09'
            else dw.dw_end_date
            end as dw_end_date
    from
        `itcast_dw`.`dim_goods` dw
        left join 
        (select * from `itcast_ods`.`itcast_goods` where dt='20190910') ods
        on dw.goodsId = ods.goodsId
    union all
    select
        goodsId,
        goodsSn,
        productNo,
        goodsName,
        goodsImg,
        shopId,
        goodsType,
        marketPrice,
        shopPrice,
        warnStock,
        goodsStock,
        goodsUnit,
        goodsTips,
        isSale,
        isBest,
        isHot,
        isNew,
        isRecom,
        goodsCatIdPath,
        goodsCatId,
        shopCatId1,
        shopCatId2,
        brandId,
        goodsDesc,
        goodsStatus,
        saleNum,
        saleTime,
        visitNum,
        appraiseNum,
        isSpec,
        gallery,
        goodsSeoKeywords,
        illegalRemarks,
        dataFlag,
        createTime,
        isFreeShipping,
        goodsSerachKeywords,
        modifyTime,
        case when modifyTime is not null
            then from_unixtime(unix_timestamp(modifyTime, 'yyyy-MM-dd HH:mm:ss'),'yyyy-MM-dd')
            else from_unixtime(unix_timestamp(createTime, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd') 
            end as dw_start_date,
        '9999-12-31' as dw_end_date
    from
        `itcast_ods`.`itcast_goods`
    where dt = '20190910';
    
  5. 将历史数据 当日数据导入到历史拉链表

    -- 将历史数据 当日数据导入到历史拉链表
    insert overwrite table `itcast_dw`.`dim_goods`
    select * from `itcast_dw`.`tmp_dim_goods_history`;
    
    -- 获取2019-09-09日的商品数据
    select * from `itcast_dw`.`dim_goods` where dw_start_date <= '2019-09-09' and dw_end_date >= '2019-09-09' limit 10;
    --查看对应商品id的历史拉链数据
    select * from `itcast_dw`.`dim_goods` where goodsId = 100134;
    

4.3 周期性事实表(使用拉链表解决SCD问题)

因为订单表和订单退款表都有状态变化的特点,所以他们作为周期性事实表在进行同步操作也就是采集数据到数仓中时需要我们能记录下订单的状态变化。因次依然使用拉链表来解决这类周期性事实表的同步需求。订单明细表并不会随着时间而变化,所以不需要使用拉链表进行同步。

订单表、订单退款表 拉链表具体实现步骤:

4.3.1 创建dw层订单拉链表、订单退款拉链表
-- 创建dw层订单事实表--带有分区字段
DROP TABLE IF EXISTS itcast_dw.fact_orders;
create  table itcast_dw.fact_orders(
    orderId            bigint,
    orderNo            string,
    shopId             bigint,
    userId             bigint,
    orderStatus        bigint,
    goodsMoney         double,
    deliverType        bigint,
    deliverMoney       double,
    totalMoney         double,
    realTotalMoney     double,
    payType            bigint,
    isPay              bigint,
    areaId             bigint,
    userAddressId      bigint,
    areaIdPath         string,
    userName           string,
    userAddress        string,
    userPhone          string,
    orderScore         bigint,
    isInvoice          bigint,
    invoiceClient      string,
    orderRemarks       string,
    orderSrc           bigint,
    needPay            double,
    payRand            bigint,
    orderType          bigint,
    isRefund           bigint,
    isAppraise         bigint,
    cancelReason       bigint,
    rejectReason       bigint,
    rejectOtherReason  string,
    isClosed           bigint,
    goodsSearchKeys    string,
    orderunique        string,
    receiveTime        string,
    deliveryTime       string,
    tradeNo            string,
    dataFlag           bigint,
    createTime         string,
    settlementId       bigint,
    commissionFee      double,
    scoreMoney         double,
    useScore           bigint,
    orderCode          string,
    extraJson          string,
    orderCodeTargetId  bigint,
    noticeDeliver      bigint,
    invoiceJson        string,
    lockCashMoney      double,
    payTime            string,
    isBatch            bigint,
    totalPayFee        bigint,
    modifiedTime        string,
    dw_start_date       string,
    dw_end_date         string
)
partitioned by (dt string) --按照天分区
STORED AS PARQUET TBLPROPERTIES('parquet.compression'='SNAPPY');
--临时订单表
DROP TABLE IF EXISTS itcast_dw.tmp_fact_orders;
create  table itcast_dw.tmp_fact_orders(
    orderId            bigint,
    orderNo            string,
    shopId             bigint,
    userId             bigint,
    orderStatus        bigint,
    goodsMoney         double,
    deliverType        bigint,
    deliverMoney       double,
    totalMoney         double,
    realTotalMoney     double,
    payType            bigint,
    isPay              bigint,
    areaId             bigint,
    userAddressId      bigint,
    areaIdPath         string,
    userName           string,
    userAddress        string,
    userPhone          string,
    orderScore         bigint,
    isInvoice          bigint,
    invoiceClient      string,
    orderRemarks       string,
    orderSrc           bigint,
    needPay            double,
    payRand            bigint,
    orderType          bigint,
    isRefund           bigint,
    isAppraise         bigint,
    cancelReason       bigint,
    rejectReason       bigint,
    rejectOtherReason  string,
    isClosed           bigint,
    goodsSearchKeys    string,
    orderunique        string,
    receiveTime        string,
    deliveryTime       string,
    tradeNo            string,
    dataFlag           bigint,
    createTime         string,
    settlementId       bigint,
    commissionFee      double,
    scoreMoney         double,
    useScore           bigint,
    orderCode          string,
    extraJson          string,
    orderCodeTargetId  bigint,
    noticeDeliver      bigint,
    invoiceJson        string,
    lockCashMoney      double,
    payTime            string,
    isBatch            bigint,
    totalPayFee        bigint,
    modifiedTime        string,
    dw_start_date       string,
    dw_end_date         string
)
partitioned by (dt string)
STORED AS PARQUET TBLPROPERTIES('parquet.compression'='SNAPPY');

--创建订单退款表--带有分区字段
drop table if exists `itcast_dw`.`fact_order_refunds`;
create  table `itcast_dw`.`fact_order_refunds`(
    id                bigint,
    orderId           bigint,
    goodsId           bigint,
    refundTo          bigint,
    refundReson       bigint,
    refundOtherReson  string,
    backMoney         double,
    refundTradeNo     string,
    refundRemark      string,
    refundTime        string,
    shopRejectReason  string,
    refundStatus      bigint,
    createTime        string,
    modifiedTime        string,
    dw_start_date string,
  dw_end_date string
)
partitioned by (dt string) --按照天分区
STORED AS PARQUET TBLPROPERTIES('parquet.compression'='SNAPPY');
--临时表
drop table if exists `itcast_dw`.`tmp_fact_order_refunds`;
create  table `itcast_dw`.`tmp_fact_order_refunds`(
    id                bigint,
    orderId           bigint,
    goodsId           bigint,
    refundTo          bigint,
    refundReson       bigint,
    refundOtherReson  string,
    backMoney         double,
    refundTradeNo     string,
    refundRemark      string,
    refundTime        string,
    shopRejectReason  string,
    refundStatus      bigint,
    createTime        string,
    modifiedTime        string,
    dw_start_date string,
  dw_end_date string
)
partitioned by (dt string)
STORED AS PARQUET TBLPROPERTIES('parquet.compression'='SNAPPY');
4.3.2 第一次全量数据导入拉链表

我们开启hive的动态分区,并根据数据的createtime字段进行分区划分,同一天创建的订单放在同一分区!!

#开启动态分区,默认是false
#开启允许所有分区都是动态的,否则必须要有静态分区才能使用
set hive.exec.dynamici.partition=true;  
set hive.exec.dynamic.partition.mode=nonstrict; 

订单表数据:ods层导入dw层

insert overwrite table itcast_dw.fact_orders 
select
orderId            ,
orderNo            ,
shopId             ,
userId             ,
orderStatus        ,
goodsMoney         ,
deliverType        ,
deliverMoney       ,
totalMoney         ,
realTotalMoney     ,
payType            ,
isPay              ,
areaId             ,
userAddressId      ,
areaIdPath         ,
userName           ,
userAddress        ,
userPhone          ,
orderScore         ,
isInvoice          ,
invoiceClient      ,
orderRemarks       ,
orderSrc           ,
needPay            ,
payRand            ,
orderType          ,
isRefund           ,
isAppraise         ,
cancelReason       ,
rejectReason       ,
rejectOtherReason  ,
isClosed           ,
goodsSearchKeys    ,
orderunique        ,
receiveTime        ,
deliveryTime       ,
tradeNo            ,
dataFlag           ,
createTime         ,
settlementId       ,
commissionFee      ,
scoreMoney         ,
useScore           ,
orderCode          ,
extraJson          ,
orderCodeTargetId  ,
noticeDeliver      ,
invoiceJson        ,
lockCashMoney      ,
payTime            ,
isBatch            ,
totalPayFee        ,
modifiedTime       ,
--增加开始时间
date_format(modifiedTime,'yyyy-MM-dd') as dw_start_date,
--增加结束时间
'9999-12-31' as dw_end_date,
--指定动态分区使用的字段,动态分区的用法:就是查询字段的最后一个字段hive表进行解析然后存入指定分区
--此次数据分区按照订单的创建时间
date_format(createtime,'yyyyMMdd') 
from itcast_ods.itcast_orders where dt="20190909";

订单退款表:ods层导入dw层

insert overwrite table itcast_dw.fact_order_refunds 
select
id,
orderId,
goodsId,
refundTo,
refundReson,
refundOtherReson,
backMoney,
refundTradeNo,
refundRemark,
refundTime,
shopRejectReason,
refundStatus,
createTime,
modifiedTime,
date_format(modifiedTime,'yyyy-MM-dd') as dw_start_date,
'9999-12-31' as dw_end_date,
--此次数据分区按照订单退款的创建时间
date_format(createTime,'yyyyMMdd')
from itcast_ods.itcast_order_refunds where dt="20190909"; 
4.3.3 增量数据导入拉链表
4.3.3.1 kettle抽取增量数据导入ods层
抽取20190910这一天的数据,查询条件为modifiedTime等于20190910这天的订单数据和订单退款数据!!
在这里插入图片描述

表输入组件sql语句:

SELECT *
FROM itcast_orders

WHERE DATE_FORMAT(modifiedTime, '%Y%m%d') = '${dt}';
字段选择组件
在这里插入图片描述
parquet output组件
在这里插入图片描述
4.3.3.2 ods层数据合并到dw层拉链表中
insert overwrite table itcast_dw.tmp_fact_orders  
select
dw.orderId            ,
dw.orderNo            ,
dw.shopId             ,
dw.userId             ,
dw.orderStatus        ,
dw.goodsMoney         ,
dw.deliverType        ,
dw.deliverMoney       ,
dw.totalMoney         ,
dw.realTotalMoney     ,
dw.payType            ,
dw.isPay              ,
dw.areaId             ,
dw.userAddressId      ,
dw.areaIdPath         ,
dw.userName           ,
dw.userAddress        ,
dw.userPhone          ,
dw.orderScore         ,
dw.isInvoice          ,
dw.invoiceClient      ,
dw.orderRemarks       ,
dw.orderSrc           ,
dw.needPay            ,
dw.payRand            ,
dw.orderType          ,
dw.isRefund           ,
dw.isAppraise         ,
dw.cancelReason       ,
dw.rejectReason       ,
dw.rejectOtherReason  ,
dw.isClosed           ,
dw.goodsSearchKeys    ,
dw.orderunique        ,
dw.receiveTime        ,
dw.deliveryTime       ,
dw.tradeNo            ,
dw.dataFlag           ,
dw.createTime         ,
dw.settlementId       ,
dw.commissionFee      ,
dw.scoreMoney         ,
dw.useScore           ,
dw.orderCode          ,
dw.extraJson          ,
dw.orderCodeTargetId  ,
dw.noticeDeliver      ,
dw.invoiceJson        ,
dw.lockCashMoney      ,
dw.payTime            ,
dw.isBatch            ,
dw.totalPayFee        ,
dw.modifiedTime ,
dw.dw_start_date,
--修改end_date
case when ods.orderid is not null and dw.dw_end_date ='9999-12-31'
then '2019-09-09'
else dw.dw_end_date
end as dw_end_date,
--动态分区需要的字段
dw.dt
from 
itcast_dw.fact_orders  dw 
left join 
(select * from itcast_ods.itcast_orders where dt ='20190910') ods
on dw.orderid=ods.orderid 
union all
--今天新增数据的插入动作
select
orderId            ,
orderNo            ,
shopId             ,
userId             ,
orderStatus        ,
goodsMoney         ,
deliverType        ,
deliverMoney       ,
totalMoney         ,
realTotalMoney     ,
payType            ,
isPay              ,
areaId             ,
userAddressId      ,
areaIdPath         ,
userName           ,
userAddress        ,
userPhone          ,
orderScore         ,
isInvoice          ,
invoiceClient      ,
orderRemarks       ,
orderSrc           ,
needPay            ,
payRand            ,
orderType          ,
isRefund           ,
isAppraise         ,
cancelReason       ,
rejectReason       ,
rejectOtherReason  ,
isClosed           ,
goodsSearchKeys    ,
orderunique        ,
receiveTime        ,
deliveryTime       ,
tradeNo            ,
dataFlag           ,
createTime         ,
settlementId       ,
commissionFee      ,
scoreMoney         ,
useScore           ,
orderCode          ,
extraJson          ,
orderCodeTargetId  ,
noticeDeliver      ,
invoiceJson        ,
lockCashMoney      ,
payTime            ,
isBatch            ,
totalPayFee        ,
modifiedTime       ,
--增加开始时间
date_format(modifiedTime,'yyyy-MM-dd') as dw_start_date,
--增加结束时间
'9999-12-31' as dw_end_date,
--指定动态分区使用的字段,动态分区的用法:就是查询字段的最后一个字段hive表进行解析然后存入指定分区
--此次数据分区按照订单的创建时间
date_format(createtime,'yyyyMMdd') 
from itcast_ods.itcast_orders where dt="20190910";
--从临时表再插入itcast_dw.fact_orders
insert overwrite table itcast_dw.fact_orders
select
* from
itcast_dw.tmp_fact_orders;
--验证数据查询拉链表数据
select * from itcast_dw.fact_orders limit 5;


--订单退款表增量数据与历史数据合并覆盖插入dw层临时拉链表中
insert overwrite table itcast_dw.tmp_fact_order_refunds
select
dw.id,
dw.orderId,
dw.goodsId,
dw.refundTo,
dw.refundReson,
dw.refundOtherReson,
dw.backMoney,
dw.refundTradeNo,
dw.refundRemark,
dw.refundTime,
dw.shopRejectReason,
dw.refundStatus,
dw.createTime,
dw.modifiedTime,
dw.dw_start_date,
case when ods.id is not null and dw.dw_end_date ='9999-12-31'
then '2019-09-09'
else dw.dw_end_date
end as dw_end_date,
dw.dt
from  itcast_dw.fact_order_refunds  dw
left join (select * from itcast_ods.itcast_order_refunds where dt="20190910") ods
on dw.id =ods.id 
union all 
select
id,
orderId,
goodsId,
refundTo,
refundReson,
refundOtherReson,
backMoney,
refundTradeNo,
refundRemark,
refundTime,
shopRejectReason,
refundStatus,
createTime,
modifiedTime,
date_format(modifiedTime,'yyyy-MM-dd') as dw_start_date,
'9999-12-31' as dw_end_date,
date_format(createTime,'yyyyMMdd')
from itcast_ods.itcast_order_refunds where dt="20190910";
--合并数据插入临时表
insert overwrite table itcast_dw.fact_order_refunds
select * from itcast_dw.tmp_fact_order_refunds;
--验证数据
select * from itcast_dw.fact_order_refunds limit 5;
4.3.3.3 拉链表分区意义
--合并11号数据
insert overwrite table itcast_dw.tmp_fact_orders  
select
dw.orderId            ,
dw.orderNo            ,
dw.shopId             ,
dw.userId             ,
dw.orderStatus        ,
dw.goodsMoney         ,
dw.deliverType        ,
dw.deliverMoney       ,
dw.totalMoney         ,
dw.realTotalMoney     ,
dw.payType            ,
dw.isPay              ,
dw.areaId             ,
dw.userAddressId      ,
dw.areaIdPath         ,
dw.userName           ,
dw.userAddress        ,
dw.userPhone          ,
dw.orderScore         ,
dw.isInvoice          ,
dw.invoiceClient      ,
dw.orderRemarks       ,
dw.orderSrc           ,
dw.needPay            ,
dw.payRand            ,
dw.orderType          ,
dw.isRefund           ,
dw.isAppraise         ,
dw.cancelReason       ,
dw.rejectReason       ,
dw.rejectOtherReason  ,
dw.isClosed           ,
dw.goodsSearchKeys    ,
dw.orderunique        ,
dw.receiveTime        ,
dw.deliveryTime       ,
dw.tradeNo            ,
dw.dataFlag           ,
dw.createTime         ,
dw.settlementId       ,
dw.commissionFee      ,
dw.scoreMoney         ,
dw.useScore           ,
dw.orderCode          ,
dw.extraJson          ,
dw.orderCodeTargetId  ,
dw.noticeDeliver      ,
dw.invoiceJson        ,
dw.lockCashMoney      ,
dw.payTime            ,
dw.isBatch            ,
dw.totalPayFee        ,
dw.modifiedTime ,
  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值