数据仓库-离线数仓(基于物流数仓)

1、数据仓库概述

1.1、数据仓库概念

        数据仓库是一个为数据分析而设计的企业级数据管理系统。数据仓库可集中、整合多个信息源的大量数据,借助数据仓库的分析能力,企业可从数据中获得宝贵的信息进而改进决策。同时,随着时间的推移,数据仓库中积累的大量历史数据对于数据科学家和业务分析师也是十分宝贵的。

1.2、数据仓库核心架构

2、数据仓库建模概述

2.1、数据仓库建模的意义

        如果把数据看作图书馆里的书,我们希望看到它们在书架上分门别类地放置;如果把数据看作城市的建筑,我们希望城市规划布局合理;如果把数据看作电脑文件和文件夹,我们希望按照自己的习惯有很好的文件夹组织方式,而不是糟糕混乱的桌面,经常为找一个文件而不知所措。

        数据模型就是数据组织和存储方法,它强调从业务、数据存取和使用角度合理存储数据。只有将数据有序的组织和存储起来之后,数据才能得到高性能、低成本、高效率、高质量的使用。

高性能:良好的数据模型能够帮助我们快速查询所需要的数据。

低成本:良好的数据模型能减少重复计算,实现计算结果的复用,降低计算成本。

高效率:良好的数据模型能极大的改善用户使用数据的体验,提高使用数据的效率。

高质量:良好的数据模型能改善数据统计口径的混乱,减少计算错误的可能性。

2.2、数据仓库建模方法论

2.2.1、ER模型

        数据仓库之父Bill Inmon提出的建模方法是从全企业的高度,用实体关系(Entity Relationship,ER)模型来描述企业业务,并用规范化的方式表示出来,在范式理论上符合3NF。

1)实体关系模型

        实体关系模型将复杂的数据抽象为两个概念——实体和关系。实体表示一个对象,例如学生、班级,关系是指两个实体之间的关系,例如学生和班级之间的从属关系。

2)数据库规范化

        数据库规范化是使用一系列范式设计数据库(通常是关系型数据库)的过程,其目的是减少数据冗余,增强数据的一致性。

        这一系列范式就是指在设计关系型数据库时,需要遵从的不同的规范。关系型数据库的范式一共有六种,分别是第一范式(1NF)、第二范式(2NF)、第三范式(3NF)、巴斯-科德范式(BCNF)、第四范式(4NF)和第五范式(5NF)。遵循的范式级别越高,数据冗余性就越低。

3)三范式

(1)函数依赖

(2)第一范式

(3)第二范式

4)第三范式

        下图为一个采用Bill Inmon倡导的建模方法构建的模型,从图中可以看出,较为松散、零碎,物理表数量多。

这种建模方法的出发点是整合数据,其目的是将整个企业的数据进行组合和合并,并进行规范处理,减少数据冗余性,保证数据的一致性。这种模型并不适合直接用于分析统计。

2.2.2、维度模型

        数据仓库领域的令一位大师——Ralph Kimball倡导的建模方法为维度建模。维度模型将复杂的业务通过事实维度两个概念进行呈现。事实通常对应业务过程,而维度通常对应业务过程发生时所处的环境。

:业务过程可以概括为一个个不可拆分的行为事件,例如在线教育交易中的下单,付款,加购等,都是业务过程。

        下图为一个典型的维度模型,其中位于中心的SalesOrder为事实表,其中保存的是下单这个业务过程的所有记录。位于周围每张表都是维度表,包括Date(日期),Customer(顾客),Product(产品),Location(地区)等,这些维度表就组成了每个订单发生时所处的环境,即何人、何时、在何地下单了何种产品。从图中可以看出,模型相对清晰、简洁。

维度建模以数据分析作为出发点,为数据分析服务,因此它关注的重点的用户如何更快的完成需求分析以及如何实现较好的大规模复杂查询的响应性能。

3、维度建模理论之事实表

3.1、事实表概述

        事实表作为数据仓库维度建模的核心,紧紧围绕着业务过程来设计。其包含与该业务过程有关的维度引用(维度表外键)以及该业务过程的度量(通常是可累加的数字类型字段)。

3.1.1、事实表特点

        事实表通常比较“细长”,即列较少,但行较多,且行的增速快。

3.1.2、事实表分类

        事实表有三种类型:分别是事务事实表、周期快照事实表和累积快照事实表,每种事实表都具有不同的特点和适用场景,下面逐个介绍。

3.2、事务型事实表

3.2.1、概述

        事务事实表用来记录各业务过程,它保存的是各业务过程的原子操作事件,即最细粒度的操作事件。粒度是指事实表中一行数据所表达的业务细节程度。

        事务型事实表可用于分析与各业务过程相关的各项统计指标,由于其保存了最细粒度的记录,可以提供最大限度的灵活性,可以支持无法预期的各种细节层次的统计需求。

3.2.2、设计流程

设计事务事实表时一般可遵循以下四个步骤:

选择业务过程→声明粒度→确认维度→确认事实

1)选择业务过程

        在业务系统中,挑选我们感兴趣的业务过程,业务过程可以概括为一个个不可拆分的行为事件,例如在线教育交易中的下单,付款,加购等,都是业务过程。通常情况下,一个业务过程对应一张事务型事实表。

2)声明粒度

        业务过程确定后,需要为每个业务过程声明粒度。即精确定义每张事务型事实表的每行数据表示什么,应该尽可能选择最细粒度,以此来应对各种细节程度的需求。

典型的粒度声明如下:

        订单事实表中一行数据表示的是一个订单中的一门课程。

3)确定维度

        确定维度具体是指,确定与每张事务型事实表相关的维度有哪些。

        确定维度时应尽量多的选择与业务过程相关的环境信息。因为维度的丰富程度就决定了维度模型能够支持的指标丰富程度。

4)确定事实

        此处的“事实”一词,指的是每个业务过程的度量值(通常是可累加的数字类型的值,例如:次数、个数、件数、金额等)

        经过上述四个步骤,事务型事实表就基本设计完成了。第一步选择业务过程可以确定有哪些事务型事实表,第二步可以确定每张事务型事实表的每行数据是什么,第三步可以确定每张事务型事实表的维度外键,第四步可以确定每张事务型事实表的度量值字段。

3.2.3、不足

        事务型事实表可以保存所有业务过程的最细粒度的操作事件,故理论上其可以支撑与各业务过程相关的各种统计粒度的需求。但对于某些特定类型的需求,其逻辑可能会比较复杂,或者效率会比较低下。例如:

1)存量型指标

        例如购物车存量,账户余额等。此处以在线教育中的加购业务为例,加购业务包含的业务过程主要包括加购物车和减购物车,两个业务过程各自对应一张事务型事实表,一张存储所有加购物车的原子操作事件,另一张存储所有减购物车的原子操作事件。

        假定现有一个需求,要求统计截至当日的各用户各科目的购物车存量。由于加购物车和减购物车操作均会影响到购物车存量,故需要对两张事务型事实表进行聚合,且需要区分两者对购物车存量的影响(加或减),另外需要对两张表的全表数据聚合才能得到统计结果。

        可以看到,不论是从逻辑上还是效率上考虑,这都不是一个好的方案。

2)多事务关联统计

        例如,现需要统计最近30天,用户下单到支付的时间间隔的平均值。统计思路应该是找到下单事务事实表和支付事务事实表,过滤出最近30天的记录,然后按照订单id对两张事实表进行关联,之后用支付时间减去下单时间,然后再求平均值。

        逻辑上虽然并不复杂,但是其效率较低,因为下单事务事实表和支付事务事实表均为大表,大表join大表的操作应尽量避免。

        可以看到,在上述两种场景下事务型事实表的表现并不理想。下面要介绍的另外两种类型的事实表就是为了弥补事务型事实表的不足的。

3.3、周期型快照事实表

3.3.1、概述

        周期快照事实表以具有规律性的、可预见的时间间隔来记录事实,主要用于分析一些存量型(例如购物车存量,账户余额)或者状态型(空气温度,行驶速度)指标。

        对于购物车存量、账户余额这些存量型指标,业务系统中通常就会计算并保存最新结果,所以定期同步一份全量数据到数据仓库,构建周期型快照事实表,就能轻松应对此类统计需求,而无需再对事务型事实表中大量的历史记录进行聚合了。

        对于空气温度、行驶速度这些状态型指标,由于它们的值往往是连续的,我们无法捕获其变动的原子事务操作,所以无法使用事务型事实表统计此类需求。而只能定期对其进行采样,构建周期型快照事实表。

3.3.2、设计流程

1)确定粒度

        周期型快照事实表的粒度可由采样周期和维度描述,故确定采样周期和维度后即可确定粒度。

        采样周期通常选择每日。

        维度可根据统计指标决定,例如指标为统计每个用户每个科目的购物车存量,则可确定维度为用户和科目。

        确定完采样周期和维度后,即可确定该表粒度为每日-用户-科目。

2)确认事实

        事实也可根据统计指标决定,例如指标为统计每个用户每个科目的购物车存量,则事实为购物车存量。

3.3.3、事实类型

        此处的事实类型是指度量值的类型,而非事实表的类型。事实(度量值)共分为三类,分别是可加事实,半可加事实和不可加事实。

1)可加事实

        可加事实是指可以按照与事实表相关的所有维度进行累加,例如事务型事实表中的事实。

2)半可加事实

        半可加事实是指只能按照与事实表相关的一部分维度进行累加,例如周期型快照事实表中的事实。以上述各仓库中各用户购物车存量每天快照事实表为例,这张表中的购物车存量事实可以按照用户或者科目维度进行累加,但是不能按照时间维度进行累加,因为将每天的购物车存量累加起来是没有任何意义的。

3)不可加事实

        不可加事实是指完全不具备可加性,例如比率型事实。不可加事实通常需要转化为可加事实,例如比率可转化为分子和分母。

3.4、累积型快照事实表

3.4.1、概述

        累积型快照事实表是基于一个业务流程中的多个关键业务过程联合处理而构建的事实表,如交易流程中的试听、下单、支付等业务过程。

        累积型快照事实表通常具有多个日期字段,每个日期对应业务流程中的一个关键业务过程(里程碑)。

课程id

用户id

试听日期

下单日期

支付日期

订单分摊金额

支付分摊金额

1001

1234

2022-02-19

2022-02-20

2022-02-21

1000

1000

        累积型快照事实表主要用于分析业务过程(里程碑)之间的时间间隔等需求。例如前文提到的用户下单到支付的平均时间间隔,使用累积型快照事实表进行统计,就能避免两个事务事实表的关联操作,从而变得十分简单高效。

3.4.2、设计流程

        累积型快照事实表的设计流程同事务型事实表类似,也可采用以下四个步骤,下面重点描述与事务型事实表的不同之处。

        选择业务过程→声明粒度→确认维度→确认事实。

1)选择业务过程

        选择一个业务流程中需要关联分析的多个关键业务过程,多个业务过程对应一张累积型快照事实表。

2)声明粒度

        精确定义每行数据表示的是什么,尽量选择最小粒度。

3)确认维度

        选择与各业务过程相关的维度,需要注意的是,每各业务过程均需要一个日期维度。

4)确认事实

        选择各业务过程的度量值。

4、维度建模理论之维度表

4.1、维度表概述

        维度表是维度建模的基础和灵魂。前文提到,事实表紧紧围绕业务过程进行设计,而维度表则围绕业务过程所处的环境进行设计。维度表主要包含一个主键和各种维度字段,维度字段称为维度属性。

4.2、维度表设计步骤

1)确定维度(表)

        在设计事实表时,已经确定了与每个事实表相关的维度,理论上每个相关维度均需对应一张维度表。需要注意到,可能存在多个事实表与同一个维度都相关的情况,这种情况需保证维度的唯一性,即只创建一张维度表。另外,如果某些维度表的维度属性很少,例如只有一个**名称,则可不创建该维度表,而把该表的维度属性直接增加到与之相关的事实表中,这个操作称为维度退化

2)确定主维表和相关维表

        此处的主维表和相关维表均指业务系统中与某维度相关的表。例如业务系统中与课程相关的表有course_info,chapter_info,base_subject_info,base_category_info,video_info等,其中course_info就称为课程维度的主维表,其余表称为课程维度的相关维表。维度表的粒度通常与主维表相同。

3)确定维度属性

        确定维度属性即确定维度表字段。维度属性主要来自于业务系统中与该维度对应的主维表和相关维表。维度属性可直接从主维表或相关维表中选择,也可通过进一步加工得到。

        确定维度属性时,需要遵循以下要求:

(1)尽可能生成丰富的维度属性

        维度属性是后续做分析统计时的查询约束条件、分组字段的基本来源,是数据易用性的关键。维度属性的丰富程度直接影响到数据模型能够支持的指标的丰富程度。

(2)尽量不使用编码,而使用明确的文字说明,一般可以编码和文字共存。

(3)尽量沉淀出通用的维度属性

        有些维度属性的获取需要进行比较复杂的逻辑处理,例如需要通过多个字段拼接得到。为避免后续每次使用时的重复处理,可将这些维度属性沉淀到维度表中。

4.3、维度设计要点

4.3.1、规范化与反规范化

        规范化是指使用一系列范式设计数据库的过程,其目的是减少数据冗余,增强数据的一致性。通常情况下,规范化之后,一张表的字段会拆分到多张表。

        反规范化是指将多张表的数据冗余到一张表,其目的是减少join操作,提高查询性能。

        在设计维度表时,如果对其进行规范化,得到的维度模型称为雪花模型,如果对其进行反规范化,得到的模型称为星型模型。

        数据仓库系统的主要目的是用于数据分析和统计,所以是否方便用户进行统计分析决定了模型的优劣。采用雪花模型,用户在统计分析的过程中需要大量的关联操作,使用复杂度高,同时查询性能很差,而采用星型模型,则方便、易用且性能好。所以出于易用性和性能的考虑,维度表一般是很不规范化的。

4.3.2、维度变化

        维度属性通常不是静态的,而是会随时间变化的,数据仓库的一个重要特点就是反映历史的变化,所以如何保存维度的历史状态是维度设计的重要工作之一。保存维度数据的历史状态,通常有以下两种做法,分别是全量快照表和拉链表。

1)全量快照表

        离线数据仓库的计算周期通常为每天一次,所以可以每天保存一份全量的维度数据。这种方式的优点和缺点都很明显。

        优点是简单而有效,开发和维护成本低,且方便理解和使用。

        缺点是浪费存储空间,尤其是当数据的变化比例比较低时。

2)拉链表

        拉链表的意义就在于能够更加高效的保存维度信息的历史状态。

(1)什么是拉链表

        拉链表,记录每条信息的生命周期,一旦一条记录的生命周期结束,就重新开始一条新的记录,并把当前日期放入生效开始日期。
        如果当前信息至今有效,在生效结束日期中填入一个极大值(如9999-12-31 )

 (2)为什么要做拉链表

        拉链表适合于:数据会发生变化,但是变化频率并不高的维度(即: 缓慢变化维)。

        比如:用户信息会发生变化,但是每天变化的比例不高。如果数据量有一定规模,按照每日全量的方式保存效率很低。 比如: 1亿用户365天,每天一份用户信息。(做每日全量效率低)

(3)如何使用拉链表

4.3.3、多值维度

        如果事实表中一条记录在某个维度表中有多条记录与之对应,称为多值维度。例如,下单事实表中的一条记录为一个订单,一个订单可能包含多个课程,所以课程维度表中就可能有多条数据与之对应。

        针对这种情况,通常采用以下两种方案解决。

        第一种:降低事实表的粒度,例如将订单事实表的粒度由一个订单降低为一个订单中的一门课程。

        第二种:在事实表中采用多字段保存多个维度值,每个字段保存一个维度id。这种方案只适用于多值维度个数固定的情况。

        建议尽量采用第一种方案解决多值维度问题。

4.3.4、多值属性

        维表中的某个属性同时有多个值,称之为“多值属性”,例如课程维度的课程类别,每个课程均有多个属性值。

        针对这种情况,通常有可以采用以下两种方案。

        第一种:将多值属性放到一个字段,该字段内容为key1:value1,key2:value2或者 value1,value2的形式。如课程类别属性,一个课程可以属于编程技术、后端开发等多个类别。

        第二种:将多值属性放到多个字段,每个字段对应一个属性。这种方案只适用于多值属性个数固定的情况。

5、数据仓库设计

5.1、数据仓库分层规划

        优秀可靠的数仓体系,需要良好的数据分层结构。合理的分层,能够使数据体系更加清晰,使复杂问题得以简化。以下是该项目的分层规划。

5.2、数据仓库构建流程

以下是构建数据仓库的完整流程。

5.2.1、数据调研

        数据调研重点要做两项工作,分别是业务调研和需求分析。这两项工作做的是否充分,直接影响着数据仓库的质量。

1)业务调研

        业务调研的主要目标是熟悉业务流程、熟悉业务数据

        熟悉业务流程要求做到,明确每个业务的具体流程,需要将该业务所包含的每个业务过程一一列举出来。

        熟悉业务数据要求做到,将数据(包括埋点日志和业务数据表)与业务过程对应起来,明确每个业务过程会对哪些表的数据产生影响,以及产生什么影响。产生的影响,需要具体到,是新增一条数据,还是修改一条数据,并且需要明确新增的内容或者是修改的逻辑。

        下面以在线教育中的交易业务为例进行演示,交易业务涉及到的业务过程有用户试听、用户下单、用户支付,具体流程如下图。

2)需求分析

        典型的需求指标如,最近一天各省份 Java 学科订单总额。

        分析需求时,需要明确需求所需的业务过程维度,例如该需求所需的业务过程就是用户下单,所需的维度有日期,省份,科目。

3)总结

        做完业务分析和需求分析之后,要保证每个需求都能找到与之对应的业务过程及维度。若现有数据无法满足需求,则需要和业务方进行沟通,例如某个页面需要新增某个行为的埋点。

5.2.2、明确数据域

        数据仓库模型设计除横向的分层外,通常也需要根据业务情况进行纵向划分数据域。划分数据域的意义是便于数据的管理和应用

        通常可以根据业务过程或者部门进行划分,本项目根据业务过程进行划分,需要注意的是一个业务过程只能属于一个数据域。

        下面是本数仓项目所需的所有业务过程及数据域划分详情。

数据域

业务过程

交易域

下单、支付成功、取消运单

物流域

揽收(接单)、发单、转运完成、派送成功、签收、运输

中转域

入库、分拣、出库

5.2.3、构建业务总线矩阵

        业务总线矩阵中包含维度模型所需的所有事实(业务过程)以及维度,以及各业务过程与各维度的关系。矩阵的行是一个个业务过程,矩阵的列是一个个的维度,行列的交点表示业务过程与维度的关系。

        一个业务过程对应维度模型中一张事务型事实表,一个维度则对应维度模型中的一张维度表。所以构建业务总线矩阵的过程就是设计维度模型的过程。但是需要注意的是,总线矩阵中通常只包含事务型事实表,另外两种类型的事实表需单独设计。

        按照事务型事实表的设计流程,选择业务过程->声明粒度->确认维度->确认事实,得到的最终的业务总线矩阵见以下表格。

        后续的DWD层以及DIM层的搭建需参考业务总线矩阵。

5.2.4、明确统计指标

        明确统计指标具体的工作是,深入分析需求,构建指标体系。构建指标体系的主要意义就是指标定义标准化。所有指标的定义,都必须遵循同一套标准,这样能有效的避免指标定义存在歧义,指标定义重复等问题。

1)指标体系相关概念

(1)原子指标

        原子指标基于某一业务过程度量值,是业务定义中不可再拆解的指标,原子指标的核心功能就是对指标的聚合逻辑进行了定义。我们可以得出结论,原子指标包含三要素,分别是业务过程、度量值和聚合逻辑。

        例如订单总额就是一个典型的原子指标,其中的业务过程为用户下单、度量值为订单金额,聚合逻辑为sum()求和。需要注意的是原子指标只是用来辅助定义指标一个概念,通常不会对应有实际统计需求与之对应。

(2)派生指标

        派生指标基于原子指标,其与原子指标的关系如下图所示。

        与原子指标不同,派生指标通常会对应实际的统计需求。请从图中的例子中,体会指标定义标准化的含义。

(3)衍生指标

        衍生指标是在一个或多个派生指标的基础上,通过各种逻辑运算复合而成的。例如比率、比例等类型的指标。衍生指标也会对应实际的统计需求。

2)指标体系对于数仓建模的意义

        通过上述两个具体的案例可以看出,绝大多数的统计需求,都可以使用原子指标、派生指标以及衍生指标这套标准去定义。同时能够发现这些统计需求都直接的或间接的对应一个或者是多个派生指标。

        当统计需求足够多时,必然会出现部分统计需求对应的派生指标相同的情况。这种情况下,我们就可以考虑将这些公共的派生指标保存下来,这样做的主要目的就是减少重复计算,提高数据的复用性。

        这些公共的派生指标统一保存在数据仓库的DWS层。因此DWS层设计,就可以参考我们根据现有的统计需求整理出的派生指标。

        按照上述标准整理出的指标体系如下(部分):

从上述指标体系中抽取出来的所有派生指标见如下表格(部分)。

5.2.4、维度模型设计

        维度模型的设计参照上述得到的业务总线矩阵即可。事实表存储在DWD层,维度表存储在DIM层。

5.2.5、汇总模型设计

        汇总模型的设计参考上述整理出的指标体系(主要是派生指标)即可。汇总表与派生指标的对应关系是,一张汇总表通常包含业务过程相同、统计周期相同、统计粒度相同的多个派生指标。请思考:汇总表与事实表的对应关系是?

6、数据仓库环境准备

6.1、数据仓库运行环境

数据仓库运行环境-HiveOnSpark配置-CSDN博客

6.2、数据仓库开发环境

        数仓开发工具可选用DBeaver或者DataGrip。两者都需要用到JDBC协议连接到Hive,故需要启动HiveServer2。

1)启动HiveServer2

[kingjw@hadoop102 hive]$ hiveserver2

2)配置DataGrip连接

(1)创建连接

(2)配置连接属性

        所有属性配置,和Hive的beeline客户端配置一致即可。初次使用,配置过程会提示缺少JDBC驱动,按照提示下载即可。

如果驱动下载失败,参考

DataGrip下载驱动失败,手动配置http代理-CSDN博客

6.3、模拟数据准备

        通常企业在开始搭建数仓时,业务系统中会存在历史数据,一般业务数据库存在历史数据,而用户行为日志无历史数据。假定数仓上线的日期为2023-01-10,为模拟真实场景,需准备以下数据。

        注:在执行以下操作之前,先将HDFS上/origin_data路径下之前的测试使用的数据删除。

6.3.1、用户行为日志

本项目没有日志数据。

6.3.2、业务数据

业务数据一般存在历史数据,此处需准备2023-01-05至2023-01-10的数据

1)增量表同步

(1)将application.yml的mock.reset-all-dim和mock.clear.busi置为1,mock.date修改为2023-01-05,如下。

#业务日期
mock.date: "2023-01-05"

# 清空所有维度数据
mock.reset-all-dim: 1
# 清空所有业务事实数据
mock.clear.busi: 1

(2)生成数据

[kingjw@hadoop102 tms]$ java -jar tms-mock-2023-01-06.jar

(3)将application.yml中的mock.reset-all-dim和mock.clear.busi修改为0,mock.date修改为2023-01-06,如下。

mock.date: "2023-01-06"

(4)重新生成数据

[kingjw@hadoop102 tms]$ java -jar tms-mock-2023-01-06.jar

(5)依次修改mock.date为2023-01-07,2023-01-08,2023-01-09,2023-01-10,生成对应日期的数据,不再赘述。

(6)执行flink-cdc.sh脚本,如下。

[kingjw@hadoop102 tms]$ flink-cdc.sh initial 2023-01-10

(7)查看HDFS对应目录,如下

2)全量表同步

(1)执行全量表同步脚本

[kingjw@hadoop102 bin]$ mysql_to_hdfs_full.sh all 2023-01-10

(2)查看HDFS对应目录,如下。

6.4、Hive 常见问题及解决方式

6.4.1、DataGrip 中注释乱码问题

        注释属于元数据的一部分,同样存储在mysql的metastore库中,如果metastore库的字符集不支持中文,就会导致中文显示乱码。

        不建议修改Hive元数据库的编码,此处我们在metastore中找存储注释的表,找到表中存储注释的字段,只改对应表对应字段的编码。

        如下两步修改,缺一不可

(1)修改mysql元数据库

        我们用到的注释有两种:字段注释和整张表的注释。

        COLUMNS_V2 表中的 COMMENT 字段存储了 Hive 表所有字段的注释,TABLE_PARAMS 表中的 PARAM_VALUE 字段存储了所有表的注释。我们可以通过命令修改字段编码,也可以用 DataGrip 或 Navicat 等工具修改,此处仅对 Navicat 进行展示。

① 命令修改

alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8;
alter table TABLE_PARAMS modify column PARAM_VALUE mediumtext character set utf8;

② 使用工具

以 COLUMNS_V2 表中COMMENT 字段的修改为例

i)右键点击表名,选择设计表

ii)在右侧页面中选中表的字段

iii)在页面下方下拉列表中将字符集改为 utf8

        修改字符集之后,已存在的中文注释能否正确显示?不能。为何?

        数据库中的字符都是通过编码存储的,写入时编码,读取时解码。修改字段编码并不会改变此前数据的编码方式,依然为默认的latin1,此时读取之前的中文注释会用utf8解码,编解码方式不一致,依然乱码。

(2)修改url连接

修改hive-site.xml在末尾添加

&useUnicode=true&characterEncoding=UTF-8

xml 文件中 & 符是有特殊含义的,我们必须使用转义的方式 & 对 & 进行替换

修改结果如下

<property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://hadoop102:3306/metastore?useSSL=false&useUnicode=true&characterEncoding=UTF-8</value>
</property>

只要修改了 hive-site.xml,就必须重启 hiveserver2。

6.4.2、DataGrip 刷新连接时 hiveserver2 后台报错

(1)报错信息如下

FAILED: ParseException line 1:5 cannot recognize input near 'show' 'indexes' 'on' in ddl statement

原因:我们使用的是Hive-3.1.3,早期版本的Hive有索引功能,当前版本已移除, DataGrip刷新连接时会扫描索引,而 Hive 没有,就会报错。

(2)报错信息如下

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Current user : kingjw is not allowed to list roles. User has to belong to ADMIN role and have it as current role, for this action.

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Current user : kingjw is not allowed get principals in a role. User has to belong to ADMIN role and have it as current role, for this action. Otherwise, grantor need to have ADMIN OPTION on role being granted and have it as a current role for this action.

DataGrip 连接 hiveserver2 时会做权限认证,但本项目中我们没有对 Hive 的权限管理进行配置,因而报错。

上述两个问题都是 DataGrip 导致的,并非 Hive 环境的问题,不影响使用

6.4.3、OOM 报错

        Hive 默认堆内存只有 256M,如果 hiveserver2 后台频繁出现 OutOfMemoryError,可以调大堆内存。

        在 Hive 家目录的 conf 目录下复制一份模板文件hive-env.sh.template

[kingjw@hadoop102 conf]$ cd $HIVE_HOME/conf
[kingjw@hadoop102 conf]$ cp hive-env.sh.template hive-env.sh

        修改 hive-env.sh,将 Hive 堆内存改为 1024M,如下

export HADOOP_HEAPSIZE=1024

        可根据实际使用情况适当调整堆内存。

6.4.4、DataGrip ODS 层部分表字段显示异常

        建表字段中有如下语句的表字段无法显示

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'

        上述语句指定了 Hive 表的序列化器和反序列化器 SERDE(serialization 和 deserialization 的合并缩写),用于解析 JSON 格式的文件。上述 SERDE 是由第三方提供的,在 hive-site.xml 中添加如下配置即可解决

<property>
    <name>metastore.storage.schema.reader.impl</name>
    <value>org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader</value>
</property>

7、数仓开发之ODS层

        Flink-cdc采集到的增量同步数据是以JSON的格式保存的,所以读取JSON数据也需要对应的序列化方法。

JSON格式的序列化和反序列化

LanguageManual DDL - Apache Hive - Apache Software Foundation

复杂数据类型

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-OperatorsonComplexTypes

ODS层的设计要点如下:

(1)ODS层的表结构设计依托于从业务系统同步过来的数据结构。

(2)ODS层要保存全部历史数据,故其压缩格式应选择压缩比较高的,此处选择gzip。

(3)ODS层表名的命名规范为:ods_表名_单分区增量全量标识(inc/full)。

7.1、运单表(增量表)

drop table if exists ods_order_info_inc;
create external table ods_order_info_inc(
  `op` string comment '操作类型',
  `after` struct<`id`:bigint,`order_no`:string,`status`:string,`collect_type`:string,`user_id`:bigint,`receiver_complex_id`:bigint,`receiver_province_id`:string,`receiver_city_id`:string,`receiver_district_id`:string,`receiver_address`:string,`receiver_name`:string,`sender_complex_id`:bigint,`sender_province_id`:string,`sender_city_id`:string,`sender_district_id`:string,`sender_name`:string,`payment_type`:string,`cargo_num`:bigint,`amount`:decimal(16,2),`estimate_arrive_time`:string,`distance`:decimal(16,2),`create_time`:string,`update_time`:string,`is_deleted`:string> comment '修改或插入后的数据',
  `before` struct<`id`:bigint,`order_no`:string,`status`:string,`collect_type`:string,`user_id`:bigint,`receiver_complex_id`:bigint,`receiver_province_id`:string,`receiver_city_id`:string,`receiver_district_id`:string,`receiver_address`:string,`receiver_name`:string,`sender_complex_id`:bigint,`sender_province_id`:string,`sender_city_id`:string,`sender_district_id`:string,`sender_name`:string,`payment_type`:string,`cargo_num`:bigint,`amount`:decimal(16,2),`estimate_arrive_time`:string,`distance`:decimal(16,2),`create_time`:string,`update_time`:string,`is_deleted`:string> comment '修改前的数据',
  `ts` bigint comment '时间戳'
) comment '运单表'
	partitioned by (`dt` string comment '统计日期')
	ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
	location '/warehouse/tms/ods/ods_order_info_inc';

7.2、小区表(全量表)

drop table if exists ods_base_complex_full;
create external table ods_base_complex_full(
	`id` bigint comment '小区ID',
	`complex_name` string comment '小区名称',
	`province_id` bigint comment '省份ID',
	`city_id` bigint comment '城市ID',
	`district_id` bigint comment '区(县)ID',
	`district_name` string comment '区(县)名称',
	`create_time` string comment '创建时间',
	`update_time` string comment '更新时间',
	`is_deleted` string comment '是否删除'
) comment '小区表'
	partitioned by (`dt` string comment '统计日期')
	row format delimited fields terminated by '\t'
	null defined as ''
	location '/warehouse/tms/ods/ods_base_complex_full';

7.3、数据装载脚本

1)在hadoop102的/home/kingjw/bin目录下创建hdfs_to_ods.sh

[kingjw@hadoop102 bin]$ vim hdfs_to_ods.sh

2)编写如下内容

#!/bin/bash

APP='tms'

# 如果有第二个参数
if [ -n "$2" ] ;then
   do_date=$2
else 
	# 获得上一天的时间,可以通过data -help 查看帮助
   do_date=`date -d '-1 day' +%F`
fi

load_data(){
    sql=""
    for i in $*; do
        #判断路径是否存在,存在则返回0
        hadoop fs -test -e /origin_data/tms/${i:4}/$do_date
        #路径存在方可装载数据;$? 为获取上一个的输出
        if [[ $? = 0 ]]; then
            sql=$sql"load data inpath '/origin_data/tms/${i:4}/$do_date' OVERWRITE into table ${APP}.${i} partition(dt='$do_date');"
        fi
    done
    hive -e "$sql"
}

case $1 in
    ods_order_info_inc | ods_order_cargo_inc | ods_transport_task_inc | ods_order_org_bound_inc | ods_user_info_inc | ods_user_address_inc | ods_base_complex_full | ods_base_dic_full | ods_base_region_info_full | ods_base_organ_full | ods_express_courier_full | ods_express_courier_complex_full | ods_employee_info_full | ods_line_base_shift_full | ods_line_base_info_full | ods_truck_driver_full | ods_truck_info_full | ods_truck_model_full | ods_truck_team_full)
        load_data $1
    ;;
    "all")
        load_data ods_order_info_inc ods_order_cargo_inc ods_transport_task_inc ods_order_org_bound_inc ods_user_info_inc ods_user_address_inc ods_base_complex_full ods_base_dic_full ods_base_region_info_full ods_base_organ_full ods_express_courier_full ods_express_courier_complex_full ods_employee_info_full ods_line_base_shift_full ods_line_base_info_full ods_truck_driver_full ods_truck_info_full ods_truck_model_full ods_truck_team_full
    ;;
esac

3)赋予脚本执行权限

[kingjw@hadoop102 bin]$ chmod +x hdfs_to_ods.sh

4)脚本用法

[kingjw@hadoop102 bin]$ hdfs_to_ods.sh all 2023-01-10

8、数仓开发之DIM层

DIM层设计要点:

(1)DIM层的设计依据是维度建模理论,该层存储维度模型的维度表。

(2)DIM层的数据存储格式为orc列式存储+snappy压缩。

(3)DIM层表名的命名规范为dim_表名_全量表或者拉链表标识(full/zip)

8.1、小区维度表

8.1.1、主维表和相关维表

主维表

相关维表1

相关维表2

8.1.2、建表语句

drop table if exists dim_complex_full;
create external table dim_complex_full(
  `id` bigint comment '小区ID',
  `complex_name` string comment '小区名称',
  `courier_emp_ids` array<string> comment '负责快递员IDS',
  `province_id` bigint comment '省份ID',
  `province_name` string comment '省份名称',
  `city_id` bigint comment '城市ID',
  `city_name` string comment '城市名称',
  `district_id` bigint comment '区(县)ID',
  `district_name` string comment '区(县)名称'
) comment '小区维度表'
  partitioned by (`dt` string comment '统计日期')
  stored as orc
  location '/warehouse/tms/dim/dim_complex_full'
  tblproperties('orc.compress'='snappy');

8.1.3、数据装载

insert overwrite table tms.dim_complex_full
    partition (dt = '2023-01-10')
select complex_info.id   id,
       complex_name,
       courier_emp_ids,
       province_id,
       dic_for_prov.name province_name,
       city_id,
       dic_for_city.name city_name,
       district_id,
       district_name
from (select id,
             complex_name,
             province_id,
             city_id,
             district_id,
             district_name
      from ods_base_complex_full
      where dt = '2023-01-10'
        and is_deleted = '0') complex_info
         join
     (select id,
             name
      from ods_base_region_info_full
      where dt = '2023-01-10'
        and is_deleted = '0') dic_for_prov
     on complex_info.province_id = dic_for_prov.id
         join
     (select id,
             name
      from ods_base_region_info_full
      where dt = '2023-01-10'
        and is_deleted = '0') dic_for_city
     on complex_info.city_id = dic_for_city.id
         left join
     (select
       collect_set(cast(courier_emp_id as string)) courier_emp_ids,
       complex_id
from ods_express_courier_complex_full where dt='2023-01-10' and is_deleted='0'
group by complex_id
) complex_courier
     on complex_info.id = complex_courier.complex_id;

另外的写法,可读性更好

--全量快照维度表:每个分区保存一份全量的维度数据
with cx as (
    select id,
           complex_name,
           province_id,
           city_id,
           district_id,
           district_name
    from ods_base_complex_full
    where dt='2023-01-10' and is_deleted='0'
),pv as (
    select id,
           name
    from ods_base_region_info_full
    where dt='2023-01-10' and is_deleted='0'
),cy as (
    select
        id,
        name
    from ods_base_region_info_full
),ex as (
    select
        collect_set((cast(courier_emp_id as string))) courier_emp_ids,
        complex_id
    from ods_express_courier_complex_full
    where dt='2023-01-10' and is_deleted='0'
    group by complex_id
)
insert overwrite table dim_complex_full partition (dt = '2023-01-10')
select
    cx.id,
    complex_name,
    courier_emp_ids,
    province_id,
    pv.name,
    city_id,
    cy.name,
    district_id,
    district_name
from cx left join pv
on cx.province_id = pv.id
left join cy
on cx.city_id = cy.id
left join ex
on cx.id = ex.complex_id;

结果

8.2、用户维度表(拉链表)

8.2.1、建表语句

drop table if exists dim_user_zip;
create external table dim_user_zip(
  `id` bigint COMMENT '用户地址信息ID',
  `login_name` string COMMENT '用户名称',
  `nick_name` string COMMENT '用户昵称',
  `passwd` string COMMENT '用户密码',
  `real_name` string COMMENT '用户姓名',
  `phone_num` string COMMENT '手机号',
  `email` string COMMENT '邮箱',
  `user_level` string COMMENT '用户级别',
  `birthday` string COMMENT '用户生日',
  `gender` string COMMENT '性别 M男,F女',
  `start_date` string COMMENT '起始日期',
  `end_date` string COMMENT '结束日期'
) comment '用户拉链表'
    partitioned by (`dt` string comment '统计日期')
    stored as orc
    location '/warehouse/tms/dim/dim_user_zip'
    tblproperties('orc.compress'='snappy');

8.2.2、分区规划

8.2.3、数据装载

1)数据装载过程

2)数据流向

首日装载

每日装载

3)首日装载

insert overwrite table dim_user_zip
    partition (dt = '9999-12-31')
select after.id,
       after.login_name,
       after.nick_name,
       md5(after.passwd)                                                                                    passwd,
       md5(after.real_name)                                                                                 realname,
       md5(if(after.phone_num regexp '^(13[0-9]|14[01456879]|15[0-35-9]|16[2567]|17[0-8]|18[0-9]|19[0-35-9])\\d{8}$',
              after.phone_num, null))                                                                       phone_num,
       md5(if(after.email regexp '^[a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+(\\.[a-zA-Z0-9_-]+)+$', after.email, null)) email,
       after.user_level,
       date_add('1970-01-01', cast(after.birthday as int))                                                  birthday,
       after.gender,
       date_format(from_utc_timestamp(
                           cast(after.create_time as bigint), 'UTC'),
                   'yyyy-MM-dd')                                                                            start_date,
       '9999-12-31'                                                                                         end_date
from ods_user_info_inc
where dt = '2023-01-10'
  and after.is_deleted = '0';

4)每日装载

(1)装载思路

(2)装载语句

set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table dim_user_zip
    partition (dt)
select id,
       login_name,
       nick_name,
       passwd,
       real_name,
       phone_num,
       email,
       user_level,
       birthday,
       gender,
       start_date,
       if(rk = 1, end_date, date_add('2023-01-11', -1)) end_date,
       if(rk = 1, end_date, date_add('2023-01-11', -1)) dt
from (select id,
             login_name,
             nick_name,
             passwd,
             real_name,
             phone_num,
             email,
             user_level,
             birthday,
             gender,
             start_date,
             end_date,
             row_number() over (partition by id order by start_date desc) rk
      from (select id,
                   login_name,
                   nick_name,
                   passwd,
                   real_name,
                   phone_num,
                   email,
                   user_level,
                   birthday,
                   gender,
                   start_date,
                   end_date
            from dim_user_zip
            where dt = '9999-12-31'
            union
            select id,
                   login_name,
                   nick_name,
                   md5(passwd)                                                                              passwd,
                   md5(real_name)                                                                           realname,
                   md5(if(phone_num regexp
                          '^(13[0-9]|14[01456879]|15[0-35-9]|16[2567]|17[0-8]|18[0-9]|19[0-35-9])\\d{8}$',
                          phone_num, null))                                                                 phone_num,
                   md5(if(email regexp '^[a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+(\\.[a-zA-Z0-9_-]+)+$', email, null)) email,
                   user_level,
                   cast(date_add('1970-01-01', cast(birthday as int)) as string)                            birthday,
                   gender,
                   '2023-01-11'                                                                             start_date,
                   '9999-12-31'                                                                             end_date
            from (select after.id,
                         after.login_name,
                         after.nick_name,
                         after.passwd,
                         after.real_name,
                         after.phone_num,
                         after.email,
                         after.user_level,
                         after.birthday,
                         after.gender,
                         row_number() over (partition by after.id order by ts desc) rn
                  from ods_user_info_inc
                  where dt = '2023-01-11'
                    and after.is_deleted = '0'
                 ) inc
            where rn = 1) full_info) final_info;

8.3、数据装载脚本

8.3.1、首日装载脚本

(1)在hadoop102的/home/kingjw/bin目录下创建ods_to_dim_init.sh

[kingjw@hadoop102 bin]$ vim ods_to_dim_init.sh 

(2)编写如下内容

#!/bin/bash

APP=tms
#1、判断参数是否传入
if [ $# -lt 2 ]
then
	echo "必须传入all/表名与数仓上线日期..."
	exit
fi

#2、根据表名匹配加载首日数据

dim_complex_full_sql="
insert overwrite table tms.dim_complex_full
    partition (dt = '$2')
select complex_info.id   id,
       complex_name,
       courier_emp_ids,
       province_id,
       dic_for_prov.name province_name,
       city_id,
       dic_for_city.name city_name,
       district_id,
       district_name
from (select id,
             complex_name,
             province_id,
             city_id,
             district_id,
             district_name
      from ods_base_complex_full
      where dt = '$2'
        and is_deleted = '0') complex_info
         join
     (select id,
             name
      from ods_base_region_info_full
      where dt = '$2'
        and is_deleted = '0') dic_for_prov
     on complex_info.province_id = dic_for_prov.id
         join
     (select id,
             name
      from ods_base_region_info_full
      where dt = '$2'
        and is_deleted = '0') dic_for_city
     on complex_info.city_id = dic_for_city.id
         left join
     (select collect_set(cast(courier_emp_id as string)) courier_emp_ids,
             complex_id
      from ods_express_courier_complex_full
      where dt = '$2'
        and is_deleted = '0'
group by complex_id
) complex_courier
     on complex_info.id = complex_courier.complex_id;
"

dim_organ_full_sql="
insert overwrite table tms.dim_organ_full
    partition (dt = '$2')
select organ_info.id,
       organ_info.org_name,
       org_level,
       region_id,
       region_info.name        region_name,
       region_info.dict_code   region_code,
       org_parent_id,
       org_for_parent.org_name org_parent_name
from (select id,
             org_name,
             org_level,
             region_id,
             org_parent_id
      from ods_base_organ_full
      where dt = '$2'
        and is_deleted = '0') organ_info
         left join (
    select id,
           name,
           dict_code
    from ods_base_region_info_full
    where dt = '$2'
      and is_deleted = '0'
) region_info
                   on organ_info.region_id = region_info.id
         left join (
    select id,
           org_name
    from ods_base_organ_full
    where dt = '$2'
      and is_deleted = '0'
) org_for_parent
                   on organ_info.org_parent_id = org_for_parent.id;
"

dim_region_full_sql="
insert overwrite table dim_region_full
    partition (dt = '$2')
select id,
       parent_id,
       name,
       dict_code,
       short_name
from ods_base_region_info_full
where dt = '$2'
  and is_deleted = '0';
"

dim_express_courier_full_sql="
insert overwrite table tms.dim_express_courier_full
    partition (dt = '$2')
select express_cor_info.id,
       emp_id,
       org_id,
       org_name,
       working_phone,
       express_type,
       dic_info.name express_type_name
from (select id,
             emp_id,
             org_id,
             md5(working_phone) working_phone,
             express_type
      from ods_express_courier_full
      where dt = '$2'
        and is_deleted = '0') express_cor_info
         join (
    select id,
           org_name
    from ods_base_organ_full
    where dt = '$2'
      and is_deleted = '0'
) organ_info
              on express_cor_info.org_id = organ_info.id
         join (
    select id,
           name
    from ods_base_dic_full
    where dt = '$2'
      and is_deleted = '0'
) dic_info
              on express_type = dic_info.id;
"

dim_shift_full_sql="
insert overwrite table tms.dim_shift_full
    partition (dt = '$2')
select shift_info.id,
       line_id,
       line_info.name line_name,
       line_no,
       line_level,
       org_id,
       transport_line_type_id,
       dic_info.name  transport_line_type_name,
       start_org_id,
       start_org_name,
       end_org_id,
       end_org_name,
       pair_line_id,
       distance,
       cost,
       estimated_time,
       start_time,
       driver1_emp_id,
       driver2_emp_id,
       truck_id,
       pair_shift_id
from (select id,
             line_id,
             start_time,
             driver1_emp_id,
             driver2_emp_id,
             truck_id,
             pair_shift_id
      from ods_line_base_shift_full
      where dt = '$2'
        and is_deleted = '0') shift_info
         join
     (select id,
             name,
             line_no,
             line_level,
             org_id,
             transport_line_type_id,
             start_org_id,
             start_org_name,
             end_org_id,
             end_org_name,
             pair_line_id,
             distance,
             cost,
             estimated_time
      from ods_line_base_info_full
      where dt = '$2'
        and is_deleted = '0') line_info
     on shift_info.line_id = line_info.id
         join (
    select id,
           name
    from ods_base_dic_full
    where dt = '$2'
      and is_deleted = '0'
) dic_info on line_info.transport_line_type_id = dic_info.id;
"

dim_truck_driver_full_sql="
insert overwrite table tms.dim_truck_driver_full
    partition (dt = '$2')
select driver_info.id,
       emp_id,
       org_id,
       organ_info.org_name,
       team_id,
       team_info.name team_name,
       license_type,
       init_license_date,
       expire_date,
       license_no,
       is_enabled
from (select id,
             emp_id,
             org_id,
             team_id,
             license_type,
             init_license_date,
             expire_date,
             license_no,
             is_enabled
      from ods_truck_driver_full
      where dt = '$2'
        and is_deleted = '0') driver_info
         join (
    select id,
           org_name
    from ods_base_organ_full
    where dt = '$2'
      and is_deleted = '0'
) organ_info
              on driver_info.org_id = organ_info.id
         join (
    select id,
           name
    from ods_truck_team_full
    where dt = '$2'
      and is_deleted = '0'
) team_info
              on driver_info.team_id = team_info.id;
"

dim_truck_full_sql="
insert overwrite table tms.dim_truck_full
partition (dt = '$2')
select truck_info.id,
       team_id,
       team_info.name     team_name,
       team_no,
       org_id,
       org_name,
       manager_emp_id,
       truck_no,
       truck_model_id,
       model_name         truck_model_name,
       model_type         truck_model_type,
       dic_for_type.name  truck_model_type_name,
       model_no           truck_model_no,
       brand              truck_brand,
       dic_for_brand.name truck_brand_name,
       truck_weight,
       load_weight,
       total_weight,
       eev,
       boxcar_len,
       boxcar_wd,
       boxcar_hg,
       max_speed,
       oil_vol,
       device_gps_id,
       engine_no,
       license_registration_date,
       license_last_check_date,
       license_expire_date,
       is_enabled
from (select id,
             team_id,

             md5(truck_no) truck_no,
             truck_model_id,

             device_gps_id,
             engine_no,
             license_registration_date,
             license_last_check_date,
             license_expire_date,
             is_enabled
      from ods_truck_info_full
      where dt = '$2'
        and is_deleted = '0') truck_info
         join
     (select id,
             name,
             team_no,
             org_id,

             manager_emp_id
      from ods_truck_team_full
      where dt = '$2'
        and is_deleted = '0') team_info
     on truck_info.team_id = team_info.id
         join
     (select id,
             model_name,
             model_type,

             model_no,
             brand,

             truck_weight,
             load_weight,
             total_weight,
             eev,
             boxcar_len,
             boxcar_wd,
             boxcar_hg,
             max_speed,
             oil_vol
      from ods_truck_model_full
      where dt = '$2'
        and is_deleted = '0') model_info
     on truck_info.truck_model_id = model_info.id
         join
     (select id,
             org_name
      from ods_base_organ_full
      where dt = '$2'
        and is_deleted = '0'
     ) organ_info
     on org_id = organ_info.id
         join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_type
     on model_info.model_type = dic_for_type.id
         join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_brand
     on model_info.brand = dic_for_brand.id;
"

dim_user_zip_sql="
insert overwrite table dim_user_zip
    partition (dt = '9999-12-31')
select after.id,
       after.login_name,
       after.nick_name,
       md5(after.passwd)                                                                                    passwd,
       md5(after.real_name)                                                                                 realname,
       md5(if(after.phone_num regexp '^(13[0-9]|14[01456879]|15[0-35-9]|16[2567]|17[0-8]|18[0-9]|19[0-35-9])\\d{8}$',
              after.phone_num, null))                                                                       phone_num,
       md5(if(after.email regexp '^[a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+(\\.[a-zA-Z0-9_-]+)+$', after.email, null)) email,
       after.user_level,
       date_add('1970-01-01', cast(after.birthday as int))                                                  birthday,
       after.gender,
       date_format(from_utc_timestamp(
                           cast(after.create_time as bigint), 'UTC'),
                   'yyyy-MM-dd')                                                                            start_date,
       '9999-12-31'                                                                                         end_date
from ods_user_info_inc
where dt = '$2'
  and after.is_deleted = '0';
"

dim_user_address_zip_sql="
insert overwrite table dim_user_address_zip
    partition (dt = '9999-12-31')
select after.id,
       after.user_id,
       md5(if(after.phone regexp
              '^(13[0-9]|14[01456879]|15[0-35-9]|16[2567]|17[0-8]|18[0-9]|19[0-35-9])\\d{8}$',
              after.phone, null))               phone,
       after.province_id,
       after.city_id,
       after.district_id,
       after.complex_id,
       after.address,
       after.is_default,
       concat(substr(after.create_time, 1, 10), ' ',
              substr(after.create_time, 12, 8)) start_date,
       '9999-12-31'                             end_date
from ods_user_address_inc
where dt = '$2'
  and after.is_deleted = '0';
"

case $1 in
"all")
	/opt/module/hive/bin/hive -e "use tms;${dim_complex_full_sql}${dim_express_courier_full_sql}${dim_organ_full_sql}${dim_region_full_sql}${dim_shift_full_sql}${dim_truck_driver_full_sql}${dim_truck_full_sql}${dim_user_address_zip_sql}${dim_user_zip_sql}"
;;
"dim_complex_full")
    /opt/module/hive/bin/hive -e "use tms;${dim_complex_full_sql}"
;;
"dim_express_courier_full")
    /opt/module/hive/bin/hive -e "use tms;${dim_express_courier_full_sql}"
;;
"dim_organ_full")
    /opt/module/hive/bin/hive -e "use tms;${dim_organ_full_sql}"
;;
"dim_region_full")
    /opt/module/hive/bin/hive -e "use tms;${dim_region_full_sql}"
;;
"dim_shift_full")
    /opt/module/hive/bin/hive -e "use tms;${dim_shift_full_sql}"
;;
"dim_truck_driver_full")
    /opt/module/hive/bin/hive -e "use tms;${dim_truck_driver_full_sql}"
;;
"dim_truck_full")
    /opt/module/hive/bin/hive -e "use tms;${dim_truck_full_sql}"
;;
"dim_user_address_zip")
    /opt/module/hive/bin/hive -e "use tms;${dim_user_address_zip_sql}"
;;
"dim_user_zip")
    /opt/module/hive/bin/hive -e "use tms;${dim_user_zip_sql}"
;;
*)
	echo "表名输入错误..."
;;
esac

(3)增加脚本执行权限

[kingjw@hadoop102 bin]$ chmod +x ods_to_dim_init.sh 

(4)脚本用法

[kingjw@hadoop102 bin]$ ods_to_dim_init.sh all 2023-01-10

8.3.2、每日装载脚本

(1)在hadoop102的/home/kingjw/bin目录下创建ods_to_dim.sh

[kingjw@hadoop102 bin]$ vim ods_to_dim.sh 

(2)编写如下内容

#!/bin/bash

APP=tms
#1、判断参数是否传入
if [ $# -lt 1 ]
then
	echo "必须传入all/表名..."
	exit
fi

#2、判断日期是否传入,传入则加载指定日期数据,没传则加载前一天日期数据
[ "$2" ] && datestr=$2 || datestr=$(date -d '-1 day' +%F)

#3、根据表名匹配加载首日数据

dim_complex_full_sql="
insert overwrite table tms.dim_complex_full
    partition (dt = '${datestr}')
select complex_info.id   id,
       complex_name,
       courier_emp_ids,
       province_id,
       dic_for_prov.name province_name,
       city_id,
       dic_for_city.name city_name,
       district_id,
       district_name
from (select id,
             complex_name,
             province_id,
             city_id,
             district_id,
             district_name
      from ods_base_complex_full
      where dt = '${datestr}'
        and is_deleted = '0') complex_info
         join
     (select id,
             name
      from ods_base_region_info_full
      where dt = '${datestr}'
        and is_deleted = '0') dic_for_prov
     on complex_info.province_id = dic_for_prov.id
         join
     (select id,
             name
      from ods_base_region_info_full
      where dt = '${datestr}'
        and is_deleted = '0') dic_for_city
     on complex_info.city_id = dic_for_city.id
         left join
     (select collect_set(cast(courier_emp_id as string)) courier_emp_ids,
             complex_id
      from ods_express_courier_complex_full
      where dt = '${datestr}'
        and is_deleted = '0'
      group by complex_id
) complex_courier
     on complex_info.id = complex_courier.complex_id;
"

dim_organ_full_sql="
insert overwrite table tms.dim_organ_full
    partition (dt = '${datestr}')
select organ_info.id,
       organ_info.org_name,
       org_level,
       region_id,
       region_info.name        region_name,
       region_info.dict_code   region_code,
       org_parent_id,
       org_for_parent.org_name org_parent_name
from (select id,
             org_name,
             org_level,
             region_id,
             org_parent_id
      from ods_base_organ_full
      where dt = '${datestr}'
        and is_deleted = '0') organ_info
         left join (
    select id,
           name,
           dict_code
    from ods_base_region_info_full
    where dt = '${datestr}'
      and is_deleted = '0'
) region_info
                   on organ_info.region_id = region_info.id
         left join (
    select id,
           org_name
    from ods_base_organ_full
    where dt = '${datestr}'
      and is_deleted = '0'
) org_for_parent
                   on organ_info.org_parent_id = org_for_parent.id;
"

dim_region_full_sql="
insert overwrite table dim_region_full
    partition (dt = '${datestr}')
select id,
       parent_id,
       name,
       dict_code,
       short_name
from ods_base_region_info_full
where dt = '${datestr}'
  and is_deleted = '0';
"

dim_express_courier_full_sql="
insert overwrite table tms.dim_express_courier_full
    partition (dt = '${datestr}')
select express_cor_info.id,
       emp_id,
       org_id,
       org_name,
       working_phone,
       express_type,
       dic_info.name express_type_name
from (select id,
             emp_id,
             org_id,
             md5(working_phone) working_phone,
             express_type
      from ods_express_courier_full
      where dt = '${datestr}'
        and is_deleted = '0') express_cor_info
         join (
    select id,
           org_name
    from ods_base_organ_full
    where dt = '${datestr}'
      and is_deleted = '0'
) organ_info
              on express_cor_info.org_id = organ_info.id
         join (
    select id,
           name
    from ods_base_dic_full
    where dt = '${datestr}'
      and is_deleted = '0'
) dic_info
              on express_type = dic_info.id;
"

dim_shift_full_sql="
insert overwrite table tms.dim_shift_full
    partition (dt = '${datestr}')
select shift_info.id,
       line_id,
       line_info.name line_name,
       line_no,
       line_level,
       org_id,
       transport_line_type_id,
       dic_info.name  transport_line_type_name,
       start_org_id,
       start_org_name,
       end_org_id,
       end_org_name,
       pair_line_id,
       distance,
       cost,
       estimated_time,
       start_time,
       driver1_emp_id,
       driver2_emp_id,
       truck_id,
       pair_shift_id
from (select id,
             line_id,
             start_time,
             driver1_emp_id,
             driver2_emp_id,
             truck_id,
             pair_shift_id
      from ods_line_base_shift_full
      where dt = '${datestr}'
        and is_deleted = '0') shift_info
         join
     (select id,
             name,
             line_no,
             line_level,
             org_id,
             transport_line_type_id,
             start_org_id,
             start_org_name,
             end_org_id,
             end_org_name,
             pair_line_id,
             distance,
             cost,
             estimated_time
      from ods_line_base_info_full
      where dt = '${datestr}'
        and is_deleted = '0') line_info
     on shift_info.line_id = line_info.id
         join (
    select id,
           name
    from ods_base_dic_full
    where dt = '${datestr}'
      and is_deleted = '0'
) dic_info on line_info.transport_line_type_id = dic_info.id;
"

dim_truck_driver_full_sql="
insert overwrite table tms.dim_truck_driver_full
    partition (dt = '${datestr}')
select driver_info.id,
       emp_id,
       org_id,
       organ_info.org_name,
       team_id,
       team_info.name team_name,
       license_type,
       init_license_date,
       expire_date,
       license_no,
       is_enabled
from (select id,
             emp_id,
             org_id,
             team_id,
             license_type,
             init_license_date,
             expire_date,
             license_no,
             is_enabled
      from ods_truck_driver_full
      where dt = '${datestr}'
        and is_deleted = '0') driver_info
         join (
    select id,
           org_name
    from ods_base_organ_full
    where dt = '${datestr}'
      and is_deleted = '0'
) organ_info
              on driver_info.org_id = organ_info.id
         join (
    select id,
           name
    from ods_truck_team_full
    where dt = '${datestr}'
      and is_deleted = '0'
) team_info
              on driver_info.team_id = team_info.id;
"

dim_truck_full_sql="
insert overwrite table tms.dim_truck_full
partition (dt = '${datestr}')
select truck_info.id,
       team_id,
       team_info.name     team_name,
       team_no,
       org_id,
       org_name,
       manager_emp_id,
       truck_no,
       truck_model_id,
       model_name         truck_model_name,
       model_type         truck_model_type,
       dic_for_type.name  truck_model_type_name,
       model_no           truck_model_no,
       brand              truck_brand,
       dic_for_brand.name truck_brand_name,
       truck_weight,
       load_weight,
       total_weight,
       eev,
       boxcar_len,
       boxcar_wd,
       boxcar_hg,
       max_speed,
       oil_vol,
       device_gps_id,
       engine_no,
       license_registration_date,
       license_last_check_date,
       license_expire_date,
       is_enabled
from (select id,
             team_id,

             md5(truck_no) truck_no,
             truck_model_id,

             device_gps_id,
             engine_no,
             license_registration_date,
             license_last_check_date,
             license_expire_date,
             is_enabled
      from ods_truck_info_full
      where dt = '${datestr}'
        and is_deleted = '0') truck_info
         join
     (select id,
             name,
             team_no,
             org_id,

             manager_emp_id
      from ods_truck_team_full
      where dt = '${datestr}'
        and is_deleted = '0') team_info
     on truck_info.team_id = team_info.id
         join
     (select id,
             model_name,
             model_type,

             model_no,
             brand,

             truck_weight,
             load_weight,
             total_weight,
             eev,
             boxcar_len,
             boxcar_wd,
             boxcar_hg,
             max_speed,
             oil_vol
      from ods_truck_model_full
      where dt = '${datestr}'
        and is_deleted = '0') model_info
     on truck_info.truck_model_id = model_info.id
         join
     (select id,
             org_name
      from ods_base_organ_full
      where dt = '${datestr}'
        and is_deleted = '0'
     ) organ_info
     on org_id = organ_info.id
         join
     (select id,
             name
      from ods_base_dic_full
      where dt = '${datestr}'
        and is_deleted = '0') dic_for_type
     on model_info.model_type = dic_for_type.id
         join
     (select id,
             name
      from ods_base_dic_full
      where dt = '${datestr}'
        and is_deleted = '0') dic_for_brand
     on model_info.brand = dic_for_brand.id;
"

dim_user_zip_sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table dim_user_zip
    partition (dt)
select id,
       login_name,
       nick_name,
       passwd,
       real_name,
       phone_num,
       email,
       user_level,
       birthday,
       gender,
       start_date,
       if(rk = 1, end_date, date_add('${datestr}', -1)) end_date,
       if(rk = 1, end_date, date_add('${datestr}', -1)) dt
from (select id,
             login_name,
             nick_name,
             passwd,
             real_name,
             phone_num,
             email,
             user_level,
             birthday,
             gender,
             start_date,
             end_date,
             row_number() over (partition by id order by start_date desc) rk
      from (select id,
                   login_name,
                   nick_name,
                   passwd,
                   real_name,
                   phone_num,
                   email,
                   user_level,
                   birthday,
                   gender,
                   start_date,
                   end_date
            from dim_user_zip
            where dt = '9999-12-31'
            union
            select id,
                   login_name,
                   nick_name,
                   md5(passwd)                                                                              passwd,
                   md5(real_name)                                                                           realname,
                   md5(if(phone_num regexp
                          '^(13[0-9]|14[01456879]|15[0-35-9]|16[2567]|17[0-8]|18[0-9]|19[0-35-9])\\d{8}$',
                          phone_num, null))                                                                 phone_num,
                   md5(if(email regexp '^[a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+(\\.[a-zA-Z0-9_-]+)+$', email, null)) email,
                   user_level,
                   cast(date_add('1970-01-01', cast(birthday as int)) as string)                            birthday,
                   gender,
                   '${datestr}'                                                                             start_date,
                   '9999-12-31'                                                                             end_date
            from (select after.id,
                         after.login_name,
                         after.nick_name,
                         after.passwd,
                         after.real_name,
                         after.phone_num,
                         after.email,
                         after.user_level,
                         after.birthday,
                         after.gender,
                         row_number() over (partition by after.id order by ts desc) rn
                  from ods_user_info_inc
                  where dt = '${datestr}'
                    and after.is_deleted = '0'
                 ) inc
            where rn = 1) full_info) final_info;
"

dim_user_address_zip_sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table dim_user_address_zip
    partition (dt)
select id,
       user_id,
       phone,
       province_id,
       city_id,
       district_id,
       complex_id,
       address,
       is_default,
       start_date,
       if(rk = 1, end_date, date_add('${datestr}', -1)) end_date,
       if(rk = 1, end_date, date_add('${datestr}', -1)) dt
from (select id,
             user_id,
             phone,
             province_id,
             city_id,
             district_id,
             complex_id,
             address,
             is_default,
             start_date,
             end_date,
             row_number() over (partition by id order by start_date desc) rk
      from (select id,
                   user_id,
                   phone,
                   province_id,
                   city_id,
                   district_id,
                   complex_id,
                   address,
                   is_default,
                   start_date,
                   end_date
            from dim_user_address_zip
            where dt = '9999-12-31'
            union
            select id,
                   user_id,
                   phone,
                   province_id,
                   city_id,
                   district_id,
                   complex_id,
                   address,
                   is_default,
                   '${datestr}' start_date,
                   '9999-12-31' end_date
            from (select after.id,
                         after.user_id,
                         after.phone,
                         after.province_id,
                         after.city_id,
                         after.district_id,
                         after.complex_id,
                         after.address,
                         cast(after.is_default as tinyint)                          is_default,
                         row_number() over (partition by after.id order by ts desc) rn
                  from ods_user_address_inc
                  where dt = '${datestr}'
                    and after.is_deleted = '0') inc
            where rn = 1
           ) union_info
     ) with_rk;
"

case $1 in
"all")
	/opt/module/hive/bin/hive -e "use tms;${dim_complex_full_sql}${dim_express_courier_full_sql}${dim_organ_full_sql}${dim_region_full_sql}${dim_shift_full_sql}${dim_truck_driver_full_sql}${dim_truck_full_sql}${dim_user_address_zip_sql}${dim_user_zip_sql}"
;;
"dim_complex_full")
    /opt/module/hive/bin/hive -e "use tms;${dim_complex_full_sql}"
;;
"dim_express_courier_full")
    /opt/module/hive/bin/hive -e "use tms;${dim_express_courier_full_sql}"
;;
"dim_organ_full")
    /opt/module/hive/bin/hive -e "use tms;${dim_organ_full_sql}"
;;
"dim_region_full")
    /opt/module/hive/bin/hive -e "use tms;${dim_region_full_sql}"
;;
"dim_shift_full")
    /opt/module/hive/bin/hive -e "use tms;${dim_shift_full_sql}"
;;
"dim_truck_driver_full")
    /opt/module/hive/bin/hive -e "use tms;${dim_truck_driver_full_sql}"
;;
"dim_truck_full")
    /opt/module/hive/bin/hive -e "use tms;${dim_truck_full_sql}"
;;
"dim_user_address_zip")
    /opt/module/hive/bin/hive -e "use tms;${dim_user_address_zip_sql}"
;;
"dim_user_zip")
    /opt/module/hive/bin/hive -e "use tms;${dim_user_zip_sql}"
;;
*)
	echo "表名输入错误..."
;;
esac

(3)增加脚本执行权限

[kingjw@hadoop102 bin]$ chmod +x ods_to_dim.sh 

(4)脚本用法

[kingjw@hadoop102 bin]$ ods_to_dim.sh all 2023-01-11

9、数仓开发之DWD层

DWD层设计要点:

(1)DWD层的设计依据是维度建模理论,该层存储维度模型的事实表。

(2)DWD层的数据存储格式为orc列式存储+snappy压缩。

(3)DWD层表名的命名规范为dwd_数据域_表名_单分区增量全量标识(inc/full/acc)

9.1、交易域下单事务事实表

9.1.1、建表语句

drop table if exists dwd_trade_order_detail_inc;
create external table dwd_trade_order_detail_inc(
  `id` bigint comment '运单明细ID',
  `order_id` string COMMENT '运单ID',
  `cargo_type` string COMMENT '货物类型ID',
  `cargo_type_name` string COMMENT '货物类型名称',
  `volumn_length` bigint COMMENT '长cm',
  `volumn_width` bigint COMMENT '宽cm',
  `volumn_height` bigint COMMENT '高cm',
  `weight` decimal(16,2) COMMENT '重量 kg',
  `order_time` string COMMENT '下单时间',
  `order_no` string COMMENT '运单号',
  `status` string COMMENT '运单状态',
  `status_name` string COMMENT '运单状态名称',
  `collect_type` string COMMENT '取件类型,1为网点自寄,2为上门取件',
  `collect_type_name` string COMMENT '取件类型名称',
  `user_id` bigint COMMENT '用户ID',
  `receiver_complex_id` bigint COMMENT '收件人小区id',
  `receiver_province_id` string COMMENT '收件人省份id',
  `receiver_city_id` string COMMENT '收件人城市id',
  `receiver_district_id` string COMMENT '收件人区县id',
  `receiver_name` string COMMENT '收件人姓名',
  `sender_complex_id` bigint COMMENT '发件人小区id',
  `sender_province_id` string COMMENT '发件人省份id',
  `sender_city_id` string COMMENT '发件人城市id',
  `sender_district_id` string COMMENT '发件人区县id',
  `sender_name` string COMMENT '发件人姓名',
  `cargo_num` bigint COMMENT '货物个数',
  `amount` decimal(16,2) COMMENT '金额',
  `estimate_arrive_time` string COMMENT '预计到达时间',
  `distance` decimal(16,2) COMMENT '距离,单位:公里',
  `ts` bigint COMMENT '时间戳'
) comment '交易域订单明细事务事实表'
    partitioned by (`dt` string comment '统计日期')
    stored as orc
    location '/warehouse/tms/dwd/dwd_trade_order_detail_inc'
    tblproperties('orc.compress' = 'snappy');

9.1.2、分区规划

9.1.3、数据装载

1)数据流向

2)首日装载

set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table tms.dwd_trade_order_detail_inc
    partition (dt)
select cargo.id,
       order_id,
       cargo_type,
       dic_for_cargo_type.name               cargo_type_name,
       volumn_length,
       volumn_width,
       volumn_height,
       weight,
       order_time,
       order_no,
       status,
       dic_for_status.name                   status_name,
       collect_type,
       dic_for_collect_type.name             collect_type_name,
       user_id,
       receiver_complex_id,
       receiver_province_id,
       receiver_city_id,
       receiver_district_id,
       receiver_name,
       sender_complex_id,
       sender_province_id,
       sender_city_id,
       sender_district_id,
       sender_name,
       cargo_num,
       amount,
       estimate_arrive_time,
       distance,
       ts,
       date_format(order_time, 'yyyy-MM-dd') dt
from (select after.id,
             after.order_id,
             after.cargo_type,
             after.volumn_length,
             after.volumn_width,
             after.volumn_height,
             after.weight,
             concat(substr(after.create_time, 1, 10), ' ', substr(after.create_time, 12, 8)) order_time,
             ts
      from ods_order_cargo_inc
      where dt = '2023-01-10'
        and after.is_deleted = '0') cargo
         join
     (select after.id,
             after.order_no,
             after.status,
             after.collect_type,
             after.user_id,
             after.receiver_complex_id,
             after.receiver_province_id,
             after.receiver_city_id,
             after.receiver_district_id,
             concat(substr(after.receiver_name, 1, 1), '*') receiver_name,
             after.sender_complex_id,
             after.sender_province_id,
             after.sender_city_id,
             after.sender_district_id,
             concat(substr(after.sender_name, 1, 1), '*')   sender_name,
             after.cargo_num,
             after.amount,
             date_format(from_utc_timestamp(
                                 cast(after.estimate_arrive_time as bigint), 'UTC'),
                         'yyyy-MM-dd HH:mm:ss')             estimate_arrive_time,
             after.distance
      from ods_order_info_inc
      where dt = '2023-01-10'
        and after.is_deleted = '0') info
     on cargo.order_id = info.id
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '2023-01-10'
        and is_deleted = '0') dic_for_cargo_type
     on cargo.cargo_type = cast(dic_for_cargo_type.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '2023-01-10'
        and is_deleted = '0') dic_for_status
     on info.status = cast(dic_for_status.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '2023-01-10'
        and is_deleted = '0') dic_for_collect_type
     on info.collect_type = cast(dic_for_cargo_type.id as string);

3)每日装载

insert overwrite table tms.dwd_trade_order_detail_inc
    partition (dt = '2023-01-11')
select cargo.id,
       order_id,
       cargo_type,
       dic_for_cargo_type.name   cargo_type_name,
       volumn_length,
       volumn_width,
       volumn_height,
       weight,
       order_time,
       order_no,
       status,
       dic_for_status.name       status_name,
       collect_type,
       dic_for_collect_type.name collect_type_name,
       user_id,
       receiver_complex_id,
       receiver_province_id,
       receiver_city_id,
       receiver_district_id,
       receiver_name,
       sender_complex_id,
       sender_province_id,
       sender_city_id,
       sender_district_id,
       sender_name,
       cargo_num,
       amount,
       estimate_arrive_time,
       distance,
       ts
from (select after.id,
             after.order_id,
             after.cargo_type,
             after.volumn_length,
             after.volumn_width,
             after.volumn_height,
             after.weight,
             date_format(
                     from_utc_timestamp(
                                 to_unix_timestamp(concat(substr(after.create_time, 1, 10), ' ',
                                                          substr(after.create_time, 12, 8))) * 1000,
                                 'GMT+8'), 'yyyy-MM-dd HH:mm:ss') order_time,
             ts
      from ods_order_cargo_inc
      where dt = '2023-01-11'
        and op = 'c') cargo
         join
     (select after.id,
             after.order_no,
             after.status,
             after.collect_type,
             after.user_id,
             after.receiver_complex_id,
             after.receiver_province_id,
             after.receiver_city_id,
             after.receiver_district_id,
             concat(substr(after.receiver_name, 1, 1), '*') receiver_name,
             after.sender_complex_id,
             after.sender_province_id,
             after.sender_city_id,
             after.sender_district_id,
             concat(substr(after.sender_name, 1, 1), '*')   sender_name,
             after.cargo_num,
             after.amount,
             date_format(from_utc_timestamp(
                                 cast(after.estimate_arrive_time as bigint), 'UTC'),
                         'yyyy-MM-dd HH:mm:ss')             estimate_arrive_time,
             after.distance
      from ods_order_info_inc
      where dt = '2023-01-11'
        and op = 'c') info
     on cargo.order_id = info.id
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '2023-01-11'
        and is_deleted = '0') dic_for_cargo_type
     on cargo.cargo_type = cast(dic_for_cargo_type.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '2023-01-11'
        and is_deleted = '0') dic_for_status
     on info.status = cast(dic_for_status.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '2023-01-11'
        and is_deleted = '0') dic_for_collect_type
     on info.collect_type = cast(dic_for_cargo_type.id as string);

9.2、数据装载脚本

9.2.1、首日装载脚本

1)在hadoop102的/home/kingjw/bin目录下创建ods_to_dwd_init.sh

[kingjw@hadoop102 bin]$ vim ods_to_dwd_init.sh 

2)编写如下内容

#!/bin/bash
#1、判断表名是否传入
if [ $# -lt 2 ]
then
	echo "必须传入all/表名与上线日期..."
	exit
fi

dwd_trade_order_detail_inc_sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table tms.dwd_trade_order_detail_inc
    partition (dt)
select cargo.id,
       order_id,
       cargo_type,
       dic_for_cargo_type.name               cargo_type_name,
       volume_length,
       volume_width,
       volume_height,
       weight,
       order_time,
       order_no,
       status,
       dic_for_status.name                   status_name,
       collect_type,
       dic_for_collect_type.name             collect_type_name,
       user_id,
       receiver_complex_id,
       receiver_province_id,
       receiver_city_id,
       receiver_district_id,
       receiver_name,
       sender_complex_id,
       sender_province_id,
       sender_city_id,
       sender_district_id,
       sender_name,
       cargo_num,
       amount,
       estimate_arrive_time,
       distance,
       ts,
       date_format(order_time, 'yyyy-MM-dd') dt
from (select after.id,
             after.order_id,
             after.cargo_type,
             after.volume_length,
             after.volume_width,
             after.volume_height,
             after.weight,
             concat(substr(after.create_time, 1, 10), ' ', substr(after.create_time, 12, 8)) order_time,
             ts
      from ods_order_cargo_inc
      where dt = '$2'
        and after.is_deleted = '0') cargo
         join
     (select after.id,
             after.order_no,
             after.status,
             after.collect_type,
             after.user_id,
             after.receiver_complex_id,
             after.receiver_province_id,
             after.receiver_city_id,
             after.receiver_district_id,
             concat(substr(after.receiver_name, 1, 1), '*') receiver_name,
             after.sender_complex_id,
             after.sender_province_id,
             after.sender_city_id,
             after.sender_district_id,
             concat(substr(after.sender_name, 1, 1), '*')   sender_name,
             after.cargo_num,
             after.amount,
             date_format(from_utc_timestamp(
                                 cast(after.estimate_arrive_time as bigint), 'UTC'),
                         'yyyy-MM-dd HH:mm:ss')             estimate_arrive_time,
             after.distance
      from ods_order_info_inc
      where dt = '$2'
        and after.is_deleted = '0') info
     on cargo.order_id = info.id
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_cargo_type
     on cargo.cargo_type = cast(dic_for_cargo_type.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_status
     on info.status = cast(dic_for_status.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_collect_type
     on info.collect_type = cast(dic_for_cargo_type.id as string);
"

dwd_trade_pay_suc_detail_inc_sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table tms.dwd_trade_pay_suc_detail_inc
    partition (dt)
select cargo.id,
       order_id,
       cargo_type,
       dic_for_cargo_type.name                 cargo_type_name,
       volume_length,
       volume_width,
       volume_height,
       weight,
       payment_time,
       order_no,
       status,
       dic_for_status.name                     status_name,
       collect_type,
       dic_for_collect_type.name               collect_type_name,
       user_id,
       receiver_complex_id,
       receiver_province_id,
       receiver_city_id,
       receiver_district_id,
       receiver_name,
       sender_complex_id,
       sender_province_id,
       sender_city_id,
       sender_district_id,
       sender_name,
       payment_type,
       dic_for_payment_type.name               payment_type_name,
       cargo_num,
       amount,
       estimate_arrive_time,
       distance,
       ts,
       date_format(payment_time, 'yyyy-MM-dd') dt
from (select after.id,
             after.order_id,
             after.cargo_type,
             after.volume_length,
             after.volume_width,
             after.volume_height,
             after.weight,
             ts
      from ods_order_cargo_inc
      where dt = '$2'
        and after.is_deleted = '0') cargo
         join
     (select after.id,
             after.order_no,
             after.status,
             after.collect_type,
             after.user_id,
             after.receiver_complex_id,
             after.receiver_province_id,
             after.receiver_city_id,
             after.receiver_district_id,
             concat(substr(after.receiver_name, 1, 1), '*')                                  receiver_name,
             after.sender_complex_id,
             after.sender_province_id,
             after.sender_city_id,
             after.sender_district_id,
             concat(substr(after.sender_name, 1, 1), '*')                                    sender_name,
             after.payment_type,
             after.cargo_num,
             after.amount,
             date_format(from_utc_timestamp(
                                 cast(after.estimate_arrive_time as bigint), 'UTC'),
                         'yyyy-MM-dd HH:mm:ss')                                              estimate_arrive_time,
             after.distance,
             concat(substr(after.update_time, 1, 10), ' ', substr(after.update_time, 12, 8)) payment_time
      from ods_order_info_inc
      where dt = '$2'
        and after.is_deleted = '0'
        and after.status <> '60010'
        and after.status <> '60999') info
     on cargo.order_id = info.id
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_cargo_type
     on cargo.cargo_type = cast(dic_for_cargo_type.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_status
     on info.status = cast(dic_for_status.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_collect_type
     on info.collect_type = cast(dic_for_cargo_type.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_payment_type
     on info.payment_type = cast(dic_for_payment_type.id as string);
"

dwd_trade_order_cancel_detail_inc_sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table tms.dwd_trade_order_cancel_detail_inc
    partition (dt)
select cargo.id,
       order_id,
       cargo_type,
       dic_for_cargo_type.name                 cargo_type_name,
       volume_length,
       volume_width,
       volume_height,
       weight,
       cancel_time,
       order_no,
       status,
       dic_for_status.name                     status_name,
       collect_type,
       dic_for_collect_type.name               collect_type_name,
       user_id,
       receiver_complex_id,
       receiver_province_id,
       receiver_city_id,
       receiver_district_id,
       receiver_name,
       sender_complex_id,
       sender_province_id,
       sender_city_id,
       sender_district_id,
       sender_name,
       cargo_num,
       amount,
       estimate_arrive_time,
       distance,
       ts,
       date_format(cancel_time, 'yyyy-MM-dd') dt
from (select after.id,
             after.order_id,
             after.cargo_type,
             after.volume_length,
             after.volume_width,
             after.volume_height,
             after.weight,
             ts
      from ods_order_cargo_inc
      where dt = '$2'
        and after.is_deleted = '0') cargo
         join
     (select after.id,
             after.order_no,
             after.status,
             after.collect_type,
             after.user_id,
             after.receiver_complex_id,
             after.receiver_province_id,
             after.receiver_city_id,
             after.receiver_district_id,
             concat(substr(after.receiver_name, 1, 1), '*')                                  receiver_name,
             after.sender_complex_id,
             after.sender_province_id,
             after.sender_city_id,
             after.sender_district_id,
             concat(substr(after.sender_name, 1, 1), '*')                                    sender_name,
             after.cargo_num,
             after.amount,
             date_format(from_utc_timestamp(
                                 cast(after.estimate_arrive_time as bigint), 'UTC'),
                         'yyyy-MM-dd HH:mm:ss')                                              estimate_arrive_time,
             after.distance,
             concat(substr(after.update_time, 1, 10), ' ', substr(after.update_time, 12, 8)) cancel_time
      from ods_order_info_inc
      where dt = '$2'
        and after.is_deleted = '0'
        and after.status = '60999') info
     on cargo.order_id = info.id
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_cargo_type
     on cargo.cargo_type = cast(dic_for_cargo_type.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_status
     on info.status = cast(dic_for_status.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_collect_type
     on info.collect_type = cast(dic_for_cargo_type.id as string);
"

dwd_trans_receive_detail_inc_sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table tms.dwd_trans_receive_detail_inc
    partition (dt)
select cargo.id,
       order_id,
       cargo_type,
       dic_for_cargo_type.name                 cargo_type_name,
       volume_length,
       volume_width,
       volume_height,
       weight,
       receive_time,
       order_no,
       status,
       dic_for_status.name                     status_name,
       collect_type,
       dic_for_collect_type.name               collect_type_name,
       user_id,
       receiver_complex_id,
       receiver_province_id,
       receiver_city_id,
       receiver_district_id,
       receiver_name,
       sender_complex_id,
       sender_province_id,
       sender_city_id,
       sender_district_id,
       sender_name,
       payment_type,
       dic_for_payment_type.name               payment_type_name,
       cargo_num,
       amount,
       estimate_arrive_time,
       distance,
       ts,
       date_format(receive_time, 'yyyy-MM-dd') dt
from (select after.id,
             after.order_id,
             after.cargo_type,
             after.volume_length,
             after.volume_width,
             after.volume_height,
             after.weight,
             ts
      from ods_order_cargo_inc
      where dt = '$2'
        and after.is_deleted = '0') cargo
         join
     (select after.id,
             after.order_no,
             after.status,
             after.collect_type,
             after.user_id,
             after.receiver_complex_id,
             after.receiver_province_id,
             after.receiver_city_id,
             after.receiver_district_id,
             concat(substr(after.receiver_name, 1, 1), '*')                                  receiver_name,
             after.sender_complex_id,
             after.sender_province_id,
             after.sender_city_id,
             after.sender_district_id,
             concat(substr(after.sender_name, 1, 1), '*')                                    sender_name,
             after.payment_type,
             after.cargo_num,
             after.amount,
             date_format(from_utc_timestamp(
                                 cast(after.estimate_arrive_time as bigint), 'UTC'),
                         'yyyy-MM-dd HH:mm:ss')                                              estimate_arrive_time,
             after.distance,
             concat(substr(after.update_time, 1, 10), ' ', substr(after.update_time, 12, 8)) receive_time
      from ods_order_info_inc
      where dt = '$2'
        and after.is_deleted = '0'
        and after.status <> '60010'
        and after.status <> '60020'
        and after.status <> '60999') info
     on cargo.order_id = info.id
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_cargo_type
     on cargo.cargo_type = cast(dic_for_cargo_type.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_status
     on info.status = cast(dic_for_status.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_collect_type
     on info.collect_type = cast(dic_for_cargo_type.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_payment_type
     on info.payment_type = cast(dic_for_payment_type.id as string);
"

dwd_trans_dispatch_detail_inc_sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table tms.dwd_trans_dispatch_detail_inc
    partition (dt)
select cargo.id,
       order_id,
       cargo_type,
       dic_for_cargo_type.name                 cargo_type_name,
       volume_length,
       volume_width,
       volume_height,
       weight,
       dispatch_time,
       order_no,
       status,
       dic_for_status.name                     status_name,
       collect_type,
       dic_for_collect_type.name               collect_type_name,
       user_id,
       receiver_complex_id,
       receiver_province_id,
       receiver_city_id,
       receiver_district_id,
       receiver_name,
       sender_complex_id,
       sender_province_id,
       sender_city_id,
       sender_district_id,
       sender_name,
       payment_type,
       dic_for_payment_type.name               payment_type_name,
       cargo_num,
       amount,
       estimate_arrive_time,
       distance,
       ts,
       date_format(dispatch_time, 'yyyy-MM-dd') dt
from (select after.id,
             after.order_id,
             after.cargo_type,
             after.volume_length,
             after.volume_width,
             after.volume_height,
             after.weight,
             ts
      from ods_order_cargo_inc
      where dt = '$2'
        and after.is_deleted = '0') cargo
         join
     (select after.id,
             after.order_no,
             after.status,
             after.collect_type,
             after.user_id,
             after.receiver_complex_id,
             after.receiver_province_id,
             after.receiver_city_id,
             after.receiver_district_id,
             concat(substr(after.receiver_name, 1, 1), '*')                                  receiver_name,
             after.sender_complex_id,
             after.sender_province_id,
             after.sender_city_id,
             after.sender_district_id,
             concat(substr(after.sender_name, 1, 1), '*')                                    sender_name,
             after.payment_type,
             after.cargo_num,
             after.amount,
             date_format(from_utc_timestamp(
                                 cast(after.estimate_arrive_time as bigint), 'UTC'),
                         'yyyy-MM-dd HH:mm:ss')                                              estimate_arrive_time,
             after.distance,
             concat(substr(after.update_time, 1, 10), ' ', substr(after.update_time, 12, 8)) dispatch_time
      from ods_order_info_inc
      where dt = '$2'
        and after.is_deleted = '0'
        and after.status <> '60010'
        and after.status <> '60020'
        and after.status <> '60030'
        and after.status <> '60040'
        and after.status <> '60999') info
     on cargo.order_id = info.id
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_cargo_type
     on cargo.cargo_type = cast(dic_for_cargo_type.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_status
     on info.status = cast(dic_for_status.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_collect_type
     on info.collect_type = cast(dic_for_cargo_type.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_payment_type
     on info.payment_type = cast(dic_for_payment_type.id as string);
"

dwd_trans_bound_finish_detail_inc_sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table tms.dwd_trans_bound_finish_detail_inc
    partition (dt)
select cargo.id,
       order_id,
       cargo_type,
       dic_for_cargo_type.name                 cargo_type_name,
       volume_length,
       volume_width,
       volume_height,
       weight,
       bound_finish_time,
       order_no,
       status,
       dic_for_status.name                     status_name,
       collect_type,
       dic_for_collect_type.name               collect_type_name,
       user_id,
       receiver_complex_id,
       receiver_province_id,
       receiver_city_id,
       receiver_district_id,
       receiver_name,
       sender_complex_id,
       sender_province_id,
       sender_city_id,
       sender_district_id,
       sender_name,
       payment_type,
       dic_for_payment_type.name               payment_type_name,
       cargo_num,
       amount,
       estimate_arrive_time,
       distance,
       ts,
       date_format(bound_finish_time, 'yyyy-MM-dd') dt
from (select after.id,
             after.order_id,
             after.cargo_type,
             after.volume_length,
             after.volume_width,
             after.volume_height,
             after.weight,
             ts
      from ods_order_cargo_inc
      where dt = '$2'
        and after.is_deleted = '0') cargo
         join
     (select after.id,
             after.order_no,
             after.status,
             after.collect_type,
             after.user_id,
             after.receiver_complex_id,
             after.receiver_province_id,
             after.receiver_city_id,
             after.receiver_district_id,
             concat(substr(after.receiver_name, 1, 1), '*')                                  receiver_name,
             after.sender_complex_id,
             after.sender_province_id,
             after.sender_city_id,
             after.sender_district_id,
             concat(substr(after.sender_name, 1, 1), '*')                                    sender_name,
             after.payment_type,
             after.cargo_num,
             after.amount,
             date_format(from_utc_timestamp(
                                 cast(after.estimate_arrive_time as bigint), 'UTC'),
                         'yyyy-MM-dd HH:mm:ss')                                              estimate_arrive_time,
             after.distance,
             concat(substr(after.update_time, 1, 10), ' ', substr(after.update_time, 12, 8)) bound_finish_time
      from ods_order_info_inc
      where dt = '$2'
        and after.is_deleted = '0'
        and after.status <> '60010'
        and after.status <> '60020'
        and after.status <> '60030'
        and after.status <> '60040'
        and after.status <> '60050'
        and after.status <> '60999') info
     on cargo.order_id = info.id
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_cargo_type
     on cargo.cargo_type = cast(dic_for_cargo_type.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_status
     on info.status = cast(dic_for_status.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_collect_type
     on info.collect_type = cast(dic_for_cargo_type.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_payment_type
     on info.payment_type = cast(dic_for_payment_type.id as string);
"

dwd_trans_deliver_suc_detail_inc_sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table tms.dwd_trans_deliver_suc_detail_inc
    partition (dt)
select cargo.id,
       order_id,
       cargo_type,
       dic_for_cargo_type.name                 cargo_type_name,
       volume_length,
       volume_width,
       volume_height,
       weight,
       deliver_suc_time,
       order_no,
       status,
       dic_for_status.name                     status_name,
       collect_type,
       dic_for_collect_type.name               collect_type_name,
       user_id,
       receiver_complex_id,
       receiver_province_id,
       receiver_city_id,
       receiver_district_id,
       receiver_name,
       sender_complex_id,
       sender_province_id,
       sender_city_id,
       sender_district_id,
       sender_name,
       payment_type,
       dic_for_payment_type.name               payment_type_name,
       cargo_num,
       amount,
       estimate_arrive_time,
       distance,
       ts,
       date_format(deliver_suc_time, 'yyyy-MM-dd') dt
from (select after.id,
             after.order_id,
             after.cargo_type,
             after.volume_length,
             after.volume_width,
             after.volume_height,
             after.weight,
             ts
      from ods_order_cargo_inc
      where dt = '$2'
        and after.is_deleted = '0') cargo
         join
     (select after.id,
             after.order_no,
             after.status,
             after.collect_type,
             after.user_id,
             after.receiver_complex_id,
             after.receiver_province_id,
             after.receiver_city_id,
             after.receiver_district_id,
             concat(substr(after.receiver_name, 1, 1), '*')                                  receiver_name,
             after.sender_complex_id,
             after.sender_province_id,
             after.sender_city_id,
             after.sender_district_id,
             concat(substr(after.sender_name, 1, 1), '*')                                    sender_name,
             after.payment_type,
             after.cargo_num,
             after.amount,
             date_format(from_utc_timestamp(
                                 cast(after.estimate_arrive_time as bigint), 'UTC'),
                         'yyyy-MM-dd HH:mm:ss')                                              estimate_arrive_time,
             after.distance,
             concat(substr(after.update_time, 1, 10), ' ', substr(after.update_time, 12, 8)) deliver_suc_time
      from ods_order_info_inc
      where dt = '$2'
        and after.is_deleted = '0'
        and after.status <> '60010'
        and after.status <> '60020'
        and after.status <> '60030'
        and after.status <> '60040'
        and after.status <> '60050'
        and after.status <> '60060'
        and after.status <> '60999') info
     on cargo.order_id = info.id
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_cargo_type
     on cargo.cargo_type = cast(dic_for_cargo_type.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_status
     on info.status = cast(dic_for_status.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_collect_type
     on info.collect_type = cast(dic_for_cargo_type.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_payment_type
     on info.payment_type = cast(dic_for_payment_type.id as string);
"

dwd_trans_sign_detail_inc_sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table tms.dwd_trans_sign_detail_inc
    partition (dt)
select cargo.id,
       order_id,
       cargo_type,
       dic_for_cargo_type.name                 cargo_type_name,
       volume_length,
       volume_width,
       volume_height,
       weight,
       sign_time,
       order_no,
       status,
       dic_for_status.name                     status_name,
       collect_type,
       dic_for_collect_type.name               collect_type_name,
       user_id,
       receiver_complex_id,
       receiver_province_id,
       receiver_city_id,
       receiver_district_id,
       receiver_name,
       sender_complex_id,
       sender_province_id,
       sender_city_id,
       sender_district_id,
       sender_name,
       payment_type,
       dic_for_payment_type.name               payment_type_name,
       cargo_num,
       amount,
       estimate_arrive_time,
       distance,
       ts,
       date_format(sign_time, 'yyyy-MM-dd') dt
from (select after.id,
             after.order_id,
             after.cargo_type,
             after.volume_length,
             after.volume_width,
             after.volume_height,
             after.weight,
             ts
      from ods_order_cargo_inc
      where dt = '$2'
        and after.is_deleted = '0') cargo
         join
     (select after.id,
             after.order_no,
             after.status,
             after.collect_type,
             after.user_id,
             after.receiver_complex_id,
             after.receiver_province_id,
             after.receiver_city_id,
             after.receiver_district_id,
             concat(substr(after.receiver_name, 1, 1), '*')                                  receiver_name,
             after.sender_complex_id,
             after.sender_province_id,
             after.sender_city_id,
             after.sender_district_id,
             concat(substr(after.sender_name, 1, 1), '*')                                    sender_name,
             after.payment_type,
             after.cargo_num,
             after.amount,
             date_format(from_utc_timestamp(
                                 cast(after.estimate_arrive_time as bigint), 'UTC'),
                         'yyyy-MM-dd HH:mm:ss')                                              estimate_arrive_time,
             after.distance,
             concat(substr(after.update_time, 1, 10), ' ', substr(after.update_time, 12, 8)) sign_time
      from ods_order_info_inc
      where dt = '$2'
        and after.is_deleted = '0'
        and after.status <> '60010'
        and after.status <> '60020'
        and after.status <> '60030'
        and after.status <> '60040'
        and after.status <> '60050'
        and after.status <> '60060'
        and after.status <> '60070'
        and after.status <> '60999') info
     on cargo.order_id = info.id
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_cargo_type
     on cargo.cargo_type = cast(dic_for_cargo_type.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_status
     on info.status = cast(dic_for_status.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_collect_type
     on info.collect_type = cast(dic_for_cargo_type.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_payment_type
     on info.payment_type = cast(dic_for_payment_type.id as string);
"

dwd_trade_order_process_inc_sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table tms.dwd_trade_order_process_inc
    partition (dt)
select cargo.id,
       order_id,
       cargo_type,
       dic_for_cargo_type.name               cargo_type_name,
       volume_length,
       volume_width,
       volume_height,
       weight,
       order_time,
       order_no,
       status,
       dic_for_status.name                   status_name,
       collect_type,
       dic_for_collect_type.name             collect_type_name,
       user_id,
       receiver_complex_id,
       receiver_province_id,
       receiver_city_id,
       receiver_district_id,
       receiver_name,
       sender_complex_id,
       sender_province_id,
       sender_city_id,
       sender_district_id,
       sender_name,
       payment_type,
       dic_for_payment_type.name             payment_type_name,
       cargo_num,
       amount,
       estimate_arrive_time,
       distance,
       ts,
       date_format(order_time, 'yyyy-MM-dd') start_date,
       end_date,
       end_date                              dt
from (select after.id,
             after.order_id,
             after.cargo_type,
             after.volume_length,
             after.volume_width,
             after.volume_height,
             after.weight,
             concat(substr(after.create_time, 1, 10), ' ', substr(after.create_time, 12, 8)) order_time,
             ts
      from ods_order_cargo_inc
      where dt = '$2'
        and after.is_deleted = '0') cargo
         join
     (select after.id,
             after.order_no,
             after.status,
             after.collect_type,
             after.user_id,
             after.receiver_complex_id,
             after.receiver_province_id,
             after.receiver_city_id,
             after.receiver_district_id,
             concat(substr(after.receiver_name, 1, 1), '*') receiver_name,
             after.sender_complex_id,
             after.sender_province_id,
             after.sender_city_id,
             after.sender_district_id,
             concat(substr(after.sender_name, 1, 1), '*')   sender_name,
             after.payment_type,
             after.cargo_num,
             after.amount,
             date_format(from_utc_timestamp(
                                 cast(after.estimate_arrive_time as bigint), 'UTC'),
                         'yyyy-MM-dd HH:mm:ss')             estimate_arrive_time,
             after.distance,
             if(after.status = '60080' or
                after.status = '60999',
                concat(substr(after.update_time, 1, 10)),
                '9999-12-31')                               end_date
      from ods_order_info_inc
      where dt = '$2'
        and after.is_deleted = '0') info
     on cargo.order_id = info.id
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_cargo_type
     on cargo.cargo_type = cast(dic_for_cargo_type.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_status
     on info.status = cast(dic_for_status.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_collect_type
     on info.collect_type = cast(dic_for_cargo_type.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '$2'
        and is_deleted = '0') dic_for_payment_type
     on info.payment_type = cast(dic_for_payment_type.id as string);
"

dwd_trans_trans_finish_inc_sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table dwd_trans_trans_finish_inc
    partition (dt)
select info.id,
       shift_id,
       line_id,
       start_org_id,
       start_org_name,
       end_org_id,
       end_org_name,
       order_num,
       driver1_emp_id,
       driver1_name,
       driver2_emp_id,
       driver2_name,
       truck_id,
       truck_no,
       actual_start_time,
       actual_end_time,
       from_unixtime( (to_unix_timestamp(actual_start_time) + estimated_time*60)) estimate_end_time,
       actual_distance,
       finish_dur_sec,
       ts,
       dt
from (select after.id,
             after.shift_id,
             after.line_id,
             after.start_org_id,
             after.start_org_name,
             after.end_org_id,
             after.end_org_name,
             after.order_num,
             after.driver1_emp_id,
             concat(substr(after.driver1_name, 1, 1), '*')                                            driver1_name,
             after.driver2_emp_id,
             concat(substr(after.driver2_name, 1, 1), '*')                                            driver2_name,
             after.truck_id,
             md5(after.truck_no)                                                                      truck_no,
             date_format(from_utc_timestamp(
                                 cast(after.actual_start_time as bigint), 'UTC'),
                         'yyyy-MM-dd HH:mm:ss')                                                       actual_start_time,
             date_format(from_utc_timestamp(
                                 cast(after.actual_end_time as bigint), 'UTC'),
                         'yyyy-MM-dd HH:mm:ss')                                                       actual_end_time,

             after.actual_distance,
             (cast(after.actual_end_time as bigint) - cast(after.actual_start_time as bigint)) / 1000 finish_dur_sec,
             ts,
             date_format(from_utc_timestamp(
                                 cast(after.actual_end_time as bigint), 'UTC'),
                         'yyyy-MM-dd')                                                                dt
      from ods_transport_task_inc
      where dt = '$2'
        and after.is_deleted = '0'
        and after.actual_end_time is not null) info
         left join
     (select id,
             estimated_time
      from dim_shift_full
      where dt = '$2') dim_tb
     on info.shift_id = dim_tb.id;
"

dwd_bound_inbound_inc_sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table dwd_bound_inbound_inc
    partition (dt)
select after.id,
       after.order_id,
       after.org_id,
       date_format(from_utc_timestamp(
                           cast(after.inbound_time as bigint), 'UTC'),
                   'yyyy-MM-dd HH:mm:ss') inbound_time,
       after.inbound_emp_id,
       date_format(from_utc_timestamp(
                           cast(after.inbound_time as bigint), 'UTC'),
                   'yyyy-MM-dd')          dt
from ods_order_org_bound_inc
where dt = '$2';
"

dwd_bound_sort_inc_sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table dwd_bound_sort_inc
    partition (dt)
select after.id,
       after.order_id,
       after.org_id,
       date_format(from_utc_timestamp(
                           cast(after.sort_time as bigint), 'UTC'),
                   'yyyy-MM-dd HH:mm:ss') sort_time,
       after.sorter_emp_id,
       date_format(from_utc_timestamp(
                           cast(after.sort_time as bigint), 'UTC'),
                   'yyyy-MM-dd')          dt
from ods_order_org_bound_inc
where dt = '$2'
  and after.sort_time is not null;
"

dwd_bound_outbound_inc_sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table dwd_bound_outbound_inc
    partition (dt)
select after.id,
       after.order_id,
       after.org_id,
       date_format(from_utc_timestamp(
                           cast(after.outbound_time as bigint), 'UTC'),
                   'yyyy-MM-dd HH:mm:ss') outbound_time,
       after.outbound_emp_id,
       date_format(from_utc_timestamp(
                           cast(after.outbound_time as bigint), 'UTC'),
                   'yyyy-MM-dd')          dt
from ods_order_org_bound_inc
where dt = '$2'
  and after.outbound_time is not null;
"
#2、根据表名匹配加载数据
case $1 in
"all")
	/opt/module/hive/bin/hive -e "use tms;${dwd_bound_inbound_inc_sql}${dwd_bound_outbound_inc_sql}${dwd_bound_sort_inc_sql}${dwd_trade_order_cancel_detail_inc_sql}${dwd_trade_order_detail_inc_sql}${dwd_trade_order_process_inc_sql}${dwd_trade_pay_suc_detail_inc_sql}${dwd_trans_bound_finish_detail_inc_sql}${dwd_trans_deliver_suc_detail_inc_sql}${dwd_trans_dispatch_detail_inc_sql}${dwd_trans_receive_detail_inc_sql}${dwd_trans_sign_detail_inc_sql}${dwd_trans_trans_finish_inc_sql}
"
;;
"dwd_bound_inbound_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_bound_inbound_inc_sql}"
;;
"dwd_bound_outbound_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_bound_outbound_inc_sql}"
;;
"dwd_bound_sort_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_bound_sort_inc_sql}"
;;
"dwd_trade_order_cancel_detail_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_trade_order_cancel_detail_inc_sql}"
;;
"dwd_trade_order_detail_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_trade_order_detail_inc_sql}"
;;
"dwd_trade_order_process_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_trade_order_process_inc_sql}"
;;
"dwd_trade_pay_suc_detail_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_trade_pay_suc_detail_inc_sql}"
;;
"dwd_trans_bound_finish_detail_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_trans_bound_finish_detail_inc_sql}"
;;
"dwd_trans_deliver_suc_detail_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_trans_deliver_suc_detail_inc_sql}"
;;
"dwd_trans_dispatch_detail_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_trans_dispatch_detail_inc_sql}"
;;
"dwd_trans_receive_detail_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_trans_receive_detail_inc_sql}"
;;
"dwd_trans_sign_detail_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_trans_sign_detail_inc_sql}"
;;
"dwd_trans_trans_finish_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_trans_trans_finish_inc_sql}"
;;
*)
	echo "表名输入错误..."
;;
esac

3)增加脚本执行权限

[kingjw@hadoop102 bin]$ chmod +x ods_to_dwd_init.sh

4)脚本用法

[kingjw@hadoop102 bin]$ ods_to_dwd_init.sh all 2023-01-10

9.2.2、每日装载脚本

1)在hadoop102的/home/kingjw/bin目录下创建ods_to_dwd.sh

[kingjw@hadoop102 bin]$ vim ods_to_dwd.sh 

2)编写如下内容

#!/bin/bash
#1、判断表名是否传入
if [ $# -lt 1 ]
then
	echo "必须传入all/表名..."
	exit
fi

#2、判断日期是否传入,传入则记载指定日期数据,否则加载前一天数据
[ "$2" ] && datestr=$2 || datestr=$(date -d '-1 day' +%F)

dwd_trade_order_detail_inc_sql="
insert overwrite table tms.dwd_trade_order_detail_inc
    partition (dt = '${datestr}')
select cargo.id,
       order_id,
       cargo_type,
       dic_for_cargo_type.name   cargo_type_name,
       volume_length,
       volume_width,
       volume_height,
       weight,
       order_time,
       order_no,
       status,
       dic_for_status.name       status_name,
       collect_type,
       dic_for_collect_type.name collect_type_name,
       user_id,
       receiver_complex_id,
       receiver_province_id,
       receiver_city_id,
       receiver_district_id,
       receiver_name,
       sender_complex_id,
       sender_province_id,
       sender_city_id,
       sender_district_id,
       sender_name,
       cargo_num,
       amount,
       estimate_arrive_time,
       distance,
       ts
from (select after.id,
             after.order_id,
             after.cargo_type,
             after.volume_length,
             after.volume_width,
             after.volume_height,
             after.weight,
             date_format(
                     from_utc_timestamp(
                                 to_unix_timestamp(concat(substr(after.create_time, 1, 10), ' ',
                                                          substr(after.create_time, 12, 8))) * 1000,
                                 'GMT+8'), 'yyyy-MM-dd HH:mm:ss') order_time,
             ts
      from ods_order_cargo_inc
      where dt = '${datestr}'
        and op = 'c') cargo
         join
     (select after.id,
             after.order_no,
             after.status,
             after.collect_type,
             after.user_id,
             after.receiver_complex_id,
             after.receiver_province_id,
             after.receiver_city_id,
             after.receiver_district_id,
             concat(substr(after.receiver_name, 1, 1), '*') receiver_name,
             after.sender_complex_id,
             after.sender_province_id,
             after.sender_city_id,
             after.sender_district_id,
             concat(substr(after.sender_name, 1, 1), '*')   sender_name,
             after.cargo_num,
             after.amount,
             date_format(from_utc_timestamp(
                                 cast(after.estimate_arrive_time as bigint), 'UTC'),
                         'yyyy-MM-dd HH:mm:ss')             estimate_arrive_time,
             after.distance
      from ods_order_info_inc
      where dt = '${datestr}'
        and op = 'c') info
     on cargo.order_id = info.id
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '${datestr}'
        and is_deleted = '0') dic_for_cargo_type
     on cargo.cargo_type = cast(dic_for_cargo_type.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '${datestr}'
        and is_deleted = '0') dic_for_status
     on info.status = cast(dic_for_status.id as string)
         left join
     (select id,
             name
      from ods_base_dic_full
      where dt = '${datestr}'
        and is_deleted = '0') dic_for_collect_type
     on info.collect_type = cast(dic_for_cargo_type.id as string);
"

dwd_trade_pay_suc_detail_inc_sql="
with pay_info
         as
         (select without_status.id,
                 order_no,
                 status,
                 dic_for_status.name status_name,
                 collect_type,
                 user_id,
                 receiver_complex_id,
                 receiver_province_id,
                 receiver_city_id,
                 receiver_district_id,
                 receiver_name,
                 sender_complex_id,
                 sender_province_id,
                 sender_city_id,
                 sender_district_id,
                 sender_name,
                 payment_type,
                 dic_type_name.name  payment_type_name,
                 cargo_num,
                 amount,
                 estimate_arrive_time,
                 distance,
                 payment_time,
                 ts
          from (select after.id,
                       after.order_no,
                       after.status,
                       after.collect_type,
                       after.user_id,
                       after.receiver_complex_id,
                       after.receiver_province_id,
                       after.receiver_city_id,
                       after.receiver_district_id,
                       concat(substr(after.receiver_name, 1, 1), '*')       receiver_name,
                       after.sender_complex_id,
                       after.sender_province_id,
                       after.sender_city_id,
                       after.sender_district_id,
                       concat(substr(after.sender_name, 1, 1), '*')         sender_name,
                       after.payment_type,
                       after.cargo_num,
                       after.amount,
                       date_format(from_utc_timestamp(
                                           cast(after.estimate_arrive_time as bigint), 'UTC'),
                                   'yyyy-MM-dd HH:mm:ss')                   estimate_arrive_time,
                       after.distance,
                       date_format(
                               from_utc_timestamp(
                                           to_unix_timestamp(concat(substr(after.update_time, 1, 10), ' ',
                                                                    substr(after.update_time, 12, 8))) * 1000,
                                           'GMT+8'), 'yyyy-MM-dd HH:mm:ss') payment_time,
                       ts
                from ods_order_info_inc
                where dt = '${datestr}'
                  and op = 'u'
                  and before.status = '60010'
                  and after.status = '60020'
                  and after.is_deleted = '0') without_status
                   left join
               (select id,
                       name
                from ods_base_dic_full
                where dt = '${datestr}'
                  and is_deleted = '0') dic_for_status
               on without_status.status = cast(dic_for_status.id as string)
                   left join
               (select id,
                       name
                from ods_base_dic_full
                where dt = '${datestr}'
                  and is_deleted = '0') dic_type_name
               on without_status.payment_type = cast(dic_type_name.id as string)),
     order_info
         as (
         select id,
                order_id,
                cargo_type,
                cargo_type_name,
                volume_length,
                volume_width,
                volume_height,
                weight,
                order_time,
                order_no,
                status,
                status_name,
                collect_type,
                collect_type_name,
                user_id,
                receiver_complex_id,
                receiver_province_id,
                receiver_city_id,
                receiver_district_id,
                receiver_name,
                sender_complex_id,
                sender_province_id,
                sender_city_id,
                sender_district_id,
                sender_name,
                payment_type,
                payment_type_name,
                cargo_num,
                amount,
                estimate_arrive_time,
                distance
         from dwd_trade_order_process_inc
         where dt = '9999-12-31'
           and status = '60010'
         union
         select cargo.id,
                order_id,
                cargo_type,
                dic_for_cargo_type.name   cargo_type_name,
                volume_length,
                volume_width,
                volume_height,
                weight,
                order_time,
                order_no,
                status,
                dic_for_status.name       status_name,
                collect_type,
                dic_for_collect_type.name collect_type_name,
                user_id,
                receiver_complex_id,
                receiver_province_id,
                receiver_city_id,
                receiver_district_id,
                receiver_name,
                sender_complex_id,
                sender_province_id,
                sender_city_id,
                sender_district_id,
                sender_name,
                ''                        payment_type,
                ''                        payment_type_name,
                cargo_num,
                amount,
                estimate_arrive_time,
                distance
         from (select after.id,
                      after.order_id,
                      after.cargo_type,
                      after.volume_length,
                      after.volume_width,
                      after.volume_height,
                      after.weight,
                      date_format(
                              from_utc_timestamp(
                                          to_unix_timestamp(concat(substr(after.create_time, 1, 10), ' ',
                                                                   substr(after.create_time, 12, 8))) * 1000,
                                          'GMT+8'), 'yyyy-MM-dd HH:mm:ss') order_time,
                      ts
               from ods_order_cargo_inc
               where dt = '${datestr}'
                 and op = 'c') cargo
                  join
              (select after.id,
                      after.order_no,
                      after.status,
                      after.collect_type,
                      after.user_id,
                      after.receiver_complex_id,
                      after.receiver_province_id,
                      after.receiver_city_id,
                      after.receiver_district_id,
                      concat(substr(after.receiver_name, 1, 1), '*') receiver_name,
                      after.sender_complex_id,
                      after.sender_province_id,
                      after.sender_city_id,
                      after.sender_district_id,
                      concat(substr(after.sender_name, 1, 1), '*')   sender_name,
                      after.cargo_num,
                      after.amount,
                      date_format(from_utc_timestamp(
                                          cast(after.estimate_arrive_time as bigint), 'UTC'),
                                  'yyyy-MM-dd HH:mm:ss')             estimate_arrive_time,
                      after.distance
               from ods_order_info_inc
               where dt = '${datestr}'
                 and op = 'c') info
              on cargo.order_id = info.id
                  left join
              (select id,
                      name
               from ods_base_dic_full
               where dt = '${datestr}'
                 and is_deleted = '0') dic_for_cargo_type
              on cargo.cargo_type = cast(dic_for_cargo_type.id as string)
                  left join
              (select id,
                      name
               from ods_base_dic_full
               where dt = '${datestr}'
                 and is_deleted = '0') dic_for_status
              on info.status = cast(dic_for_status.id as string)
                  left join
              (select id,
                      name
               from ods_base_dic_full
               where dt = '${datestr}'
                 and is_deleted = '0') dic_for_collect_type
              on info.collect_type = cast(dic_for_cargo_type.id as string))
insert overwrite table tms.dwd_trade_pay_suc_detail_inc
    partition(dt = '${datestr}')
select order_info.id,
       order_id,
       cargo_type,
       cargo_type_name,
       volume_length,
       volume_width,
       volume_height,
       weight,
       pay_info.payment_time,
       order_info.order_no,
       pay_info.status,
       pay_info.status_name,
       order_info.collect_type,
       collect_type_name,
       order_info.user_id,
       order_info.receiver_complex_id,
       order_info.receiver_province_id,
       order_info.receiver_city_id,
       order_info.receiver_district_id,
       order_info.receiver_name,
       order_info.sender_complex_id,
       order_info.sender_province_id,
       order_info.sender_city_id,
       order_info.sender_district_id,
       order_info.sender_name,
       pay_info.payment_type,
       pay_info.payment_type_name,
       order_info.cargo_num,
       order_info.amount,
       order_info.estimate_arrive_time,
       order_info.distance,
       pay_info.ts
from pay_info
         join order_info
              on pay_info.id = order_info.order_id;
"

dwd_trade_order_cancel_detail_inc_sql="
with cancel_info
         as
         (select without_status.id,
                 order_no,
                 status,
                 dic_for_status.name status_name,
                 collect_type,
                 user_id,
                 receiver_complex_id,
                 receiver_province_id,
                 receiver_city_id,
                 receiver_district_id,
                 receiver_name,
                 sender_complex_id,
                 sender_province_id,
                 sender_city_id,
                 sender_district_id,
                 sender_name,
                 cargo_num,
                 amount,
                 estimate_arrive_time,
                 distance,
                 cancel_time,
                 ts
          from (select after.id,
                       after.order_no,
                       after.status,
                       after.collect_type,
                       after.user_id,
                       after.receiver_complex_id,
                       after.receiver_province_id,
                       after.receiver_city_id,
                       after.receiver_district_id,
                       concat(substr(after.receiver_name, 1, 1), '*')       receiver_name,
                       after.sender_complex_id,
                       after.sender_province_id,
                       after.sender_city_id,
                       after.sender_district_id,
                       concat(substr(after.sender_name, 1, 1), '*')         sender_name,
                       after.payment_type,
                       after.cargo_num,
                       after.amount,
                       date_format(from_utc_timestamp(
                                           cast(after.estimate_arrive_time as bigint), 'UTC'),
                                   'yyyy-MM-dd HH:mm:ss')                   estimate_arrive_time,
                       after.distance,
                       date_format(
                               from_utc_timestamp(
                                           to_unix_timestamp(concat(substr(after.update_time, 1, 10), ' ',
                                                                    substr(after.update_time, 12, 8))) * 1000,
                                           'GMT+8'), 'yyyy-MM-dd HH:mm:ss') cancel_time,
                       ts
                from ods_order_info_inc
                where dt = '${datestr}'
                  and op = 'u'
                  and after.status = '60999'
                  and after.is_deleted = '0') without_status
                   left join
               (select id,
                       name
                from ods_base_dic_full
                where dt = '${datestr}'
                  and is_deleted = '0') dic_for_status
               on without_status.status = cast(dic_for_status.id as string)),
     order_info
         as (
         select id,
                order_id,
                cargo_type,
                cargo_type_name,
                volume_length,
                volume_width,
                volume_height,
                weight,
                order_time,
                order_no,
                status,
                status_name,
                collect_type,
                collect_type_name,
                user_id,
                receiver_complex_id,
                receiver_province_id,
                receiver_city_id,
                receiver_district_id,
                receiver_name,
                sender_complex_id,
                sender_province_id,
                sender_city_id,
                sender_district_id,
                sender_name,
                cargo_num,
                amount,
                estimate_arrive_time,
                distance
         from dwd_trade_order_process_inc
         where dt = '9999-12-31'
         union
         select cargo.id,
                order_id,
                cargo_type,
                dic_for_cargo_type.name   cargo_type_name,
                volume_length,
                volume_width,
                volume_height,
                weight,
                order_time,
                order_no,
                status,
                dic_for_status.name       status_name,
                collect_type,
                dic_for_collect_type.name collect_type_name,
                user_id,
                receiver_complex_id,
                receiver_province_id,
                receiver_city_id,
                receiver_district_id,
                receiver_name,
                sender_complex_id,
                sender_province_id,
                sender_city_id,
                sender_district_id,
                sender_name,
                cargo_num,
                amount,
                estimate_arrive_time,
                distance
         from (select after.id,
                      after.order_id,
                      after.cargo_type,
                      after.volume_length,
                      after.volume_width,
                      after.volume_height,
                      after.weight,
                      date_format(
                              from_utc_timestamp(
                                          to_unix_timestamp(concat(substr(after.create_time, 1, 10), ' ',
                                                                   substr(after.create_time, 12, 8))) * 1000,
                                          'GMT+8'), 'yyyy-MM-dd HH:mm:ss') order_time,
                      ts
               from ods_order_cargo_inc
               where dt = '${datestr}'
                 and op = 'c') cargo
                  join
              (select after.id,
                      after.order_no,
                      after.status,
                      after.collect_type,
                      after.user_id,
                      after.receiver_complex_id,
                      after.receiver_province_id,
                      after.receiver_city_id,
                      after.receiver_district_id,
                      concat(substr(after.receiver_name, 1, 1), '*') receiver_name,
                      after.sender_complex_id,
                      after.sender_province_id,
                      after.sender_city_id,
                      after.sender_district_id,
                      concat(substr(after.sender_name, 1, 1), '*')   sender_name,
                      after.cargo_num,
                      after.amount,
                      date_format(from_utc_timestamp(
                                          cast(after.estimate_arrive_time as bigint), 'UTC'),
                                  'yyyy-MM-dd HH:mm:ss')             estimate_arrive_time,
                      after.distance
               from ods_order_info_inc
               where dt = '${datestr}'
                 and op = 'c') info
              on cargo.order_id = info.id
                  left join
              (select id,
                      name
               from ods_base_dic_full
               where dt = '${datestr}'
                 and is_deleted = '0') dic_for_cargo_type
              on cargo.cargo_type = cast(dic_for_cargo_type.id as string)
                  left join
              (select id,
                      name
               from ods_base_dic_full
               where dt = '${datestr}'
                 and is_deleted = '0') dic_for_status
              on info.status = cast(dic_for_status.id as string)
                  left join
              (select id,
                      name
               from ods_base_dic_full
               where dt = '${datestr}'
                 and is_deleted = '0') dic_for_collect_type
              on info.collect_type = cast(dic_for_cargo_type.id as string))
insert overwrite table tms.dwd_trade_order_cancel_detail_inc
    partition(dt = '${datestr}')
select order_info.id,
       order_id,
       cargo_type,
       cargo_type_name,
       volume_length,
       volume_width,
       volume_height,
       weight,
       cancel_info.cancel_time,
       order_info.order_no,
       cancel_info.status,
       cancel_info.status_name,
       order_info.collect_type,
       collect_type_name,
       order_info.user_id,
       order_info.receiver_complex_id,
       order_info.receiver_province_id,
       order_info.receiver_city_id,
       order_info.receiver_district_id,
       order_info.receiver_name,
       order_info.sender_complex_id,
       order_info.sender_province_id,
       order_info.sender_city_id,
       order_info.sender_district_id,
       order_info.sender_name,
       order_info.cargo_num,
       order_info.amount,
       order_info.estimate_arrive_time,
       order_info.distance,
       cancel_info.ts
from cancel_info
         join order_info
              on cancel_info.id = order_info.order_id;
"

dwd_trans_receive_detail_inc_sql="
with receive_info
         as
         (select without_status.id,
                 order_no,
                 status,
                 dic_for_status.name status_name,
                 collect_type,
                 user_id,
                 receiver_complex_id,
                 receiver_province_id,
                 receiver_city_id,
                 receiver_district_id,
                 receiver_name,
                 sender_complex_id,
                 sender_province_id,
                 sender_city_id,
                 sender_district_id,
                 sender_name,
                 payment_type,
                 dic_type_name.name  payment_type_name,
                 cargo_num,
                 amount,
                 estimate_arrive_time,
                 distance,
                 receive_time,
                 ts
          from (select after.id,
                       after.order_no,
                       after.status,
                       after.collect_type,
                       after.user_id,
                       after.receiver_complex_id,
                       after.receiver_province_id,
                       after.receiver_city_id,
                       after.receiver_district_id,
                       concat(substr(after.receiver_name, 1, 1), '*')       receiver_name,
                       after.sender_complex_id,
                       after.sender_province_id,
                       after.sender_city_id,
                       after.sender_district_id,
                       concat(substr(after.sender_name, 1, 1), '*')         sender_name,
                       after.payment_type,
                       after.cargo_num,
                       after.amount,
                       date_format(from_utc_timestamp(
                                           cast(after.estimate_arrive_time as bigint), 'UTC'),
                                   'yyyy-MM-dd HH:mm:ss')                   estimate_arrive_time,
                       after.distance,
                       date_format(
                               from_utc_timestamp(
                                           to_unix_timestamp(concat(substr(after.update_time, 1, 10), ' ',
                                                                    substr(after.update_time, 12, 8))) * 1000,
                                           'GMT+8'), 'yyyy-MM-dd HH:mm:ss') receive_time,
                       ts
                from ods_order_info_inc
                where dt = '${datestr}'
                  and op = 'u'
                  and before.status = '60020'
                  and after.status = '60030'
                  and after.is_deleted = '0') without_status
                   left join
               (select id,
                       name
                from ods_base_dic_full
                where dt = '${datestr}'
                  and is_deleted = '0') dic_for_status
               on without_status.status = cast(dic_for_status.id as string)
                   left join
               (select id,
                       name
                from ods_base_dic_full
                where dt = '${datestr}'
                  and is_deleted = '0') dic_type_name
               on without_status.payment_type = cast(dic_type_name.id as string)),
     order_info
         as (
         select id,
                order_id,
                cargo_type,
                cargo_type_name,
                volume_length,
                volume_width,
                volume_height,
                weight,
                order_time,
                order_no,
                status,
                status_name,
                collect_type,
                collect_type_name,
                user_id,
                receiver_complex_id,
                receiver_province_id,
                receiver_city_id,
                receiver_district_id,
                receiver_name,
                sender_complex_id,
                sender_province_id,
                sender_city_id,
                sender_district_id,
                sender_name,
                payment_type,
                payment_type_name,
                cargo_num,
                amount,
                estimate_arrive_time,
                distance
         from dwd_trade_order_process_inc
         where dt = '9999-12-31'
           and (status = '60010' or
                status = '60020')
         union
         select cargo.id,
                order_id,
                cargo_type,
                dic_for_cargo_type.name   cargo_type_name,
                volume_length,
                volume_width,
                volume_height,
                weight,
                order_time,
                order_no,
                status,
                dic_for_status.name       status_name,
                collect_type,
                dic_for_collect_type.name collect_type_name,
                user_id,
                receiver_complex_id,
                receiver_province_id,
                receiver_city_id,
                receiver_district_id,
                receiver_name,
                sender_complex_id,
                sender_province_id,
                sender_city_id,
                sender_district_id,
                sender_name,
                ''                        payment_type,
                ''                        payment_type_name,
                cargo_num,
                amount,
                estimate_arrive_time,
                distance
         from (select after.id,
                      after.order_id,
                      after.cargo_type,
                      after.volume_length,
                      after.volume_width,
                      after.volume_height,
                      after.weight,
                      date_format(
                              from_utc_timestamp(
                                          to_unix_timestamp(concat(substr(after.create_time, 1, 10), ' ',
                                                                   substr(after.create_time, 12, 8))) * 1000,
                                          'GMT+8'), 'yyyy-MM-dd HH:mm:ss') order_time,
                      ts
               from ods_order_cargo_inc
               where dt = '${datestr}'
                 and op = 'c') cargo
                  join
              (select after.id,
                      after.order_no,
                      after.status,
                      after.collect_type,
                      after.user_id,
                      after.receiver_complex_id,
                      after.receiver_province_id,
                      after.receiver_city_id,
                      after.receiver_district_id,
                      concat(substr(after.receiver_name, 1, 1), '*') receiver_name,
                      after.sender_complex_id,
                      after.sender_province_id,
                      after.sender_city_id,
                      after.sender_district_id,
                      concat(substr(after.sender_name, 1, 1), '*')   sender_name,
                      after.cargo_num,
                      after.amount,
                      date_format(from_utc_timestamp(
                                          cast(after.estimate_arrive_time as bigint), 'UTC'),
                                  'yyyy-MM-dd HH:mm:ss')             estimate_arrive_time,
                      after.distance
               from ods_order_info_inc
               where dt = '${datestr}'
                 and op = 'c') info
              on cargo.order_id = info.id
                  left join
              (select id,
                      name
               from ods_base_dic_full
               where dt = '${datestr}'
                 and is_deleted = '0') dic_for_cargo_type
              on cargo.cargo_type = cast(dic_for_cargo_type.id as string)
                  left join
              (select id,
                      name
               from ods_base_dic_full
               where dt = '${datestr}'
                 and is_deleted = '0') dic_for_status
              on info.status = cast(dic_for_status.id as string)
                  left join
              (select id,
                      name
               from ods_base_dic_full
               where dt = '${datestr}'
                 and is_deleted = '0') dic_for_collect_type
              on info.collect_type = cast(dic_for_cargo_type.id as string))
insert overwrite table tms.dwd_trans_receive_detail_inc
    partition(dt = '${datestr}')
select order_info.id,
       order_id,
       cargo_type,
       cargo_type_name,
       volume_length,
       volume_width,
       volume_height,
       weight,
       receive_info.receive_time,
       order_info.order_no,
       receive_info.status,
       receive_info.status_name,
       order_info.collect_type,
       collect_type_name,
       order_info.user_id,
       order_info.receiver_complex_id,
       order_info.receiver_province_id,
       order_info.receiver_city_id,
       order_info.receiver_district_id,
       order_info.receiver_name,
       order_info.sender_complex_id,
       order_info.sender_province_id,
       order_info.sender_city_id,
       order_info.sender_district_id,
       order_info.sender_name,
       receive_info.payment_type,
       receive_info.payment_type_name,
       order_info.cargo_num,
       order_info.amount,
       order_info.estimate_arrive_time,
       order_info.distance,
       receive_info.ts
from receive_info
         join order_info
              on receive_info.id = order_info.order_id;
"

dwd_trans_dispatch_detail_inc_sql="
with dispatch_info
         as
         (select without_status.id,
                 order_no,
                 status,
                 dic_for_status.name status_name,
                 collect_type,
                 user_id,
                 receiver_complex_id,
                 receiver_province_id,
                 receiver_city_id,
                 receiver_district_id,
                 receiver_name,
                 sender_complex_id,
                 sender_province_id,
                 sender_city_id,
                 sender_district_id,
                 sender_name,
                 payment_type,
                 dic_type_name.name  payment_type_name,
                 cargo_num,
                 amount,
                 estimate_arrive_time,
                 distance,
                 dispatch_time,
                 ts
          from (select after.id,
                       after.order_no,
                       after.status,
                       after.collect_type,
                       after.user_id,
                       after.receiver_complex_id,
                       after.receiver_province_id,
                       after.receiver_city_id,
                       after.receiver_district_id,
                       concat(substr(after.receiver_name, 1, 1), '*')       receiver_name,
                       after.sender_complex_id,
                       after.sender_province_id,
                       after.sender_city_id,
                       after.sender_district_id,
                       concat(substr(after.sender_name, 1, 1), '*')         sender_name,
                       after.payment_type,
                       after.cargo_num,
                       after.amount,
                       date_format(from_utc_timestamp(
                                           cast(after.estimate_arrive_time as bigint), 'UTC'),
                                   'yyyy-MM-dd HH:mm:ss')                   estimate_arrive_time,
                       after.distance,
                       date_format(
                               from_utc_timestamp(
                                           to_unix_timestamp(concat(substr(after.update_time, 1, 10), ' ',
                                                                    substr(after.update_time, 12, 8))) * 1000,
                                           'GMT+8'), 'yyyy-MM-dd HH:mm:ss') dispatch_time,
                       ts
                from ods_order_info_inc
                where dt = '${datestr}'
                  and op = 'u'
                  and before.status = '60040'
                  and after.status = '60050'
                  and after.is_deleted = '0') without_status
                   left join
               (select id,
                       name
                from ods_base_dic_full
                where dt = '${datestr}'
                  and is_deleted = '0') dic_for_status
               on without_status.status = cast(dic_for_status.id as string)
                   left join
               (select id,
                       name
                from ods_base_dic_full
                where dt = '${datestr}'
                  and is_deleted = '0') dic_type_name
               on without_status.payment_type = cast(dic_type_name.id as string)),
     order_info
         as (
         select id,
                order_id,
                cargo_type,
                cargo_type_name,
                volume_length,
                volume_width,
                volume_height,
                weight,
                order_time,
                order_no,
                status,
                status_name,
                collect_type,
                collect_type_name,
                user_id,
                receiver_complex_id,
                receiver_province_id,
                receiver_city_id,
                receiver_district_id,
                receiver_name,
                sender_complex_id,
                sender_province_id,
                sender_city_id,
                sender_district_id,
                sender_name,
                payment_type,
                payment_type_name,
                cargo_num,
                amount,
                estimate_arrive_time,
                distance
         from dwd_trade_order_process_inc
         where dt = '9999-12-31'
           and (status = '60010' or
                status = '60020' or
                status = '60030' or
                status = '60040')
         union
         select cargo.id,
                order_id,
                cargo_type,
                dic_for_cargo_type.name   cargo_type_name,
                volume_length,
                volume_width,
                volume_height,
                weight,
                order_time,
                order_no,
                status,
                dic_for_status.name       status_name,
                collect_type,
                dic_for_collect_type.name collect_type_name,
                user_id,
                receiver_complex_id,
                receiver_province_id,
                receiver_city_id,
                receiver_district_id,
                receiver_name,
                sender_complex_id,
                sender_province_id,
                sender_city_id,
                sender_district_id,
                sender_name,
                ''                        payment_type,
                ''                        payment_type_name,
                cargo_num,
                amount,
                estimate_arrive_time,
                distance
         from (select after.id,
                      after.order_id,
                      after.cargo_type,
                      after.volume_length,
                      after.volume_width,
                      after.volume_height,
                      after.weight,
                      date_format(
                              from_utc_timestamp(
                                          to_unix_timestamp(concat(substr(after.create_time, 1, 10), ' ',
                                                                   substr(after.create_time, 12, 8))) * 1000,
                                          'GMT+8'), 'yyyy-MM-dd HH:mm:ss') order_time,
                      ts
               from ods_order_cargo_inc
               where dt = '${datestr}'
                 and op = 'c') cargo
                  join
              (select after.id,
                      after.order_no,
                      after.status,
                      after.collect_type,
                      after.user_id,
                      after.receiver_complex_id,
                      after.receiver_province_id,
                      after.receiver_city_id,
                      after.receiver_district_id,
                      concat(substr(after.receiver_name, 1, 1), '*') receiver_name,
                      after.sender_complex_id,
                      after.sender_province_id,
                      after.sender_city_id,
                      after.sender_district_id,
                      concat(substr(after.sender_name, 1, 1), '*')   sender_name,
                      after.cargo_num,
                      after.amount,
                      date_format(from_utc_timestamp(
                                          cast(after.estimate_arrive_time as bigint), 'UTC'),
                                  'yyyy-MM-dd HH:mm:ss')             estimate_arrive_time,
                      after.distance
               from ods_order_info_inc
               where dt = '${datestr}'
                 and op = 'c') info
              on cargo.order_id = info.id
                  left join
              (select id,
                      name
               from ods_base_dic_full
               where dt = '${datestr}'
                 and is_deleted = '0') dic_for_cargo_type
              on cargo.cargo_type = cast(dic_for_cargo_type.id as string)
                  left join
              (select id,
                      name
               from ods_base_dic_full
               where dt = '${datestr}'
                 and is_deleted = '0') dic_for_status
              on info.status = cast(dic_for_status.id as string)
                  left join
              (select id,
                      name
               from ods_base_dic_full
               where dt = '${datestr}'
                 and is_deleted = '0') dic_for_collect_type
              on info.collect_type = cast(dic_for_cargo_type.id as string))
insert overwrite table tms.dwd_trans_dispatch_detail_inc
    partition(dt = '${datestr}')
select order_info.id,
       order_id,
       cargo_type,
       cargo_type_name,
       volume_length,
       volume_width,
       volume_height,
       weight,
       dispatch_info.dispatch_time,
       order_info.order_no,
       dispatch_info.status,
       dispatch_info.status_name,
       order_info.collect_type,
       collect_type_name,
       order_info.user_id,
       order_info.receiver_complex_id,
       order_info.receiver_province_id,
       order_info.receiver_city_id,
       order_info.receiver_district_id,
       order_info.receiver_name,
       order_info.sender_complex_id,
       order_info.sender_province_id,
       order_info.sender_city_id,
       order_info.sender_district_id,
       order_info.sender_name,
       dispatch_info.payment_type,
       dispatch_info.payment_type_name,
       order_info.cargo_num,
       order_info.amount,
       order_info.estimate_arrive_time,
       order_info.distance,
       dispatch_info.ts
from dispatch_info
         join order_info
              on dispatch_info.id = order_info.order_id;
"

dwd_trans_bound_finish_detail_inc_sql="
with bound_finish_info
         as
         (select without_status.id,
                 order_no,
                 status,
                 dic_for_status.name status_name,
                 collect_type,
                 user_id,
                 receiver_complex_id,
                 receiver_province_id,
                 receiver_city_id,
                 receiver_district_id,
                 receiver_name,
                 sender_complex_id,
                 sender_province_id,
                 sender_city_id,
                 sender_district_id,
                 sender_name,
                 payment_type,
                 dic_type_name.name  payment_type_name,
                 cargo_num,
                 amount,
                 estimate_arrive_time,
                 distance,
                 bound_finish_time,
                 ts
          from (select after.id,
                       after.order_no,
                       after.status,
                       after.collect_type,
                       after.user_id,
                       after.receiver_complex_id,
                       after.receiver_province_id,
                       after.receiver_city_id,
                       after.receiver_district_id,
                       concat(substr(after.receiver_name, 1, 1), '*')       receiver_name,
                       after.sender_complex_id,
                       after.sender_province_id,
                       after.sender_city_id,
                       after.sender_district_id,
                       concat(substr(after.sender_name, 1, 1), '*')         sender_name,
                       after.payment_type,
                       after.cargo_num,
                       after.amount,
                       date_format(from_utc_timestamp(
                                           cast(after.estimate_arrive_time as bigint), 'UTC'),
                                   'yyyy-MM-dd HH:mm:ss')                   estimate_arrive_time,
                       after.distance,
                       date_format(
                               from_utc_timestamp(
                                           to_unix_timestamp(concat(substr(after.update_time, 1, 10), ' ',
                                                                    substr(after.update_time, 12, 8))) * 1000,
                                           'GMT+8'), 'yyyy-MM-dd HH:mm:ss') bound_finish_time,
                       ts
                from ods_order_info_inc
                where dt = '${datestr}'
                  and op = 'u'
                  and before.status = '60050'
                  and after.status = '60060'
                  and after.is_deleted = '0') without_status
                   left join
               (select id,
                       name
                from ods_base_dic_full
                where dt = '${datestr}'
                  and is_deleted = '0') dic_for_status
               on without_status.status = cast(dic_for_status.id as string)
                   left join
               (select id,
                       name
                from ods_base_dic_full
                where dt = '${datestr}'
                  and is_deleted = '0') dic_type_name
               on without_status.payment_type = cast(dic_type_name.id as string)),
     order_info
         as (
         select id,
                order_id,
                cargo_type,
                cargo_type_name,
                volume_length,
                volume_width,
                volume_height,
                weight,
                order_time,
                order_no,
                status,
                status_name,
                collect_type,
                collect_type_name,
                user_id,
                receiver_complex_id,
                receiver_province_id,
                receiver_city_id,
                receiver_district_id,
                receiver_name,
                sender_complex_id,
                sender_province_id,
                sender_city_id,
                sender_district_id,
                sender_name,
                payment_type,
                payment_type_name,
                cargo_num,
                amount,
                estimate_arrive_time,
                distance
         from dwd_trade_order_process_inc
         where dt = '9999-12-31'
           and (status = '60010' or
                status = '60020' or
                status = '60030' or
                status = '60040' or
                status = '60050')
         union
         select cargo.id,
                order_id,
                cargo_type,
                dic_for_cargo_type.name   cargo_type_name,
                volume_length,
                volume_width,
                volume_height,
                weight,
                order_time,
                order_no,
                status,
                dic_for_status.name       status_name,
                collect_type,
                dic_for_collect_type.name collect_type_name,
                user_id,
                receiver_complex_id,
                receiver_province_id,
                receiver_city_id,
                receiver_district_id,
                receiver_name,
                sender_complex_id,
                sender_province_id,
                sender_city_id,
                sender_district_id,
                sender_name,
                ''                        payment_type,
                ''                        payment_type_name,
                cargo_num,
                amount,
                estimate_arrive_time,
                distance
         from (select after.id,
                      after.order_id,
                      after.cargo_type,
                      after.volume_length,
                      after.volume_width,
                      after.volume_height,
                      after.weight,
                      date_format(
                              from_utc_timestamp(
                                          to_unix_timestamp(concat(substr(after.create_time, 1, 10), ' ',
                                                                   substr(after.create_time, 12, 8))) * 1000,
                                          'GMT+8'), 'yyyy-MM-dd HH:mm:ss') order_time,
                      ts
               from ods_order_cargo_inc
               where dt = '${datestr}'
                 and op = 'c') cargo
                  join
              (select after.id,
                      after.order_no,
                      after.status,
                      after.collect_type,
                      after.user_id,
                      after.receiver_complex_id,
                      after.receiver_province_id,
                      after.receiver_city_id,
                      after.receiver_district_id,
                      concat(substr(after.receiver_name, 1, 1), '*') receiver_name,
                      after.sender_complex_id,
                      after.sender_province_id,
                      after.sender_city_id,
                      after.sender_district_id,
                      concat(substr(after.sender_name, 1, 1), '*')   sender_name,
                      after.cargo_num,
                      after.amount,
                      date_format(from_utc_timestamp(
                                          cast(after.estimate_arrive_time as bigint), 'UTC'),
                                  'yyyy-MM-dd HH:mm:ss')             estimate_arrive_time,
                      after.distance
               from ods_order_info_inc
               where dt = '${datestr}'
                 and op = 'c') info
              on cargo.order_id = info.id
                  left join
              (select id,
                      name
               from ods_base_dic_full
               where dt = '${datestr}'
                 and is_deleted = '0') dic_for_cargo_type
              on cargo.cargo_type = cast(dic_for_cargo_type.id as string)
                  left join
              (select id,
                      name
               from ods_base_dic_full
               where dt = '${datestr}'
                 and is_deleted = '0') dic_for_status
              on info.status = cast(dic_for_status.id as string)
                  left join
              (select id,
                      name
               from ods_base_dic_full
               where dt = '${datestr}'
                 and is_deleted = '0') dic_for_collect_type
              on info.collect_type = cast(dic_for_cargo_type.id as string))
insert overwrite table tms.dwd_trans_bound_finish_detail_inc
    partition(dt = '${datestr}')
select order_info.id,
       order_id,
       cargo_type,
       cargo_type_name,
       volume_length,
       volume_width,
       volume_height,
       weight,
       bound_finish_info.bound_finish_time,
       order_info.order_no,
       bound_finish_info.status,
       bound_finish_info.status_name,
       order_info.collect_type,
       collect_type_name,
       order_info.user_id,
       order_info.receiver_complex_id,
       order_info.receiver_province_id,
       order_info.receiver_city_id,
       order_info.receiver_district_id,
       order_info.receiver_name,
       order_info.sender_complex_id,
       order_info.sender_province_id,
       order_info.sender_city_id,
       order_info.sender_district_id,
       order_info.sender_name,
       bound_finish_info.payment_type,
       bound_finish_info.payment_type_name,
       order_info.cargo_num,
       order_info.amount,
       order_info.estimate_arrive_time,
       order_info.distance,
       bound_finish_info.ts
from bound_finish_info
         join order_info
              on bound_finish_info.id = order_info.order_id;
"

dwd_trans_deliver_suc_detail_inc_sql="
with deliver_suc_info
         as
         (select without_status.id,
                 order_no,
                 status,
                 dic_for_status.name status_name,
                 collect_type,
                 user_id,
                 receiver_complex_id,
                 receiver_province_id,
                 receiver_city_id,
                 receiver_district_id,
                 receiver_name,
                 sender_complex_id,
                 sender_province_id,
                 sender_city_id,
                 sender_district_id,
                 sender_name,
                 payment_type,
                 dic_type_name.name  payment_type_name,
                 cargo_num,
                 amount,
                 estimate_arrive_time,
                 distance,
                 deliver_suc_time,
                 ts
          from (select after.id,
                       after.order_no,
                       after.status,
                       after.collect_type,
                       after.user_id,
                       after.receiver_complex_id,
                       after.receiver_province_id,
                       after.receiver_city_id,
                       after.receiver_district_id,
                       concat(substr(after.receiver_name, 1, 1), '*')       receiver_name,
                       after.sender_complex_id,
                       after.sender_province_id,
                       after.sender_city_id,
                       after.sender_district_id,
                       concat(substr(after.sender_name, 1, 1), '*')         sender_name,
                       after.payment_type,
                       after.cargo_num,
                       after.amount,
                       date_format(from_utc_timestamp(
                                           cast(after.estimate_arrive_time as bigint), 'UTC'),
                                   'yyyy-MM-dd HH:mm:ss')                   estimate_arrive_time,
                       after.distance,
                       date_format(
                               from_utc_timestamp(
                                           to_unix_timestamp(concat(substr(after.update_time, 1, 10), ' ',
                                                                    substr(after.update_time, 12, 8))) * 1000,
                                           'GMT+8'), 'yyyy-MM-dd HH:mm:ss') deliver_suc_time,
                       ts
                from ods_order_info_inc
                where dt = '${datestr}'
                  and op = 'u'
                  and before.status = '60060'
                  and after.status = '60070'
                  and after.is_deleted = '0') without_status
                   left join
               (select id,
                       name
                from ods_base_dic_full
                where dt = '${datestr}'
                  and is_deleted = '0') dic_for_status
               on without_status.status = cast(dic_for_status.id as string)
                   left join
               (select id,
                       name
                from ods_base_dic_full
                where dt = '${datestr}'
                  and is_deleted = '0') dic_type_name
               on without_status.payment_type = cast(dic_type_name.id as string)),
     order_info
         as (
         select id,
                order_id,
                cargo_type,
                cargo_type_name,
                volume_length,
                volume_width,
                volume_height,
                weight,
                order_time,
                order_no,
                status,
                status_name,
                collect_type,
                collect_type_name,
                user_id,
                receiver_complex_id,
                receiver_province_id,
                receiver_city_id,
                receiver_district_id,
                receiver_name,
                sender_complex_id,
                sender_province_id,
                sender_city_id,
                sender_district_id,
                sender_name,
                payment_type,
                payment_type_name,
                cargo_num,
                amount,
                estimate_arrive_time,
                distance
         from dwd_trade_order_process_inc
         where dt = '9999-12-31'
           and (status = '60010' or
                status = '60020' or
                status = '60030' or
                status = '60040' or
                status = '60050' or
                status = '60060')
         union
         select cargo.id,
                order_id,
                cargo_type,
                dic_for_cargo_type.name   cargo_type_name,
                volume_length,
                volume_width,
                volume_height,
                weight,
                order_time,
                order_no,
                status,
                dic_for_status.name       status_name,
                collect_type,
                dic_for_collect_type.name collect_type_name,
                user_id,
                receiver_complex_id,
                receiver_province_id,
                receiver_city_id,
                receiver_district_id,
                receiver_name,
                sender_complex_id,
                sender_province_id,
                sender_city_id,
                sender_district_id,
                sender_name,
                ''                        payment_type,
                ''                        payment_type_name,
                cargo_num,
                amount,
                estimate_arrive_time,
                distance
         from (select after.id,
                      after.order_id,
                      after.cargo_type,
                      after.volume_length,
                      after.volume_width,
                      after.volume_height,
                      after.weight,
                      date_format(
                              from_utc_timestamp(
                                          to_unix_timestamp(concat(substr(after.create_time, 1, 10), ' ',
                                                                   substr(after.create_time, 12, 8))) * 1000,
                                          'GMT+8'), 'yyyy-MM-dd HH:mm:ss') order_time,
                      ts
               from ods_order_cargo_inc
               where dt = '${datestr}'
                 and op = 'c') cargo
                  join
              (select after.id,
                      after.order_no,
                      after.status,
                      after.collect_type,
                      after.user_id,
                      after.receiver_complex_id,
                      after.receiver_province_id,
                      after.receiver_city_id,
                      after.receiver_district_id,
                      concat(substr(after.receiver_name, 1, 1), '*') receiver_name,
                      after.sender_complex_id,
                      after.sender_province_id,
                      after.sender_city_id,
                      after.sender_district_id,
                      concat(substr(after.sender_name, 1, 1), '*')   sender_name,
                      after.cargo_num,
                      after.amount,
                      date_format(from_utc_timestamp(
                                          cast(after.estimate_arrive_time as bigint), 'UTC'),
                                  'yyyy-MM-dd HH:mm:ss')             estimate_arrive_time,
                      after.distance
               from ods_order_info_inc
               where dt = '${datestr}'
                 and op = 'c') info
              on cargo.order_id = info.id
                  left join
              (select id,
                      name
               from ods_base_dic_full
               where dt = '${datestr}'
                 and is_deleted = '0') dic_for_cargo_type
              on cargo.cargo_type = cast(dic_for_cargo_type.id as string)
                  left join
              (select id,
                      name
               from ods_base_dic_full
               where dt = '${datestr}'
                 and is_deleted = '0') dic_for_status
              on info.status = cast(dic_for_status.id as string)
                  left join
              (select id,
                      name
               from ods_base_dic_full
               where dt = '${datestr}'
                 and is_deleted = '0') dic_for_collect_type
              on info.collect_type = cast(dic_for_cargo_type.id as string))
insert overwrite table tms.dwd_trans_deliver_suc_detail_inc
    partition(dt = '${datestr}')
select order_info.id,
       order_id,
       cargo_type,
       cargo_type_name,
       volume_length,
       volume_width,
       volume_height,
       weight,
       deliver_suc_info.deliver_suc_time,
       order_info.order_no,
       deliver_suc_info.status,
       deliver_suc_info.status_name,
       order_info.collect_type,
       collect_type_name,
       order_info.user_id,
       order_info.receiver_complex_id,
       order_info.receiver_province_id,
       order_info.receiver_city_id,
       order_info.receiver_district_id,
       order_info.receiver_name,
       order_info.sender_complex_id,
       order_info.sender_province_id,
       order_info.sender_city_id,
       order_info.sender_district_id,
       order_info.sender_name,
       deliver_suc_info.payment_type,
       deliver_suc_info.payment_type_name,
       order_info.cargo_num,
       order_info.amount,
       order_info.estimate_arrive_time,
       order_info.distance,
       deliver_suc_info.ts
from deliver_suc_info
         join order_info
              on deliver_suc_info.id = order_info.order_id;
"

dwd_trans_sign_detail_inc_sql="
with sign_info
         as
         (select without_status.id,
                 order_no,
                 status,
                 dic_for_status.name status_name,
                 collect_type,
                 user_id,
                 receiver_complex_id,
                 receiver_province_id,
                 receiver_city_id,
                 receiver_district_id,
                 receiver_name,
                 sender_complex_id,
                 sender_province_id,
                 sender_city_id,
                 sender_district_id,
                 sender_name,
                 payment_type,
                 dic_type_name.name  payment_type_name,
                 cargo_num,
                 amount,
                 estimate_arrive_time,
                 distance,
                 sign_time,
                 ts
          from (select after.id,
                       after.order_no,
                       after.status,
                       after.collect_type,
                       after.user_id,
                       after.receiver_complex_id,
                       after.receiver_province_id,
                       after.receiver_city_id,
                       after.receiver_district_id,
                       concat(substr(after.receiver_name, 1, 1), '*')       receiver_name,
                       after.sender_complex_id,
                       after.sender_province_id,
                       after.sender_city_id,
                       after.sender_district_id,
                       concat(substr(after.sender_name, 1, 1), '*')         sender_name,
                       after.payment_type,
                       after.cargo_num,
                       after.amount,
                       date_format(from_utc_timestamp(
                                           cast(after.estimate_arrive_time as bigint), 'UTC'),
                                   'yyyy-MM-dd HH:mm:ss')                   estimate_arrive_time,
                       after.distance,
                       date_format(
                               from_utc_timestamp(
                                           to_unix_timestamp(concat(substr(after.update_time, 1, 10), ' ',
                                                                    substr(after.update_time, 12, 8))) * 1000,
                                           'GMT+8'), 'yyyy-MM-dd HH:mm:ss') sign_time,
                       ts
                from ods_order_info_inc
                where dt = '${datestr}'
                  and op = 'u'
                  and before.status = '60070'
                  and after.status = '60080'
                  and after.is_deleted = '0') without_status
                   left join
               (select id,
                       name
                from ods_base_dic_full
                where dt = '${datestr}'
                  and is_deleted = '0') dic_for_status
               on without_status.status = cast(dic_for_status.id as string)
                   left join
               (select id,
                       name
                from ods_base_dic_full
                where dt = '${datestr}'
                  and is_deleted = '0') dic_type_name
               on without_status.payment_type = cast(dic_type_name.id as string)),
     order_info
         as (
         select id,
                order_id,
                cargo_type,
                cargo_type_name,
                volume_length,
                volume_width,
                volume_height,
                weight,
                order_time,
                order_no,
                status,
                status_name,
                collect_type,
                collect_type_name,
                user_id,
                receiver_complex_id,
                receiver_province_id,
                receiver_city_id,
                receiver_district_id,
                receiver_name,
                sender_complex_id,
                sender_province_id,
                sender_city_id,
                sender_district_id,
                sender_name,
                payment_type,
                payment_type_name,
                cargo_num,
                amount,
                estimate_arrive_time,
                distance
         from dwd_trade_order_process_inc
         where dt = '9999-12-31'
         union
         select cargo.id,
                order_id,
                cargo_type,
                dic_for_cargo_type.name   cargo_type_name,
                volume_length,
                volume_width,
                volume_height,
                weight,
                order_time,
                order_no,
                status,
                dic_for_status.name       status_name,
                collect_type,
                dic_for_collect_type.name collect_type_name,
                user_id,
                receiver_complex_id,
                receiver_province_id,
                receiver_city_id,
                receiver_district_id,
                receiver_name,
                sender_complex_id,
                sender_province_id,
                sender_city_id,
                sender_district_id,
                sender_name,
                ''                        payment_type,
                ''                        payment_type_name,
                cargo_num,
                amount,
                estimate_arrive_time,
                distance
         from (select after.id,
                      after.order_id,
                      after.cargo_type,
                      after.volume_length,
                      after.volume_width,
                      after.volume_height,
                      after.weight,
                      date_format(
                              from_utc_timestamp(
                                          to_unix_timestamp(concat(substr(after.create_time, 1, 10), ' ',
                                                                   substr(after.create_time, 12, 8))) * 1000,
                                          'GMT+8'), 'yyyy-MM-dd HH:mm:ss') order_time,
                      ts
               from ods_order_cargo_inc
               where dt = '${datestr}'
                 and op = 'c') cargo
                  join
              (select after.id,
                      after.order_no,
                      after.status,
                      after.collect_type,
                      after.user_id,
                      after.receiver_complex_id,
                      after.receiver_province_id,
                      after.receiver_city_id,
                      after.receiver_district_id,
                      concat(substr(after.receiver_name, 1, 1), '*') receiver_name,
                      after.sender_complex_id,
                      after.sender_province_id,
                      after.sender_city_id,
                      after.sender_district_id,
                      concat(substr(after.sender_name, 1, 1), '*')   sender_name,
                      after.cargo_num,
                      after.amount,
                      date_format(from_utc_timestamp(
                                          cast(after.estimate_arrive_time as bigint), 'UTC'),
                                  'yyyy-MM-dd HH:mm:ss')             estimate_arrive_time,
                      after.distance
               from ods_order_info_inc
               where dt = '${datestr}'
                 and op = 'c') info
              on cargo.order_id = info.id
                  left join
              (select id,
                      name
               from ods_base_dic_full
               where dt = '${datestr}'
                 and is_deleted = '0') dic_for_cargo_type
              on cargo.cargo_type = cast(dic_for_cargo_type.id as string)
                  left join
              (select id,
                      name
               from ods_base_dic_full
               where dt = '${datestr}'
                 and is_deleted = '0') dic_for_status
              on info.status = cast(dic_for_status.id as string)
                  left join
              (select id,
                      name
               from ods_base_dic_full
               where dt = '${datestr}'
                 and is_deleted = '0') dic_for_collect_type
              on info.collect_type = cast(dic_for_cargo_type.id as string))
insert overwrite table tms.dwd_trans_sign_detail_inc
    partition(dt = '${datestr}')
select order_info.id,
       order_id,
       cargo_type,
       cargo_type_name,
       volume_length,
       volume_width,
       volume_height,
       weight,
       sign_info.sign_time,
       order_info.order_no,
       sign_info.status,
       sign_info.status_name,
       order_info.collect_type,
       collect_type_name,
       order_info.user_id,
       order_info.receiver_complex_id,
       order_info.receiver_province_id,
       order_info.receiver_city_id,
       order_info.receiver_district_id,
       order_info.receiver_name,
       order_info.sender_complex_id,
       order_info.sender_province_id,
       order_info.sender_city_id,
       order_info.sender_district_id,
       order_info.sender_name,
       sign_info.payment_type,
       sign_info.payment_type_name,
       order_info.cargo_num,
       order_info.amount,
       order_info.estimate_arrive_time,
       order_info.distance,
       sign_info.ts
from sign_info
         join order_info
              on sign_info.id = order_info.order_id;
"

dwd_trade_order_process_inc_sql="
set hive.exec.dynamic.partition.mode=nonstrict;
with tmp
         as
         (select id,
                 order_id,
                 cargo_type,
                 cargo_type_name,
                 volume_length,
                 volume_width,
                 volume_height,
                 weight,
                 order_time,
                 order_no,
                 status,
                 status_name,
                 collect_type,
                 collect_type_name,
                 user_id,
                 receiver_complex_id,
                 receiver_province_id,
                 receiver_city_id,
                 receiver_district_id,
                 receiver_name,
                 sender_complex_id,
                 sender_province_id,
                 sender_city_id,
                 sender_district_id,
                 sender_name,
                 payment_type,
                 payment_type_name,
                 cargo_num,
                 amount,
                 estimate_arrive_time,
                 distance,
                 ts,
                 start_date,
                 end_date
          from dwd_trade_order_process_inc
          where dt = '9999-12-31'
          union
          select cargo.id,
                 order_id,
                 cargo_type,
                 dic_for_cargo_type.name               cargo_type_name,
                 volume_length,
                 volume_width,
                 volume_height,
                 weight,
                 order_time,
                 order_no,
                 status,
                 dic_for_status.name                   status_name,
                 collect_type,
                 dic_for_collect_type.name             collect_type_name,
                 user_id,
                 receiver_complex_id,
                 receiver_province_id,
                 receiver_city_id,
                 receiver_district_id,
                 receiver_name,
                 sender_complex_id,
                 sender_province_id,
                 sender_city_id,
                 sender_district_id,
                 sender_name,
                 payment_type,
                 dic_for_payment_type.name             payment_type_name,
                 cargo_num,
                 amount,
                 estimate_arrive_time,
                 distance,
                 ts,
                 date_format(order_time, 'yyyy-MM-dd') start_date,
                 '9999-12-31'                          end_date
          from (select after.id,
                       after.order_id,
                       after.cargo_type,
                       after.volume_length,
                       after.volume_width,
                       after.volume_height,
                       after.weight,
                       date_format(
                               from_utc_timestamp(
                                           to_unix_timestamp(concat(substr(after.create_time, 1, 10), ' ',
                                                                    substr(after.create_time, 12, 8))) * 1000,
                                           'GMT+8'), 'yyyy-MM-dd HH:mm:ss') order_time,
                       ts
                from ods_order_cargo_inc
                where dt = '${datestr}'
                  and op = 'c') cargo
                   join
               (select after.id,
                       after.order_no,
                       after.status,
                       after.collect_type,
                       after.user_id,
                       after.receiver_complex_id,
                       after.receiver_province_id,
                       after.receiver_city_id,
                       after.receiver_district_id,
                       concat(substr(after.receiver_name, 1, 1), '*') receiver_name,
                       after.sender_complex_id,
                       after.sender_province_id,
                       after.sender_city_id,
                       after.sender_district_id,
                       concat(substr(after.sender_name, 1, 1), '*')   sender_name,
                       after.payment_type,
                       after.cargo_num,
                       after.amount,
                       date_format(from_utc_timestamp(
                                           cast(after.estimate_arrive_time as bigint), 'UTC'),
                                   'yyyy-MM-dd HH:mm:ss')             estimate_arrive_time,
                       after.distance
                from ods_order_info_inc
                where dt = '${datestr}'
                  and op = 'c') info
               on cargo.order_id = info.id
                   left join
               (select id,
                       name
                from ods_base_dic_full
                where dt = '${datestr}'
                  and is_deleted = '0') dic_for_cargo_type
               on cargo.cargo_type = cast(dic_for_cargo_type.id as string)
                   left join
               (select id,
                       name
                from ods_base_dic_full
                where dt = '${datestr}'
                  and is_deleted = '0') dic_for_status
               on info.status = cast(dic_for_status.id as string)
                   left join
               (select id,
                       name
                from ods_base_dic_full
                where dt = '${datestr}'
                  and is_deleted = '0') dic_for_collect_type
               on info.collect_type = cast(dic_for_cargo_type.id as string)
                   left join
               (select id,
                       name
                from ods_base_dic_full
                where dt = '${datestr}'
                  and is_deleted = '0') dic_for_payment_type
               on info.payment_type = cast(dic_for_payment_type.id as string)),
     inc
         as
         (select without_type_name.id,
                 status,
                 payment_type,
                 dic_for_payment_type.name payment_type_name
          from (select id,
                       status,
                       payment_type
                from (select after.id,
                             after.status,
                             after.payment_type,
                             row_number() over (partition by after.id order by ts desc) rn
                      from ods_order_info_inc
                      where dt = '${datestr}'
                        and op = 'u'
                        and after.is_deleted = '0'
                     ) inc_origin
                where rn = 1) without_type_name
                   left join
               (select id,
                       name
                from ods_base_dic_full
                where dt = '${datestr}'
                  and is_deleted = '0') dic_for_payment_type
               on without_type_name.payment_type = cast(dic_for_payment_type.id as string)
         )
insert overwrite table dwd_trade_order_process_inc
    partition(dt)
select tmp.id,
       order_id,
       cargo_type,
       cargo_type_name,
       volume_length,
       volume_width,
       volume_height,
       weight,
       order_time,
       order_no,
       inc.status,
       status_name,
       collect_type,
       collect_type_name,
       user_id,
       receiver_complex_id,
       receiver_province_id,
       receiver_city_id,
       receiver_district_id,
       receiver_name,
       sender_complex_id,
       sender_province_id,
       sender_city_id,
       sender_district_id,
       sender_name,
       inc.payment_type,
       inc.payment_type_name,
       cargo_num,
       amount,
       estimate_arrive_time,
       distance,
       ts,
       start_date,
       if(inc.status = '60080' or
          inc.status = '60999',
          '${datestr}', tmp.end_date) end_date,
       if(inc.status = '60080' or
          inc.status = '60999',
          '${datestr}', tmp.end_date) dt
from tmp
         left join inc
                   on tmp.order_id = inc.id;
"

dwd_trans_trans_finish_inc_sql="
insert overwrite table dwd_trans_trans_finish_inc
    partition (dt = '${datestr}')
select info.id,
       shift_id,
       line_id,
       start_org_id,
       start_org_name,
       end_org_id,
       end_org_name,
       order_num,
       driver1_emp_id,
       driver1_name,
       driver2_emp_id,
       driver2_name,
       truck_id,
       truck_no,
       actual_start_time,
       actual_end_time,
       from_unixtime( (to_unix_timestamp(actual_start_time) + estimated_time*60)) estimate_end_time,
       actual_distance,
       finish_dur_sec,
       ts
from (select after.id,
       after.shift_id,
       after.line_id,
       after.start_org_id,
       after.start_org_name,
       after.end_org_id,
       after.end_org_name,
       after.order_num,
       after.driver1_emp_id,
       concat(substr(after.driver1_name, 1, 1), '*')                                            driver1_name,
       after.driver2_emp_id,
       concat(substr(after.driver2_name, 1, 1), '*')                                            driver2_name,
       after.truck_id,
       md5(after.truck_no)                                                                      truck_no,
       date_format(from_utc_timestamp(
                           cast(after.actual_start_time as bigint), 'UTC'),
                   'yyyy-MM-dd HH:mm:ss')                                                       actual_start_time,
       date_format(from_utc_timestamp(
                           cast(after.actual_end_time as bigint), 'UTC'),
                   'yyyy-MM-dd HH:mm:ss')                                                       actual_end_time,
       after.actual_distance,
       (cast(after.actual_end_time as bigint) - cast(after.actual_start_time as bigint)) / 1000 finish_dur_sec,
       ts                                                                                       ts
from ods_transport_task_inc
where dt = '${datestr}'
  and op = 'u'
  and before.actual_end_time is null
  and after.actual_end_time is not null
  and after.is_deleted = '0') info
         left join
     (select id,
             estimated_time
      from dim_shift_full
      where dt = '${datestr}') dim_tb
     on info.shift_id = dim_tb.id;
"

dwd_bound_inbound_inc_sql="
insert overwrite table dwd_bound_inbound_inc
    partition (dt = '${datestr}')
select after.id,
       after.order_id,
       after.org_id,
       date_format(from_utc_timestamp(
                           cast(after.inbound_time as bigint), 'UTC'),
                   'yyyy-MM-dd HH:mm:ss') inbound_time,
       after.inbound_emp_id
from ods_order_org_bound_inc
where dt = '${datestr}'
  and op = 'c';
"

dwd_bound_sort_inc_sql="
insert overwrite table dwd_bound_sort_inc
    partition (dt = '${datestr}')
select after.id,
       after.order_id,
       after.org_id,
       date_format(from_utc_timestamp(
                           cast(after.sort_time as bigint), 'UTC'),
                   'yyyy-MM-dd HH:mm:ss') sort_time,
       after.sorter_emp_id
from ods_order_org_bound_inc
where dt = '${datestr}'
  and op = 'u'
  and before.sort_time is null
  and after.sort_time is not null
  and after.is_deleted = '0';
"

dwd_bound_outbound_inc_sql="
insert overwrite table dwd_bound_outbound_inc
    partition (dt = '${datestr}')
select after.id,
       after.order_id,
       after.org_id,
       date_format(from_utc_timestamp(
                           cast(after.outbound_time as bigint), 'UTC'),
                   'yyyy-MM-dd HH:mm:ss') outbound_time,
       after.outbound_emp_id
from ods_order_org_bound_inc
where dt = '${datestr}'
  and op = 'u'
  and before.outbound_time is null
  and after.outbound_time is not null
  and after.is_deleted = '0';
"
#3、根据表名匹配加载数据
case $1 in
"all")
	/opt/module/hive/bin/hive -e "use tms;${dwd_trade_order_process_inc_sql}${dwd_bound_inbound_inc_sql}${dwd_bound_outbound_inc_sql}${dwd_bound_sort_inc_sql}${dwd_trade_order_cancel_detail_inc_sql}${dwd_trade_order_detail_inc_sql}${dwd_trade_pay_suc_detail_inc_sql}${dwd_trans_bound_finish_detail_inc_sql}${dwd_trans_deliver_suc_detail_inc_sql}${dwd_trans_dispatch_detail_inc_sql}${dwd_trans_receive_detail_inc_sql}${dwd_trans_sign_detail_inc_sql}${dwd_trans_trans_finish_inc_sql}
"
;;
"dwd_bound_inbound_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_bound_inbound_inc_sql}"
;;
"dwd_bound_outbound_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_bound_outbound_inc_sql}"
;;
"dwd_bound_sort_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_bound_sort_inc_sql}"
;;
"dwd_trade_order_cancel_detail_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_trade_order_cancel_detail_inc_sql}"
;;
"dwd_trade_order_detail_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_trade_order_detail_inc_sql}"
;;
"dwd_trade_order_process_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_trade_order_process_inc_sql}"
;;
"dwd_trade_pay_suc_detail_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_trade_pay_suc_detail_inc_sql}"
;;
"dwd_trans_bound_finish_detail_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_trans_bound_finish_detail_inc_sql}"
;;
"dwd_trans_deliver_suc_detail_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_trans_deliver_suc_detail_inc_sql}"
;;
"dwd_trans_dispatch_detail_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_trans_dispatch_detail_inc_sql}"
;;
"dwd_trans_receive_detail_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_trans_receive_detail_inc_sql}"
;;
"dwd_trans_sign_detail_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_trans_sign_detail_inc_sql}"
;;
"dwd_trans_trans_finish_inc")
    /opt/module/hive/bin/hive -e "use tms;${dwd_trans_trans_finish_inc_sql}"
;;
*)
	echo "表名输入错误..."
;;
esac

3)增加脚本执行权限

[kingjw@hadoop102 bin]$ chmod +x ods_to_dwd.sh 

4)脚本用法

[kingjw@hadoop102 bin]$ ods_to_dwd.sh all 2023-01-11

10、数仓开发之DWS层

DWS层的表的结构和ads层的需求相关,提取相同的需求

10.1、最近 1 日汇总表

10.1.1、交易域机构货物类型粒度下单 1 日汇总表

1)建表语句

drop table if exists dws_trade_org_cargo_type_order_1d;
create external table dws_trade_org_cargo_type_order_1d(
	`org_id` bigint comment '机构ID',
	`org_name` string comment '转运站名称',
	`city_id` bigint comment '城市ID',
	`city_name` string comment '城市名称',
	`cargo_type` string comment '货物类型',
	`cargo_type_name` string comment '货物类型名称',
	`order_count` bigint comment '下单数',
	`order_amount` decimal(16,2) comment '下单金额'
) comment '交易域机构货物类型粒度下单 1 日汇总表'
	partitioned by(`dt` string comment '统计日期')
	stored as orc
	location '/warehouse/tms/dws/dws_trade_org_cargo_type_order_1d'
	tblproperties('orc.compress' = 'snappy');

2)数据装载

(1)首日装载

set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table dws_trade_org_cargo_type_order_1d
    partition (dt)
select org_id,
       org_name,
       city_id,
       region.name city_name,
       cargo_type,
       cargo_type_name,
       order_count,
       order_amount,
       dt
from (select org_id,
             org_name,
             sender_city_id  city_id,
             cargo_type,
             cargo_type_name,
             count(order_id) order_count,
             sum(amount)     order_amount,
             dt
      from (select order_id,
                   cargo_type,
                   cargo_type_name,
                   sender_district_id,
                   sender_city_id,
                   max(amount) amount,
                   dt
            from (select order_id,
                         cargo_type,
                         cargo_type_name,
                         sender_district_id,
                         sender_city_id,
                         amount,
                         dt
                  from dwd_trade_order_detail_inc) detail
            group by order_id,
                     cargo_type,
                     cargo_type_name,
                     sender_district_id,
                     sender_city_id,
                     dt) distinct_detail
               left join
           (select id org_id,
                   org_name,
                   region_id
            from dim_organ_full
            where dt = '2023-01-10') org
           on distinct_detail.sender_district_id = org.region_id
      group by org_id,
               org_name,
               cargo_type,
               cargo_type_name,
               sender_city_id,
               dt) agg
         left join (
    select id,
           name
    from dim_region_full
    where dt = '2023-01-10'
) region on city_id = region.id;

(2)每日装载

insert overwrite table dws_trade_org_cargo_type_order_1d
    partition (dt = '2023-01-11')
select org_id,
       org_name,
       city_id,
       region.name city_name,
       cargo_type,
       cargo_type_name,
       order_count,
       order_amount
from (select org_id,
             org_name,
             city_id,
             cargo_type,
             cargo_type_name,
             count(order_id) order_count,
             sum(amount)     order_amount
      from (select order_id,
                   cargo_type,
                   cargo_type_name,
                   sender_district_id,
                   sender_city_id city_id,
                   sum(amount)    amount
            from (select order_id,
                         cargo_type,
                         cargo_type_name,
                         sender_district_id,
                         sender_city_id,
                         amount
                  from dwd_trade_order_detail_inc
                  where dt = '2023-01-11') detail
            group by order_id,
                     cargo_type,
                     cargo_type_name,
                     sender_district_id,
                     sender_city_id) distinct_detail
               left join
           (select id org_id,
                   org_name,
                   region_id
            from dim_organ_full
            where dt = '2023-01-11') org
           on distinct_detail.sender_district_id = org.region_id
      group by org_id,
               org_name,
               city_id,
               cargo_type,
               cargo_type_name) agg
         left join (
    select id,
           name
    from dim_region_full
    where dt = '2023-01-11'
) region on city_id = region.id;

10.1.2、数据装载脚本

1首日数据装载脚本

(1)在hadoop102的/home/kingjw/bin目录下创建dwd_to_dws_1d_init.sh

[kingjw@hadoop102 bin]$ vim dwd_to_dws_1d_init.sh 

(2)编写如下内容

#! /bin/bash
#1、判断参数是否传入
if [ $# -lt 2 ]
then
	echo "必须传入all/表名与上线日期..."
	exit
fi

dws_trade_org_cargo_type_order_1d_sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table dws_trade_org_cargo_type_order_1d
    partition (dt)
select org_id,
       org_name,
       city_id,
       region.name city_name,
       cargo_type,
       cargo_type_name,
       order_count,
       order_amount,
       dt
from (select org_id,
             org_name,
             sender_city_id  city_id,
             cargo_type,
             cargo_type_name,
             count(order_id) order_count,
             sum(amount)     order_amount,
             dt
      from (select order_id,
                   cargo_type,
                   cargo_type_name,
                   sender_district_id,
                   sender_city_id,
                   max(amount) amount,
                   dt
            from (select order_id,
                         cargo_type,
                         cargo_type_name,
                         sender_district_id,
                         sender_city_id,
                         amount,
                         dt
                  from dwd_trade_order_detail_inc) detail
            group by order_id,
                     cargo_type,
                     cargo_type_name,
                     sender_district_id,
                     sender_city_id,
                     dt) distinct_detail
               left join
           (select id org_id,
                   org_name,
                   region_id
            from dim_organ_full
            where dt = '$2') org
           on distinct_detail.sender_district_id = org.region_id
      group by org_id,
               org_name,
               cargo_type,
               cargo_type_name,
               sender_city_id,
               dt) agg
         left join (
    select id,
           name
    from dim_region_full
    where dt = '$2'
) region on city_id = region.id;
"

dws_trans_org_receive_1d_sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table dws_trans_org_receive_1d
    partition (dt)
select org_id,
       org_name,
       city_id,
       city_name,
       province_id,
       province_name,
       count(order_id)      order_count,
       sum(distinct_amount) order_amount,
       dt
from (select order_id,
             org_id,
             org_name,
             city_id,
             city_name,
             province_id,
             province_name,
             max(amount) distinct_amount,
             dt
      from (select order_id,
                   amount,
                   sender_district_id,
                   dt
            from dwd_trans_receive_detail_inc) detail
               left join
           (select id org_id,
                   org_name,
                   region_id
            from dim_organ_full
            where dt = '$2') organ
           on detail.sender_district_id = organ.region_id
               left join
           (select id,
                   parent_id
            from dim_region_full
            where dt = '$2') district
           on region_id = district.id
               left join
           (select id   city_id,
                   name city_name,
                   parent_id
            from dim_region_full
            where dt = '$2') city
           on district.parent_id = city_id
               left join
           (select id   province_id,
                   name province_name,
                   parent_id
            from dim_region_full
            where dt = '$2') province
           on city.parent_id = province_id
      group by order_id,
               org_id,
               org_name,
               city_id,
               city_name,
               province_id,
               province_name,
               dt) distinct_tb
group by org_id,
         org_name,
         city_id,
         city_name,
         province_id,
         province_name,
         dt;
"

dws_trans_dispatch_1d_sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table dws_trans_dispatch_1d
    partition (dt)
select count(order_id)      order_count,
       sum(distinct_amount) order_amount,
       dt
from (select order_id,
             dt,
             max(amount) distinct_amount
      from dwd_trans_dispatch_detail_inc
      group by order_id,
               dt) distinct_info
group by dt;
"

dws_trans_org_truck_model_type_trans_finish_1d_sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table dws_trans_org_truck_model_type_trans_finish_1d
    partition (dt)
select org_id,
       org_name,
       truck_model_type,
       truck_model_type_name,
       count(trans_finish.id) truck_finish_count,
       sum(actual_distance)   trans_finish_distance,
       sum(finish_dur_sec)    finish_dur_sec,
       dt
from (select id,
             start_org_id   org_id,
             start_org_name org_name,
             truck_id,
             actual_distance,
             finish_dur_sec,
             dt
      from dwd_trans_trans_finish_inc) trans_finish
         left join
     (select id,
             truck_model_type,
             truck_model_type_name
      from dim_truck_full
      where dt = '$2') truck_info
     on trans_finish.truck_id = truck_info.id
group by org_id,
         org_name,
         truck_model_type,
         truck_model_type_name,
         dt;
"

dws_trans_org_deliver_suc_1d_sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table dws_trans_org_deliver_suc_1d
    partition (dt)
select org_id,
       org_name,
       city_id,
       city.name       city_name,
       province_id,
       province.name   province_name,
       count(order_id) order_count,
       dt
from (select order_id,
             receiver_district_id,
             dt
      from dwd_trans_deliver_suc_detail_inc
      group by order_id, receiver_district_id, dt) detail
         left join
     (select id        org_id,
             org_name,
             region_id district_id
      from dim_organ_full
      where dt = '$2') organ
     on detail.receiver_district_id = organ.district_id
         left join
     (select id,
             parent_id city_id
      from dim_region_full
      where dt = '$2') district
     on district_id = district.id
         left join
     (select id,
             name,
             parent_id province_id
      from dim_region_full
      where dt = '$2') city
     on city_id = city.id
         left join
     (select id,
             name
      from dim_region_full
      where dt = '$2') province
     on province_id = province.id
group by org_id,
         org_name,
         city_id,
         city.name,
         province_id,
         province.name,
         dt;
"

dws_trans_org_sort_1d_sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table dws_trans_org_sort_1d
    partition (dt)
select org_id,
       org_name,
       if(org_level = 1, city_for_level1.id, province_for_level1.id)         city_id,
       if(org_level = 1, city_for_level1.name, province_for_level1.name)     city_name,
       if(org_level = 1, province_for_level1.id, province_for_level2.id)     province_id,
       if(org_level = 1, province_for_level1.name, province_for_level2.name) province_name,
       sort_count,
       dt
from (select org_id,
             count(*) sort_count,
             dt
      from dwd_bound_sort_inc
      group by org_id, dt) agg
         left join
     (select id,
             org_name,
             org_level,
             region_id
      from dim_organ_full
      where dt = '$2') org
     on org_id = org.id
         left join
     (select id,
             name,
             parent_id
      from dim_region_full
      where dt = '$2') city_for_level1
     on region_id = city_for_level1.id
         left join
     (select id,
             name,
             parent_id
      from dim_region_full
      where dt = '$2') province_for_level1
     on city_for_level1.parent_id = province_for_level1.id
         left join
     (select id,
             name,
             parent_id
      from dim_region_full
      where dt = '$2') province_for_level2
     on province_for_level1.parent_id = province_for_level2.id;
"
#2、匹配表名加载数据
case $1 in
"all")
	/opt/module/hive/bin/hive -e "use tms;set hive.exec.dynamic.partition.mode=nonstrict;${dws_trade_org_cargo_type_order_1d_sql}${dws_trans_dispatch_1d_sql}${dws_trans_org_deliver_suc_1d_sql}${dws_trans_org_receive_1d_sql}${dws_trans_org_sort_1d_sql}${dws_trans_org_truck_model_type_trans_finish_1d_sql}"
;;
"dws_trade_org_cargo_type_order_1d")
    /opt/module/hive/bin/hive -e "use tms;set hive.exec.dynamic.partition.mode=nonstrict;${dws_trade_org_cargo_type_order_1d_sql}"
;;
"dws_trans_dispatch_1d")
    /opt/module/hive/bin/hive -e "use tms;set hive.exec.dynamic.partition.mode=nonstrict;${dws_trans_dispatch_1d_sql}"
;;
"dws_trans_org_deliver_suc_1d")
    /opt/module/hive/bin/hive -e "use tms;set hive.exec.dynamic.partition.mode=nonstrict;${dws_trans_org_deliver_suc_1d_sql}"
;;
"dws_trans_org_receive_1d")
    /opt/module/hive/bin/hive -e "use tms;set hive.exec.dynamic.partition.mode=nonstrict;${dws_trans_org_receive_1d_sql}"
;;
"dws_trans_org_sort_1d")
    /opt/module/hive/bin/hive -e "use tms;set hive.exec.dynamic.partition.mode=nonstrict;${dws_trans_org_sort_1d_sql}"
;;
"dws_trans_org_truck_model_type_trans_finish_1d")
    /opt/module/hive/bin/hive -e "use tms;set hive.exec.dynamic.partition.mode=nonstrict;${dws_trans_org_truck_model_type_trans_finish_1d_sql}"
;;
*)
	echo "表名输入错误..."
;;
esac

(3)增加脚本执行权限

[kingjw@hadoop102 bin]$ chmod +x dwd_to_dws_1d_init.sh 

(4)脚本用法

[kingjw@hadoop102 bin]$ dwd_to_dws_1d_init.sh all 2023-01-10

2)每日数据装载脚本

(1)在hadoop102的/home/kingjw/bin目录下创建dwd_to_dws_1d.sh

[kingjw@hadoop102 bin]$ vim dwd_to_dws_1d.sh 

(2)编写如下内容

#! /bin/bash
#1、判断参数是否传入
if [ $# -lt 1 ]
then
	echo "必须传入all/表名..."
	exit
fi

#2、判断日期是否传入
[ "$2" ] && datestr=$2 || datestr=$(date -d '-1 day' +%F)

dws_trade_org_cargo_type_order_1d_sql="
insert overwrite table dws_trade_org_cargo_type_order_1d
    partition (dt = '${datestr}')
select org_id,
       org_name,
       city_id,
       region.name city_name,
       cargo_type,
       cargo_type_name,
       order_count,
       order_amount
from (select org_id,
             org_name,
             city_id,
             cargo_type,
             cargo_type_name,
             count(order_id) order_count,
             sum(amount)     order_amount
      from (select order_id,
                   cargo_type,
                   cargo_type_name,
                   sender_district_id,
                   sender_city_id city_id,
                   sum(amount)    amount
            from (select order_id,
                         cargo_type,
                         cargo_type_name,
                         sender_district_id,
                         sender_city_id,
                         amount
                  from dwd_trade_order_detail_inc
                  where dt = '${datestr}') detail
            group by order_id,
                     cargo_type,
                     cargo_type_name,
                     sender_district_id,
                     sender_city_id) distinct_detail
               left join
           (select id org_id,
                   org_name,
                   region_id
            from dim_organ_full
            where dt = '${datestr}') org
           on distinct_detail.sender_district_id = org.region_id
      group by org_id,
               org_name,
               city_id,
               cargo_type,
               cargo_type_name) agg
         left join (
    select id,
           name
    from dim_region_full
    where dt = '${datestr}'
) region on city_id = region.id;
"

dws_trans_org_receive_1d_sql="
insert overwrite table dws_trans_org_receive_1d
    partition (dt = '${datestr}')
select org_id,
       org_name,
       city_id,
       city_name,
       province_id,
       province_name,
       count(order_id)      order_count,
       sum(distinct_amount) order_amount
from (select order_id,
             org_id,
             org_name,
             city_id,
             city_name,
             province_id,
             province_name,
             max(amount) distinct_amount
      from (select order_id,
                   amount,
                   sender_district_id
            from dwd_trans_receive_detail_inc
            where dt = '${datestr}') detail
               left join
           (select id org_id,
                   org_name,
                   region_id
            from dim_organ_full
            where dt = '${datestr}') organ
           on detail.sender_district_id = organ.region_id
               left join
           (select id,
                   parent_id
            from dim_region_full
            where dt = '${datestr}') district
           on region_id = district.id
               left join
           (select id   city_id,
                   name city_name,
                   parent_id
            from dim_region_full
            where dt = '${datestr}') city
           on district.parent_id = city_id
               left join
           (select id   province_id,
                   name province_name,
                   parent_id
            from dim_region_full
            where dt = '${datestr}') province
           on city.parent_id = province_id
      group by order_id,
               org_id,
               org_name,
               city_id,
               city_name,
               province_id,
               province_name) distinct_tb
group by org_id,
         org_name,
         city_id,
         city_name,
         province_id,
         province_name;
"

dws_trans_dispatch_1d_sql="
insert overwrite table dws_trans_dispatch_1d
    partition (dt = '${datestr}')
select count(order_id)      order_count,
       sum(distinct_amount) order_amount
from (select order_id,
             max(amount) distinct_amount
      from dwd_trans_dispatch_detail_inc
      where dt = '${datestr}'
      group by order_id) distinct_info;
"

dws_trans_org_truck_model_type_trans_finish_1d_sql="
insert overwrite table dws_trans_org_truck_model_type_trans_finish_1d
    partition (dt = '${datestr}')
select org_id,
       org_name,
       truck_model_type,
       truck_model_type_name,
       count(trans_finish.id) truck_finish_count,
       sum(actual_distance)   trans_finish_distance,
       sum(finish_dur_sec)    finish_dur_sec
from (select id,
             start_org_id   org_id,
             start_org_name org_name,
             truck_id,
             actual_distance,
             finish_dur_sec
      from dwd_trans_trans_finish_inc
      where dt = '${datestr}') trans_finish
         left join
     (select id,
             truck_model_type,
             truck_model_type_name
      from dim_truck_full
      where dt = '${datestr}') truck_info
     on trans_finish.truck_id = truck_info.id
group by org_id,
         org_name,
         truck_model_type,
         truck_model_type_name;
"

dws_trans_org_deliver_suc_1d_sql="
insert overwrite table dws_trans_org_deliver_suc_1d
    partition (dt = '${datestr}')
select org_id,
       org_name,
       city_id,
       city.name       city_name,
       province_id,
       province.name   province_name,
       count(order_id) order_count
from (select order_id,
             sender_district_id
      from dwd_trans_deliver_suc_detail_inc
      where dt = '${datestr}'
      group by order_id, sender_district_id) detail
         left join
     (select id        org_id,
             org_name,
             region_id district_id
      from dim_organ_full
      where dt = '${datestr}') organ
     on detail.sender_district_id = organ.district_id
         left join
     (select id,
             parent_id city_id
      from dim_region_full
      where dt = '${datestr}') district
     on district_id = district.id
         left join
     (select id,
             name,
             parent_id province_id
      from dim_region_full
      where dt = '${datestr}') city
     on city_id = city.id
         left join
     (select id,
             name
      from dim_region_full
      where dt = '${datestr}') province
     on province_id = province.id
group by org_id,
         org_name,
         city_id,
         city.name,
         province_id,
         province.name;
"

dws_trans_org_sort_1d_sql="
insert overwrite table dws_trans_org_sort_1d
    partition (dt = '${datestr}')
select org_id,
       org_name,
       if(org_level = 1, city_for_level1.id, province_for_level1.id)         city_id,
       if(org_level = 1, city_for_level1.name, province_for_level1.name)     city_name,
       if(org_level = 1, province_for_level1.id, province_for_level2.id)     province_id,
       if(org_level = 1, province_for_level1.name, province_for_level2.name) province_name,
       sort_count
from (select org_id,
             count(*) sort_count
      from dwd_bound_sort_inc
      where dt = '${datestr}'
      group by org_id) agg
         left join
     (select id,
             org_name,
             org_level,
             region_id
      from dim_organ_full
      where dt = '${datestr}') org
     on org_id = org.id
         left join
     (select id,
             name,
             parent_id
      from dim_region_full
      where dt = '${datestr}') city_for_level1
     on region_id = city_for_level1.id
         left join
     (select id,
             name,
             parent_id
      from dim_region_full
      where dt = '${datestr}') province_for_level1
     on city_for_level1.parent_id = province_for_level1.id
         left join
     (select id,
             name,
             parent_id
      from dim_region_full
      where dt = '${datestr}') province_for_level2
     on province_for_level1.parent_id = province_for_level2.id;
"
#2、匹配表名加载数据
case $1 in
"all")
	/opt/module/hive/bin/hive -e "use tms;${dws_trade_org_cargo_type_order_1d_sql}${dws_trans_dispatch_1d_sql}${dws_trans_org_deliver_suc_1d_sql}${dws_trans_org_receive_1d_sql}${dws_trans_org_sort_1d_sql}${dws_trans_org_truck_model_type_trans_finish_1d_sql}"
;;
"dws_trade_org_cargo_type_order_1d")
    /opt/module/hive/bin/hive -e "use tms;${dws_trade_org_cargo_type_order_1d_sql}"
;;
"dws_trans_dispatch_1d")
    /opt/module/hive/bin/hive -e "use tms;${dws_trans_dispatch_1d_sql}"
;;
"dws_trans_org_deliver_suc_1d")
    /opt/module/hive/bin/hive -e "use tms;${dws_trans_org_deliver_suc_1d_sql}"
;;
"dws_trans_org_receive_1d")
    /opt/module/hive/bin/hive -e "use tms;${dws_trans_org_receive_1d_sql}"
;;
"dws_trans_org_sort_1d")
    /opt/module/hive/bin/hive -e "use tms;${dws_trans_org_sort_1d_sql}"
;;
"dws_trans_org_truck_model_type_trans_finish_1d")
    /opt/module/hive/bin/hive -e "use tms;${dws_trans_org_truck_model_type_trans_finish_1d_sql}"
;;
*)
	echo "表名输入错误..."
;;
esac

(3)增加脚本执行权限

[kingjw@hadoop102 bin]$ chmod +x dwd_to_dws_1d.sh 

(4)脚本用法

[kingjw@hadoop102 bin]$ dwd_to_dws_1d.sh all 2023-01-11

10.2、最近 n 日汇总表

10.2.1、交易域机构货物类型粒度下单 n 日汇总表

1)建表语句

drop table if exists dws_trade_org_cargo_type_order_nd;
create external table dws_trade_org_cargo_type_order_nd(
	`org_id` bigint comment '机构ID',
	`org_name` string comment '转运站名称',
	`city_id` bigint comment '城市ID',
	`city_name` string comment '城市名称',
	`cargo_type` string comment '货物类型',
	`cargo_type_name` string comment '货物类型名称',
	`recent_days` tinyint comment '最近天数',
	`order_count` bigint comment '下单数',
	`order_amount` decimal(16,2) comment '下单金额'
) comment '交易域机构货物类型粒度下单 n 日汇总表'
	partitioned by(`dt` string comment '统计日期')
	stored as orc
	location '/warehouse/tms/dws/dws_trade_org_cargo_type_order_nd'
	tblproperties('orc.compress' = 'snappy');

2)数据装载

insert overwrite table dws_trade_org_cargo_type_order_nd
    partition (dt = '2023-01-10')
select org_id,
       org_name,
       city_id,
       city_name,
       cargo_type,
       cargo_type_name,
       recent_days,
       sum(order_count)  order_count,
       sum(order_amount) order_amount
from dws_trade_org_cargo_type_order_1d lateral view
    explode(array(7, 30)) tmp as recent_days
where dt >= date_add('2023-01-10', -recent_days + 1)
group by org_id,
         org_name,
         city_id,
         city_name,
         cargo_type,
         cargo_type_name,
         recent_days;

10.2.2、数据装载脚本

1)每日数据装载脚本

(1)在hadoop102的/home/kingjw/bin目录下创建dws_1d_to_dws_nd.sh

[kingjw@hadoop102 bin]$ vim dws_1d_to_dws_nd.sh 

(2)编写如下内容

#! /bin/bash
#1、判断参数是否传入
if [ $# -lt 1 ]
then
	echo "必须传入all/表名..."
	exit
fi

#2、判断日期是否传入
[ "$2" ] && datestr=$2 || datestr=$(date -d '-1 day' +%F)

dws_trade_org_cargo_type_order_nd_sql="
insert overwrite table dws_trade_org_cargo_type_order_nd
    partition (dt = '${datestr}')
select org_id,
       org_name,
       city_id,
       city_name,
       cargo_type,
       cargo_type_name,
       recent_days,
       sum(order_count)  order_count,
       sum(order_amount) order_amount
from dws_trade_org_cargo_type_order_1d lateral view
    explode(array(7, 30)) tmp as recent_days
where dt >= date_add('${datestr}', -recent_days + 1)
group by org_id,
         org_name,
         city_id,
         city_name,
         cargo_type,
         cargo_type_name,
         recent_days;
"
dws_trans_org_receive_nd_sql="
insert overwrite table dws_trans_org_receive_nd
    partition (dt = '${datestr}')
select org_id,
       org_name,
       city_id,
       city_name,
       province_id,
       province_name,
       recent_days,
       sum(order_count)  order_count,
       sum(order_amount) order_amount
from dws_trans_org_receive_1d
         lateral view explode(array(7, 30)) tmp as recent_days
where dt >= date_add('${datestr}', -recent_days + 1)
group by org_id,
         org_name,
         city_id,
         city_name,
         province_id,
         province_name,
         recent_days;
"

dws_trans_dispatch_nd_sql="
insert overwrite table dws_trans_dispatch_nd
    partition (dt = '${datestr}')
select recent_days,
       sum(order_count)  order_count,
       sum(order_amount) order_amount
from dws_trans_dispatch_1d lateral view
    explode(array(7, 30)) tmp as recent_days
where dt >= date_add('${datestr}', -recent_days + 1)
group by recent_days;
"

dws_trans_shift_trans_finish_nd_sql="
insert overwrite table dws_trans_shift_trans_finish_nd
    partition (dt = '${datestr}')
select shift_id,
       if(org_level = 1, first.region_id, city.id)     city_id,
       if(org_level = 1, first.region_name, city.name) city_name,
       org_id,
       org_name,
       line_id,
       line_name,
       driver1_emp_id,
       driver1_name,
       driver2_emp_id,
       driver2_name,
       truck_model_type,
       truck_model_type_name,
       recent_days,
       trans_finish_count,
       trans_finish_distance,
       trans_finish_dur_sec,
       trans_finish_order_count,
       trans_finish_delay_count
from (select recent_days,
             shift_id,
             line_id,
             truck_id,
             start_org_id                                       org_id,
             start_org_name                                     org_name,
             driver1_emp_id,
             driver1_name,
             driver2_emp_id,
             driver2_name,
             count(id)                                          trans_finish_count,
             sum(actual_distance)                               trans_finish_distance,
             sum(finish_dur_sec)                                trans_finish_dur_sec,
             sum(order_num)                                     trans_finish_order_count,
             sum(if(actual_end_time > estimate_end_time, 1, 0)) trans_finish_delay_count
      from dwd_trans_trans_finish_inc lateral view
          explode(array(7, 30)) tmp as recent_days
      where dt >= date_add('${datestr}', -recent_days + 1)
      group by recent_days,
               shift_id,
               line_id,
               start_org_id,
               start_org_name,
               driver1_emp_id,
               driver1_name,
               driver2_emp_id,
               driver2_name,
               truck_id) aggregated
         left join
     (select id,
             org_level,
             region_id,
             region_name
      from dim_organ_full
      where dt = '${datestr}'
     ) first
     on aggregated.org_id = first.id
         left join
     (select id,
             parent_id
      from dim_region_full
      where dt = '${datestr}'
     ) parent
     on first.region_id = parent.id
         left join
     (select id,
             name
      from dim_region_full
      where dt = '${datestr}'
     ) city
     on parent.parent_id = city.id
         left join
     (select id,
             line_name
      from dim_shift_full
      where dt = '${datestr}') for_line_name
     on shift_id = for_line_name.id
         left join (
    select id,
           truck_model_type,
           truck_model_type_name
    from dim_truck_full
    where dt = '${datestr}'
) truck_info on truck_id = truck_info.id;
"

dws_trans_org_deliver_suc_nd_sql="
insert overwrite table dws_trans_org_deliver_suc_nd
    partition (dt = '${datestr}')
select org_id,
       org_name,
       city_id,
       city_name,
       province_id,
       province_name,
       recent_days,
       sum(order_count) order_count
from dws_trans_org_deliver_suc_1d lateral view
    explode(array(7, 30)) tmp as recent_days
where dt >= date_add('${datestr}', -recent_days + 1)
group by org_id,
         org_name,
         city_id,
         city_name,
         province_id,
         province_name,
         recent_days;
"

dws_trans_org_sort_nd_sql="
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table dws_trans_org_sort_nd
    partition (dt = '${datestr}')
select org_id,
       org_name,
       city_id,
       city_name,
       province_id,
       province_name,
       recent_days,
       sum(sort_count) sort_count
from dws_trans_org_sort_1d lateral view
    explode(array(7, 30)) tmp as recent_days
where dt >= date_add('${datestr}', -recent_days + 1)
group by org_id,
         org_name,
         city_id,
         city_name,
         province_id,
         province_name,
         recent_days;
"
#3、匹配表名加载数据
case $1 in
"all")
	/opt/module/hive/bin/hive -e "use tms;${dws_trans_shift_trans_finish_nd_sql}${dws_trade_org_cargo_type_order_nd_sql}${dws_trans_org_sort_nd_sql}${dws_trans_dispatch_nd_sql}${dws_trans_org_receive_nd_sql}${dws_trans_org_deliver_suc_nd_sql}"
;;
"dws_trade_org_cargo_type_order_nd")
    /opt/module/hive/bin/hive -e "use tms;${dws_trade_org_cargo_type_order_nd_sql}"
;;
"dws_trans_dispatch_nd")
    /opt/module/hive/bin/hive -e "use tms;${dws_trans_dispatch_nd_sql}"
;;
"dws_trans_org_deliver_suc_nd")
    /opt/module/hive/bin/hive -e "use tms;${dws_trans_org_deliver_suc_nd_sql}"
;;
"dws_trans_org_receive_nd")
    /opt/module/hive/bin/hive -e "use tms;${dws_trans_org_receive_nd_sql}"
;;
"dws_trans_org_sort_nd")
    /opt/module/hive/bin/hive -e "use tms;${dws_trans_org_sort_nd_sql}"
;;
"dws_trans_shift_trans_finish_nd")
    /opt/module/hive/bin/hive -e "use tms;${dws_trans_shift_trans_finish_nd_sql}"
;;
*)
	echo "表名输入错误..."
;;
esac

(3)增加脚本执行权限

[kingjw@hadoop102 bin]$ chmod +x dws_1d_to_dws_nd.sh 

(4)脚本用法

[kingjw@hadoop102 bin]$ dws_1d_to_dws_nd.sh all 2023-01-10

10.3、历史至今汇总表

10.3.1、物流域发单历史至今汇总表

1)建表语句

drop table if exists dws_trans_dispatch_td;
create external table dws_trans_dispatch_td(
	`order_count` bigint comment '发单数',
	`order_amount` decimal(16,2) comment '发单金额'
) comment '物流域发单历史至今汇总表'
	partitioned by (`dt` string comment '统计日期')
	stored as orc
	location '/warehouse/tms/dws/dws_trans_dispatch_td'
	tblproperties('orc.compress'='snappy');

2)数据装载

(1)首日装载

insert overwrite table dws_trans_dispatch_td
    partition (dt = '2023-01-10')
select sum(order_count)  order_count,
       sum(order_amount) order_amount
from dws_trans_dispatch_1d;

(2)每日装载

insert overwrite table dws_trans_dispatch_td
    partition (dt = '2023-01-11')
select sum(order_count)  order_count,
       sum(order_amount) order_amount
from (select order_count,
             order_amount
      from dws_trans_dispatch_td
      where dt = date_add('2023-01-11', -1)
      union
      select order_count,
             order_amount
      from dws_trans_dispatch_1d
      where dt = '2023-01-11') all_data;

10.3.2、数据装载脚本

1)首日数据装载脚本

(1)在hadoop102的/home/kingjw/bin目录下创建dws_1d_to_dws_td_init.sh

[kingjw@hadoop102 bin]$ vim dws_1d_to_dws_td_init.sh

(2)编写如下内容

#! /bin/bash
#1、判断参数是否传入
if [ $# -lt 2 ]
then
	echo "必须传入all/表名与数仓上线日期..."
	exit
fi

dws_trans_dispatch_td_sql="
insert overwrite table dws_trans_dispatch_td
    partition (dt = '$2')
select sum(order_count)  order_count,
       sum(order_amount) order_amount
from dws_trans_dispatch_1d;
"

dws_trans_bound_finish_td_sql="
insert overwrite table dws_trans_bound_finish_td
    partition (dt = '$2')
select count(order_id)   order_count,
       sum(order_amount) order_amount
from (select order_id,
             max(amount) order_amount
      from dwd_trans_bound_finish_detail_inc
      group by order_id) distinct_info;
"
#2、匹配表名加载数据
case $1 in
"all")
	/opt/module/hive/bin/hive -e "use tms;${dws_trans_bound_finish_td_sql}${dws_trans_dispatch_td_sql}"
;;
"dws_trans_bound_finish_td")
    /opt/module/hive/bin/hive -e "use tms;${dws_trans_bound_finish_td_sql}"
;;
"dws_trans_dispatch_td")
    /opt/module/hive/bin/hive -e "use tms;${dws_trans_dispatch_td_sql}"
;;
*)
	echo "表名输入错误..."
;;
esac

(3)增加脚本执行权限

[kingjw@hadoop102 bin]$ chmod +x dws_1d_to_dws_td_init.sh 

(4)脚本用法

[kingjw@hadoop102 bin]$ dws_1d_to_dws_td_init.sh all 2023-01-10

2)每日数据装载脚本

(1)在hadoop102的/home/kingjw/bin目录下创建dws_1d_to_dws_td.sh

[kingjw@hadoop102 bin]$ vim dws_1d_to_dws_td.sh 

(2)编写如下内容

#! /bin/bash
#1、判断参数是否传入
if [ $# -lt 1 ]
then
	echo "必须传入all/表名..."
	exit
fi

#2、判断日期是否传入
[ "$2" ] && datestr=$2 || datestr=$(date -d '-1 day' +%F)

dws_trans_dispatch_td_sql="
insert overwrite table dws_trans_dispatch_td
    partition (dt = '${datestr}')
select sum(order_count)  order_count,
       sum(order_amount) order_amount
from (select order_count,
             order_amount
      from dws_trans_dispatch_td
      where dt = date_add('${datestr}', -1)
      union
      select order_count,
             order_amount
      from dws_trans_dispatch_1d
      where dt = '${datestr}') all_data;
"

dws_trans_bound_finish_td_sql="
insert overwrite table dws_trans_bound_finish_td
    partition (dt = '${datestr}')
select sum(order_count)  order_count,
       sum(order_amount) order_amount
from (select order_count,
             order_amount
      from dws_trans_bound_finish_td
      where dt = date_add('${datestr}', -1)
      union
      select count(order_id)   order_count,
             sum(order_amount) order_amount
      from (select order_id,
                   max(amount) order_amount
            from dwd_trans_bound_finish_detail_inc
            where dt = '${datestr}'
            group by order_id) distinct_tb) all_data;
"
#3、匹配表名加载数据
case $1 in
"all")
	/opt/module/hive/bin/hive -e "use tms;${dws_trans_bound_finish_td_sql}${dws_trans_dispatch_td_sql}"
;;
"dws_trans_bound_finish_td")
    /opt/module/hive/bin/hive -e "use tms;${dws_trans_bound_finish_td_sql}"
;;
"dws_trans_dispatch_td")
    /opt/module/hive/bin/hive -e "use tms;${dws_trans_dispatch_td_sql}"
;;
*)
	echo "表名输入错误..."
;;
esac

(3)增加脚本执行权限

[kingjw@hadoop102 bin]$ chmod +x dws_1d_to_dws_td.sh

(4)脚本用法

[kingjw@hadoop102 bin]$ dws_1d_to_dws_td.sh all 2023-01-11

11、数仓开发之ADS层

Ads层是数据本身数据量很少,所以不需要分区。

在计算完数据之后将数据用insert加在表后面,如果hql逻辑有问题,修改后重复运行会有重复数据。而用overwrite覆盖写入,会覆盖掉历史数据。可以将上一天之前的数据全部取出,unit all 上新计算的数据,再overwrite写入就可以了。

11.1、物流主题

11.1.1、运单相关统计

需求说明如下。

统计周期

统计粒度

指标

说明

最近 1/7/30 日

--

接单总数

最近 1/7/30 日

--

接单金额

最近 1/7/30 日

--

发单总数

最近 1/7/30 日

--

发单金额

1)建表语句

drop table if exists ads_trans_order_stats;
create external table ads_trans_order_stats(
  `dt` string COMMENT '统计日期',
  `recent_days` tinyint COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
  `receive_order_count` bigint COMMENT '接单总数',
  `receive_order_amount` decimal(16,2) COMMENT '接单金额',
  `dispatch_order_count` bigint COMMENT '发单总数',
  `dispatch_order_amount` decimal(16,2) COMMENT '发单金额'
) comment '运单相关统计'
    row format delimited fields terminated by '\t'
    location '/warehouse/tms/ads/ads_trans_order_stats';

2)数据装载

insert overwrite table ads_trans_order_stats
select dt,
       recent_days,
       receive_order_count,
       receive_order_amount,
       dispatch_order_count,
       dispatch_order_amount
from ads_trans_order_stats
where dt != '2023-01-10' -- 防止一次运行报错,修改后二次运行导致数据重复
union
select '2023-01-10'                                         dt,
       nvl(receive_1d.recent_days, dispatch_1d.recent_days) recent_days,
       receive_order_count,
       receive_order_amount,
       dispatch_order_count,
       dispatch_order_amount
from (select 1                 recent_days,
             sum(order_count)  receive_order_count,
             sum(order_amount) receive_order_amount
      from dws_trans_org_receive_1d
      where dt = '2023-01-10') receive_1d
         full outer join
     (select 1            recent_days,
             order_count  dispatch_order_count,
             order_amount dispatch_order_amount
      from dws_trans_dispatch_1d
      where dt = '2023-01-10') dispatch_1d
     on receive_1d.recent_days = dispatch_1d.recent_days
union
select '2023-01-10'                                         dt,
       nvl(receive_nd.recent_days, dispatch_nd.recent_days) recent_days,
       receive_order_count,
       receive_order_amount,
       dispatch_order_count,
       dispatch_order_amount
from (select recent_days,
             sum(order_count)  receive_order_count,
             sum(order_amount) receive_order_amount
      from dws_trans_org_receive_nd
      where dt = '2023-01-10'
      group by recent_days) receive_nd
         full outer join
     (select recent_days,
             order_count  dispatch_order_count,
             order_amount dispatch_order_amount
      from dws_trans_dispatch_nd
      where dt = '2023-01-10') dispatch_nd
     on receive_nd.recent_days = dispatch_nd.recent_days;

11.2、数据装载脚本

1)每日数据装载脚本

(1)在hadoop102的/home/kingjw/bin目录下创建dws_to_ads.sh

[kingjw@hadoop102 bin]$ vim dws_to_ads.sh 

(2)编写如下内容

#! /bin/bash
#1、判断参数是否传入
if [ $# -lt 1 ]
then
	echo "必须传入all/表名..."
	exit
fi

#2、判断日期是否传入
[ "$2" ] && datestr=$2 || datestr=$(date -d '-1 day' +%F)

ads_trans_order_stats_sql="
insert overwrite table ads_trans_order_stats
select dt,
       recent_days,
       receive_order_count,
       receive_order_amount,
       dispatch_order_count,
       dispatch_order_amount
from ads_trans_order_stats
where dt != '${datestr}'
union
select '${datestr}'                                         dt,
       nvl(receive_1d.recent_days, dispatch_1d.recent_days) recent_days,
       receive_order_count,
       receive_order_amount,
       dispatch_order_count,
       dispatch_order_amount
from (select 1                 recent_days,
             sum(order_count)  receive_order_count,
             sum(order_amount) receive_order_amount
      from dws_trans_org_receive_1d
      where dt = '${datestr}') receive_1d
         full outer join
     (select 1            recent_days,
             order_count  dispatch_order_count,
             order_amount dispatch_order_amount
      from dws_trans_dispatch_1d
      where dt = '${datestr}') dispatch_1d
     on receive_1d.recent_days = dispatch_1d.recent_days
union
select '${datestr}'                                         dt,
       nvl(receive_nd.recent_days, dispatch_nd.recent_days) recent_days,
       receive_order_count,
       receive_order_amount,
       dispatch_order_count,
       dispatch_order_amount
from (select recent_days,
             sum(order_count)  receive_order_count,
             sum(order_amount) receive_order_amount
      from dws_trans_org_receive_nd
      where dt = '${datestr}'
      group by recent_days) receive_nd
         full outer join
     (select recent_days,
             order_count  dispatch_order_count,
             order_amount dispatch_order_amount
      from dws_trans_dispatch_nd
      where dt = '${datestr}') dispatch_nd
     on receive_nd.recent_days = dispatch_nd.recent_days;
"

ads_trans_stats_sql="
insert overwrite table ads_trans_stats
select dt,
       recent_days,
       trans_finish_count,
       trans_finish_distance,
       trans_finish_dur_sec
from ads_trans_stats
where dt != '${datestr}'
union
select '${datestr}'               dt,
       1                          recent_days,
       sum(trans_finish_count)    trans_finish_count,
       sum(trans_finish_distance) trans_finish_distance,
       sum(trans_finish_dur_sec)  trans_finish_dur_sec
from dws_trans_org_truck_model_type_trans_finish_1d
where dt = '${datestr}'
union
select '${datestr}'               dt,
       recent_days,
       sum(trans_finish_count)    trans_finish_count,
       sum(trans_finish_distance) trans_finish_distance,
       sum(trans_finish_dur_sec)  trans_finish_dur_sec
from dws_trans_shift_trans_finish_nd
where dt = '${datestr}'
group by recent_days;
"
ads_trans_order_stats_td_sql="
insert overwrite table ads_trans_order_stats_td
select dt,
       bounding_order_count,
       bounding_order_amount
from ads_trans_order_stats_td
where dt != '${datestr}'
union
select dt,
       sum(order_count)  bounding_order_count,
       sum(order_amount) bounding_order_amount
from (select dt,
             order_count,
             order_amount
      from dws_trans_dispatch_td
      where dt = '${datestr}'
      union
      select dt,
             order_count * (-1),
             order_amount * (-1)
      from dws_trans_bound_finish_td
      where dt = '${datestr}') new
group by dt;
"

ads_order_stats_sql="
insert overwrite table ads_order_stats
select dt,
       recent_days,
       order_count,
       order_amount
from ads_order_stats
where dt != '${datestr}'
union
select '${datestr}'      dt,
       1                 recent_days,
       sum(order_count)  order_count,
       sum(order_amount) order_amount
from dws_trade_org_cargo_type_order_1d
where dt = '${datestr}'
union
select '${datestr}'      dt,
       recent_days,
       sum(order_count)  order_count,
       sum(order_amount) order_amount
from dws_trade_org_cargo_type_order_nd
where dt = '${datestr}'
group by recent_days;
"

ads_order_cargo_type_stats_sql="
insert overwrite table ads_order_cargo_type_stats
select dt,
       recent_days,
       cargo_type,
       cargo_type_name,
       order_count,
       order_amount
from ads_order_cargo_type_stats
where dt != '${datestr}'
union
select '${datestr}'      dt,
       1                 recent_days,
       cargo_type,
       cargo_type_name,
       sum(order_count)  order_count,
       sum(order_amount) order_amount
from dws_trade_org_cargo_type_order_1d
where dt = '${datestr}'
group by cargo_type,
         cargo_type_name
union
select '${datestr}'      dt,
       recent_days,
       cargo_type,
       cargo_type_name,
       sum(order_count)  order_count,
       sum(order_amount) order_amount
from dws_trade_org_cargo_type_order_nd
where dt = '${datestr}'
group by cargo_type,
         cargo_type_name,
         recent_days;
"

ads_city_stats_sql="
insert overwrite table ads_city_stats
select dt,
       recent_days,
       city_id,
       city_name,
       order_count,
       order_amount,
       trans_finish_count,
       trans_finish_distance,
       trans_finish_dur_sec,
       avg_trans_finish_distance,
       avg_trans_finish_dur_sec
from ads_city_stats
where dt != '${datestr}'
union
select nvl(city_order_1d.dt, city_trans_1d.dt)                   dt,
       nvl(city_order_1d.recent_days, city_trans_1d.recent_days) recent_days,
       nvl(city_order_1d.city_id, city_trans_1d.city_id)         city_id,
       nvl(city_order_1d.city_name, city_trans_1d.city_name)     city_name,
       order_count,
       order_amount,
       trans_finish_count,
       trans_finish_distance,
       trans_finish_dur_sec,
       avg_trans_finish_distance,
       avg_trans_finish_dur_sec
from (select '${datestr}'      dt,
             1                 recent_days,
             city_id,
             city_name,
             sum(order_count)  order_count,
             sum(order_amount) order_amount
      from dws_trade_org_cargo_type_order_1d
      where dt = '${datestr}'
      group by city_id,
               city_name) city_order_1d
         full outer join
     (select '${datestr}'                                         dt,
             1                                                    recent_days,
             city_id,
             city_name,
             sum(trans_finish_count)                              trans_finish_count,
             sum(trans_finish_distance)                           trans_finish_distance,
             sum(trans_finish_dur_sec)                            trans_finish_dur_sec,
             sum(trans_finish_distance) / sum(trans_finish_count) avg_trans_finish_distance,
             sum(trans_finish_dur_sec) / sum(trans_finish_count)  avg_trans_finish_dur_sec
      from (select if(org_level = 1, city_for_level1.id, city_for_level2.id)     city_id,
                   if(org_level = 1, city_for_level1.name, city_for_level2.name) city_name,
                   trans_finish_count,
                   trans_finish_distance,
                   trans_finish_dur_sec
            from (select org_id,
                         trans_finish_count,
                         trans_finish_distance,
                         trans_finish_dur_sec
                  from dws_trans_org_truck_model_type_trans_finish_1d
                  where dt = '${datestr}') trans_origin
                     left join
                 (select id,
                         org_level,
                         region_id
                  from dim_organ_full
                  where dt = '${datestr}') organ
                 on org_id = organ.id
                     left join
                 (select id,
                         name,
                         parent_id
                  from dim_region_full
                  where dt = '${datestr}') city_for_level1
                 on region_id = city_for_level1.id
                     left join
                 (select id,
                         name
                  from dim_region_full
                  where dt = '${datestr}') city_for_level2
                 on city_for_level1.parent_id = city_for_level2.id) trans_1d
      group by city_id,
               city_name) city_trans_1d
     on city_order_1d.dt = city_trans_1d.dt
         and city_order_1d.recent_days = city_trans_1d.recent_days
         and city_order_1d.city_id = city_trans_1d.city_id
         and city_order_1d.city_name = city_trans_1d.city_name
union
select nvl(city_order_nd.dt, city_trans_nd.dt)                   dt,
       nvl(city_order_nd.recent_days, city_trans_nd.recent_days) recent_days,
       nvl(city_order_nd.city_id, city_trans_nd.city_id)         city_id,
       nvl(city_order_nd.city_name, city_trans_nd.city_name)     city_name,
       order_count,
       order_amount,
       trans_finish_count,
       trans_finish_distance,
       trans_finish_dur_sec,
       avg_trans_finish_distance,
       avg_trans_finish_dur_sec
from (select '${datestr}'      dt,
             recent_days,
             city_id,
             city_name,
             sum(order_count)  order_count,
             sum(order_amount) order_amount
      from dws_trade_org_cargo_type_order_nd
      where dt = '${datestr}'
      group by city_id,
               city_name,
               recent_days) city_order_nd
         full outer join
     (select '${datestr}'                                         dt,
             city_id,
             city_name,
             recent_days,
             sum(trans_finish_count)                              trans_finish_count,
             sum(trans_finish_distance)                           trans_finish_distance,
             sum(trans_finish_dur_sec)                            trans_finish_dur_sec,
             sum(trans_finish_distance) / sum(trans_finish_count) avg_trans_finish_distance,
             sum(trans_finish_dur_sec) / sum(trans_finish_count)  avg_trans_finish_dur_sec
      from dws_trans_shift_trans_finish_nd
      where dt = '${datestr}'
      group by city_id,
               city_name,
               recent_days
     ) city_trans_nd
     on city_order_nd.dt = city_trans_nd.dt
         and city_order_nd.recent_days = city_trans_nd.recent_days
         and city_order_nd.city_id = city_trans_nd.city_id
         and city_order_nd.city_name = city_trans_nd.city_name;
"

ads_org_stats_sql="
insert overwrite table ads_org_stats
select dt,
       recent_days,
       org_id,
       org_name,
       order_count,
       order_amount,
       trans_finish_count,
       trans_finish_distance,
       trans_finish_dur_sec,
       avg_trans_finish_distance,
       avg_trans_finish_dur_sec
from ads_org_stats
where dt != '${datestr}'
union
select nvl(org_order_1d.dt, org_trans_1d.dt)                   dt,
       nvl(org_order_1d.recent_days, org_trans_1d.recent_days) recent_days,
       nvl(org_order_1d.org_id, org_trans_1d.org_id)           org_id,
       nvl(org_order_1d.org_name, org_trans_1d.org_name)       org_name,
       order_count,
       order_amount,
       trans_finish_count,
       trans_finish_distance,
       trans_finish_dur_sec,
       avg_trans_finish_distance,
       avg_trans_finish_dur_sec
from (select '${datestr}'      dt,
             1                 recent_days,
             org_id,
             org_name,
             sum(order_count)  order_count,
             sum(order_amount) order_amount
      from dws_trade_org_cargo_type_order_1d
      where dt = '${datestr}'
      group by org_id,
               org_name) org_order_1d
         full outer join
     (select '${datestr}'                                         dt,
             org_id,
             org_name,
             1                                                    recent_days,
             sum(trans_finish_count)                              trans_finish_count,
             sum(trans_finish_distance)                           trans_finish_distance,
             sum(trans_finish_dur_sec)                            trans_finish_dur_sec,
             sum(trans_finish_distance) / sum(trans_finish_count) avg_trans_finish_distance,
             sum(trans_finish_dur_sec) / sum(trans_finish_count)  avg_trans_finish_dur_sec
      from dws_trans_org_truck_model_type_trans_finish_1d
      where dt = '${datestr}'
      group by org_id,
               org_name
     ) org_trans_1d
     on org_order_1d.dt = org_trans_1d.dt
         and org_order_1d.recent_days = org_trans_1d.recent_days
         and org_order_1d.org_id = org_trans_1d.org_id
         and org_order_1d.org_name = org_trans_1d.org_name
union
select org_order_nd.dt,
       org_order_nd.recent_days,
       org_order_nd.org_id,
       org_order_nd.org_name,
       order_count,
       order_amount,
       trans_finish_count,
       trans_finish_distance,
       trans_finish_dur_sec,
       avg_trans_finish_distance,
       avg_trans_finish_dur_sec
from (select '${datestr}'      dt,
             recent_days,
             org_id,
             org_name,
             sum(order_count)  order_count,
             sum(order_amount) order_amount
      from dws_trade_org_cargo_type_order_nd
      where dt = '${datestr}'
      group by org_id,
               org_name,
               recent_days) org_order_nd
         join
     (select '${datestr}'                                         dt,
             recent_days,
             org_id,
             org_name,
             sum(trans_finish_count)                              trans_finish_count,
             sum(trans_finish_distance)                           trans_finish_distance,
             sum(trans_finish_dur_sec)                            trans_finish_dur_sec,
             sum(trans_finish_distance) / sum(trans_finish_count) avg_trans_finish_distance,
             sum(trans_finish_dur_sec) / sum(trans_finish_count)  avg_trans_finish_dur_sec
      from dws_trans_shift_trans_finish_nd
      where dt = '${datestr}'
      group by org_id,
               org_name,
               recent_days
     ) org_trans_nd
     on org_order_nd.dt = org_trans_nd.dt
         and org_order_nd.recent_days = org_trans_nd.recent_days
         and org_order_nd.org_id = org_trans_nd.org_id
         and org_order_nd.org_name = org_trans_nd.org_name;
"

ads_shift_stats_sql="
insert overwrite table ads_shift_stats
select dt,
       recent_days,
       shift_id,
       trans_finish_count,
       trans_finish_distance,
       trans_finish_dur_sec,
       trans_finish_order_count
from ads_shift_stats
where dt != '${datestr}'
union
select '${datestr}' dt,
       recent_days,
       shift_id,
       trans_finish_count,
       trans_finish_distance,
       trans_finish_dur_sec,
       trans_finish_order_count
from dws_trans_shift_trans_finish_nd
where dt = '${datestr}';
"

ads_line_stats_sql="
insert overwrite table ads_line_stats
select dt,
       recent_days,
       line_id,
       line_name,
       trans_finish_count,
       trans_finish_distance,
       trans_finish_dur_sec,
       trans_finish_order_count
from ads_line_stats
where dt != '${datestr}'
union
select '${datestr}'                  dt,
       recent_days,
       line_id,
       line_name,
       sum(trans_finish_count)       trans_finish_count,
       sum(trans_finish_distance)    trans_finish_distance,
       sum(trans_finish_dur_sec)     trans_finish_dur_sec,
       sum(trans_finish_order_count) trans_finish_order_count
from dws_trans_shift_trans_finish_nd
where dt = '${datestr}'
group by line_id,
         line_name,
         recent_days;
"

ads_driver_stats_sql="
insert overwrite table ads_driver_stats
select dt,
       recent_days,
       driver_emp_id,
       driver_name,
       trans_finish_count,
       trans_finish_distance,
       trans_finish_dur_sec,
       avg_trans_finish_distance,
       avg_trans_finish_dur_sec,
       trans_finish_late_count
from ads_driver_stats
where dt != '${datestr}'
union
select '${datestr}'                                         dt,
       recent_days,
       driver_id,
       driver_name,
       sum(trans_finish_count)                              trans_finish_count,
       sum(trans_finish_distance)                           trans_finish_distance,
       sum(trans_finish_dur_sec)                            trans_finish_dur_sec,
       sum(trans_finish_distance) / sum(trans_finish_count) avg_trans_finish_distance,
       sum(trans_finish_dur_sec) / sum(trans_finish_count)  avg_trans_finish_dur_sec,
       sum(trans_finish_delay_count)                        trans_finish_delay_count
from (select recent_days,
             driver1_emp_id driver_id,
             driver1_name   driver_name,
             trans_finish_count,
             trans_finish_distance,
             trans_finish_dur_sec,
             trans_finish_delay_count
      from dws_trans_shift_trans_finish_nd
      where dt = '${datestr}'
        and driver2_emp_id is null
      union
      select recent_days,
             cast(driver_info[0] as bigint) driver_id,
             driver_info[1] driver_name,
             trans_finish_count,
             trans_finish_distance,
             trans_finish_dur_sec,
             trans_finish_delay_count
      from (select recent_days,
                   array(array(driver1_emp_id, driver1_name),
                         array(driver2_emp_id, driver2_name)) driver_arr,
                   trans_finish_count,
                   trans_finish_distance / 2                  trans_finish_distance,
                   trans_finish_dur_sec / 2                   trans_finish_dur_sec,
                   trans_finish_delay_count
            from dws_trans_shift_trans_finish_nd
            where dt = '${datestr}'
              and driver2_emp_id is not null) t1
               lateral view explode(driver_arr) tmp as driver_info) t2
group by driver_id,
         driver_name,
         recent_days;
"

ads_truck_stats_sql="
insert overwrite table ads_truck_stats
select dt,
       recent_days,
       truck_model_type,
       truck_model_type_name,
       trans_finish_count,
       trans_finish_distance,
       trans_finish_dur_sec,
       avg_trans_finish_distance,
       avg_trans_finish_dur_sec
from ads_truck_stats
where dt != '${datestr}'
union
select '${datestr}'                                         dt,
       recent_days,
       truck_model_type,
       truck_model_type_name,
       sum(trans_finish_count)                              trans_finish_count,
       sum(trans_finish_distance)                           trans_finish_distance,
       sum(trans_finish_dur_sec)                            trans_finish_dur_sec,
       sum(trans_finish_distance) / sum(trans_finish_count) avg_trans_finish_distance,
       sum(trans_finish_dur_sec) / sum(trans_finish_count)  avg_trans_finish_dur_sec
from dws_trans_shift_trans_finish_nd
where dt = '${datestr}'
group by truck_model_type,
         truck_model_type_name,
         recent_days;
"

ads_express_stats_sql="
insert overwrite table ads_truck_stats
select dt,
       recent_days,
       truck_model_type,
       truck_model_type_name,
       trans_finish_count,
       trans_finish_distance,
       trans_finish_dur_sec,
       avg_trans_finish_distance,
       avg_trans_finish_dur_sec
from ads_truck_stats
where dt != '${datestr}'
union
select '${datestr}'                                         dt,
       recent_days,
       truck_model_type,
       truck_model_type_name,
       sum(trans_finish_count)                              trans_finish_count,
       sum(trans_finish_distance)                           trans_finish_distance,
       sum(trans_finish_dur_sec)                            trans_finish_dur_sec,
       sum(trans_finish_distance) / sum(trans_finish_count) avg_trans_finish_distance,
       sum(trans_finish_dur_sec) / sum(trans_finish_count)  avg_trans_finish_dur_sec
from dws_trans_shift_trans_finish_nd
where dt = '${datestr}'
group by truck_model_type,
         truck_model_type_name,
         recent_days;insert overwrite table ads_express_stats
select dt,
       recent_days,
       deliver_suc_count,
       sort_count
from ads_express_stats
union
select nvl(deliver_1d.dt, sort_1d.dt)                   dt,
       nvl(deliver_1d.recent_days, sort_1d.recent_days) recent_days,
       deliver_suc_count,
       sort_count
from (select '${datestr}'     dt,
             1                recent_days,
             sum(order_count) deliver_suc_count
      from dws_trans_org_deliver_suc_1d
      where dt = '${datestr}') deliver_1d
         full outer join
     (select '${datestr}'    dt,
             1               recent_days,
             sum(sort_count) sort_count
      from dws_trans_org_sort_1d
      where dt = '${datestr}') sort_1d
     on deliver_1d.dt = sort_1d.dt
         and deliver_1d.recent_days = sort_1d.recent_days
union
select nvl(deliver_nd.dt, sort_nd.dt)                   dt,
       nvl(deliver_nd.recent_days, sort_nd.recent_days) recent_days,
       deliver_suc_count,
       sort_count
from (select '${datestr}'     dt,
             recent_days,
             sum(order_count) deliver_suc_count
      from dws_trans_org_deliver_suc_nd
      where dt = '${datestr}'
      group by recent_days) deliver_nd
         full outer join
     (select '${datestr}'    dt,
             recent_days,
             sum(sort_count) sort_count
      from dws_trans_org_sort_nd
      where dt = '${datestr}'
      group by recent_days) sort_nd
     on deliver_nd.dt = sort_nd.dt
         and deliver_nd.recent_days = sort_nd.recent_days;
"

ads_express_province_stats_sql="
insert overwrite table ads_express_province_stats
select dt,
       recent_days,
       province_id,
       province_name,
       receive_order_count,
       receive_order_amount,
       deliver_suc_count,
       sort_count
from ads_express_province_stats
where dt != '${datestr}'
union
select nvl(nvl(province_deliver_1d.dt, province_sort_1d.dt), province_receive_1d.dt) dt,
       nvl(nvl(province_deliver_1d.recent_days, province_sort_1d.recent_days),
           province_receive_1d.recent_days)                                          recent_days,
       nvl(nvl(province_deliver_1d.province_id, province_sort_1d.province_id),
           province_receive_1d.province_id)                                          province_id,
       nvl(nvl(province_deliver_1d.province_name, province_sort_1d.province_name),
           province_receive_1d.province_name)                                        province_name,
       receive_order_count,
       receive_order_amount,
       deliver_suc_count,
       sort_count
from (select '${datestr}'     dt,
             1                recent_days,
             province_id,
             province_name,
             sum(order_count) deliver_suc_count
      from dws_trans_org_deliver_suc_1d
      where dt = '${datestr}'
      group by province_id,
               province_name) province_deliver_1d
         full outer join
     (select '${datestr}'    dt,
             1               recent_days,
             province_id,
             province_name,
             sum(sort_count) sort_count
      from dws_trans_org_sort_1d
      where dt = '${datestr}'
      group by province_id,
               province_name) province_sort_1d
     on province_deliver_1d.dt = province_sort_1d.dt
         and province_deliver_1d.recent_days = province_sort_1d.recent_days
         and province_deliver_1d.province_id = province_sort_1d.province_id
         and province_deliver_1d.province_name = province_sort_1d.province_name
         full outer join
     (select '${datestr}'      dt,
             1                 recent_days,
             province_id,
             province_name,
             sum(order_count)  receive_order_count,
             sum(order_amount) receive_order_amount
      from dws_trans_org_receive_1d
      where dt = '${datestr}'
      group by province_id,
               province_name) province_receive_1d
     on province_deliver_1d.dt = province_receive_1d.dt
         and province_deliver_1d.recent_days = province_receive_1d.recent_days
         and province_deliver_1d.province_id = province_receive_1d.province_id
         and province_deliver_1d.province_name = province_receive_1d.province_name
union
select nvl(nvl(province_deliver_nd.dt, province_sort_nd.dt), province_receive_nd.dt) dt,
       nvl(nvl(province_deliver_nd.recent_days, province_sort_nd.recent_days),
           province_receive_nd.recent_days)                                          recent_days,
       nvl(nvl(province_deliver_nd.province_id, province_sort_nd.province_id),
           province_receive_nd.province_id)                                          province_id,
       nvl(nvl(province_deliver_nd.province_name, province_sort_nd.province_name),
           province_receive_nd.province_name)                                        province_name,
       receive_order_count,
       receive_order_amount,
       deliver_suc_count,
       sort_count
from (select '${datestr}'     dt,
             recent_days,
             province_id,
             province_name,
             sum(order_count) deliver_suc_count
      from dws_trans_org_deliver_suc_nd
      where dt = '${datestr}'
      group by recent_days,
               province_id,
               province_name) province_deliver_nd
         full outer join
     (select '${datestr}'    dt,
             recent_days,
             province_id,
             province_name,
             sum(sort_count) sort_count
      from dws_trans_org_sort_nd
      where dt = '${datestr}'
      group by recent_days,
               province_id,
               province_name) province_sort_nd
     on province_deliver_nd.dt = province_sort_nd.dt
         and province_deliver_nd.recent_days = province_sort_nd.recent_days
         and province_deliver_nd.province_id = province_sort_nd.province_id
         and province_deliver_nd.province_name = province_sort_nd.province_name
         full outer join
     (select '${datestr}'      dt,
             recent_days,
             province_id,
             province_name,
             sum(order_count)  receive_order_count,
             sum(order_amount) receive_order_amount
      from dws_trans_org_receive_nd
      where dt = '${datestr}'
      group by recent_days,
               province_id,
               province_name) province_receive_nd
     on province_deliver_nd.dt = province_receive_nd.dt
         and province_deliver_nd.recent_days = province_receive_nd.recent_days
         and province_deliver_nd.province_id = province_receive_nd.province_id
         and province_deliver_nd.province_name = province_receive_nd.province_name;
"

ads_express_city_stats_sql="
insert overwrite table ads_express_city_stats
select dt,
       recent_days,
       city_id,
       city_name,
       receive_order_count,
       receive_order_amount,
       deliver_suc_count,
       sort_count
from ads_express_city_stats
where dt != '${datestr}'
union
select nvl(nvl(city_deliver_1d.dt, city_sort_1d.dt), city_receive_1d.dt) dt,
       nvl(nvl(city_deliver_1d.recent_days, city_sort_1d.recent_days),
           city_receive_1d.recent_days)                                  recent_days,
       nvl(nvl(city_deliver_1d.city_id, city_sort_1d.city_id),
           city_receive_1d.city_id)                                      city_id,
       nvl(nvl(city_deliver_1d.city_name, city_sort_1d.city_name),
           city_receive_1d.city_name)                                    city_name,
       receive_order_count,
       receive_order_amount,
       deliver_suc_count,
       sort_count
from (select '${datestr}'     dt,
             1                recent_days,
             city_id,
             city_name,
             sum(order_count) deliver_suc_count
      from dws_trans_org_deliver_suc_1d
      where dt = '${datestr}'
      group by city_id,
               city_name) city_deliver_1d
         full outer join
     (select '${datestr}'    dt,
             1               recent_days,
             city_id,
             city_name,
             sum(sort_count) sort_count
      from dws_trans_org_sort_1d
      where dt = '${datestr}'
      group by city_id,
               city_name) city_sort_1d
     on city_deliver_1d.dt = city_sort_1d.dt
         and city_deliver_1d.recent_days = city_sort_1d.recent_days
         and city_deliver_1d.city_id = city_sort_1d.city_id
         and city_deliver_1d.city_name = city_sort_1d.city_name
         full outer join
     (select '${datestr}'      dt,
             1                 recent_days,
             city_id,
             city_name,
             sum(order_count)  receive_order_count,
             sum(order_amount) receive_order_amount
      from dws_trans_org_receive_1d
      where dt = '${datestr}'
      group by city_id,
               city_name) city_receive_1d
     on city_deliver_1d.dt = city_receive_1d.dt
         and city_deliver_1d.recent_days = city_receive_1d.recent_days
         and city_deliver_1d.city_id = city_receive_1d.city_id
         and city_deliver_1d.city_name = city_receive_1d.city_name
union
select nvl(nvl(city_deliver_nd.dt, city_sort_nd.dt), city_receive_nd.dt) dt,
       nvl(nvl(city_deliver_nd.recent_days, city_sort_nd.recent_days),
           city_receive_nd.recent_days)                                  recent_days,
       nvl(nvl(city_deliver_nd.city_id, city_sort_nd.city_id),
           city_receive_nd.city_id)                                      city_id,
       nvl(nvl(city_deliver_nd.city_name, city_sort_nd.city_name),
           city_receive_nd.city_name)                                    city_name,
       receive_order_count,
       receive_order_amount,
       deliver_suc_count,
       sort_count
from (select '${datestr}'     dt,
             recent_days,
             city_id,
             city_name,
             sum(order_count) deliver_suc_count
      from dws_trans_org_deliver_suc_nd
      where dt = '${datestr}'
      group by recent_days,
               city_id,
               city_name) city_deliver_nd
         full outer join
     (select '${datestr}'    dt,
             recent_days,
             city_id,
             city_name,
             sum(sort_count) sort_count
      from dws_trans_org_sort_nd
      where dt = '${datestr}'
      group by recent_days,
               city_id,
               city_name) city_sort_nd
     on city_deliver_nd.dt = city_sort_nd.dt
         and city_deliver_nd.recent_days = city_sort_nd.recent_days
         and city_deliver_nd.city_id = city_sort_nd.city_id
         and city_deliver_nd.city_name = city_sort_nd.city_name
         full outer join
     (select '${datestr}'      dt,
             recent_days,
             city_id,
             city_name,
             sum(order_count)  receive_order_count,
             sum(order_amount) receive_order_amount
      from dws_trans_org_receive_nd
      where dt = '${datestr}'
      group by recent_days,
               city_id,
               city_name) city_receive_nd
     on city_deliver_nd.dt = city_receive_nd.dt
         and city_deliver_nd.recent_days = city_receive_nd.recent_days
         and city_deliver_nd.city_id = city_receive_nd.city_id
         and city_deliver_nd.city_name = city_receive_nd.city_name;
"

ads_express_org_stats_sql="
insert overwrite table ads_express_org_stats
select dt,
       recent_days,
       org_id,
       org_name,
       receive_order_count,
       receive_order_amount,
       deliver_suc_count,
       sort_count
from ads_express_org_stats
where dt != '${datestr}'
union
select nvl(nvl(org_deliver_1d.dt, org_sort_1d.dt), org_receive_1d.dt) dt,
       nvl(nvl(org_deliver_1d.recent_days, org_sort_1d.recent_days),
           org_receive_1d.recent_days)                                recent_days,
       nvl(nvl(org_deliver_1d.org_id, org_sort_1d.org_id),
           org_receive_1d.org_id)                                     org_id,
       nvl(nvl(org_deliver_1d.org_name, org_sort_1d.org_name),
           org_receive_1d.org_name)                                   org_name,
       receive_order_count,
       receive_order_amount,
       deliver_suc_count,
       sort_count
from (select '${datestr}'     dt,
             1                recent_days,
             org_id,
             org_name,
             sum(order_count) deliver_suc_count
      from dws_trans_org_deliver_suc_1d
      where dt = '${datestr}'
      group by org_id,
               org_name) org_deliver_1d
         full outer join
     (select '${datestr}'    dt,
             1               recent_days,
             org_id,
             org_name,
             sum(sort_count) sort_count
      from dws_trans_org_sort_1d
      where dt = '${datestr}'
      group by org_id,
               org_name) org_sort_1d
     on org_deliver_1d.dt = org_sort_1d.dt
         and org_deliver_1d.recent_days = org_sort_1d.recent_days
         and org_deliver_1d.org_id = org_sort_1d.org_id
         and org_deliver_1d.org_name = org_sort_1d.org_name
         full outer join
     (select '${datestr}'      dt,
             1                 recent_days,
             org_id,
             org_name,
             sum(order_count)  receive_order_count,
             sum(order_amount) receive_order_amount
      from dws_trans_org_receive_1d
      where dt = '${datestr}'
      group by org_id,
               org_name) org_receive_1d
     on org_deliver_1d.dt = org_receive_1d.dt
         and org_deliver_1d.recent_days = org_receive_1d.recent_days
         and org_deliver_1d.org_id = org_receive_1d.org_id
         and org_deliver_1d.org_name = org_receive_1d.org_name
union
select nvl(nvl(org_deliver_nd.dt, org_sort_nd.dt), org_receive_nd.dt) dt,
       nvl(nvl(org_deliver_nd.recent_days, org_sort_nd.recent_days),
           org_receive_nd.recent_days)                                recent_days,
       nvl(nvl(org_deliver_nd.org_id, org_sort_nd.org_id),
           org_receive_nd.org_id)                                     org_id,
       nvl(nvl(org_deliver_nd.org_name, org_sort_nd.org_name),
           org_receive_nd.org_name)                                   org_name,
       receive_order_count,
       receive_order_amount,
       deliver_suc_count,
       sort_count
from (select '${datestr}'     dt,
             recent_days,
             org_id,
             org_name,
             sum(order_count) deliver_suc_count
      from dws_trans_org_deliver_suc_nd
      where dt = '${datestr}'
      group by recent_days,
               org_id,
               org_name) org_deliver_nd
         full outer join
     (select '${datestr}'    dt,
             recent_days,
             org_id,
             org_name,
             sum(sort_count) sort_count
      from dws_trans_org_sort_nd
      where dt = '${datestr}'
      group by recent_days,
               org_id,
               org_name) org_sort_nd
     on org_deliver_nd.dt = org_sort_nd.dt
         and org_deliver_nd.recent_days = org_sort_nd.recent_days
         and org_deliver_nd.org_id = org_sort_nd.org_id
         and org_deliver_nd.org_name = org_sort_nd.org_name
         full outer join
     (select '${datestr}'      dt,
             recent_days,
             org_id,
             org_name,
             sum(order_count)  receive_order_count,
             sum(order_amount) receive_order_amount
      from dws_trans_org_receive_nd
      where dt = '${datestr}'
      group by recent_days,
               org_id,
               org_name) org_receive_nd
     on org_deliver_nd.dt = org_receive_nd.dt
         and org_deliver_nd.recent_days = org_receive_nd.recent_days
         and org_deliver_nd.org_id = org_receive_nd.org_id
         and org_deliver_nd.org_name = org_receive_nd.org_name;
"
#3、匹配表名加载数据
case $1 in
"all")
	/opt/module/hive/bin/hive -e "use tms;${ads_city_stats_sql}${ads_driver_stats_sql}${ads_express_city_stats_sql}${ads_express_org_stats_sql}${ads_express_province_stats_sql}${ads_express_stats_sql}${ads_line_stats_sql}${ads_order_cargo_type_stats_sql}${ads_order_stats_sql}${ads_org_stats_sql}${ads_shift_stats_sql}${ads_trans_order_stats_sql}${ads_trans_order_stats_td_sql}${ads_trans_stats_sql}${ads_truck_stats_sql}"
;;
"ads_city_stats")
    /opt/module/hive/bin/hive -e "use tms;${ads_city_stats_sql}"
;;
"ads_driver_stats")
    /opt/module/hive/bin/hive -e "use tms;${ads_driver_stats_sql}"
;;
"ads_express_city_stats")
    /opt/module/hive/bin/hive -e "use tms;${ads_express_city_stats_sql}"
;;
"ads_express_org_stats")
    /opt/module/hive/bin/hive -e "use tms;${ads_express_org_stats_sql}"
;;
"ads_express_province_stats")
    /opt/module/hive/bin/hive -e "use tms;${ads_express_province_stats_sql}"
;;
"ads_express_stats")
    /opt/module/hive/bin/hive -e "use tms;${ads_express_stats_sql}"
;;
"ads_line_stats")
    /opt/module/hive/bin/hive -e "use tms;${ads_line_stats_sql}"
;;
"ads_order_cargo_type_stats")
    /opt/module/hive/bin/hive -e "use tms;${ads_order_cargo_type_stats_sql}"
;;
"ads_order_stats")
    /opt/module/hive/bin/hive -e "use tms;${ads_order_stats_sql}"
;;
"ads_org_stats")
    /opt/module/hive/bin/hive -e "use tms;${ads_org_stats_sql}"
;;
"ads_shift_stats")
    /opt/module/hive/bin/hive -e "use tms;${ads_shift_stats_sql}"
;;
"ads_trans_order_stats")
    /opt/module/hive/bin/hive -e "use tms;${ads_trans_order_stats_sql}"
;;
"ads_trans_order_stats_td")
    /opt/module/hive/bin/hive -e "use tms;${ads_trans_order_stats_td_sql}"
;;
"ads_trans_stats")
    /opt/module/hive/bin/hive -e "use tms;${ads_trans_stats_sql}"
;;
"ads_truck_stats")
    /opt/module/hive/bin/hive -e "use tms;${ads_truck_stats_sql}"
;;
*)
	echo "表名输入错误..."
;;
esac

(3)增加脚本执行权限

[kingjw@hadoop102 bin]$ chmod +x dws_to_ads.sh 

(4)脚本用法

[kingjw@hadoop102 bin]$ dws_to_ads.sh all 2023-01-10

12、报表数据导出

为方便报表应用使用数据,需将ads各指标的统计结果导出到MySQL数据库中。

12.1、MySQL建库建表

12.1.1、创建数据库

CREATE DATABASE IF NOT EXISTS tms_report DEFAULT CHARSET utf8 COLLATE utf8_general_ci;

12.1.2、创建表(结构和ads层表一样)

1)运单相关统计

drop table if exists ads_trans_order_stats;
create table ads_trans_order_stats(
  `dt` date NOT NULL COMMENT '统计日期',
  `recent_days` tinyint NOT NULL COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
  `receive_order_count` bigint COMMENT '接单总数',
  `receive_order_amount` decimal(16,2) COMMENT '接单金额',
  `dispatch_order_count` bigint COMMENT '发单总数',
  `dispatch_order_amount` decimal(16,2) COMMENT '发单金额',
  PRIMARY KEY (`dt`, `recent_days`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci comment '运单相关统计'
    ROW_FORMAT = DYNAMIC;

12.2、数据导出

数据导出工具选用DataX,选用HDFSReader和MySQLWriter。

12.2.1、编写DataX配置文件

我们需要为每个张表编写一个DataX配置文件,此处以ads_trans_order_stats为例,配置文件内容如下:

{
    "job":{
        "content":[
            {
                "writer":{
                    "parameter":{
                        "username":"root",
                        "writeMode":"replace",
                        "connection":[
                            {
                                "table":[
                                    "ads_trans_order_stats"
                                ],
                                "jdbcUrl":"jdbc:mysql://hadoop102:3306/tms_report?useUnicode=true&characterEncoding=utf-8"
                            }
                        ],
                        "password":"000000",
                        "column":[
                            "dt",
                            "recent_days",
                            "receive_order_count",
                            "receive_order_amount",
                            "dispatch_order_count",
                            "dispatch_order_amount"
                        ]
                    },
                    "name":"mysqlwriter"
                },
                "reader":{
                    "parameter":{
                        "nullFormat":"\\N",
                        "fieldDelimiter":"\t",
                        "encoding":"UTF-8",
                        "column":[
                            "*"
                        ],
                        "path":"${exportdir}",
                        "fileType":"text",
                        "defaultFS":"hdfs://mycluster",
                        "hadoopConfig":{
                            "dfs.nameservices":"mycluster",
                            "dfs.namenode.rpc-address.mycluster.nn2":"hadoop103:8020",
                            "dfs.namenode.rpc-address.mycluster.nn1":"hadoop102:8020",
                            "dfs.client.failover.proxy.provider.mycluster":"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
                            "dfs.ha.namenodes.mycluster":"nn1,nn2"
                        }
                    },
                    "name":"hdfsreader"
                }
            }
        ],
        "setting":{
            "speed":{
                "channel":3
            },
            "errorLimit":{
                "record":0,
                "percentage":0.02
            }
        }
    }
}

注:导出路径path参数并未写死,需在提交任务时通过参数动态传入,参数名称为exportdir。

12.2.2、编写每日导出脚本

1 通用dataxjson配置文件

[kingjw@hadoop102 bin]$ vim /opt/module/datax/job/hdfs_to_mysql.json

写入

{
    "job":{
        "content":[
            {
                "writer":{
                    "parameter":{
                        "username":"root",
                        "writeMode":"update",
                        "connection":[
                            {
                                "table":[
                                    "${tableName}"
                                ],
                                "jdbcUrl":"jdbc:mysql://hadoop102:3306/tms_report?useUnicode=true&characterEncoding=utf-8"
                            }
                        ],
                        "password":"123456",
                        "column":["*"]
                    },
                    "name":"mysqlwriter"
                },
                "reader":{
                    "parameter":{
                        "nullFormat":"\\N",
                        "fieldDelimiter":"\t",
                        "encoding":"UTF-8",
                        "column":[
                            "*"
                        ],
                        "path":"/warehouse/tms/ads/${tableName}/*",
                        "fileType":"text",
                        "defaultFS":"hdfs://mycluster",
                        "hadoopConfig":{
                            "dfs.nameservices":"mycluster",
                            "dfs.namenode.rpc-address.mycluster.nn2":"hadoop103:8020",
                            "dfs.namenode.rpc-address.mycluster.nn1":"hadoop102:8020",
                            "dfs.client.failover.proxy.provider.mycluster":"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
                            "dfs.ha.namenodes.mycluster":"nn1,nn2"
                        }
                    },
                    "name":"hdfsreader"
                }
            }
        ],
        "setting":{
            "speed":{
                "channel":3
            },
            "errorLimit":{
                "record":0,
                "percentage":0.02
            }
        }
    }
}

2)数据从adsmysql的脚本

[kingjw@hadoop102 bin]$ vim ads_to_mysql.sh

3)编写如下内容

#! /bin/bash
# ads_to_mysql.sh 表名/all

# 1、判断参数是否传入
if [ $# -lt 1 ]
then
	echo "必须输入all/表名..."
	exit
fi

# 2、根据表名导数据
case $1 in
"all")
	python /opt/module/datax/bin/datax.py -p"-DtableName=ads_city_stats" /opt/module/datax/job/hdfs_to_mysql.json
	python /opt/module/datax/bin/datax.py -p"-DtableName=ads_driver_stats" /opt/module/datax/job/hdfs_to_mysql.json
	python /opt/module/datax/bin/datax.py -p"-DtableName=ads_express_city_stats" /opt/module/datax/job/hdfs_to_mysql.json
	python /opt/module/datax/bin/datax.py -p"-DtableName=ads_express_org_stats" /opt/module/datax/job/hdfs_to_mysql.json
	python /opt/module/datax/bin/datax.py -p"-DtableName=ads_express_province_stats" /opt/module/datax/job/hdfs_to_mysql.json
	python /opt/module/datax/bin/datax.py -p"-DtableName=ads_express_stats" /opt/module/datax/job/hdfs_to_mysql.json
	python /opt/module/datax/bin/datax.py -p"-DtableName=ads_line_stats" /opt/module/datax/job/hdfs_to_mysql.json
	python /opt/module/datax/bin/datax.py -p"-DtableName=ads_order_cargo_type_stats" /opt/module/datax/job/hdfs_to_mysql.json
	python /opt/module/datax/bin/datax.py -p"-DtableName=ads_order_stats" /opt/module/datax/job/hdfs_to_mysql.json
	python /opt/module/datax/bin/datax.py -p"-DtableName=ads_org_stats" /opt/module/datax/job/hdfs_to_mysql.json
	python /opt/module/datax/bin/datax.py -p"-DtableName=ads_shift_stats" /opt/module/datax/job/hdfs_to_mysql.json
	python /opt/module/datax/bin/datax.py -p"-DtableName=ads_trans_order_stats" /opt/module/datax/job/hdfs_to_mysql.json
	python /opt/module/datax/bin/datax.py -p"-DtableName=ads_trans_order_stats_td" /opt/module/datax/job/hdfs_to_mysql.json
	python /opt/module/datax/bin/datax.py -p"-DtableName=ads_trans_stats" /opt/module/datax/job/hdfs_to_mysql.json
	python /opt/module/datax/bin/datax.py -p"-DtableName=ads_truck_stats" /opt/module/datax/job/hdfs_to_mysql.json
;;
"ads_city_stats")
    python /opt/module/datax/bin/datax.py -p"-DtableName=ads_city_stats" /opt/module/datax/job/hdfs_to_mysql.json
;;
"ads_driver_stats")
    python /opt/module/datax/bin/datax.py -p"-DtableName=ads_driver_stats" /opt/module/datax/job/hdfs_to_mysql.json
;;
"ads_express_city_stats")
    python /opt/module/datax/bin/datax.py -p"-DtableName=ads_express_city_stats" /opt/module/datax/job/hdfs_to_mysql.json
;;
"ads_express_org_stats")
    python /opt/module/datax/bin/datax.py -p"-DtableName=ads_express_org_stats" /opt/module/datax/job/hdfs_to_mysql.json
;;
"ads_express_province_stats")
    python /opt/module/datax/bin/datax.py -p"-DtableName=ads_express_province_stats" /opt/module/datax/job/hdfs_to_mysql.json
;;
"ads_express_stats")
    python /opt/module/datax/bin/datax.py -p"-DtableName=ads_express_stats" /opt/module/datax/job/hdfs_to_mysql.json
;;
"ads_line_stats")
    python /opt/module/datax/bin/datax.py -p"-DtableName=ads_line_stats" /opt/module/datax/job/hdfs_to_mysql.json
;;
"ads_order_cargo_type_stats")
    python /opt/module/datax/bin/datax.py -p"-DtableName=ads_order_cargo_type_stats" /opt/module/datax/job/hdfs_to_mysql.json
;;
"ads_order_stats")
    python /opt/module/datax/bin/datax.py -p"-DtableName=ads_order_stats" /opt/module/datax/job/hdfs_to_mysql.json
;;
"ads_org_stats")
    python /opt/module/datax/bin/datax.py -p"-DtableName=ads_org_stats" /opt/module/datax/job/hdfs_to_mysql.json
;;
"ads_shift_stats")
    python /opt/module/datax/bin/datax.py -p"-DtableName=ads_shift_stats" /opt/module/datax/job/hdfs_to_mysql.json
;;
"ads_trans_order_stats")
    python /opt/module/datax/bin/datax.py -p"-DtableName=ads_trans_order_stats" /opt/module/datax/job/hdfs_to_mysql.json
;;
"ads_trans_order_stats_td")
    python /opt/module/datax/bin/datax.py -p"-DtableName=ads_trans_order_stats_td" /opt/module/datax/job/hdfs_to_mysql.json
;;
"ads_trans_stats")
    python /opt/module/datax/bin/datax.py -p"-DtableName=ads_trans_stats" /opt/module/datax/job/hdfs_to_mysql.json
;;
"ads_truck_stats")
    python /opt/module/datax/bin/datax.py -p"-DtableName=ads_truck_stats" /opt/module/datax/job/hdfs_to_mysql.json
;;
*)
	echo "表名输入错误..."
;;
esac

4)增加脚本执行权限

[kingjw@hadoop102 bin]$ chmod +x ads_to_mysql.sh 

5)脚本用法

[kingjw@hadoop102 bin]$ ads_to_mysql.sh all

13、数据仓库工作流调度

13.1、调度工具部署

13.2、新数据生成

1)启动采集通道,包括Kafka、Flume等

(1)启动Zookeeper

[kingjw@hadoop102 ~]$ zk.sh start

(2)启动Kafka

[kingjw@hadoop102 ~]$ kf.sh start

(3)启动Flume

[kingjw@hadoop102 ~]$ tms-f1.sh start

2)修改日志模拟器配置文件

修改hadoop102的/opt/module/tms/application.yml文件,修改mock.date参数如下

mock.date: "2023-01-11"

3)生成数据

[kingjw@hadoop102 tms]$ java -jar tms-mock-2023-01-06.jar

4) 执行flink-cdc.sh脚本,从上次的检查点开始启动[检查点根据自己实际情况配置]采集增量数据,如下

[kingjw@hadoop102 tms]$ flink-cdc.sh start hdfs://hadoop102:8020/tms/sp/ods_app/savepoint-a6f47c-e430fae35e6e 2023-01-11

5) 启动datax采集全量数据

[kingjw@hadoop102 bin]$ mysql_to_hdfs_full.sh all 2023-01-11

6)观察HDFS上是否有2023-01-11的日志数据和增量表数据生成

13.3、工作流调度实操

        由于DolphinScheduler集群模式启动进程较多,对虚拟机内存要求较高。故下面提供两种方式,可根据虚拟机内存情况进行选择。

13.3.1、DolphinScheduler集群模式

1)启动DolphinScheduler

[kingjw@hadoop102 dolphinscheduler]$ bin/start-all.sh 

2)使用普通用户登录

3)向DolphinScheduler资源中心上传工作流所需脚本

(1)创建文件夹

(2)上传工作流所需脚本

将工作流所需的所有脚本上传到资源中心scripts路径下,结果如下。

4)向DolphinScheduler的WorkerServer节点分发脚本依赖的组件

由于工作流要执行的脚本需要调用Hive、DataX等组件,故在DolphinScheduler的集群模式下,需要确保每个WorkerServer节点都有脚本所依赖的组件。

(这里可以创建work时候选择一台拥有全部组件的节点然后用这个work进行调度就行实际生产会有专门的节点这些组件专门用于调度)

[kingjw@hadoop102 ~]$ xsync /opt/module/hive/
[kingjw@hadoop102 ~]$ xsync /opt/module/spark/
[kingjw@hadoop102 ~]$ xsync /opt/module/datax/

5)创建所需环境

(1)切换到admin用户

(2)点击环境管理/创建环境 

3)配置环境

注:具体环境变量如下

export HADOOP_HOME=/opt/module/hadoop
export HADOOP_CONF_DIR=/opt/module/hadoop/etc/hadoop
export SPARK_HOME=/opt/module/spark
export JAVA_HOME=/opt/module/java
export HIVE_HOME=/opt/module/hive
export DATAX_HOME=/opt/module/datax

export PATH=$PATH:$HADOOP_HOME/bin:$SPARK_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$DATAX_HOME/bin

6)创建工作流

(0)切换到普通用户

1)在tms项目下创建工作流

2)各任务节点配置如下

①mysql_to_hdfs_full

其它类似

3)各节点依赖关系如下

4)保存工作流

注:定时调度时,全局参数值应设置为$[yyyy-MM-dd-1]或者空值。

7)上线工作流

8)执行工作流(生产环境是定时调度)

13.3.2、DolphinScheduler单机模式

1)启动DolphinScheduler

[kingjw@hadoop102 dolphinscheduler]$ bin/dolphinscheduler-daemon.sh start standalone-server

2)其余操作

其余操作与集群模式基本一致,其中分发Hive、Spark、DataX这一步可以省略。

14、Superset报表可视化展示

数据可视化-Superset安装和使用-CSDN博客

15、总结

        这个物流数仓的离线数仓模块是尚硅谷的教学视频。

        在B站的视频:尚硅谷大数据项目《物流项目之离线数仓》_哔哩哔哩_bilibili

        这里介绍了物流数仓的离线数仓的架构和开发流程,数据仓库的设计原理和建模理论及实现。

        离线数仓是大数据的核心模块,需要整个流程和使用到的相关组件全部掌握。

        有帮助的话帮忙点个赞吧!

  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: 如果要设计一个基于 Hive 的网站访问数据的离线数仓,需要以下步骤: 1. 收集网站的访问数据,并存储在 Hadoop 分布式文件系统(HDFS)上。 2. 利用 Hive 建立数据仓库,并对访问数据进行结构化。 3. 在 Hive 中创建数据表,并导入数据。 4. 使用 Hive 查询语言(HQL)进行数据分析,生成统计报告和图表。 5. 将分析结果导出到外部存储,便于进一步使用。 6. 定期对数据仓库进行维护和更新,保证数据的最新性。 这些步骤可以帮助你设计一个稳健的、高效的离线数仓,用于分析网站的访问数据。 ### 回答2: 基于Hive的网站访问数据的离线数仓设计需要以下步骤: 1. 数据采集和存储:通过日志收集器收集网站访问日志,将这些日志数据传送到Hive数据仓库中进行存储。可以使用Flume或Kafka等工具来实现数据采集和传输。 2. 数据清洗和转换:使用Hive的ETL功能对原始数据进行清洗和转换。这包括去除无效数据、处理数据格式、合并数据等。可以使用Hive的查询语言(例如HQL)来实现这些操作。 3. 数据建模和分区:根据网站访问数据的需求,设计合适的数据模型,并进行分区以提高查询性能。可以使用Hive的表分区功能根据时间、地域、用户等维度进行数据分区。 4. 数据加载和索引:使用Hive的加载命令将清洗和转换后的数据加载到数据仓库中,并根据查询需求创建索引以加快查询速度。可以使用Hive的分区索引或bitmap索引等技术来实现数据加载和索引。 5. 数据查询和分析:通过Hive的查询语言对数据仓库中的网站访问数据进行查询和分析,并生成相应的报表和可视化结果。可以使用Hive的聚合函数、分组和排序等功能来实现数据查询和分析。 6. 数据备份和恢复:定期对数据仓库进行备份,以防止数据丢失或损坏。可以使用Hive的导出和导入功能将数据仓库中的数据备份到其他存储系统,如HDFS或云存储。 通过以上步骤设计的基于Hive的网站访问数据的离线数仓可以实现高效的数据存储、查询和分析,为网站运营和决策提供可靠的数据支持。 ### 回答3: 基于Hive的网站访问数据的离线数仓设计如下: 1. 数据源收集:首先,需要收集网站访问数据源,包括网站服务器日志、用户行为数据、广告点击等相关数据。这些数据可以通过Flume等数据采集工具实时收集到Hadoop集群上。 2. 数据预处理:将收集到的原始数据进行预处理,包括数据清洗、字段解析、数据格式转换等。可以使用Hive进行数据清洗和转换操作。 3. 数据存储:将预处理后的数据存储到Hive或者HBase中,便于后续的查询分析。Hive提供了数据仓库的功能,并可以通过Hive SQL进行数据查询和分析操作。 4. 数据分区:根据业务需求将数据进行分区,可以按照时间、地区、用户等维度进行分区,提高查询效率和性能。 5. 数据建模:设计合适的数据模型,根据业务需求创建表结构,并建立关联关系。可以使用Hive的表和分区来组织数据,并使用Hive内置的函数和脚本来处理数据。 6. 查询分析:根据业务需求,使用Hive SQL语句进行查询和分析操作,例如统计网站的访问量、独立访客数、页面流量等指标。可以使用Hive的MapReduce和Tez来进行大规模数据处理和计算。 7. 数据可视化:将查询结果通过可视化工具(如Tableau、Superset等)进行展示,生成直观的数据报表和可视化图表,供业务人员和决策者进行数据分析和决策。 8. 定期更新:根据实际情况,定期将新的网站访问数据导入数据仓库进行更新,保持数据的实时性和准确性。 通过以上步骤,可以设计一个基于Hive的网站访问数据的离线数仓,实现对网站访问数据的离线分析和查询,为业务决策提供数据支持。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值