企业级数仓构建项目一-大数据Week12-DAY1-企业级数仓构建项目一

游戏行业数据仓库—用户行为数仓
1、数据仓库的基本概念
在这里插入图片描述

1、什么是数据仓库
W.H.Inmon在《 Building the Data Warehouse 》一书中,对数据仓库的定义为:
数据仓库是一个
面向主题的
集成的
非易失的
随时间变化的
用来支持管理人员决策的数据集合。
面向主题
操作型数据库的数据组织面向事务处理任务,各个业务系统之间各自分离,而数据仓库中的数据是按照一定的主题域进行组织。
主题是一个抽象的概念,是数据归类的标准,是指用户使用数据仓库进行决策时所关心的重点方面,一个主题通常与多个操作型信息系统相关。每一个主题基本对应一个宏观的分析领域。
例如,银行的数据仓库的主题:客户
客户数据来源:从银行储蓄数据库、信用卡数据库、贷款数据库等几个数据库中抽取的数据整理而成。这些客户信息有可能是一致的,也可能是不一致的,这些信息需要统一整合才能完整体现客户。
集成
面向事务处理的操作型数据库通常与某些特定的应用相关,数据库之间相互独立,并且往往是异构的。而数据仓库中的数据是在对原有分散的数据库数据抽取、清理的基础上经过系统加工、汇总和整理得到的,必须消除源数据中的不一致性,以保证数据仓库内的信息是关于整个企业的一致的全局信息。
具体如下:
1:数据进入数据仓库后、使用之前,必须经过加工与集成。
2:对不同的数据来源进行统一数据结构和编码。统一原始数 据中的所有矛盾之处,如字段的同名异义,异名同义,单位不统一,字长不一致等。
3:将原始数据结构做一个从面向应用到面向主题的大转变。

非易失即相对稳定的
操作型数据库中的数据通常实时更新,数据根据需要及时发生变化。数据仓库的数据主要供企业决策分析之用,所涉及的数据操作主要是数据查询,一旦某个数据进入数据仓库以后,一般情况下将被长期保留,也就是数据仓库中一般有大量的查询操作,但修改和删除操作很少,通常只需要定期的加载、刷新。
数据仓库中包括了大量的历史数据。
数据经集成进入数据仓库后是极少或根本不更新的。
随时间变化即反映历史变化
操作型数据库主要关心当前某一个时间段内的数据,而数据仓库中的数据通常包含历史信息,系统记录了企业从过去某一时点(如开始应用数据仓库的时点)到目前的各个阶段的信息,通过这些信息,可以对企业的发展历程和未来趋势做出定量分析和预测。企业数据仓库的建设,是以现有企业业务系统和大量业务数据的积累为基础。数据仓库不是静态的概念,只有把信息及时交给需要这些信息的使用者,供他们做出改善其业务经营的决策,信息才能发挥作用,信息才有意义。而把信息加以整理归纳和重组,并及时提供给相应的管理决策人员,是数据仓库的根本任务。因此,从产业界的角度看,数据仓库建设是一个工程,是一个过程
数据仓库内的数据时限一般在5-10年以上,甚至永不删除,这些数据的键码都包含时间项,标明数据的历史时期,方便做时间趋势分析。

2、数据仓库的演进
在这里插入图片描述

3、数据库与数据仓库的对比
在这里插入图片描述

4、数据仓库的主要作用
1、支持数据提取
数据提取可以支撑来自企业各业务部门的数据需求。
由之前的不同业务部门给不同业务系统提需求转变为不同业务系统统一给数据仓库提需求
在这里插入图片描述

2、支持报表系统
基于企业的数据仓库,向上支撑企业的各部门的统计报表需求,辅助支撑企业日常运营决策。
在这里插入图片描述

3、支持数据分析(BI)
BI(商业智能)是一种解决方案:
从许多来自不同的企业业务系统的数据中提取出有用的数据并进行清理,以保证数据的正确性,然后经过抽取、转换和装载,即ETL过程,合并到一个企业级的数据仓库里,从而得到企业数据的一个全局视图;
在此基础上利用合适的查询和分析工具、数据挖掘工具、OLAP工具等对其进行分析和处理(这时信息变为辅助决策的知识);
最后将知识呈现给管理者,为管理者的决策过程提供支持 。
在这里插入图片描述

4、支持数据挖掘
数据挖掘也称为数据库知识发现(Knowledge Discovery in Databases, KDD),就是将高级智能计算技术应用于大量数据中,让计算机在有人或无人指导的情况下从海量数据中发现潜在的,有用的模式(也叫知识)。
Jiawei Han在《数据挖掘概念与技术》一书中对数据挖掘的定义:数据挖掘是从大量数据中挖掘有趣模式和知识的过程,数据源包括数据库、数据仓库、Web、其他信息存储库或动态地流入系统的数据。
在这里插入图片描述

5、支持数据应用
电信基于位置数据的旅游客流分析及人群画像
电信基于位置数据的人流监控和预警
银行基于用户交易数据的金融画像应用
电商根据用户浏览和购买行为的用户标签体系及推荐系统
征信机构根据用户信用记录的信用评估

在这里插入图片描述

5、数据仓库的架构设计
数据仓库经过多年的发展,其技术已经架构基本上都已经非常成熟稳定,不管是数仓的整体架构,还是各种技术选型方案,在业界基本上都已经非常稳定了,如下架构就是一个稳定的数据仓库的架构设计方案。

在这里插入图片描述

2、游戏行业背景介绍
2.1 游戏行业分类
现在的市面上很多游戏公司主要分类两大类:
一是游戏研发类,二是游戏运营类。
研发类公司就是自己开发游戏,有自己的游戏策划师,美术团队,有原画师,美术设计师,设计场景和角色,有3D建模师,有渲染师,程序员游戏开发工程师。

在这里插入图片描述

比如:《泡泡堂》,这个是韩国的NEXON游戏公司开发的,由盛大代理的。后来盛大出钱成立了启航工作室,开发出了《龙之谷手游》。还有天美工作室,开发过《QQ飞车》、《逆战》,由腾讯运营。
这种团队一般是以工作室的形式存在,不是有限责任公司的形式。他们没有自己的运营和客服团队,他们一般是把自己的游戏卖给专门做运营的公司,由运营公司负责游戏的运营、客服服务、玩家服务、游戏活动策划等工作。他们再根据流水来分成,就是用户充值的钱,叫做流水。

而这部分流水,不仅仅由游戏研发公司和游戏运营公司分钱,还有第三方渠道也要参与分钱。比如一款游戏是通过苹果的AppStore下载的,那苹果就要分钱,通过腾讯应用宝的,那腾讯也要分。

所以游戏行业的数据分析和数据仓库的建设中,渠道是很重要的一个维度。

2.2 网站运行产生日志系统架构图
在这里插入图片描述

这里我们主要采集用户埋点日志数据,以及业务库当中的数据进行统一的数据仓库建设
3、业务库数据生成
为了模拟真实的数据生成,我们这里使用java程序来生成数据,保存到数据库当中去,接下来我们主要来介绍一下,主要涉及到的各种业务库表
1、数据库表准备
1 创建mysql数据库
CREATE DATABASE /!32312 IF NOT EXISTS/jiuyou_game /*!40100 DEFAULT CHARACTER SET utf8 */;

USE jiuyou_game;
2 文章详情表
DROP TABLE IF EXISTS article_detail;

CREATE TABLE article_detail (
article_id int(64) NOT NULL AUTO_INCREMENT,
article_name varchar(256) DEFAULT NULL COMMENT ‘文章名称’,
article_little_content text COMMENT ‘简略内容’,
article_content text COMMENT ‘文章内容’,
article_time datetime DEFAULT NULL COMMENT ‘文章发表时间’,
jiuyou_show_time datetime DEFAULT NULL COMMENT ‘九游展示时间’,
article_author_name varchar(128) DEFAULT NULL COMMENT ‘作者名字’,
article_user_id int(32) DEFAULT NULL COMMENT ‘作者用户id’,
article_status varchar(16) DEFAULT NULL COMMENT ‘文章状态 1:正常展示 2:下架 3:删除’,
article_create_time datetime DEFAULT NULL COMMENT ‘数据创建时间’,
article_update_time datetime DEFAULT NULL COMMENT ‘数据更新时间’,
first_class varchar(256) DEFAULT NULL COMMENT ‘一级分类’,
first_class_url varchar(256) DEFAULT NULL COMMENT ‘一级分类url地址’,
second_class varchar(256) DEFAULT NULL COMMENT ‘二级分类’,
second_class_url varchar(256) DEFAULT NULL COMMENT ‘二级分类url地址’,
third_class varchar(256) DEFAULT NULL COMMENT ‘三级分类’,
third_class_url varchar(256) DEFAULT NULL COMMENT ‘三级分类url地址’,
fourth_class varchar(256) DEFAULT NULL COMMENT ‘四级分类’,
fourth_class_url varchar(256) DEFAULT NULL COMMENT ‘四级分类url地址’,
article_url varchar(256) DEFAULT NULL COMMENT ‘文章请求地址’,
PRIMARY KEY (article_id)
) ENGINE=InnoDB AUTO_INCREMENT=2538691 DEFAULT CHARSET=utf8;
3 通用字典表
DROP TABLE IF EXISTS common_base;

CREATE TABLE common_base (
id int(16) NOT NULL AUTO_INCREMENT COMMENT ‘主键id’,
type_name varchar(32) DEFAULT NULL COMMENT ‘类别名称’,
type_value varchar(32) DEFAULT NULL COMMENT ‘类别对应的值’,
type_createtime datetime DEFAULT CURRENT_TIMESTAMP COMMENT ‘创建时间’,
type_status varchar(8) DEFAULT NULL COMMENT ‘类别状态 0:已经停用 1:正常展示 2:被删除’,
type_updatetime timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT ‘更新时间’,
type_flag varchar(16) DEFAULT NULL,
PRIMARY KEY (id),
KEY type_flag (type_flag),
FULLTEXT KEY type_name (type_name)
) ENGINE=InnoDB AUTO_INCREMENT=35006 DEFAULT CHARSET=utf8;

4 游戏圈子表
DROP TABLE IF EXISTS game_cycle;

CREATE TABLE game_cycle (
cycle_id int(32) NOT NULL AUTO_INCREMENT,
cycle_name varchar(128) DEFAULT NULL COMMENT ‘圈子名称’,
cycle_person_num varchar(64) DEFAULT NULL COMMENT ‘圈子人数’,
cycle_type varchar(16) DEFAULT NULL COMMENT ‘加入或者取消圈子0取消 1加入’,
cycle_comment varchar(32) DEFAULT NULL COMMENT ‘圈子评论数量’,
PRIMARY KEY (cycle_id)
) ENGINE=InnoDB AUTO_INCREMENT=284 DEFAULT CHARSET=utf8;

5 游戏商品交易订单表
DROP TABLE IF EXISTS game_order;

CREATE TABLE game_order (
order_id int(32) NOT NULL AUTO_INCREMENT COMMENT ‘主键id’,
order_num varchar(64) DEFAULT NULL COMMENT ‘订单编号’,
buyer_user_id int(32) DEFAULT NULL COMMENT ‘订单关联购买方用户id’,
order_status varchar(16) DEFAULT NULL COMMENT ‘商品交易状态 0:加入购物车车 1:提交订单 2:已支付 3:确认收货,交易完成 4:订单交易有问题进行申诉’,
pay_type varchar(16) DEFAULT NULL COMMENT ‘支付方式 0:微信支付 1:支付宝支付 2:银联银行卡支付’,
pay_money decimal(8,2) DEFAULT NULL COMMENT ‘支付金额’,
order_time datetime DEFAULT NULL COMMENT ‘下单时间’,
pay_time datetime DEFAULT NULL COMMENT ‘支付时间’,
tax_percent varchar(32) DEFAULT NULL COMMENT ‘平台提成税点’,
order_deleted varchar(16) DEFAULT NULL COMMENT ‘订单是否删除 0:删除 1:未删除’,
PRIMARY KEY (order_id)
) ENGINE=InnoDB AUTO_INCREMENT=4001 DEFAULT CHARSET=utf8;

6 订单商品关联表
DROP TABLE IF EXISTS game_order_product;

CREATE TABLE game_order_product (
game_product_id int(32) NOT NULL AUTO_INCREMENT COMMENT ‘订单关联商品表主键’,
game_order_id varchar(128) DEFAULT NULL COMMENT ‘关联订单主键表’,
product_id int(32) DEFAULT NULL COMMENT ‘关联商品id表’,
product_num int(8) DEFAULT NULL COMMENT ‘商品购买数量’,
product_name varchar(512) DEFAULT NULL COMMENT ‘商品名称’,
product_url varchar(128) DEFAULT NULL COMMENT ‘购买商品url地址’,
seller_id int(32) DEFAULT NULL COMMENT ‘商品出售方id’,
product_price decimal(8,2) DEFAULT NULL COMMENT ‘商品金额’,
PRIMARY KEY (game_product_id)
) ENGINE=InnoDB AUTO_INCREMENT=9894 DEFAULT CHARSET=utf8;

7 游戏商品表
DROP TABLE IF EXISTS game_product;

CREATE TABLE game_product (
game_id int(32) NOT NULL AUTO_INCREMENT COMMENT ‘主键id’,
game_title varchar(128) DEFAULT NULL COMMENT ‘商品名称’,
game_pic varchar(128) DEFAULT NULL COMMENT ‘商品图片’,
game_detail_url varchar(256) DEFAULT NULL COMMENT ‘商品详情URL’,
game_title_second varchar(128) DEFAULT NULL COMMENT ‘商品副标题’,
game_qufu varchar(128) DEFAULT NULL COMMENT ‘游戏区服’,
game_credit varchar(128) DEFAULT NULL COMMENT ‘信誉等级’,
game_price varchar(64) DEFAULT NULL COMMENT ‘商品价格’,
game_stock varchar(16) DEFAULT NULL COMMENT ‘商品库存’,
game_protection varchar(256) DEFAULT NULL COMMENT ‘服务保障’,
game_user int(64) DEFAULT NULL COMMENT ‘游戏关联用户’,
game_type varchar(512) DEFAULT NULL COMMENT ‘商品所属类型,账号还是游戏币还是材料装备等’,
game_create_date datetime DEFAULT NULL COMMENT ‘商品上架日期’,
game_check_date datetime DEFAULT NULL COMMENT ‘商品审核日期’,
game_check_user int(32) DEFAULT NULL COMMENT ‘审核人’,
game_check_reson varchar(64) DEFAULT NULL COMMENT ‘审核结果说明’,
game_check_status varchar(16) DEFAULT NULL COMMENT ‘游戏审核结果 0 不通过 1通过’,
PRIMARY KEY (game_id)
) ENGINE=InnoDB AUTO_INCREMENT=31863 DEFAULT CHARSET=utf8;
8 省市区字典表
DROP TABLE IF EXISTS province_code;
create table province_code (
Id int (11),
Name varchar (120),
Pid int (11)
);
打开该文本文档,然后插入省市区编码信息
9 用户基础信息表
DROP TABLE IF EXISTS user_base;

CREATE TABLE user_base (
user_id int(32) NOT NULL AUTO_INCREMENT COMMENT ‘主键id’,
user_account varchar(64) DEFAULT NULL COMMENT ‘用户账号’,
user_email varchar(64) DEFAULT NULL COMMENT ‘用户邮箱’,
user_name varchar(64) DEFAULT NULL COMMENT ‘用户姓名’,
user_phone varchar(64) DEFAULT NULL COMMENT ‘用户手机号’,
user_origin_password varchar(64) DEFAULT NULL COMMENT ‘用户原始密码保存’,
user_password varchar(64) DEFAULT NULL COMMENT ‘用户密码,经过md5加密’,
user_sex char(2) DEFAULT NULL COMMENT ‘0 女性 1 男性’,
user_card varchar(64) DEFAULT NULL COMMENT ‘身份证号’,
user_registertime datetime DEFAULT CURRENT_TIMESTAMP COMMENT ‘创建时间’,
user_updatetime datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT ‘更新时间’,
is_deleted varchar(16) DEFAULT NULL COMMENT ‘是否注销(删除)’,
user_province int(16) DEFAULT NULL COMMENT ‘关联省编码’,
user_city int(16) DEFAULT NULL COMMENT ‘关联市编码’,
user_area int(16) DEFAULT NULL COMMENT ‘关联区编码’,
user_address_detail text,
register_ip varbinary(32) DEFAULT NULL COMMENT ‘用户注册ip’,
user_pic varchar(128) DEFAULT NULL COMMENT ‘用户图像访问地址’,
PRIMARY KEY (user_id)
) ENGINE=InnoDB AUTO_INCREMENT=10639 DEFAULT CHARSET=utf8;

10 用户登录记录流水表
DROP TABLE IF EXISTS user_login_record;

CREATE TABLE user_login_record (
login_id int(32) NOT NULL AUTO_INCREMENT,
login_user_id int(32) DEFAULT NULL COMMENT ‘登录用户id’,
login_time datetime DEFAULT NULL COMMENT ‘登录时间’,
login_mac varchar(32) DEFAULT NULL COMMENT ‘登录mac地址’,
login_ip varchar(32) DEFAULT NULL COMMENT ‘登录IP地址’,
login_message varchar(128) DEFAULT NULL COMMENT ‘登录时返回给客户端的消息’,
login_status varchar(8) DEFAULT NULL COMMENT ‘登录状态:0,登录失败 1,登录成功’,
login_fail_reson varchar(8) DEFAULT NULL COMMENT ‘登录失败原因 0:用户名不存在,1:密码错误 2:登录成功’,
PRIMARY KEY (login_id)
) ENGINE=InnoDB AUTO_INCREMENT=31727 DEFAULT CHARSET=utf8;

11 视频详情表
DROP TABLE IF EXISTS video_detail;

CREATE TABLE video_detail (
video_id int(16) NOT NULL AUTO_INCREMENT COMMENT ‘主键id’,
video_name varchar(128) DEFAULT NULL COMMENT ‘视频名称’,
video_detail text COMMENT ‘视频描述’,
video_type varchar(16) DEFAULT NULL COMMENT ‘视频所属分类’,
video_url varchar(256) DEFAULT NULL COMMENT ‘视频观看URL地址’,
video_time_long int(16) DEFAULT NULL COMMENT ‘视频时长’,
video_comment_total int(16) DEFAULT NULL COMMENT ‘视频评论数’,
video_like_total int(16) DEFAULT NULL COMMENT ‘视频点赞数’,
video_upload_time datetime DEFAULT CURRENT_TIMESTAMP COMMENT ‘创建时间’,
video_check_time datetime DEFAULT NULL COMMENT ‘视频审核时间’,
video_update_time timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT ‘更新时间’,
video_upload_user int(11) DEFAULT NULL COMMENT ‘视频上传用户id’,
video_status varchar(8) DEFAULT NULL COMMENT ‘视频状态 0:上传待审核 1:审核通过正常展示 2:审核不通过,重新修改 3:视频被删除 4:违规禁止播放’,
video_comment text COMMENT ‘视频审核通过不通过原因说明’,
video_watch_time int(8) DEFAULT NULL COMMENT ‘视频观看次数’,
video_age int(8) DEFAULT NULL COMMENT ‘视频年龄,距离视频上传过去了多少天’,
PRIMARY KEY (video_id)
) ENGINE=InnoDB AUTO_INCREMENT=8334 DEFAULT CHARSET=utf8;

2、使用java代码生成数据库业务数据
1 创建maven工程data_generate

2 导入jar包坐标

<slf4j.version>1.7.20</slf4j.version>
<logback.version>1.2.3</logback.version>

com.alibaba fastjson 1.2.62 ch.qos.logback logback-core ${logback.version} ch.qos.logback logback-classic ${logback.version}
<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-lang3</artifactId>
    <version>3.6</version>
</dependency>
<dependency>
    <groupId>commons-lang</groupId>
    <artifactId>commons-lang</artifactId>
    <version>2.6</version>
</dependency>
<dependency>
    <groupId>commons-io</groupId>
    <artifactId>commons-io</artifactId>
    <version>2.6</version>
</dependency>
<dependency>
    <groupId>mysql</groupId>
    <artifactId>mysql-connector-java</artifactId>
    <version>5.1.47</version>
</dependency>
<dependency>
    <groupId>org.mybatis</groupId>
    <artifactId>mybatis</artifactId>
    <version>3.5.5</version>
</dependency>
<dependency>
    <groupId>mysql</groupId>
    <artifactId>mysql-connector-java</artifactId>
    <version>5.1.47</version>
</dependency>
<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-lang3</artifactId>
    <version>3.10</version>
</dependency>


<dependency>
    <groupId>commons-codec</groupId>
    <artifactId>commons-codec</artifactId>
    <version>1.15</version>
</dependency>

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.10.3</version>
</dependency>
maven-compiler-plugin 2.3.2 1.8 1.8 maven-assembly-plugin jar-with-dependencies make-assembly package single

3 定义数据库对应的javabean
在maven工程当中创建package 路径为com.datas.beans.dbbeans,该路径下专门用于存放javabean对象
1 文章详情表javabean
import java.util.Date;
public class ArticleDetail {
private int article_id ;
private String article_name ;
private String article_little_content;
private String article_content ;
private Date article_time ;
private Date jiuyou_show_time ;
private String article_author_name ;
private int article_user_id ;
private String article_status ;
private Date article_create_time ;
private Date article_update_time ;

private String  first_class;
private String first_class_url;
private String second_class;
private String second_class_url;
private String third_class;
private String third_class_url;

private String fourth_class;
private String fourth_class_url;

private String article_url;

public String getArticle_url() {
    return article_url;
}

public void setArticle_url(String article_url) {
    this.article_url = article_url;
}

public String getFirst_class_url() {
    return first_class_url;
}

public void setFirst_class_url(String first_class_url) {
    this.first_class_url = first_class_url;
}

public String getSecond_class_url() {
    return second_class_url;
}

public void setSecond_class_url(String second_class_url) {
    this.second_class_url = second_class_url;
}

public String getThird_class_url() {
    return third_class_url;
}

public void setThird_class_url(String third_class_url) {
    this.third_class_url = third_class_url;
}

public String getFourth_class_url() {
    return fourth_class_url;
}

public void setFourth_class_url(String fourth_class_url) {
    this.fourth_class_url = fourth_class_url;
}

public String getFirst_class() {
    return first_class;
}

public void setFirst_class(String first_class) {
    this.first_class = first_class;
}

public String getSecond_class() {
    return second_class;
}

public void setSecond_class(String second_class) {
    this.second_class = second_class;
}

public String getThird_class() {
    return third_class;
}

public void setThird_class(String third_class) {
    this.third_class = third_class;
}

public String getFourth_class() {
    return fourth_class;
}

public void setFourth_class(String fourth_class) {
    this.fourth_class = fourth_class;
}

public String getArticle_little_content() {
    return article_little_content;
}

public void setArticle_little_content(String article_little_content) {
    this.article_little_content = article_little_content;
}

public int getArticle_id() {
    return article_id;
}

public void setArticle_id(int article_id) {
    this.article_id = article_id;
}

public String getArticle_name() {
    return article_name;
}

public void setArticle_name(String article_name) {
    this.article_name = article_name;
}

public String getArticle_content() {
    return article_content;
}

public void setArticle_content(String article_content) {
    this.article_content = article_content;
}

public Date getArticle_time() {
    return article_time;
}

public void setArticle_time(Date article_time) {
    this.article_time = article_time;
}

public Date getJiuyou_show_time() {
    return jiuyou_show_time;
}

public void setJiuyou_show_time(Date jiuyou_show_time) {
    this.jiuyou_show_time = jiuyou_show_time;
}

public String getArticle_author_name() {
    return article_author_name;
}

public void setArticle_author_name(String article_author_name) {
    this.article_author_name = article_author_name;
}

public int getArticle_user_id() {
    return article_user_id;
}

public void setArticle_user_id(int article_user_id) {
    this.article_user_id = article_user_id;
}

public String getArticle_status() {
    return article_status;
}

public void setArticle_status(String article_status) {
    this.article_status = article_status;
}

public Date getArticle_create_time() {
    return article_create_time;
}

public void setArticle_create_time(Date article_create_time) {
    this.article_create_time = article_create_time;
}

public Date getArticle_update_time() {
    return article_update_time;
}

public void setArticle_update_time(Date article_update_time) {
    this.article_update_time = article_update_time;
}

}

2 通用字典表javabean

import java.util.Date;
public class CommonBase {
private int id ;
private String type_name ;
private String type_value;
private Date type_createtime ;
private String type_status ;
private Date type_updatetime ;
private String type_flag ;

public String getType_value() {
    return type_value;
}

public void setType_value(String type_value) {
    this.type_value = type_value;
}

public int getId() {
    return id;
}

public void setId(int id) {
    this.id = id;
}

public String getType_name() {
    return type_name;
}

public void setType_name(String type_name) {
    this.type_name = type_name;
}

public Date getType_createtime() {
    return type_createtime;
}

public void setType_createtime(Date type_createtime) {
    this.type_createtime = type_createtime;
}

public String getType_status() {
    return type_status;
}

public void setType_status(String type_status) {
    this.type_status = type_status;
}

public Date getType_updatetime() {
    return type_updatetime;
}

public void setType_updatetime(Date type_updatetime) {
    this.type_updatetime = type_updatetime;
}

public String getType_flag() {
    return type_flag;
}

public void setType_flag(String type_flag) {
    this.type_flag = type_flag;
}

}

3 游戏商品交易订单表javabean

import java.util.Date;

public class GameOrder {
private int order_id ;
private String order_num ;
private int buyer_user_id ;
private String order_status ;
private String pay_type ;
private Double pay_money ;
private Date order_time ;
private Date pay_time ;
private String tax_percent ;
private String order_deleted ;

public int getOrder_id() {
    return order_id;
}

public void setOrder_id(int order_id) {
    this.order_id = order_id;
}

public String getOrder_num() {
    return order_num;
}

public void setOrder_num(String order_num) {
    this.order_num = order_num;
}

public int getBuyer_user_id() {
    return buyer_user_id;
}

public void setBuyer_user_id(int buyer_user_id) {
    this.buyer_user_id = buyer_user_id;
}

public String getOrder_status() {
    return order_status;
}

public void setOrder_status(String order_status) {
    this.order_status = order_status;
}

public String getPay_type() {
    return pay_type;
}

public void setPay_type(String pay_type) {
    this.pay_type = pay_type;
}

public Double getPay_money() {
    return pay_money;
}

public void setPay_money(Double pay_money) {
    this.pay_money = pay_money;
}

public Date getOrder_time() {
    return order_time;
}

public void setOrder_time(Date order_time) {
    this.order_time = order_time;
}

public Date getPay_time() {
    return pay_time;
}

public void setPay_time(Date pay_time) {
    this.pay_time = pay_time;
}

public String getTax_percent() {
    return tax_percent;
}

public void setTax_percent(String tax_percent) {
    this.tax_percent = tax_percent;
}

public String getOrder_deleted() {
    return order_deleted;
}

public void setOrder_deleted(String order_deleted) {
    this.order_deleted = order_deleted;
}

}

4 订单商品关联表javabean

public class GameOrderProduct {
private int game_product_id ;
private String game_order_id ;
private int product_id ;
private int product_num ;
private String product_name ;
private String product_url;
private String seller_id;
private Double product_price;

public Double getProduct_price() {
    return product_price;
}

public void setProduct_price(Double product_price) {
    this.product_price = product_price;
}

public String getSeller_id() {
    return seller_id;
}

public void setSeller_id(String seller_id) {
    this.seller_id = seller_id;
}

public int getGame_product_id() {
    return game_product_id;
}

public void setGame_product_id(int game_product_id) {
    this.game_product_id = game_product_id;
}

public String getGame_order_id() {
    return game_order_id;
}

public void setGame_order_id(String game_order_id) {
    this.game_order_id = game_order_id;
}

public int getProduct_id() {
    return product_id;
}

public void setProduct_id(int product_id) {
    this.product_id = product_id;
}

public int getProduct_num() {
    return product_num;
}

public void setProduct_num(int product_num) {
    this.product_num = product_num;
}

public String getProduct_name() {
    return product_name;
}

public void setProduct_name(String product_name) {
    this.product_name = product_name;
}

public String getProduct_url() {
    return product_url;
}

public void setProduct_url(String product_url) {
    this.product_url = product_url;
}

}

5 游戏商品表javabean

import java.util.Date;

public class GameProduct {

private int           game_id                ;
private String            game_title            ;
private String            game_pic              ;
private String            game_detail_url       ;
private String            game_title_second     ;
private String            game_qufu             ;
private String            game_credit           ;
private Double            game_price            ;
private String            game_stock            ;
private String            game_protection       ;
private int            game_user             ;
private String            game_type             ;
private Date game_create_date      ;
private Date            game_check_date       ;
private int            game_check_user       ;
private String            game_check_reson      ;
private String            game_check_status     ;

public int getGame_id() {
    return game_id;
}

public void setGame_id(int game_id) {
    this.game_id = game_id;
}

public String getGame_title() {
    return game_title;
}

public void setGame_title(String game_title) {
    this.game_title = game_title;
}

public String getGame_pic() {
    return game_pic;
}

public void setGame_pic(String game_pic) {
    this.game_pic = game_pic;
}

public String getGame_detail_url() {
    return game_detail_url;
}

public void setGame_detail_url(String game_detail_url) {
    this.game_detail_url = game_detail_url;
}

public String getGame_title_second() {
    return game_title_second;
}

public void setGame_title_second(String game_title_second) {
    this.game_title_second = game_title_second;
}

public String getGame_qufu() {
    return game_qufu;
}

public void setGame_qufu(String game_qufu) {
    this.game_qufu = game_qufu;
}

public String getGame_credit() {
    return game_credit;
}

public void setGame_credit(String game_credit) {
    this.game_credit = game_credit;
}

public Double getGame_price() {
    return game_price;
}

public void setGame_price(Double game_price) {
    this.game_price = game_price;
}

public String getGame_stock() {
    return game_stock;
}

public void setGame_stock(String game_stock) {
    this.game_stock = game_stock;
}

public String getGame_protection() {
    return game_protection;
}

public void setGame_protection(String game_protection) {
    this.game_protection = game_protection;
}

public int getGame_user() {
    return game_user;
}

public void setGame_user(int game_user) {
    this.game_user = game_user;
}

public String getGame_type() {
    return game_type;
}

public void setGame_type(String game_type) {
    this.game_type = game_type;
}

public Date getGame_create_date() {
    return game_create_date;
}

public void setGame_create_date(Date game_create_date) {
    this.game_create_date = game_create_date;
}

public Date getGame_check_date() {
    return game_check_date;
}

public void setGame_check_date(Date game_check_date) {
    this.game_check_date = game_check_date;
}

public int getGame_check_user() {
    return game_check_user;
}

public void setGame_check_user(int game_check_user) {
    this.game_check_user = game_check_user;
}

public String getGame_check_reson() {
    return game_check_reson;
}

public void setGame_check_reson(String game_check_reson) {
    this.game_check_reson = game_check_reson;
}

public String getGame_check_status() {
    return game_check_status;
}

public void setGame_check_status(String game_check_status) {
    this.game_check_status = game_check_status;
}

}

6 省市区字典表javabean
public class ProvinceCode {

private int     id           ;
private String   name        ;
private int     pid          ;

public int getId() {
    return id;
}

public void setId(int id) {
    this.id = id;
}

public String getName() {
    return name;
}

public void setName(String name) {
    this.name = name;
}

public int getPid() {
    return pid;
}

public void setPid(int pid) {
    this.pid = pid;
}

}

7 用户基础信息表javabean

import java.util.Date;

public class UserBase {
private int user_id ;
private String user_account ;
private String user_email ;
private String user_name ;
private String user_phone ;
private String user_origin_password;
private String user_password ;
private String user_sex ;
private String user_card ;
private Date user_registertime ;
private Date user_updatetime ;
private String is_deleted ;
private int user_province ;
private int user_city ;
private int user_area ;
private String user_address_detail;
private String register_ip;
private String user_pic ;

public String getUser_address_detail() {
    return user_address_detail;
}

public void setUser_address_detail(String user_address_detail) {
    this.user_address_detail = user_address_detail;
}

public String getRegister_ip() {
    return register_ip;
}

public void setRegister_ip(String register_ip) {
    this.register_ip = register_ip;
}

public String getUser_origin_password() {
    return user_origin_password;
}

public void setUser_origin_password(String user_origin_password) {
    this.user_origin_password = user_origin_password;
}

public int getUser_id() {
    return user_id;
}

public void setUser_id(int user_id) {
    this.user_id = user_id;
}

public String getUser_account() {
    return user_account;
}

public void setUser_account(String user_account) {
    this.user_account = user_account;
}

public String getUser_email() {
    return user_email;
}

public void setUser_email(String user_email) {
    this.user_email = user_email;
}

public String getUser_name() {
    return user_name;
}

public void setUser_name(String user_name) {
    this.user_name = user_name;
}

public String getUser_phone() {
    return user_phone;
}

public void setUser_phone(String user_phone) {
    this.user_phone = user_phone;
}

public String getUser_password() {
    return user_password;
}

public void setUser_password(String user_password) {
    this.user_password = user_password;
}

public String getUser_sex() {
    return user_sex;
}

public void setUser_sex(String user_sex) {
    this.user_sex = user_sex;
}

public String getUser_card() {
    return user_card;
}

public void setUser_card(String user_card) {
    this.user_card = user_card;
}

public Date getUser_registertime() {
    return user_registertime;
}

public void setUser_registertime(Date user_registertime) {
    this.user_registertime = user_registertime;
}

public Date getUser_updatetime() {
    return user_updatetime;
}

public void setUser_updatetime(Date user_updatetime) {
    this.user_updatetime = user_updatetime;
}

public String getIs_deleted() {
    return is_deleted;
}

public void setIs_deleted(String is_deleted) {
    this.is_deleted = is_deleted;
}

public int getUser_province() {
    return user_province;
}

public void setUser_province(int user_province) {
    this.user_province = user_province;
}

public int getUser_city() {
    return user_city;
}

public void setUser_city(int user_city) {
    this.user_city = user_city;
}

public int getUser_area() {
    return user_area;
}

public void setUser_area(int user_area) {
    this.user_area = user_area;
}

public String getUser_pic() {
    return user_pic;
}

public void setUser_pic(String user_pic) {
    this.user_pic = user_pic;
}

}

8 用户登录流水表javabean

import java.util.Date;

public class UserLoginRecord {
private int login_id ;
private int login_user_id ;
private Date login_time ;
private String login_mac ;
private String login_ip ;
private String login_message ;
private String login_status ;
private String login_fail_reson ;
public int getLogin_id() {
return login_id;
}

public void setLogin_id(int login_id) {
    this.login_id = login_id;
}

public int getLogin_user_id() {
    return login_user_id;
}

public void setLogin_user_id(int login_user_id) {
    this.login_user_id = login_user_id;
}

public Date getLogin_time() {
    return login_time;
}

public void setLogin_time(Date login_time) {
    this.login_time = login_time;
}

public String getLogin_mac() {
    return login_mac;
}

public void setLogin_mac(String login_mac) {
    this.login_mac = login_mac;
}

public String getLogin_ip() {
    return login_ip;
}

public void setLogin_ip(String login_ip) {
    this.login_ip = login_ip;
}

public String getLogin_message() {
    return login_message;
}

public void setLogin_message(String login_message) {
    this.login_message = login_message;
}

public String getLogin_status() {
    return login_status;
}

public void setLogin_status(String login_status) {
    this.login_status = login_status;
}

public String getLogin_fail_reson() {
    return login_fail_reson;
}

public void setLogin_fail_reson(String login_fail_reson) {
    this.login_fail_reson = login_fail_reson;
}

}

9 视频详情表javabean

import java.util.Date;

public class VideoDetail {
private int video_id ;
private String video_name ;
private String article_little_content;
private String video_detail ;
private String video_type ;
private String video_url ;
private int video_time_long ;
private int video_comment_total ;
private int video_like_total ;
private Date video_upload_time ;
private Date video_check_time ;
private Date video_update_time ;
private int video_upload_user ;
private String video_status ;
private String video_comment ;
private int video_watch_time ;
private int video_age ;

public String getArticle_little_content() {
    return article_little_content;
}

public void setArticle_little_content(String article_little_content) {
    this.article_little_content = article_little_content;
}

public int getVideo_id() {
    return video_id;
}

public void setVideo_id(int video_id) {
    this.video_id = video_id;
}

public String getVideo_name() {
    return video_name;
}

public void setVideo_name(String video_name) {
    this.video_name = video_name;
}

public String getVideo_detail() {
    return video_detail;
}

public void setVideo_detail(String video_detail) {
    this.video_detail = video_detail;
}

public String getVideo_type() {
    return video_type;
}

public void setVideo_type(String video_type) {
    this.video_type = video_type;
}

public String getVideo_url() {
    return video_url;
}

public void setVideo_url(String video_url) {
    this.video_url = video_url;
}

public int getVideo_time_long() {
    return video_time_long;
}

public void setVideo_time_long(int video_time_long) {
    this.video_time_long = video_time_long;
}

public int getVideo_comment_total() {
    return video_comment_total;
}

public void setVideo_comment_total(int video_comment_total) {
    this.video_comment_total = video_comment_total;
}

public int getVideo_like_total() {
    return video_like_total;
}

public void setVideo_like_total(int video_like_total) {
    this.video_like_total = video_like_total;
}

public Date getVideo_upload_time() {
    return video_upload_time;
}

public void setVideo_upload_time(Date video_upload_time) {
    this.video_upload_time = video_upload_time;
}

public Date getVideo_check_time() {
    return video_check_time;
}

public void setVideo_check_time(Date video_check_time) {
    this.video_check_time = video_check_time;
}

public Date getVideo_update_time() {
    return video_update_time;
}

public void setVideo_update_time(Date video_update_time) {
    this.video_update_time = video_update_time;
}

public int getVideo_upload_user() {
    return video_upload_user;
}

public void setVideo_upload_user(int video_upload_user) {
    this.video_upload_user = video_upload_user;
}

public String getVideo_status() {
    return video_status;
}

public void setVideo_status(String video_status) {
    this.video_status = video_status;
}

public String getVideo_comment() {
    return video_comment;
}

public void setVideo_comment(String video_comment) {
    this.video_comment = video_comment;
}

public int getVideo_watch_time() {
    return video_watch_time;
}

public void setVideo_watch_time(int video_watch_time) {
    this.video_watch_time = video_watch_time;
}

public int getVideo_age() {
    return video_age;
}

public void setVideo_age(int video_age) {
    this.video_age = video_age;
}

}

4 定义mybatis的配置SqlMapConfig.xml以及javabean对应的xml文件
1 定义javaBean对应的xml文件
在工程的resource路径下定义mappers文件夹,然后文件夹下面定义各个javabean对应的xml文件
在这里插入图片描述

1 定义ArticleDetail.xml配置文件

<?xml version="1.0" encoding="UTF-8" ?> insert into article_detail(article_id,article_name,article_little_content,article_content,article_time ,jiuyou_show_time,article_author_name,article_user_id,article_status,article_create_time,article_update_time ,first_class,first_class_url,second_class,second_class_url,third_class,third_class_url ,fourth_class,fourth_class_url,article_url ) values (#{articleDetail.article_id},#{articleDetail.article_name},#{articleDetail.article_little_content},#{articleDetail.article_content},#{articleDetail.article_time},#{articleDetail.jiuyou_show_time},#{articleDetail.article_author_name},#{articleDetail.article_user_id},#{articleDetail.article_status},#{articleDetail.article_create_time},#{articleDetail.article_update_time}, #{articleDetail.first_class},#{articleDetail.first_class_url},#{articleDetail.second_class} ,#{articleDetail.second_class_url},#{articleDetail.third_class},#{articleDetail.third_class_url} ,#{articleDetail.fourth_class},#{articleDetail.fourth_class_url},#{articleDetail.article_url} )
</insert>

<insert id="addOne" parameterType="com.datas.beans.dbbeans.VideoDetail">
    insert into video_detail(video_id,video_name,video_detail,video_type,video_url,video_time_long,video_comment_total,video_like_total,video_upload_time,video_check_time,video_update_time,video_upload_user,video_status,video_comment,video_watch_time,video_age)
     values (#{video_id},#{video_name},#{video_detail},#{video_type},#{video_url},#{video_time_long},#{video_comment_total},#{video_like_total},#{video_upload_time},#{video_check_time},#{video_update_time},#{video_upload_user},#{video_status},#{video_comment},#{video_watch_time},#{video_age})
</insert>
<select id="getAllVideoDetail" resultType="com.datas.beans.dbbeans.VideoDetail">
    select * from video_detail;
</select>
<select id="getArticleListByUserId" resultType="com.datas.beans.dbbeans.ArticleDetail" parameterType="int">
    select * from article_detail where article_user_id = #{article_user_id}
</select>

2 定义DBCommonBase.xml

<?xml version="1.0" encoding="UTF-8" ?> insert into common_base(type_name,type_value,type_createtime,type_status,type_updatetime,type_flag) values (#{commonBase.type_name},#{commonBase.type_value},#{commonBase.type_createtime},#{commonBase.type_status},#{commonBase.type_updatetime},#{commonBase.type_flag}) select id,type_name,type_value,type_createtime,type_status,type_updatetime,type_flag from user_base where user_id = #{user_id} select id,type_name,type_value,type_createtime,type_status,type_updatetime,type_flag from common_base where type_flag = '4' and type_name = #{type_name} select id,type_name,type_value,type_createtime,type_status,type_updatetime,type_flag from common_base where type_flag = '5' and type_name = #{type_name} select id,type_name,type_value,type_createtime,type_status,type_updatetime,type_flag from common_base; select id,type_name,type_value,type_createtime,type_status,type_updatetime,type_flag from common_base where type_flag = '1';
</select>
<select id="getSearchKey" resultType="com.datas.beans.dbbeans.DBCommonBase">
     select id,type_name,type_value,type_createtime,type_status,type_updatetime,type_flag from common_base where type_flag = '6';
</select>

3 定义GameOrder.xml配置文件

<?xml version="1.0" encoding="UTF-8" ?> insert into game_order(order_id,order_num,buyer_user_id,order_status,pay_type,pay_money,order_time,pay_time,tax_percent,order_deleted) values(#{order_id},#{order_num},#{buyer_user_id},#{order_status},#{pay_type},#{pay_money},#{order_time},#{pay_time},#{tax_percent},#{order_deleted}) select * from game_order where buyer_user_id = #{buyer_user_id}

4 定义GameOrderProduct.XML配置文件

<?xml version="1.0" encoding="UTF-8" ?>
<insert id="addGameOrderProductList" parameterType="java.util.List"  >
    insert into game_order_product(game_product_id,game_order_id,product_id,product_num,product_name,product_url,seller_id,product_price)
    values
    <foreach collection ="list" item="gameOrderProduct"  index="index" separator =",">
        (#{gameOrderProduct.game_product_id},#{gameOrderProduct.game_order_id},#{gameOrderProduct.product_id},#{gameOrderProduct.product_num},#{gameOrderProduct.product_name},#{gameOrderProduct.product_url},#{gameOrderProduct.seller_id},#{gameOrderProduct.product_price})
    </foreach >
</insert>

5 定义GameProduct.xml配置文件

<?xml version="1.0" encoding="UTF-8" ?> insert into game_product(game_id,game_title,game_pic,game_detail_url,game_title_second,game_qufu,game_credit,game_price,game_stock,game_protection,game_user,game_type,game_create_date,game_check_date,game_check_user,game_check_reson,game_check_status) values ( #{gameProduct.game_id},#{gameProduct.game_title},#{gameProduct.game_pic},#{gameProduct.game_detail_url},#{gameProduct.game_title_second},#{gameProduct.game_qufu},#{gameProduct.game_credit},#{gameProduct.game_price},#{gameProduct.game_stock},#{gameProduct.game_protection},#{gameProduct.game_user},#{gameProduct.game_type},#{gameProduct.game_create_date},#{gameProduct.game_check_date},#{gameProduct.game_check_user},#{gameProduct.game_check_reson},#{gameProduct.game_check_status} ) select * from game_product; select * from game_product where game_id = #{game_id}

6 定义ProvinceCode.xml配置文件

<?xml version="1.0" encoding="UTF-8" ?> select * from province_code WHERE pid = '0' and id != '0' select * from province_code WHERE pid = #{pid} select * from province_code WHERE pid = #{pid} AND NAME != '市辖区'

7 定义UserBase.xml配置文件

<?xml version="1.0" encoding="UTF-8" ?> insert into user_base(user_account,user_email,user_name,user_phone,user_origin_password,user_password,user_sex,user_card,user_registertime,user_updatetime,is_deleted,user_province,user_city,user_area,user_address_detail,register_ip,user_pic) values (#{userBase.user_account},#{userBase.user_email},#{userBase.user_name},#{userBase.user_phone},#{userBase.user_origin_password},#{userBase.user_password},#{userBase.user_sex},#{userBase.user_card},#{userBase.user_registertime},#{userBase.user_updatetime},#{userBase.is_deleted},#{userBase.user_province},#{userBase.user_city},#{userBase.user_area},#{userBase.user_address_detail},#{userBase.register_ip},#{userBase.user_pic}) select * from user_base where user_id = #{user_id} insert into user_base(user_id,user_account,user_email,user_name,user_phone,user_origin_password,user_password,user_sex,user_card,user_registertime,user_updatetime,is_deleted,user_province,user_city,user_area,user_address_detail,register_ip,user_pic) values (null,#{user_account},#{user_email},#{user_name},#{user_phone},#{user_origin_password},#{user_password},#{user_sex},#{user_card},#{user_registertime},#{user_updatetime},#{is_deleted},#{user_province},#{user_city},#{user_area},#{user_address_detail},#{register_ip},#{user_pic}) select * from user_base; select * from user_base where user_registertime <= #{user_registertime};

8 定义UserLoginRecord.xml配置文件

<?xml version="1.0" encoding="UTF-8" ?> insert into user_login_record(login_id ,login_user_id ,login_time,login_mac ,login_ip,login_message ,login_status,login_fail_reson) values (#{userLogin.login_id },#{userLogin.login_user_id },#{userLogin.login_time},#{userLogin.login_mac },#{userLogin.login_ip},#{userLogin.login_message },#{userLogin.login_status},#{userLogin.login_fail_reson})

9 定义VideoDetail.xml配置文件

<?xml version="1.0" encoding="UTF-8" ?> insert into video_detail(video_id,video_name,video_detail,video_type,video_url,video_time_long,video_comment_total,video_like_total,video_upload_time,video_check_time,video_update_time,video_upload_user,video_status,video_comment,video_watch_time,video_age) values (#{videoDetail.video_id},#{videoDetail.video_name},#{videoDetail.video_detail},#{videoDetail.video_type},#{videoDetail.video_url},#{videoDetail.video_time_long},#{videoDetail.video_comment_total},#{videoDetail.video_like_total},#{videoDetail.video_upload_time},#{videoDetail.video_check_time},#{videoDetail.video_update_time},#{videoDetail.video_upload_user},#{videoDetail.video_status},#{videoDetail.video_comment},#{videoDetail.video_watch_time},#{videoDetail.video_age})
</insert>
<insert id="addOne" parameterType="com.datas.beans.dbbeans.VideoDetail">
    insert into video_detail(video_id,video_name,video_detail,video_type,video_url,video_time_long,video_comment_total,video_like_total,video_upload_time,video_check_time,video_update_time,video_upload_user,video_status,video_comment,video_watch_time,video_age)
     values (#{video_id},#{video_name},#{video_detail},#{video_type},#{video_url},#{video_time_long},#{video_comment_total},#{video_like_total},#{video_upload_time},#{video_check_time},#{video_update_time},#{video_upload_user},#{video_status},#{video_comment},#{video_watch_time},#{video_age})
</insert>
<select id="getAllVideoDetail" resultType="com.datas.beans.dbbeans.VideoDetail">
    select * from video_detail;
</select>
<select id="getVideoByUserId" parameterType="int" resultType="com.datas.beans.dbbeans.VideoDetail">
    select * from video_detail where video_upload_user = #{video_upload_user}
</select>

2 定义mybatis核心配置文件SqlMapConfig.xml
在工程的resource路径下定义SqlMapConfig.xml配置文件作为mybatis的核心配置文件

<?xml version="1.0" encoding="UTF-8"?>
            <!--解析:这里是添加驱动,还需要注意:这里我的mysql版本是8点多版本,所以在添加驱动时,还需要添加cj。
            如果版本不是那么高的话,驱动就这样的写(com.mysql.cj.jdbc.Driver),反正根据自己的来  -->
            <property name="driver" value="com.mysql.jdbc.Driver" />

            <!--这里属于所访问的地址,还是需要注意一下:下面是由于我的mysql是8点多的版本,需要添加“?serverTimezone=UTC”这个,
            因为这个是解决时区的问题,出现时区的问题,就加上,如果没有,就不加 。介绍:day是我的数据库,还是根据自己的来 -->
            <property name="url" value="jdbc:mysql://localhost:3306/jiuyou_game?useSSL=false" />

            <!--用户  -->
            <property name="username" value="root" />

            <!--这是mysql的密码,反正还是根据自己的mysql的密码  -->
            <property name="password" value="123456" />

        </dataSource>
    </environment>
</environments>
<mappers>
    <!--这里是主配置文件(SqlMapConfig.xml)调用映射文件(user.xml),配置如下  -->
   <!-- <mapper resource="com/kkb/log/bean/video/mappers/ProvinceCode.xml"/>-->
    <mapper resource="mappers/ProvinceCode.xml"></mapper>
    <mapper resource="mappers/UserBase.xml"></mapper>
    <mapper resource="mappers/VideoDetail.xml"></mapper>
    <mapper resource="mappers/DBCommonBase.xml"></mapper>
    <mapper resource="mappers/GameProduct.xml"></mapper>

    <mapper resource="mappers/GameOrder.xml"></mapper>
    <mapper resource="mappers/GameOrderProduct.xml"></mapper>
    <mapper resource="mappers/UserLoginRecord.xml"></mapper>
    <mapper resource="mappers/ArticleDetail.xml"></mapper>
</mappers>

5 拷贝基础工具类
将项目当中需要用到的基础工具类拷贝到com.datas.utils 这个包路径下
在这里插入图片描述

6 开发用户生成代码
在com.datas.dbgenerate 这个package下面定义UserGenerate这个class类,用于生成用户基础信息表
package com.datas.dbgenerate;

import com.datas.beans.dbbeans.ProvinceCode;
import com.datas.beans.dbbeans.UserBase;
import com.datas.utils.DBOperateUtils;
import com.datas.utils.DateUtil;
import com.datas.utils.RandomIdCardMaker;
import com.datas.utils.RandomValue;
import org.apache.commons.codec.digest.DigestUtils;
import java.io.IOException;
import java.text.ParseException;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
/**

  • 用户生成表
    */
    public class UserGenerate {
    public static void main(String[] args) throws IOException, ParseException {
    String[] myArray = {“2021-03-15”,“2021-03-16”,“2021-03-17”,“2021-03-18”,“2021-03-19”,“2021-03-20”,“2021-03-21”,“2021-03-22”,“2021-03-23”,“2021-03-24”,“2021-03-25”,“2021-03-26”,“2021-03-27”,“2021-03-28”,“2021-03-29”,“2021-03-30”,“2021-03-31”};
    for (String eachDate : myArray) {
    List userBase = getUserBase(RandomValue.getRandomInt(300,800),eachDate);
    DBOperateUtils.insertUserBase(userBase);
    }

    }

    public static List getUserBase(int totalUser,String formateDate) throws IOException, ParseException {
    //根据传入的日期来生成注册时间和更新时间
    ArrayList userBaseList = new ArrayList<>();

     List<ProvinceCode> provinceCodeList = DBOperateUtils.getProvince();
     for(int i =0;i< totalUser;i++){
         UserBase userBase = new UserBase();
         userBase.setUser_account(RandomValue.getRandomUserName(8));
         userBase.setUser_email(RandomValue.getEmail(8,12));
         userBase.setUser_name(RandomValue.getRandomUserName(10));
         userBase.setUser_phone(RandomValue.getUserPhone());
         String passwords = RandomValue.getPasswords(10);
         userBase.setUser_origin_password(passwords);
    
         userBase.setUser_password(DigestUtils.md5Hex(passwords));
         userBase.setUser_sex(RandomValue.getRandomInt(0,1)+"");
         userBase.setUser_card(RandomIdCardMaker.getUserCard(1));
    
         Date date = DateUtil.generateRandomDate(formateDate);
    
         userBase.setUser_registertime(date);
         userBase.setUser_updatetime(date);
    
         userBase.setIs_deleted(RandomValue.getRandomInt(0,1)+"");
    
         //随机获取省编码
         int randomInt = RandomValue.getRandomInt(0, provinceCodeList.size() - 1);
         ProvinceCode provinceCode = provinceCodeList.get(randomInt);
         int provinceId = provinceCode.getId();
    
         //通过省编码随机获取市编码
         ProvinceCode city = DBOperateUtils.getCityByCode(provinceId);
         int cityId = city.getId();
    
         //通过市区编码获取区县编码
         int regionId = DBOperateUtils.getRegionByCode(cityId);
    
         userBase.setUser_province(provinceId);
         userBase.setUser_city(cityId);
         userBase.setUser_area(regionId);
         userBase.setUser_address_detail(RandomValue.getRoad());
         userBase.setRegister_ip(RandomValue.getRandomIp());
         userBaseList.add(userBase);
     }
     return userBaseList;
    

    }

}

7 开发视频详情表数据生成
package com.datas.dbgenerate;
import com.datas.beans.dbbeans.VideoDetail;
import com.datas.utils.DBOperateUtils;
import com.datas.utils.JsoupGetAllVideo;

import java.io.IOException;
import java.util.List;

/**

  • 生成用户浏览观看视频表
    */
    public class VideoGenerate {
    public static void main(String[] args) throws IOException {
    List videoListByJsoup = JsoupGetAllVideo.getVideoListByJsoup();
    DBOperateUtils.insertVideoBase(videoListByJsoup);
    }
    }

8 开发文章数据生成
package com.datas.dbgenerate;

import com.datas.utils.JSoupGetAllArticle;

import java.io.IOException;
import java.text.ParseException;

public class ArticleDetailGenerate {
public static void main(String[] args) throws IOException, ParseException {
JSoupGetAllArticle jSoupGetAllArticle = new JSoupGetAllArticle();
jSoupGetAllArticle.getVideoListByJsoup(“2021-03-18”);
}

}

9 开发基础信息字典表数据生成
package com.datas.dbgenerate;
import com.datas.beans.dbbeans.CommonBase;
import com.datas.beans.dbbeans.UserBase;
import com.datas.utils.DBOperateUtils;
import com.datas.utils.RandomValue;

import java.util.ArrayList;
import java.util.Date;
import java.util.List;

/**

  • 生成用户的mac地址与ip地址
    */
    public class CommonBaseGenerateMacAndIp {

    public static void main(String[] args) {
    String[] type_dic = {“热门”,“古风江湖梦”,“日系宅世界”,“端游最经典”,“模拟”,“回合”,“竞技”,“卡牌”,“仙侠”,“魔幻”,“生存”,“角色”,“射击”,“魔性”,“赛车”,“中国风”,“动作”,“放置”,“休闲”,“经营”,“策略”,“塔防”,“消除”,“足球”,“传奇”,“体育”,“音乐”};
    ArrayList type_dicList = new ArrayList<>();
    for (String s : type_dic) {
    CommonBase commonBase = new CommonBase();
    commonBase.setType_name(s);
    // commonBase.setType_value(s);
    commonBase.setType_createtime(new Date());
    commonBase.setType_status(“1”);
    commonBase.setType_updatetime(new Date());
    commonBase.setType_flag(“1”);
    type_dicList.add(commonBase);
    }
    DBOperateUtils.insertCommonBaseList(type_dicList);

     String[] down_dic = {"应用宝","应用汇","华为商城","小米商城","苹果商城","魅族商城","三星商城","豌豆荚"};
     ArrayList<CommonBase> down_dicList = new ArrayList<>();
     for (String s : down_dic) {
         CommonBase commonBase = new CommonBase();
         commonBase.setType_name(s);
         //   commonBase.setType_value(s);
         commonBase.setType_createtime(new Date());
         commonBase.setType_status("1");
         commonBase.setType_updatetime(new Date());
         commonBase.setType_flag("2");
         down_dicList.add(commonBase);
     }
     DBOperateUtils.insertCommonBaseList(down_dicList);
    
     String[] operate_dic = {"android3.5","android3.6","android3.8","android4.2","android4.6","ios8.5","ios8.6","ios8.7","ios9.2","ios9.5","ios10","ios11"};
     ArrayList<CommonBase> operate_dicList = new ArrayList<>();
     for (String s : operate_dic) {
         CommonBase commonBase = new CommonBase();
         commonBase.setType_name(s);
         //   commonBase.setType_value(s);
         commonBase.setType_createtime(new Date());
         commonBase.setType_status("1");
         commonBase.setType_updatetime(new Date());
         commonBase.setType_flag("3");
         operate_dicList.add(commonBase);
     }
     DBOperateUtils.insertCommonBaseList(operate_dicList);
    
     //生成搜索关键字
     String[] searchKey = {"九游国风版本","英雄联盟手游","天涯明月刀","荒野乱斗","地下城与勇士","最强蜗牛","江南百景图","妖精的尾巴","梦幻西游","率师之滨","王者荣耀","和平精英","吃鸡","初始化","金币","钻石、宝石","力量","智力","运气","敏捷","体质、体力","创造","毒性","生命","蓝、魔法","魔法值、技能值","精神","恢复","攻击","魔法攻击","防御","魔法防御","护甲","物理","魔法","暴击","闪避","命中","魅力","冷却","范围","速度、频率","速率、射速","成功率","精品成功率","攻击成功率","获得","设置","支付","成功","失败","取消","分数","名声","因果值","死亡","英雄、玩家","怪物、敌人","药水","回血道具","过关攻略","CF如何卡bug","地下城与勇士bug攻略","秒杀大boss","体力回升药水","火力覆盖","如何屏蔽SB","如何防止被杀","怎么提高速度","怎么拿三杀","青铜如何快速升级","怎么避免小学生","小学生打我怎么办","草丛如何隐身","逃跑攻略","王者荣耀青铜到黄金","王者荣耀快速升级","钻石","黄金","白银","青铜"};
     ArrayList<CommonBase> search_keyList = new ArrayList<>();
     for (String s : searchKey) {
         CommonBase commonBase = new CommonBase();
         commonBase.setType_name("searchKey");
         commonBase.setType_value(s);
         commonBase.setType_createtime(new Date());
         commonBase.setType_status("1");
         commonBase.setType_updatetime(new Date());
         commonBase.setType_flag("4");
         search_keyList.add(commonBase);
     }
     DBOperateUtils.insertCommonBaseList(search_keyList);
    
     //生成用户对应的mac地址与IP地址
     List<UserBase> allUserBase = DBOperateUtils.getAllUserBase();
     for (UserBase userBase : allUserBase) {
         List<CommonBase> commonBaseMacList = getCommonBaseMacList(RandomValue.getRandomInt(1, 5), userBase.getUser_id());
         DBOperateUtils.insertCommonBaseList(commonBaseMacList);
         List<CommonBase> commonBaseList = generateUserIp(RandomValue.getRandomInt(1, 5), userBase.getUser_id());
         DBOperateUtils.insertCommonBaseList(commonBaseList);
     }
    

    }
    /**

    • 生成用户与mac地址的绑定对应关系
      /
      public static List getCommonBaseMacList(int totalResult, int userId){
      ArrayList commonBaseList = new ArrayList<>();
      for(int i =1;i<= totalResult;i++){
      CommonBase commonBase = new CommonBase();
      commonBase.setType_name(userId+"");
      //给每个用户生成一个mac地址
      commonBase.setType_value(RandomValue.randomMac4Qemu());
      //甚至有的用户生成两个mac地址
      commonBase.setType_createtime(new Date());
      commonBase.setType_status(“1”);
      commonBase.setType_updatetime(new Date());
      commonBase.setType_flag(“5”);
      commonBaseList.add(commonBase);
      //如果是9的倍数,那就给他生成两个mac地址
      if(i % 9 ==0){
      CommonBase commonBase1 = new CommonBase();
      commonBase1.setType_name(i+"");
      //给每个用户生成一个mac地址
      commonBase1.setType_value(RandomValue.randomMac4Qemu());
      //甚至有的用户生成两个mac地址
      commonBase1.setType_createtime(new Date());
      commonBase1.setType_status(“1”);
      commonBase1.setType_updatetime(new Date());
      commonBase1.setType_flag(“4”);
      commonBaseList.add(commonBase1);
      }
      //如果输入的i的值是9和4的倍数,那么就再给生成一个mac地址
      if(i % 9 ==0 && i % 4 == 0){
      CommonBase commonBase2 = new CommonBase();
      commonBase2.setType_name(i+"");
      //给每个用户生成一个mac地址
      commonBase2.setType_value(RandomValue.randomMac4Qemu());
      //甚至有的用户生成两个mac地址
      commonBase2.setType_createtime(new Date());
      commonBase2.setType_status(“1”);
      commonBase2.setType_updatetime(new Date());
      commonBase2.setType_flag(“5”);
      commonBaseList.add(commonBase2);
      }
      }
      return commonBaseList;
      }
      /
      *

    • 生成用户与IP地址的绑定对应关系
      */
      public static List generateUserIp(int totalResult,int userId){
      ArrayList commonBaseList = new ArrayList<>();
      for(int i =1;i<= totalResult;i++){
      CommonBase commonBase = new CommonBase();
      commonBase.setType_name(userId+"");
      //给每个用户生成一个mac地址
      String randomIp = RandomValue.getRandomIp();
      commonBase.setType_value(randomIp);
      //甚至有的用户生成两个mac地址
      commonBase.setType_createtime(new Date());
      commonBase.setType_status(“1”);
      commonBase.setType_updatetime(new Date());
      commonBase.setType_flag(“6”);
      commonBaseList.add(commonBase);

       String[] split = randomIp.split("\\.");
      
       for(int j = 2; j<= RandomValue.getRandomInt(5,10); j++){
           int third = RandomValue.getRandomInt(Integer.parseInt(split[2]),Integer.parseInt(split[2])+ 50);
           if(third > 255){
               third = 255;
           }
           int fourth = RandomValue.getRandomInt(Integer.parseInt(split[3]),Integer.parseInt(split[3])+ 50);
           if(fourth > 255){
               fourth = 255;
           }
           String randomnewIP = split[0] + "."  + split[1] + "." + third + "." + fourth;
           CommonBase commonBaseJ = new CommonBase();
           commonBaseJ.setType_name(i+"");
           //给每个用户生成一个mac地址
           commonBaseJ.setType_value(randomnewIP);
           //甚至有的用户生成两个mac地址
           commonBaseJ.setType_createtime(new Date());
           commonBaseJ.setType_status("1");
           commonBaseJ.setType_updatetime(new Date());
           commonBaseJ.setType_flag("6");
           commonBaseList.add(commonBaseJ);
       }
      

      }
      return commonBaseList;
      }
      }

10 开发商品表数据生成
package com.datas.dbgenerate;

import com.datas.beans.dbbeans.GameProduct;
import com.datas.beans.dbbeans.UserBase;
import com.datas.utils.DBOperateUtils;
import com.datas.utils.DateUtil;
import com.datas.utils.RandomValue;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.IOException;
import java.net.URL;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;

/**

  • 生成商品表
    */
    public class GameProductGenerate {
    public static void main(String[] args) throws IOException {
    List userList = DBOperateUtils.getAllUserBase();
    String urlFirst = “https://www.dd373.com/s-rbg22w-0-0-0-0-0-0-0-0-0-0-0-”;
    String urlSecond = “-0-0-0.html”;
    ArrayList gameProducts = new ArrayList<>();
    for(int i =1000;i<=2096;i++){
    String finalUrl = urlFirst + i + urlSecond;
    Document parse = Jsoup.parse(new URL(finalUrl), 60000);
    Elements select = parse.select(“div.goods-list-item”);
    for (Element element : select) {
    String gameTitle = element.select(“div.game-account-flag”).text();

             String game_detail_url = element.select("div.h1-box > h2 > a ").attr("href");
    
             String game_title_second = element.select("span.label-bg").text();
             Elements select1 = element.select("span.font12.color666.game-qufu-value.max-width506");
             String game_qufu= "";
             for (Element element1 : select1) {
                 game_qufu +=   element1.text()+"\001";
             }
             String game_type = select1.last().text();
    
             int game_credit = element.select("p.game-reputation").select("i").size();
             String game_price = element.select("div.goods-price > span").text();
             String game_stock = element.select("div.kucun > span").text();
             String game_protection = element.select("div.server-protection > a > span").text();
             GameProduct gameProduct = new GameProduct();
             gameProduct.setGame_title(gameTitle);
             gameProduct.setGame_pic(null);
             gameProduct.setGame_title_second(game_title_second);
             gameProduct.setGame_qufu(game_qufu);
             gameProduct.setGame_credit(game_credit+"");
             gameProduct.setGame_price(Double.valueOf(game_price.replace("¥","")));
             gameProduct.setGame_stock(game_stock+"");
             gameProduct.setGame_protection(game_protection);
             gameProduct.setGame_type(game_type);
             gameProduct.setGame_detail_url(game_detail_url);
    
             UserBase userBase = userList.get(RandomValue.getRandomInt(1, userList.size() - 1));
             gameProduct.setGame_user(userBase.getUser_id());
             Date game_create_date = DateUtil.generatePutOffTime(userBase.getUser_registertime());
             Date game_check_date = DateUtil.generatePutOffTime(game_create_date);
             gameProduct.setGame_create_date(game_create_date);
             gameProduct.setGame_check_date(game_check_date);
             gameProduct.setGame_check_user(Integer.MAX_VALUE);
             gameProduct.setGame_check_reson("");
             gameProduct.setGame_check_status(RandomValue.getRandomInt(0,1) + "");
             gameProducts.add(gameProduct);
             if(gameProducts.size()>= 200){
                 //插入数据库
                 DBOperateUtils.insertGameProducts(gameProducts);
                 gameProducts.clear();
             }
         }
     }
     //插入mysql数据库
     DBOperateUtils.insertGameProducts(gameProducts);
    

    }
    }

11 开发用户登录流水表数据生成
package com.datas.dbgenerate;
import com.datas.utils.UserLoginRecordUtils;

import java.text.ParseException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
/**

  • 用户登录产生的登录日志表
    */
    public class UserLoginRecordGenerate {

    public static void main(String[] args) throws ParseException {
    // new UserLoginRecordUtils2().generateUserLogin();
    ExecutorService es = Executors.newFixedThreadPool(8);
    es.submit(new UserLoginRecordUtils());
    es.submit(new UserLoginRecordUtils());
    es.submit(new UserLoginRecordUtils());
    es.submit(new UserLoginRecordUtils());
    }
    }

12 订单商品表数据生成
package com.datas.dbgenerate;
import com.datas.beans.dbbeans.GameOrder;
import com.datas.beans.dbbeans.GameOrderProduct;
import com.datas.beans.dbbeans.GameProduct;
import com.datas.beans.dbbeans.UserBase;
import com.datas.utils.DBOperateUtils;
import com.datas.utils.DateUtil;
import com.datas.utils.RandomValue;
import java.util.*;
/**

  • 生成订单表以及订单关联商品表
    /
    public class GameOrderAndProductGenerate {
    public static void main(String[] args) {
    for(int i =0;i<1000;i++){
    Map<GameOrder, List> gameOrderAndProduct = getGameOrderAndProduct();
    Set gameOrders = gameOrderAndProduct.keySet();
    for (GameOrder gameOrder : gameOrders) {
    List gameOrderProductList = gameOrderAndProduct.get(gameOrder);
    DBOperateUtils.insertGameOrder(gameOrder);
    DBOperateUtils.insertGameOrderProductList(gameOrderProductList);
    }
    }
    }
    /
    *
    • 生成订单以及订单对应的多个商品的集合

    • @return
      */
      public static Map<GameOrder, List> getGameOrderAndProduct(){
      Map<GameOrder, List> gameOrderAndProductMap = new HashMap<GameOrder, List>();
      List gameProductList = DBOperateUtils.GetAllGameProduct();
      List allUserBase = DBOperateUtils.getAllUserBase();
      UserBase buyUser = allUserBase.get(RandomValue.getRandomInt(0, allUserBase.size() - 1));
      List gameOrderProductList = new ArrayList();
      GameOrder gameOrder = new GameOrder();
      String orderNum = RandomValue.getOrderNum(Long.valueOf(buyUser.getUser_id()));
      gameOrder.setOrder_num(orderNum);
      gameOrder.setBuyer_user_id(buyUser.getUser_id());
      gameOrder.setOrder_status(RandomValue.getRandomInt(0,4)+"");
      gameOrder.setPay_type(RandomValue.getRandomInt(0,2)+"");

      Double totalPrice = new Double(0.0);

      int randomInt = RandomValue.getRandomInt(1, 4);
      for(int i=0;i< randomInt;i++) {
      GameProduct gameProduct = gameProductList.get(RandomValue.getRandomInt(0, gameProductList.size() - 1));
      GameOrderProduct gameOrderProduct = new GameOrderProduct();
      gameOrderProduct.setGame_order_id(orderNum);
      gameOrderProduct.setProduct_id(gameProduct.getGame_id());
      gameOrderProduct.setProduct_num(RandomValue.getRandomInt(1,3));
      gameOrderProduct.setProduct_name(gameProduct.getGame_title());
      gameOrderProduct.setProduct_url(gameProduct.getGame_detail_url());
      gameOrderProduct.setSeller_id(gameProduct.getGame_user()+"");
      gameOrderProduct.setProduct_price(gameProduct.getGame_price());
      totalPrice = totalPrice +gameProduct.getGame_price();
      gameOrderProductList.add(gameOrderProduct);
      }
      Date getOrderTime = DateUtil.getMaxProductDate(gameOrderProductList);
      gameOrder.setPay_money(totalPrice);
      Date orderTime = DateUtil.generatePutOffTime(getOrderTime);
      gameOrder.setOrder_time(orderTime);
      gameOrder.setPay_time(DateUtil.generatePayTime(orderTime));
      gameOrder.setTax_percent(“0.1”);
      gameOrder.setOrder_deleted(“1”);
      gameOrderAndProductMap.put(gameOrder,gameOrderProductList);
      return gameOrderAndProductMap;
      }
      }

4、用户埋点数据生成模块
4.1 埋点数据基本格式
公共字段:基本所有安卓手机都包含的字段
业务字段:埋点上报的字段,有具体的业务类型
下面就是一个示例,表示业务字段的上传。
{
“log_type”:“app”,
“event_array”:[
{
“event_name”:“search”,
“event_json”:{
“search_code”:“1”,
“search_time”:“1615800500564”
},
“event_time”:1615800496748
},
{
“event_name”:“subscribe”,
“event_json”:{
“sub_user_name”:“41lpzLV325”,
“sub_type”:“1”,
“sub_goods_comment”:“33”,
“sub_goods_like”:“80”,
“sub_user_discuss”:“54”,
“sub_user_id”:“505”
},
“event_time”:1615800488946
},
{
“event_name”:“cycleAdd”,
“event_json”:{
“cycle_id”:“41”,
“cycleOperateTime”:1615800511002,
“cycle_operate_type”:“1”
},
“event_time”:1615800509870
},
{
“event_name”:“sendVideo”,
“event_json”:{
“video_time”:“1615800505380”,
“video_name”:"《穿行三国》郡城争夺-活动预览",
“video_user_id”:“505”,
“video_long”:106,
“video_success_time”:“1615800519237”,
“video_type”:“13”
},
“event_time”:1615800508498
},
{
“event_name”:“articleLike”,
“event_json”:{
“target_id”:285831,
“type”:“1”,
“add_time”:“1615800511556”,
“userid”:505
},
“event_time”:1615800485912
},
{
“event_name”:“articleComment”,
“event_json”:{
“p_comment_id”:916,
“commentType”:“1”,
“commentId”:“619779”,
“praise_count”:53,
“commentContent”:“妈程汇岔饵裸看驳孤绞别织伎柔痛媒合叶昼悟兄”,
“reply_count”:181
},
“event_time”:1615800511794
},
{
“event_name”:“articleShare”,
“event_json”:{
“shareCycle”:“1”,
“shareTime”:“1615800510091”,
“shareArticleId”:“3457”,
“shareType”:“2”,
“shareUserId”:505
},
“event_time”:1615800505943
}
],
“common_base”:{
“channel_num”:“0”,
“lat”:"-6.0",
“lng”:"-93.0",
“log_time”:“1615803420052”,
“mobile_brand”:“htc”,
“mobile_type”:“htc10”,
“net_type”:“4G”,
“operate_version”:“android3.8”,
“screen_size”:“640*960”,
“sdk_version”:“sdk1.6.6”,
“sys_lag”:“简体中文”,
“user_id”:“505”,
“user_ip”:“36.58.146.211”,
“version_name”:“第一个版本2017-02-08”,
“version_num”:“1”
}
}

示例日志(服务器时间戳 | 日志):
1615797260652| {
“log_type”:“app”,
“event_array”:[
{
“event_name”:“search”,
“event_json”:{
“search_code”:“1”,
“search_time”:“1615800500564”
},
“event_time”:1615800496748
},
{
“event_name”:“subscribe”,
“event_json”:{
“sub_user_name”:“41lpzLV325”,
“sub_type”:“1”,
“sub_goods_comment”:“33”,
“sub_goods_like”:“80”,
“sub_user_discuss”:“54”,
“sub_user_id”:“505”
},
“event_time”:1615800488946
},
{
“event_name”:“cycleAdd”,
“event_json”:{
“cycle_id”:“41”,
“cycleOperateTime”:1615800511002,
“cycle_operate_type”:“1”
},
“event_time”:1615800509870
},
{
“event_name”:“sendVideo”,
“event_json”:{
“video_time”:“1615800505380”,
“video_name”:"《穿行三国》郡城争夺-活动预览",
“video_user_id”:“505”,
“video_long”:106,
“video_success_time”:“1615800519237”,
“video_type”:“13”
},
“event_time”:1615800508498
},
{
“event_name”:“articleLike”,
“event_json”:{
“target_id”:285831,
“type”:“1”,
“add_time”:“1615800511556”,
“userid”:505
},
“event_time”:1615800485912
},
{
“event_name”:“articleComment”,
“event_json”:{
“p_comment_id”:916,
“commentType”:“1”,
“commentId”:“619779”,
“praise_count”:53,
“commentContent”:“妈程汇岔饵裸看驳孤绞别织伎柔痛媒合叶昼悟兄”,
“reply_count”:181
},
“event_time”:1615800511794
},
{
“event_name”:“articleShare”,
“event_json”:{
“shareCycle”:“1”,
“shareTime”:“1615800510091”,
“shareArticleId”:“3457”,
“shareType”:“2”,
“shareUserId”:505
},
“event_time”:1615800505943
}
],
“common_base”:{
“channel_num”:“0”,
“lat”:"-6.0",
“lng”:"-93.0",
“log_time”:“1615803420052”,
“mobile_brand”:“htc”,
“mobile_type”:“htc10”,
“net_type”:“4G”,
“operate_version”:“android3.8”,
“screen_size”:“640*960”,
“sdk_version”:“sdk1.6.6”,
“sys_lag”:“简体中文”,
“user_id”:“505”,
“user_ip”:“36.58.146.211”,
“version_name”:“第一个版本2017-02-08”,
“version_num”:“1”
}
}

下面是各个埋点日志详细数据字段格式。
4.2 事件日志数据
4.2.1 滑动操作日志
字段名称 含义
move_direct 客户端滑动方向 1:向下滑动 2:向上滑动
start_offset 滑动的起始offset
end_offset 滑动的结束offset
move_menu 哪个菜单界面滑动的 0:首页 :1:圈子 :2:新奇

4.2.2 点击操作日志
字段名称 含义
click_type 点击的分类,一级分类还是二级分类:1.一级分类 2.二级分类 3.三级分类 4.四级分类
type_name 一级分类名称 或者二级分类名称 或者三级分类名称 或者四级分类名称

click_request_url 点击之后请求的URL地址,一级分类没有请求地址,二级分类有请求URL地址

click_time 点击时间

click_entry 页面入口来源:1:首页浏览点击进入 2: 消息推送点击进入 3:新奇界面点击进入 4:首页推荐点击进入

4.2.3 搜索操作日志
字段名称 含义
search_key
搜索关键字

search_time
搜索开始时间

search_code
搜索的结果 0:搜索失败 1:搜索成功

4.2.4 订阅操作日志
字段名称 含义
sub_type
订阅类型 0:取消订阅 1:订阅用户
sub_user_id
取消或者订阅用户id

sub_user_name
订阅用户名称

sub_user_discuss
订阅的频道多少用户在讨论

sub_goods_comment
订阅视频评论数

sub_goods_like
订阅商品的点赞数量

4.2.5 加入圈子操作日志
字段名称 含义
cycle_operate_type
圈子操作类型 0:取消加入圈子 1:加入圈子

cycle_id
圈子id

cycleOperateTime
加入圈子时间

4.2.6 发视频操作日志
字段名称 含义
video_id 视频id
video_name 视频名称
video_long 视频时长
video_time 开始发送视频时间
video_type 视频发送状态 0:开始发送日志记录 1:发送成功日志记录
video_success_time 视频发送成功记录时间
video_user_id 视频关联用户id

4.2.7 发帖子操作日志
字段名称 含义
article_id
帖子id
article_type
发帖子类型 0:发帖 1:发帖成功 2:发帖失败

article_cycle
帖子所属圈子

article_pic_num
帖子图片数量

article_time
帖子操作时间

article_user_id
发帖关联用户id

4.2.8 点赞操作日志
字段名称 含义
userid
帖子id
target_id
发帖子类型 0:发帖 1:发帖成功 2:发帖失败

type
帖子所属圈子

add_time
添加时间

4.2.9 评论操作日志
字段名称 含义
commentType
1 文章评论 2视频评论

commentId
评论视频或者文章id

p_comment_id

父级评论id(为0则是一级评论,不为0则是回复)

commentContent
评论内容

commentTime
评论时间

praise_count
点赞数量

reply_count
回复数量

4.2.10 分享操作日志
字段名称 含义
shareUserId 分享用户id
shareType 分享类型 1 文章分享 2 视频分享
shareArticleId 分享视频或者文章id
shareTime 分享时间
shareCycle 分享方式 0 微信好友 1微信朋友圈 2 qq好友 3 微博分享

4.3 APP启动日志
4.3.1 App启动日志
字段名称 含义
entry 入口: push=1,widget=2,icon=3,notification=4, lockscreen_widget =5
open_ad_type 开屏广告类型: 开屏原生广告=1, 开屏插屏广告=2
action 状态:成功=1 失败=2
loading_time 加载时长:计算下拉开始到接口返回数据的时间,(开始加载报0,加载成功或加载失败才上报时间)
detail 失败码(没有则上报空)

extend1 失败的message(没有则上报空)
en 启动日志类型标记

4.4 用户日志数据生成
1 拷贝用户日志数据生成基础类
在这里插入图片描述

2 定义用户操作日志数据生成
package com.datas.loggenerate;

import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;
import com.datas.beans.dbbeans.*;
import com.datas.beans.logbeans.AppStart;
import com.datas.utils.DBOperateUtils;
import com.datas.utils.GenerateLogUtils;
import com.datas.utils.RandomValue;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.text.ParseException;
import java.util.List;
import java.util.Random;

/**

  • 主程序模拟生成用户操作日志

  • 一定是要在存在的用户当中才存在操作日志
    */
    public class OperateGenerate {
    private final static Logger logger = LoggerFactory.getLogger(OperateGenerate.class);
    private static Random rand = new Random();
    public static void main(String[] args) throws IOException, ParseException {
    String[] myArray = {“2021-03-15”,“2021-03-16”,“2021-03-17”,“2021-03-18”,“2021-03-19”,“2021-03-20”,“2021-03-21”,“2021-03-22”,“2021-03-23”,“2021-03-24”,“2021-03-25”,“2021-03-26”,“2021-03-27”,“2021-03-28”,“2021-03-29”,“2021-03-30”,“2021-03-31”};
    List videoDetailList = DBOperateUtils.getAllVideoDetail();
    List gameProductList = DBOperateUtils.GetAllGameProduct();
    List commonBaseList = DBOperateUtils.getTypeNameBase();
    for (String dateArray : myArray) {
    List userBaseList = DBOperateUtils.getUserBaseByDate(dateArray);
    //从这些用户当中挑选一些用户出来做各种操作,挑选多少个用户呢???挑选30% 到70%的用户吧
    double lower = userBaseList.size() * 0.3;
    double higher = userBaseList.size() * 0.7;
    int lowerNum = (int)lower;
    int higherNum = (int)higher;
    int finalNumUser = RandomValue.getRandomInt(lowerNum, higherNum);
    for(int i =0;i< finalNumUser;i++){
    UserBase userBase = userBaseList.get(RandomValue.getRandomInt(0, userBaseList.size() - 1));
    List videoDetailListByUserId = DBOperateUtils.getVideoListByUserId(userBase.getUser_id());
    List articleDetailList = DBOperateUtils.getArticleDetailListByUserId(userBase.getUser_id());
    //每个用户生成多少条数据??
    //每个用户操作日志生成6-20条
    //生成三个启动日志
    //生成 6 - 20个 操作日志
    int startTimes = RandomValue.getRandomInt(1, 5);
    //启动次数日志
    for(int j =0;j< startTimes;j++){
    AppStart appStart = GenerateLogUtils.getAppStart(userBase,dateArray);
    //操作日志次数
    String jsonString = JSON.toJSONString(appStart);
    //控制台打印
    logger.info(jsonString);
    //操作次数日志
    int operateTimes = RandomValue.getRandomInt(startTimes * 10, startTimes * 40);
    //操作日志包括 移动 点击
    // 0 登录 已经有了登录记录 1 滑动 2:点击操作 3 搜索 4 订阅 5 加入圈子 6 发视频 7 发帖子 8 购买支付 9 点赞 10 评论 11 分享
    //App启动时间
    String startTime = appStart.getLog_time();
    for(int p = 0;p<= operateTimes;p++){
    JSONObject json = new JSONObject();
    json.put(“log_type”, “app”);
    json.put(“common_base”, GenerateLogUtils.getCommonBase(userBase,dateArray));
    JSONArray eventsArray = new JSONArray();
    //1 滑动
    if(rand.nextBoolean()){
    eventsArray.add(GenerateLogUtils.getMoveDetail(userBase,dateArray,startTime));
    json.put(“event_array”, eventsArray);
    }
    // 2:点击操作按钮
    if(rand.nextBoolean()){
    eventsArray.add(GenerateLogUtils.getClickOperate(startTime,videoDetailList,gameProductList,commonBaseList));
    json.put(“event_array”, eventsArray);
    }
    // 3 搜索
    if(rand.nextBoolean()){
    eventsArray.add(GenerateLogUtils.getSearchDetail(startTime,commonBaseList));
    json.put(“event_array”, eventsArray);
    }
    // 4 订阅
    if(rand.nextBoolean()){
    eventsArray.add(GenerateLogUtils.getSubscribeDetail(userBase,startTime));
    json.put(“event_array”, eventsArray);
    }
    // 5 加入圈子
    if(rand.nextBoolean()){
    eventsArray.add(GenerateLogUtils.getCycleAddDetail(userBase,startTime));
    json.put(“event_array”, eventsArray);
    }
    //6 发视频
    if(rand.nextBoolean()){
    if(videoDetailListByUserId.size() > 0){
    VideoDetail videoDetail = videoDetailListByUserId.get(RandomValue.getRandomInt(0, videoDetailListByUserId.size() - 1));
    eventsArray.add(GenerateLogUtils.getSendVideo(userBase,startTime,videoDetail));
    json.put(“event_array”, eventsArray);
    }
    }
    // 7 发帖子
    if(rand.nextBoolean()){

                         if(articleDetailList.size() > 0){
                             ArticleDetail articleDetail = articleDetailList.get(RandomValue.getRandomInt(0, articleDetailList.size() - 1));
                             eventsArray.add(GenerateLogUtils.getSendArticle(userBase,startTime,articleDetail));
                             json.put("event_array", eventsArray);
                         }
                     }
                     // 9 点赞
                     if(rand.nextBoolean()){
                         JSONObject articleLike = GenerateLogUtils.getArticleLike(userBase, startTime);
                         eventsArray.add(articleLike);
                         json.put("event_array",eventsArray);
                     }
                     // 10 评论
                     if(rand.nextBoolean()){
                         JSONObject articleComment =   GenerateLogUtils.getArticleComment(userBase,startTime);
                         eventsArray.add(articleComment);
                         json.put("event_array",eventsArray);
                     }
                     // 11 分享
                     if(rand.nextBoolean()){
                         JSONObject articleShare = GenerateLogUtils.getArticleShare(userBase,startTime);
                         eventsArray.add(articleShare);
                         json.put("event_array",eventsArray);
                     }
                     //时间
                     long millis = System.currentTimeMillis();
                     System.out.println(millis + "|" + json.toJSONString());
                     //控制台打印
                     logger.info(millis + "|" + json.toJSONString());
                 }
             }
         }
     }
    

    }
    }

5、日志生成代码打包执行
修改logback.xml配置文件属性

修改pom.xml指定main class路径

maven-assembly-plugin


jar-with-dependencies



com.datas.loggenerate.OperateGenerate





make-assembly
package

single



通过maven进行package打包
在这里插入图片描述

6、上传数据库文件以及jar包文件
1、上传数据库文件到node03
将数据库文件jiuyoudatas.sql上传到node03的/kkb/soft路径下,并进入mysql的客户端,执行该sql文件
在这里插入图片描述

[hadoop@node03 ~]$ cd /kkb/soft/
[hadoop@node03 soft]$ mysql -uroot –p
mysql> source /kkb/soft/jiuyodatas.sql

在这里插入图片描述

2、上传jar包文件到node01
将我们打包之后的jar包文件上传到node01服务器的/kkb/soft路径下,然后开发日志文件启动以及停止生成的脚本
在这里插入图片描述

编辑日志文件启动以及停止生成脚本
node01服务器执行以下命令,开发启动或者停止日志文件生成的脚本
[hadoop@node01 bin]$ mkdir -p /home/hadoop/bin/
[hadoop@node01 bin]$ vim data_generate.sh

#!/bin/bash
case $1 in
“start” ){
scp /kkb/soft/data_generate-1.0-jar-with-dependencies.jar node02:/kkb/soft
for m in node01 node02
do
ssh $m "source /etc/profile;nohup java -jar /kkb/soft/data_generate-1.0-jar-with-dependencies.jar > /dev/null 2>&1 & "
done
};;
“stop”){

ssh node01 “source /etc/profile;ps -ef | grep data_generate | grep -v grep |awk ‘{print $2}’ | xargs kill”

ssh node02 "source /etc/profile;ps -ef | grep data_generate | grep -v grep |awk '{print \$2}' | xargs kill"
for i in node01 node02
 do
   ssh $i "source /etc/profile;ps -ef | grep data_generate | grep -v grep |awk '{print \$2}' | xargs kill"
 done

};;
esac
3、集群时间同步修改脚本
三台机器切换到root用户,然后使用crontab -e命令来取消定时时钟同步的问题
[hadoop@node01 bin]$ su root
密码:
[root@node01 bin]# crontab -e
crontab: installing new crontab
取消时钟同步的任务

1)在node01服务器的/home/hadoop/bin目录下创建脚本dt.sh

[hadoop@node01 bin]$ vim dt.sh

2)在脚本中编写如下内容

#!/bin/bash

log_date=$1

for i in node01 node02 node03
do
ssh -t $i "sudo date -s l o g d a t e " d o n e 说 明 ( s s h − t ) : h t t p s : / / w w w . c n b l o g s . c o m / k e v i n g r a c e / p / 6110842. h t m l 3 ) 修 改 脚 本 执 行 权 限 [ h a d o o p @ n o d e 01 b i n ] log_date" done 说明(ssh -t):https://www.cnblogs.com/kevingrace/p/6110842.html 3)修改脚本执行权限 [hadoop@node01 bin] logdate"donesshthttps://www.cnblogs.com/kevingrace/p/6110842.html3[hadoop@node01bin] chmod 777 dt.sh

4)启动脚本
[hadoop@node01 bin]$ ./dt.sh 2021-03-15

4、集群所有进程查看脚本
1)在/home/hadoop/bin目录下创建脚本call.sh
[hadoop@node01 bin]$ vim xcall.sh

2)在脚本中编写如下内容

#!/bin/bash

for i in node01 node02 node03
do
echo --------- $i ----------
ssh i " s o u r c e / e t c / p r o f i l e ; i "source /etc/profile; i"source/etc/profile;*"
done

3)修改脚本执行权限
[hadoop@node01 bin]$ chmod 777 xcall.sh

4)启动脚本
[hadoop@node01 bin]$ ./xcall.sh jps

7、Flume日志数据采集
1、启动日志文件生成脚本
node01执行以下命令启动日志文件生成脚本
[hadoop@node01 bin]$ cd /home/hadoop/bin/
[hadoop@node01 bin]$ sh data_generate.sh start

2、使用Flume来采集日志数据文件
在这里插入图片描述

2.1 日志采集Flume安装
详见:Flume安装文档
集群规划:
服务器node01 服务器node02 服务器node03
Flume(采集日志) Flume Flume Flume
2.2 项目经验之Flume组件
1)Source
(1)Taildir Source相比Exec Source、Spooling Directory Source的优势
TailDir Source:断点续传、多目录。Flume1.6以前需要自己自定义Source记录每次读取文件位置,实现断点续传。
Exec Source可以实时搜集数据,但是在Flume不运行或者Shell命令出错的情况下,数据将会丢失。
Spooling Directory Source监控目录,不支持断点续传。
(2)batchSize大小如何设置?
答:Event 1K左右时,500-1000合适(默认为100)
2)Channel
保证数据的安全可靠,使用类型file,把数据缓存在磁盘中。
2.3 Flume的ETL和分类型拦截器
本项目中自定义了两个拦截器,分别是:ETL拦截器、日志类型区分拦截器。
ETL拦截器主要用于,过滤时间戳不合法和Json数据不完整的日志
日志类型区分拦截器主要用于,将启动日志和事件日志区分开来
1)创建Maven工程flume_etl
2)创建包名:com.game.flume.etl
3)在pom.xml文件中添加如下配置


org.apache.flume
flume-ng-core
1.9.0

maven-compiler-plugin 2.3.2 1.8 1.8 maven-assembly-plugin jar-with-dependencies make-assembly package single

4)在com.game.flume.etl包下创建LogETLInterceptor类名
Flume ETL拦截器LogETLInterceptor
package com.game.flume.etl;

import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.interceptor.Interceptor;

import java.nio.charset.Charset;
import java.util.ArrayList;
import java.util.List;

public class LogETLInterceptor implements Interceptor {

@Override
public void initialize() {

}

@Override
public Event intercept(Event event) {

    // 1 获取数据
    byte[] body = event.getBody();
    String log = new String(body, Charset.forName("UTF-8"));

    // 2 判断数据类型并向Header中赋值
    if (log.contains("start")) {
        if (LogUtils.validateStart(log)){
            return event;
        }
    }else {
        if (LogUtils.validateEvent(log)){
            return event;
        }
    }

    // 3 返回校验结果
    return null;
}

@Override
public List<Event> intercept(List<Event> events) {

    ArrayList<Event> interceptors = new ArrayList<>();

    for (Event event : events) {
        Event intercept1 = intercept(event);

        if (intercept1 != null){
            interceptors.add(intercept1);
        }
    }

    return interceptors;
}

@Override
public void close() {

}

public static class Builder implements Interceptor.Builder{

    @Override
    public Interceptor build() {
        return new LogETLInterceptor();
    }

    @Override
    public void configure(Context context) {

    }
}

}

4)Flume日志过滤工具类
package com.game.flume.etl;
import org.apache.commons.lang.math.NumberUtils;

public class LogUtils {

public static boolean validateEvent(String log) {
    // 服务器时间 | json
    // 1549696569054 | {"cm":{"ln":"-89.2","sv":"V2.0.4","os":"8.2.0","g":"M67B4QYU@gmail.com","nw":"4G","l":"en","vc":"18","hw":"1080*1920","ar":"MX","uid":"u8678","t":"1549679122062","la":"-27.4","md":"sumsung-12","vn":"1.1.3","ba":"Sumsung","sr":"Y"},"ap":"weather","et":[]}

    // 1 切割
    String[] logContents = log.split("\\|");

    // 2 校验
    if(logContents.length != 2){
        return false;
    }

    //3 校验服务器时间
    if (logContents[0].length()!=13 || !NumberUtils.isDigits(logContents[0])){
        return false;
    }

    // 4 校验json
    if (!logContents[1].trim().startsWith("{") || !logContents[1].trim().endsWith("}")){
        return false;
    }

    return true;
}

public static boolean validateStart(String log) {

// {“action”:“1”,“ar”:“MX”,“ba”:“HTC”,“detail”:“542”,“en”:“start”,“entry”:“2”,“extend1”:"",“g”:“S3HQ7LKM@gmail.com”,“hw”:“640*960”,“l”:“en”,“la”:"-43.4",“ln”:"-98.3",“loading_time”:“10”,“md”:“HTC-5”,“mid”:“993”,“nw”:“WIFI”,“open_ad_type”:“1”,“os”:“8.2.1”,“sr”:“D”,“sv”:“V2.9.0”,“t”:“1559551922019”,“uid”:“993”,“vc”:“0”,“vn”:“1.1.5”}

    if (log == null){
        return false;
    }

    // 校验json
    if (!log.trim().startsWith("{") || !log.trim().endsWith("}")){
        return false;
    }

    return true;
}

}

5)Flume日志类型区分拦截器LogTypeInterceptor
package com.bigdata.flume.interceptor;

import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.interceptor.Interceptor;

import java.nio.charset.Charset;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;

public class LogTypeInterceptor implements Interceptor {
@Override
public void initialize() {

}

@Override
public Event intercept(Event event) {

    // 区分日志类型:   body  header
    // 1 获取body数据
    byte[] body = event.getBody();
    String log = new String(body, Charset.forName("UTF-8"));

    // 2 获取header
    Map<String, String> headers = event.getHeaders();

    // 3 判断数据类型并向Header中赋值
    if (log.contains("start")) {
        headers.put("topic","topic_start");
    }else {
        headers.put("topic","topic_event");
    }

    return event;
}

@Override
public List<Event> intercept(List<Event> events) {

    ArrayList<Event> interceptors = new ArrayList<>();

    for (Event event : events) {
        Event intercept1 = intercept(event);

        interceptors.add(intercept1);
    }

    return interceptors;
}

@Override
public void close() {

}

public static class Builder implements  Interceptor.Builder{

    @Override
    public Interceptor build() {
        return new LogTypeInterceptor();
    }

    @Override
    public void configure(Context context) {

    }
}

}

6)打包
拦截器打包之后,只需要单独包,不需要将依赖的包上传。打包之后要放入Flume的lib文件夹下面。
在这里插入图片描述

注意:为什么不需要依赖包?因为依赖包在flume的lib目录下面已经存在了。
7)需要先将打好的包放入到node01、node02、node03的/kkb/install/apache-flume-1.9.0-bin/lib文件夹下面。
在这里插入图片描述

2.4 日志采集Flume配置
1)Flume配置分析
在这里插入图片描述

Flume直接读log日志的数据,log日志的格式是app-yyyy-mm-dd.log。
2)Flume的具体配置如下:
(1)在node01和node02主机上/kkb/install/apache-flume-1.9.0-bin/conf目录下创建flume-client.conf文件
#flume-client.conf
a1.sources = r1
a1.sinks = k1
a1.channels = c1

配置source

a1.sources.r1.type = taildir
a1.sources.r1.positionFile = /kkb/datas/index/log_position.json
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1 = /kkb/datas/user_operate/app.+
a1.sources.r1.fileHeader = true
a1.sources.r1.channels = c1

#interceptor
a1.sources.r1.interceptors = i1 i2
a1.sources.r1.interceptors.i1.type = com.game.flume.etl.LogETLInterceptor B u i l d e r a 1. s o u r c e s . r 1. i n t e r c e p t o r s . i 2. t y p e = c o m . g a m e . f l u m e . e t l . L o g T y p e I n t e r c e p t o r Builder a1.sources.r1.interceptors.i2.type = com.game.flume.etl.LogTypeInterceptor Buildera1.sources.r1.interceptors.i2.type=com.game.flume.etl.LogTypeInterceptorBuilder

#配置channel
a1.channels.c1.type = file
#检查点文件目录
a1.channels.c1.checkpointDir=/kkb/datas/flume_checkpoint
#缓存数据文件夹
a1.channels.c1.dataDirs=/kkb/datas/flume_data

#配置sink
a1.sinks.k1.channel = c1
a1.sinks.k1.type = avro
#node3
a1.sinks.k1.hostname = 192.168.52.120
a1.sinks.k1.port = 4141
在文件配置如下内容

注意:
com.game.flume.etl.LogETLInterceptor和com.game.flume.etl.LogTypeInterceptor是自定义的拦截器的全类名。
需要根据用户自定义的拦截器做相应修改。
(2)在node03主机上/kkb/install/apache-flume-1.9.0-bin/conf目录下创建flume-hdfs.conf文件

Name the components on this agent

a1.sources = r1
a1.sinks = k1
a1.channels = c1

#配置source
a1.sources.r1.type = avro
a1.sources.r1.bind = 192.168.52.120
a1.sources.r1.port = 4141
a1.sources.r1.channels = c1

#配置channel
a1.channels.c1.type = file
#检查点文件目录
a1.channels.c1.checkpointDir=/kkb/bigdata/flume_checkpoint
#缓存数据文件夹
a1.channels.c1.dataDirs=/kkb/bigdata/flume_data

#配置sink
a1.sinks.k1.channel = c1
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://node01:8020/origin_data/game/log/%{topic}/%Y-%m-%d

a1.sinks.k1.hdfs.filePrefix = logevent-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = second

#不要产生大量小文件
a1.sinks.k1.hdfs.rollInterval = 10
a1.sinks.k1.hdfs.rollSize = 134217728
a1.sinks.k1.hdfs.rollCount = 1000
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.hdfs.minBlockReplicas=1
a1.sinks.k1.hdfs.fileType = CompressedStream
a1.sinks.k1.hdfs.codeC = lzop

配置hadoop支持LZO压缩
这里需要配置hadoop支持lzo压缩
修改hadoop集群中的每台服务器配置文件core-site.xml文件,然后重新启动hadoop集群
Node01执行以下命令修改core-site.xml
[hadoop@node01 hadoop]$ cd /kkb/install/hadoop-3.1.4/etc/hadoop
[hadoop@node01 hadoop]$ vim core-site.xml

io.compression.codecs org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.BZip2Codec, com.hadoop.compression.lzo.LzoCodec, com.hadoop.compression.lzo.LzopCodec io.compression.codec.lzo.class com.hadoop.compression.lzo.LzoCodec

将node01修改完成之后的core-site.xml配置文件同步到node02与node03服务器
[hadoop@node01 hadoop]$ cd /kkb/install/hadoop-3.1.4/etc/hadoop
[hadoop@node01 hadoop]$ scp core-site.xml node02: P W D [ h a d o o p @ n o d e 01 h a d o o p ] PWD [hadoop@node01 hadoop] PWD[hadoop@node01hadoop] scp core-site.xml node03:$PWD

所有节点添加lzo压缩支持的jar包
将课件资料当中的hadoop-lzo-0.4.20.jar 这个jar包拷贝到所有hadoop节点的common目录下
Node01上传hadoop-lzo、-0.4.20.jar这个jar包,然后将该jar包拷贝到node02和node03服务器
在这里插入图片描述

拷贝到node02和node03服务器
[hadoop@node01 common]$ scp hadoop-lzo-0.4.20.jar node02: P W D [ h a d o o p @ n o d e 01 c o m m o n ] PWD [hadoop@node01 common] PWD[hadoop@node01common] scp hadoop-lzo-0.4.20.jar node03:$PWD

重新启动hdfs集群
Node01执行以下命令重新启动hdfs集群
[hadoop@node01 hadoop-3.1.4]$ sbin/stop-dfs.sh
[hadoop@node01 hadoop-3.1.4]$ sbin/start-dfs.sh

更改flume的guava这个jar包的版本
由于flume1.9这个版本自带的guava的jar包 版本是guava-11.0.2.jar这个版本比较低,我们需要将三台机器的这个guava的jar包删掉,然后重新拷贝较新的guava的jar包
Node01执行以下命令删除三台机器的guava的旧版本
[hadoop@node01 hadoop-3.1.4]$ ssh node01 “rm -rf /kkb/install/apache-flume-1.9.0-bin/lib/guava-11.0.2.jar”
[hadoop@node01 hadoop-3.1.4]$ ssh node02 “rm -rf /kkb/install/apache-flume-1.9.0-bin/lib/guava-11.0.2.jar”
[hadoop@node01 hadoop-3.1.4]$ ssh node03 “rm -rf /kkb/install/apache-flume-1.9.0-bin/lib/guava-11.0.2.jar”

然后拷贝高版本的guava的jar包到三台机器的flume的lib目录下
Node01执行以下命令进行拷贝
[hadoop@node01 hadoop-3.1.4]$ scp /kkb/install/hadoop-3.1.4/share/hadoop/hdfs/lib/guava-27.0-jre.jar node02:/kkb/install/apache-flume-1.9.0-bin/lib/
guava-27.0-jre.jar
[hadoop@node01 hadoop-3.1.4]$ scp /kkb/install/hadoop-3.1.4/share/hadoop/hdfs/lib/guava-27.0-jre.jar node03:/kkb/install/apache-flume-1.9.0-bin/lib/
guava-27.0-jre.jar

2.5 日志采集Flume启动停止脚本
1)在node01服务器的/home/hadoop/bin目录下创建脚本flume.sh
[hadoop@node01 bin]$ vim flume_stop_start.sh

在脚本中填写如下内容
#!/bin/bash

case $1 in
“start” ){
for i in node03 node02 node01
do
echo “-----------启动 $i 采集flume-------------”
if [ “node03” = $i ];then
ssh $i "source /etc/profile; nohup /kkb/install/apache-flume-1.9.0-bin/bin/flume-ng agent -n a1 -c /kkb/install/apache-flume-1.9.0-bin/conf -f /kkb/install/apache-flume-1.9.0-bin/conf/flume-hdfs.conf -Dflume.root.logger=info,console > /dev/null 2>&1 & "
else
ssh $i "source /etc/profile; nohup /kkb/install/apache-flume-1.9.0-bin/bin/flume-ng agent -n a1 -c /kkb/install/apache-flume-1.9.0-bin/conf -f /kkb/install/apache-flume-1.9.0-bin/conf/flume-client.conf -Dflume.root.logger=info,console > /dev/null 2>&1 & "
fi
done
};;
“stop”){
for i in node03 node02 node01
do
echo “-----------停止 $i 采集flume-------------”
ssh $i “source /etc/profile; ps -ef | grep flume | grep -v grep |awk ‘{print $2}’ | xargs kill”
done
};;
esac
说明1:nohup,该命令可以在你退出帐户/关闭终端之后继续运行相应的进程。nohup就是不挂起的意思,不挂断地运行命令。
说明2:/dev/null代表linux的空设备文件,所有往这个文件里面写入的内容都会丢失,俗称“黑洞”。
标准输入0:从键盘获得输入 /proc/self/fd/0
标准输出1:输出到屏幕(即控制台) /proc/self/fd/1
错误输出2:输出到屏幕(即控制台) /proc/self/fd/2
2)增加脚本执行权限
[hadoop@node01 bin]$ chmod 777 flume_stop_start.sh

3)flume集群启动脚本
[hadoop@node01 bin]$ sh flume_stop_start.sh start

4)flume集群停止脚本
[hadoop@node01 bin]$ sh flume_stop_start.sh stop

8、数据仓库理论基础夯实
1、数据仓库为什么要分层

在这里插入图片描述

2、各个层次的主要作用
数据仓库常用的分层有三层,四层,甚至五层等,其中四层的分层方式最为常见,可以分为ODS层,DWD层,DWS层,APP层等。
以下是一个数仓的分层案例:
在这里插入图片描述
在这里插入图片描述

3、数据集市与数据仓库的基本概念
数据仓库:是一个集成的面向主题的数据集合,设计的目的是支持DSS(决策支持系统)的功能,在数据仓库里,每个数据单元都和特定的时间相关。数据仓库包括原子级别的数据和轻度汇总的数据。数据仓库是面向主题的、集成的、不可更新的(稳定性)、随时间不断变化(不同时间)的数据集合,用以支持经营管理中的决策制定过程。
不能将数据仓库简单地理解成一套软件,数据仓库是重建企业数据流和信息流的过程,在这个过程中,构造企业的决策支持环境,以区别原来的业务系统所构建的操作型环境。数据仓库的价值并不是你在仓库中所存储的数据量的多少,而关键在于从仓库中能够获得的信息和分析结果的质量。
数据集市:是一个小型的部门或工作组级别的数据仓库。有两种类型的数据集市——独立型和从属型。独立型数据集市直接从操作型环境获取数据。从属型数据集市从企业级数据仓库获取数据。从长远的角度看,从属型数据集市在体系结构上比独立型数据集市更稳定。
独立型数据集市的存在会给人造成一种错觉,似乎可以先独立地构建数据集市,当数据集市达到一定的规模可以直接转换为数据仓库,然而这是不正确的,多个独立的数据集市的累积并不能形成一个企业级的数据仓库,这是由数据仓库和数据集市本身的特点决定的。如果脱离集中式的数据仓库,独立的建立多个数据集市,企业只会又增加了一些信息孤岛,仍然不能以整个企业的视图分析数据,数据集市为各个部门或工作组所用,各个集市之间又会存在不一致性。当然,独立型数据集市是一种既成事实,为满足特定用户的需求而建立的一种分析型环境,但是,从长远的观点看,是一种权宜之计,必然会被企业级的数据仓库所取代。

数据仓库和数据集市之间的区别可以直观地用下图来展示
在这里插入图片描述
从图中可以看出,数据仓库中数据结构采用的规范化模式(关系数据库设计理论),数据集市的数据结构采用的星型模式(多维数据库设计理论)。数据仓库中数据的粒度比数据集市的细。上图只反映了数据结构和数据内容的两个特征,对于其他区别如下表所示,并且简单的以银行为例进行说明。

在这里插入图片描述

4、数仓命名规范
ODS层命名为ods_作为前缀进行区分
DWD层命名为dwd_作为前缀进行区分
DWS层命名为dws_作为前缀进行区分
ADS层命名为ads_作为前缀进行区分
临时表数据库命名为xxx_tmp_作为前缀进行区分
备份数据数据库命名为xxx_bak_作为前缀进行区分

9、数据仓库环境介绍准备
1 服务器集群运行服务规划
在这里插入图片描述
在这里插入图片描述

2 Hive sql使用注意点
1)运行MR时检查到用过多虚拟内存而被NodeManager杀死进程问题:
Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1546781144082_0005 failed 2 times due to AM Container for appattempt_1546781144082_0005_000002 exited with exitCode: -103
For more detailed output, check application tracking page:http://node01:8088/cluster/app/application_1546781144082_0005Then, click on links to logs of each attempt.
Diagnostics: Container [pid=11116,containerID=container_1546781144082_0005_02_000001] is running beyond virtual memory limits. Current usage: 216.3 MB of 1 GB physical memory used; 2.6 GB of 2.1 GB virtual memory used. Killing container.

这种问题是从机上运行的Container试图使用过多的内存,而被NodeManager kill掉了。
[摘录] The NodeManager is killing your container. It sounds like you are trying to use hadoop streaming which is running as a child process of the map-reduce task. The NodeManager monitors the entire process tree of the task and if it eats up more memory than the maximum set in mapreduce.map.memory.mb or mapreduce.reduce.memory.mb respectively, we would expect the Nodemanager to kill the task, otherwise your task is stealing memory belonging to other containers, which you don’t want.
解决方法:
方案一:或者是关掉虚拟内存检查。我们选这个,修改yarn-site.xml,修改后一定要分发,并重新启动hadoop集群。

yarn.nodemanager.vmem-check-enabled
false

方案二:mapred-site.xml中设置Map和Reduce任务的内存配置如下:(value中实际配置的内存需要根据自己机器内存大小及应用情况进行修改)

  mapreduce.map.memory.mb
  1536


  mapreduce.map.java.opts
  -Xmx1024M


  mapreduce.reduce.memory.mb
  3072


  mapreduce.reduce.java.opts
  -Xmx2560M

10、数据仓库建设之ODS层建设
在这里插入图片描述

原始数据层,存放原始数据,直接加载原始日志、数据,数据保持原貌不做处理。
1 创建数据库
1)node03执行以下命令创建game数据库
[hadoop@node03 apache-hive-3.1.2]$ cd /kkb/install/apache-hive-3.1.2/
[hadoop@node03 apache-hive-3.1.2]$ bin/hive
hive (default)> create database game;

说明:如果数据库存在且有数据,需要强制删除时执行:drop database game cascade;
2)使用game数据库
hive (default)> use game;

1 创建启动日志表ods_start_log
在这里插入图片描述

1)创建输入数据是lzo输出是text,支持json解析的分区表
hive (game)>
drop table if exists game.ods_start_log;
CREATE EXTERNAL TABLE game.ods_start_log (line string)
PARTITIONED BY (dt string)
STORED AS
INPUTFORMAT ‘com.hadoop.mapred.DeprecatedLzoTextInputFormat’
OUTPUTFORMAT ‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION ‘/user/hive/warehouse/game/ods/ods_start_log’;
说明Hive的LZO压缩:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO
2)加载数据
hive (game)> load data inpath ‘/origin_data/game/log/topic_start/2021-03-15’ into table game.ods_start_log partition(dt=‘2021-03-15’);

注意:时间格式都配置成YYYY-MM-DD格式,这是Hive默认支持的时间格式
3)查看是否加载成功
hive (game)> select * from game.ods_start_log limit 10;
2 创建事件日志表ods_event_log
在这里插入图片描述

1)创建输入数据是lzo输出是text,支持json解析的分区表
hive (game)>
drop table if exists game.ods_event_log;
CREATE EXTERNAL TABLE game.ods_event_log
(line string)
PARTITIONED BY (dt string)
STORED AS
INPUTFORMAT ‘com.hadoop.mapred.DeprecatedLzoTextInputFormat’
OUTPUTFORMAT ‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION ‘/user/hive/warehouse/game/ods/ods_event_log’;

2)加载数据
hive (game)>
load data inpath ‘/origin_data/game/log/topic_event/2021-03-15’ into table game.ods_event_log partition(dt=‘2021-03-15’);
注意:时间格式都配置成YYYY-MM-DD格式,这是Hive默认支持的时间格式
3)查看是否加载成功
hive (game)> select * from game.ods_event_log limit 2;
3 ODS层加载数据脚本
1)在node03 的/home/hadoop/bin目录下创建脚本
[hadoop@node03 bin]$ vim ods_game_start_log.sh
在脚本中编写如下内容
#!/bin/bash

定义变量方便修改

APP=game
hive=/kkb/install/apache-hive-3.1.2/bin/hive

如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天

if [ -n “$1” ] ;then
do_date=$1
else
do_date=date -d "-1 day" +%F
fi

echo "===日志日期为 d o d a t e = = = " s q l = " l o a d d a t a i n p a t h ′ / o r i g i n d a t a / g a m e w a r e h o u s e / l o g / t o p i c s t a r t / do_date===" sql=" load data inpath '/origin_data/game_warehouse/log/topic_start/ dodate==="sql="loaddatainpath/origindata/gamewarehouse/log/topicstart/do_date’ into table " A P P " . o d s s t a r t l o g p a r t i t i o n ( d t = ′ APP".ods_start_log partition(dt=' APP".odsstartlogpartition(dt=do_date’);

load data inpath ‘/origin_data/game_warehouse/log/topic_event/ d o d a t e ′ i n t o t a b l e " do_date' into table " dodateintotable"APP".ods_event_log partition(dt=’$do_date’);
"

h i v e − e " hive -e " hivee"sql"

说明1:
[ -n 变量值 ] 判断变量的值,是否为空
– 变量的值,非空,返回true
– 变量的值,为空,返回false
说明2:
查看date命令的使用,
[hadoop@node03 ~]$ date --help

2)增加脚本执行权限
[hadoop@node03 bin]$ chmod 777 ods_game_start_log
3)脚本使用
[hadoop@node03 bin]$ ./ods_game_start_log 2021-03-16
4)查看导入数据
hive (game)>
select * from game.ods_start_log where dt=‘2021-03-16’ limit 2;
select * from game.ods_event_log where dt=‘2021-03-16’ limit 2;

5)脚本执行时间
企业开发中一般在每日凌晨30分~1点

11、数据仓库建设之DWD层建设
DWD层的数据都是从ODS层而来,在DWD层,主要需要对ODS层的数据进行进一步的处理,去除空置,脏数据,超过极限范围的数据,行式存储改为列式存储,更改压缩格式等。
1、DWD层启动表数据解析
通过解析ODS层的启动日志数据,存放到DWD层来
在这里插入图片描述

1.1、创建DWD层的启动表
1)建表语句
hive (game)> drop table if exists game.dwd_start_log;
hive (game)> CREATE EXTERNAL TABLE game.dwd_start_log(
action string,
channel_num string,
detail string,
en string,
entry string,
extend1 string,
lat string,
lng string,
loading_time string,
log_time string,
mac_id string,
mobile_brand string,
mobile_type string,
net_type string,
open_ad_type string,
operate_version string,
screen_size string,
sdk_version string,
sys_lag string,
user_id string,
user_ip string,
version_name string,
version_num string
)
PARTITIONED BY (dt string)
location ‘/user/hive/warehouse/game/dwd/dwd_start_log/’;

1.2、向DWD层的启动表导入数据
开启hive本地模式以加快hive查询速度
set hive.exec.mode.local.auto=true;
向dwd层插入数据
hive (game)>
insert overwrite table game.dwd_start_log
PARTITION (dt=‘2021-03-15’)
select
get_json_object(line,’ . a c t i o n ′ ) a c t i o n , g e t j s o n o b j e c t ( l i n e , ′ .action') action, get_json_object(line,' .action)action,getjsonobject(line,.channel_num’) channel_num,
get_json_object(line,’ . d e t a i l ′ ) d e t a i l , g e t j s o n o b j e c t ( l i n e , ′ .detail') detail, get_json_object(line,' .detail)detail,getjsonobject(line,.en’) en,
get_json_object(line,’ . e n t r y ′ ) e n t r y , g e t j s o n o b j e c t ( l i n e , ′ .entry') entry, get_json_object(line,' .entry)entry,getjsonobject(line,.extend1’) extend1,
get_json_object(line,’ . l a t ′ ) l a t , g e t j s o n o b j e c t ( l i n e , ′ .lat') lat, get_json_object(line,' .lat)lat,getjsonobject(line,.lng’) lng,
get_json_object(line,’ . l o a d i n g t i m e ′ ) l o a d i n g t i m e , g e t j s o n o b j e c t ( l i n e , ′ .loading_time') loading_time, get_json_object(line,' .loadingtime)loadingtime,getjsonobject(line,.log_time’) log_time,
get_json_object(line,’ . m a c i d ′ ) m a c i d , g e t j s o n o b j e c t ( l i n e , ′ .mac_id') mac_id, get_json_object(line,' .macid)macid,getjsonobject(line,.mobile_brand’) mobile_brand,
get_json_object(line,’ . m o b i l e t y p e ′ ) m o b i l e t y p e , g e t j s o n o b j e c t ( l i n e , ′ .mobile_type') mobile_type, get_json_object(line,' .mobiletype)mobiletype,getjsonobject(line,.net_type’) net_type,
get_json_object(line,’ . o p e n a d t y p e ′ ) o p e n a d t y p e , g e t j s o n o b j e c t ( l i n e , ′ .open_ad_type') open_ad_type, get_json_object(line,' .openadtype)openadtype,getjsonobject(line,.operate_version’) operate_version,
get_json_object(line,’ . s c r e e n s i z e ′ ) s c r e e n s i z e , g e t j s o n o b j e c t ( l i n e , ′ .screen_size') screen_size, get_json_object(line,' .screensize)screensize,getjsonobject(line,.sdk_version’) sdk_version,
get_json_object(line,’ . s y s l a g ′ ) s y s l a g , g e t j s o n o b j e c t ( l i n e , ′ .sys_lag') sys_lag, get_json_object(line,' .syslag)syslag,getjsonobject(line,.user_id’) user_id,
get_json_object(line,’ . u s e r i p ′ ) u s e r i p , g e t j s o n o b j e c t ( l i n e , ′ .user_ip') user_ip, get_json_object(line,' .userip)userip,getjsonobject(line,.version_name’) version_name,
get_json_object(line,’$.version_num’) version_num
from game.ods_start_log
where dt=‘2021-03-15’;

验证数据插入成功
jdbc:hive2://node03:10000> select * from game.dwd_start_log limit 2;

1.3、DWD层启动表加载数据脚本
1)在node03服务器的/home/hadoop/bin目录下创建数据加载脚本
[hadoop@node03 bin]$ vim dwd_game_start_log.sh
定义内容如下
#!/bin/bash

定义变量方便修改

APP=game
hive=/kkb/install/apache-hive-3.1.2/bin/hive

如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天

if [ -n “$1” ] ;then
do_date=$1
else
do_date=date -d "-1 day" +%F
fi

sql="
insert overwrite table “ A P P " . d w d s t a r t l o g P A R T I T I O N ( d t = ′ APP".dwd_start_log PARTITION (dt=' APP".dwdstartlogPARTITION(dt=do_date’)
select
get_json_object(line,’ . a c t i o n ′ ) a c t i o n , g e t j s o n o b j e c t ( l i n e , ′ .action') action, get_json_object(line,' .action)action,getjsonobject(line,.channel_num’) channel_num,
get_json_object(line,’ . d e t a i l ′ ) d e t a i l , g e t j s o n o b j e c t ( l i n e , ′ .detail') detail, get_json_object(line,' .detail)detail,getjsonobject(line,.en’) en,
get_json_object(line,’ . e n t r y ′ ) e n t r y , g e t j s o n o b j e c t ( l i n e , ′ .entry') entry, get_json_object(line,' .entry)entry,getjsonobject(line,.extend1’) extend1,
get_json_object(line,’ . l a t ′ ) l a t , g e t j s o n o b j e c t ( l i n e , ′ .lat') lat, get_json_object(line,' .lat)lat,getjsonobject(line,.lng’) lng,
get_json_object(line,’ . l o a d i n g t i m e ′ ) l o a d i n g t i m e , g e t j s o n o b j e c t ( l i n e , ′ .loading_time') loading_time, get_json_object(line,' .loadingtime)loadingtime,getjsonobject(line,.log_time’) log_time,
get_json_object(line,’ . m a c i d ′ ) m a c i d , g e t j s o n o b j e c t ( l i n e , ′ .mac_id') mac_id, get_json_object(line,' .macid)macid,getjsonobject(line,.mobile_brand’) mobile_brand,
get_json_object(line,’ . m o b i l e t y p e ′ ) m o b i l e t y p e , g e t j s o n o b j e c t ( l i n e , ′ .mobile_type') mobile_type, get_json_object(line,' .mobiletype)mobiletype,getjsonobject(line,.net_type’) net_type,
get_json_object(line,’ . o p e n a d t y p e ′ ) o p e n a d t y p e , g e t j s o n o b j e c t ( l i n e , ′ .open_ad_type') open_ad_type, get_json_object(line,' .openadtype)openadtype,getjsonobject(line,.operate_version’) operate_version,
get_json_object(line,’ . s c r e e n s i z e ′ ) s c r e e n s i z e , g e t j s o n o b j e c t ( l i n e , ′ .screen_size') screen_size, get_json_object(line,' .screensize)screensize,getjsonobject(line,.sdk_version’) sdk_version,
get_json_object(line,’ . s y s l a g ′ ) s y s l a g , g e t j s o n o b j e c t ( l i n e , ′ .sys_lag') sys_lag, get_json_object(line,' .syslag)syslag,getjsonobject(line,.user_id’) user_id,
get_json_object(line,’ . u s e r i p ′ ) u s e r i p , g e t j s o n o b j e c t ( l i n e , ′ .user_ip') user_ip, get_json_object(line,' .userip)userip,getjsonobject(line,.version_name’) version_name,
get_json_object(line,' . v e r s i o n n u m ′ ) v e r s i o n n u m f r o m " .version_num') version_num from " .versionnum)versionnumfrom"APP”.ods_start_log
where dt=’$do_date’;
"

h i v e − e " hive -e " hivee"sql"

2)增加脚本执行权限
[hadoop@node03 bin]$ chmod 777 dwd_game_start_log.sh

3)脚本使用
[hadoop@node03 bin]$ ./dwd_game_start_log.sh 2021-03-16
4)查询导入结果
hive (game)>
select * from game.dwd_start_log where dt=‘2021-03-16’ limit 2;

5)脚本执行时间
企业开发中一般在每日凌晨30分~1点
2、DWD层事件表数据解析
1、创建基础明细解析
明细表用于存储ODS层原始表转换过来的明细数据。
1)创建事件日志基础明细表
hive (game)>
drop table if exists game.dwd_base_event_log;
CREATE EXTERNAL TABLE game.dwd_base_event_log(
channel_num String,
lat String,
lng String,
log_time String,
mobile_brand String,
mobile_type String,
net_type String,
operate_version String,
screen_size String,
sdk_version String,
sys_lag String,
user_id String,
user_ip String,
version_name String,
version_num String ,
event_name String,
event_json String,
server_time String
)
partitioned by (dt String)
stored as parquet
location’/user/hive/warehouse/game/dwd/dwd_base_event_log’;
2)说明:其中event_name和event_json用来对应事件名和整个事件。这个地方将原始日志1对多的形式拆分出来了。操作的时候我们需要将原始日志展平,需要用到UDF和UDTF。
2 自定义UDF函数(解析公共字段)
1)创建一个maven工程:hive_functions
2)创建包名:com.game.udfs
3)在pom.xml文件中添加如下内容

<project.build.sourceEncoding>UTF8</project.build.sourceEncoding>
<hive.version>3.1.2</hive.version>

org.apache.hive hive-exec ${hive.version} maven-compiler-plugin 2.3.2 1.8 1.8 maven-assembly-plugin jar-with-dependencies make-assembly package single

4)UDF用于解析公共字段

package com.game.udfs;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.json.JSONException;
import org.json.JSONObject;

public class CommonBaseFields extends UDF {

public String evaluate(String line,String jsonkeysString){

    // 0 准备一个sb
    StringBuilder sb = new StringBuilder();

    // 1 切割jsonkeys  mid uid vc vn l sr os ar md
    String[] jsonkeys = jsonkeysString.split(",");

    // 2 处理line   服务器时间 | json
    String[] logContents = line.split("\\|");

    // 3 合法性校验
    if (logContents.length != 2 || StringUtils.isBlank(logContents[1])) {
        return "";
    }

    // 4 开始处理json
    try {
        JSONObject jsonObject = new JSONObject(logContents[1]);

        // 获取common_base里面的对象
        JSONObject base = jsonObject.getJSONObject("common_base");

        // 循环遍历取值
        for (int i = 0; i < jsonkeys.length; i++) {
            String filedName = jsonkeys[i].trim();

            if (base.has(filedName)) {
                sb.append(base.getString(filedName)).append("\t");
            } else {
                sb.append("\t");
            }
        }

        sb.append(jsonObject.getString("event_array")).append("\t");
        sb.append(logContents[0]).append("\t");
    } catch (JSONException e) {
        e.printStackTrace();
    }

    return sb.toString();

}



public static void main(String[] args) {
    String line = "1615759656072|{\"log_type\":\"app\",\"event_array\":[{\"event_name\":\"move\",\"event_json\":{\"move_menu\":\"1\",\"end_offset\":\"33\",\"start_offset\":\"10\",\"move_direct\":\"1\"},\"event_time\":1615769954087},{\"event_name\":\"click\",\"event_json\":{\"click_type\":\"3\",\"type_name\":\"下订单\",\"click_entry\":\"2\",\"click_request_url\":\"https://www.9game.cn/top/second/third/%E4%B8%8B%E8%AE%A2%E5%8D%95/detail-DB20210311154708-96544.html\"},\"event_time\":1615769958364},{\"event_name\":\"search\",\"event_json\":{\"search_code\":\"1\",\"search_time\":\"1615769974563\"},\"event_time\":1615769951474},{\"event_name\":\"cycleAdd\",\"event_json\":{\"cycle_id\":\"208\",\"cycleOperateTime\":1615769964820,\"cycle_operate_type\":\"1\"},\"event_time\":1615769959420},{\"event_name\":\"sendVideo\",\"event_json\":{\"video_time\":\"1615769969987\",\"video_name\":\"爱上一个《梦想仙侠》 如看一部五星电影大片\",\"video_user_id\":\"244\",\"video_long\":222,\"video_id\":\"7897\",\"video_success_time\":\"1615769982375\",\"video_type\":\"24\"},\"event_time\":1615769969946},{\"event_name\":\"articleShare\",\"event_json\":{\"shareCycle\":\"0\",\"shareTime\":\"1615769960809\",\"shareArticleId\":\"940469\",\"shareType\":\"1\",\"shareUserId\":244},\"event_time\":1615769975397}],\"common_base\":{\"channel_num\":\"4\",\"lat\":\"21.4\",\"lng\":\"-104.1\",\"log_time\":\"1615765320027\",\"mobile_brand\":\"appale\",\"mobile_type\":\"appale10\",\"net_type\":\"5G\",\"operate_version\":\"ios11\",\"screen_size\":\"1280*720\",\"sdk_version\":\"sdk1.8.2\",\"sys_lag\":\"简体中文\",\"user_id\":\"244\",\"user_ip\":\"123.234.22.204\",\"version_name\":\"第三个版本2017-09-22\",\"version_num\":\"3\"}}";
    String x = new CommonBaseFields().evaluate(line, "channel_num,lat,lng,log_time,mobile_brand,mobile_type,net_type,operate_version,screen_size,sdk_version,sys_lag,user_id,user_ip,version_name,version_num");
    System.out.println(x);
}

}

注意:使用main函数主要用于模拟数据测试。
3 自定义UDTF函数(解析具体事件字段)

1)创建包名:com.game.udtfs
2)在com.game.udtfs包下创建类名:EventJsonUDTF
3)用于展开业务字段
package com.game.udtfs;

import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import org.json.JSONArray;
import org.json.JSONException;

import java.util.ArrayList;

public class EventUDTF extends GenericUDTF {

//该方法中,我们将指定输出参数的名称和参数类型:
@Override
public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException {

    ArrayList<String> fieldNames = new ArrayList<String>();
    ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();

    fieldNames.add("event_name");
    fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
    fieldNames.add("event_json");
    fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);

    return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs);
}

//输入1条记录,输出若干条结果
@Override
public void process(Object[] objects) throws HiveException {

    // 获取传入的et
    String input = objects[0].toString();

    // 如果传进来的数据为空,直接返回过滤掉该数据
    if (StringUtils.isBlank(input)) {
        return;
    } else {

        try {
            // 获取一共有几个事件(ad/facoriters)
            JSONArray ja = new JSONArray(input);

            if (ja == null)
                return;

            // 循环遍历每一个事件
            for (int i = 0; i < ja.length(); i++) {
                String[] result = new String[2];

                try {
                    // 取出每个的事件名称(ad/facoriters)
                    result[0] = ja.getJSONObject(i).getString("event_name");

                    // 取出每一个事件整体
                    result[1] = ja.getString(i);
                } catch (JSONException e) {
                    continue;
                }

                // 将结果返回
                forward(result);
            }
        } catch (JSONException e) {
            e.printStackTrace();
        }
    }
}

//当没有记录处理的时候该方法会被调用,用来清理代码或者产生额外的输出
@Override
public void close() throws HiveException {

}

}

2)打包

3)将hive_functions-1.0-SNAPSHOT.jar上传到node03的/kkb/install/apache-hive-3.1.2/lib
4)将jar包添加到Hive的classpath
hive (game)>
add jar /kkb/install/apache-hive-3.1.2/lib/hive_functions-1.0-SNAPSHOT.jar;

5)创建临时函数与开发好的java class关联
hive (game)>
create temporary function common_base_analizer as ‘com.game.udfs.CommonBaseFields’;

create temporary function event_analizer as ‘com.game.udtfs.EventUDTF’;

4 解析事件日志基础明细表
1)解析事件日志基础明细表
hive (game)>
insert overwrite table game.dwd_base_event_log
PARTITION (dt=‘2021-03-15’)
select
channel_num,
lat ,
lng,
log_time,
mobile_brand,
mobile_type,
net_type,
operate_version,
screen_size,
sdk_version,
sys_lag,
user_id,
user_ip,
version_name,
version_num,
event_name_type,
event_json_array,
server_time
from (
select
split(common_base_analizer(line,‘channel_num,lat,lng,log_time,mobile_brand,mobile_type,net_type,operate_version,screen_size,sdk_version,sys_lag ,user_id,user_ip,version_name,version_num’),’\t’)[0] as channel_num,
split(common_base_analizer(line,‘channel_num,lat,lng,log_time,mobile_brand,mobile_type,net_type,operate_version,screen_size,sdk_version,sys_lag ,user_id,user_ip,version_name,version_num’),’\t’)[1] as lat,
split(common_base_analizer(line,‘channel_num,lat,lng,log_time,mobile_brand,mobile_type,net_type,operate_version,screen_size,sdk_version,sys_lag ,user_id,user_ip,version_name,version_num’),’\t’)[2] as lng,
split(common_base_analizer(line,‘channel_num,lat,lng,log_time,mobile_brand,mobile_type,net_type,operate_version,screen_size,sdk_version,sys_lag ,user_id,user_ip,version_name,version_num’),’\t’)[3] as log_time,
split(common_base_analizer(line,‘channel_num,lat,lng,log_time,mobile_brand,mobile_type,net_type,operate_version,screen_size,sdk_version,sys_lag ,user_id,user_ip,version_name,version_num’),’\t’)[4] as mobile_brand,
split(common_base_analizer(line,‘channel_num,lat,lng,log_time,mobile_brand,mobile_type,net_type,operate_version,screen_size,sdk_version,sys_lag ,user_id,user_ip,version_name,version_num’),’\t’)[5] as mobile_type,
split(common_base_analizer(line,‘channel_num,lat,lng,log_time,mobile_brand,mobile_type,net_type,operate_version,screen_size,sdk_version,sys_lag ,user_id,user_ip,version_name,version_num’),’\t’)[6] as net_type,
split(common_base_analizer(line,‘channel_num,lat,lng,log_time,mobile_brand,mobile_type,net_type,operate_version,screen_size,sdk_version,sys_lag ,user_id,user_ip,version_name,version_num’),’\t’)[7] as operate_version,
split(common_base_analizer(line,‘channel_num,lat,lng,log_time,mobile_brand,mobile_type,net_type,operate_version,screen_size,sdk_version,sys_lag ,user_id,user_ip,version_name,version_num’),’\t’)[8] as screen_size,
split(common_base_analizer(line,‘channel_num,lat,lng,log_time,mobile_brand,mobile_type,net_type,operate_version,screen_size,sdk_version,sys_lag ,user_id,user_ip,version_name,version_num’),’\t’)[9] as sdk_version,
split(common_base_analizer(line,‘channel_num,lat,lng,log_time,mobile_brand,mobile_type,net_type,operate_version,screen_size,sdk_version,sys_lag ,user_id,user_ip,version_name,version_num’),’\t’)[10] as sys_lag,
split(common_base_analizer(line,‘channel_num,lat,lng,log_time,mobile_brand,mobile_type,net_type,operate_version,screen_size,sdk_version,sys_lag ,user_id,user_ip,version_name,version_num’),’\t’)[11] as user_id,
split(common_base_analizer(line,‘channel_num,lat,lng,log_time,mobile_brand,mobile_type,net_type,operate_version,screen_size,sdk_version,sys_lag ,user_id,user_ip,version_name,version_num’),’\t’)[12] as user_ip,
split(common_base_analizer(line,‘channel_num,lat,lng,log_time,mobile_brand,mobile_type,net_type,operate_version,screen_size,sdk_version,sys_lag ,user_id,user_ip,version_name,version_num’),’\t’)[13] as version_name,
split(common_base_analizer(line,‘channel_num,lat,lng,log_time,mobile_brand,mobile_type,net_type,operate_version,screen_size,sdk_version,sys_lag ,user_id,user_ip,version_name,version_num’),’\t’)[14] as version_num,
split(common_base_analizer(line,‘channel_num,lat,lng,log_time,mobile_brand,mobile_type,net_type,operate_version,screen_size,sdk_version,sys_lag ,user_id,user_ip,version_name,version_num’),’\t’)[15] as event_json,
split(common_base_analizer(line,‘channel_num,lat,lng,log_time,mobile_brand,mobile_type,net_type,operate_version,screen_size,sdk_version,sys_lag ,user_id,user_ip,version_name,version_num’),’\t’)[16] as server_time
from game.ods_event_log where dt=‘2021-03-15’ and common_base_analizer(line,‘channel_num,lat,lng,log_time,mobile_brand,mobile_type,net_type,operate_version,screen_size,sdk_version,sys_lag ,user_id,user_ip,version_name,version_num’) <> ‘’
) sdk_log lateral view event_analizer(event_json) tmp_k as event_name_type, event_json_array;
2)测试
hive (game)> select * from game.dwd_base_event_log where dt=‘2021-03-15’ limit 10;
5 DWD层数据解析脚本
1)在node03的/home/hadoop/bin目录下创建脚本
[hadoop@node03 bin]$ vim dwd_game_base_event_log.sh

在脚本中编写如下内容
#!/bin/bash

定义变量方便修改

APP=game
hive=/kkb/install/apache-hive-3.1.2/bin/hive

如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天

if [ -n “$1” ] ;then
do_date=$1
else
do_date=date -d "-1 day" +%F
fi

sql="
add jar /kkb/install/apache-hive-3.1.2/lib/hive_functions-1.0-SNAPSHOT.jar;

create temporary function common_base_analizer as 'com.game.udfs.CommonBaseFields';
create temporary function event_analizer as 'com.game.udtfs.EventUDTF';

insert overwrite table game.dwd_base_event_log
PARTITION (dt=’KaTeX parse error: Undefined control sequence: \t at position 458: …version_num'),'\̲t̲')[0] as chan…{do_date}’ and common_base_analizer(line,‘channel_num,lat,lng,log_time,mobile_brand,mobile_type,net_type,operate_version,screen_size,sdk_version,sys_lag ,user_id,user_ip,version_name,version_num’) <> ‘’
) sdk_log lateral view event_analizer(event_json) tmp_k as event_name_type, event_json_array;
"

h i v e − e " hive -e " hivee"sql"

2)增加脚本执行权限
[hadoop@node03 bin]$ chmod 777 dwd_game_base_event_log.sh
3)脚本使用
[hadoop@node03 bin]$ ./dwd_game_base_event_log.sh 2021-03-16

4)查询导入结果
hive (game)> select * from game.dwd_base_event_log where dt=‘2021-03-16’ limit 10;
5)脚本执行时间
企业开发中一般在每日凌晨30分~1点

3、DWD层事件表解析

1 解析公共基础表出来到滑动日志表move
字段名称 含义
move_direct 客户端滑动方向 1:向下滑动 2:向上滑动
start_offset 滑动的起始offset
end_offset 滑动的结束offset
move_menu 哪个菜单界面滑动的 0:首页 :1:圈子 :2:新奇

1)建表语句
hive (game)>
drop table if exists game.dwd_move_log;
CREATE EXTERNAL TABLE game.dwd_move_log(
channel_num String ,
lat String ,
lng String ,
log_time String ,
mobile_brand String ,
mobile_type String ,
net_type String ,
operate_version String ,
screen_size String ,
sdk_version String ,
sys_lag String ,
user_id String ,
user_ip String ,
version_name String ,
version_num String ,
move_direct String ,
start_offset String ,
end_offset String ,
move_menu String ,
server_time String )
PARTITIONED BY (dt string)
location ‘/user/hive/warehouse/game/dwd/dwd_move_log/’;

2)导入数据
hive (game)>
insert overwrite table game.dwd_move_log
PARTITION (dt=‘2021-03-15’)
select
channel_num ,
lat ,
lng ,
log_time ,
mobile_brand ,
mobile_type ,
net_type ,
operate_version ,
screen_size ,
sdk_version ,
sys_lag ,
user_id ,
user_ip ,
version_name ,
version_num ,
get_json_object(event_json,’ . e v e n t j s o n . m o v e d i r e c t ′ ) m o v e d i r e c t , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.move_direct') move_direct, get_json_object(event_json,' .eventjson.movedirect)movedirect,getjsonobject(eventjson,.event_json.start_offset’) start_offset,
get_json_object(event_json,’ . e v e n t j s o n . e n d o f f s e t ′ ) e n d o f f s e t , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.end_offset') end_offset, get_json_object(event_json,' .eventjson.endoffset)endoffset,getjsonobject(eventjson,.event_json.move_menu’) move_menu,
server_time
from game.dwd_base_event_log
where dt=‘2021-03-15’ and event_name=‘move’;
3)测试
hive (game)> select * from game.dwd_move_log where dt=‘2021-03-15’ limit 2;

2 点击操作日志click
字段名称 含义
click_type 点击的分类,一级分类还是二级分类:1.一级分类 2.二级分类 3.三级分类 4.四级分类
type_name 一级分类名称 或者二级分类名称 或者三级分类名称 或者四级分类名称

click_request_url 点击之后请求的URL地址,一级分类没有请求地址,二级分类有请求URL地址

click_time 点击时间

click_entry 页面入口来源:1:首页浏览点击进入 2: 消息推送点击进入 3:新奇界面点击进入 4:首页推荐点击进入

1)建表语句
hive (game)>
drop table if exists game.dwd_click_log;
CREATE EXTERNAL TABLE game.dwd_click_log(
channel_num String ,
lat String ,
lng String ,
log_time String ,
mobile_brand String ,
mobile_type String ,
net_type String ,
operate_version String ,
screen_size String ,
sdk_version String ,
sys_lag String ,
user_id String ,
user_ip String ,
version_name String ,
version_num String ,
click_type String ,
type_name String ,
click_request_url String ,
click_time String ,
click_entry String ,
server_time String )
PARTITIONED BY (dt string)
location ‘/user/hive/warehouse/game/dwd/dwd_click_log/’;

2)导入数据
hive (game)>
insert overwrite table game.dwd_click_log
PARTITION (dt=‘2021-03-15’)
select
channel_num ,
lat ,
lng ,
log_time ,
mobile_brand ,
mobile_type ,
net_type ,
operate_version ,
screen_size ,
sdk_version ,
sys_lag ,
user_id ,
user_ip ,
version_name ,
version_num ,
get_json_object(event_json,’ . e v e n t j s o n . c l i c k t y p e ′ ) c l i c k t y p e , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.click_type') click_type, get_json_object(event_json,' .eventjson.clicktype)clicktype,getjsonobject(eventjson,.event_json.type_name’) type_name,
get_json_object(event_json,’ . e v e n t j s o n . c l i c k r e q u e s t u r l ′ ) c l i c k r e q u e s t u r l , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.click_request_url') click_request_url, get_json_object(event_json,' .eventjson.clickrequesturl)clickrequesturl,getjsonobject(eventjson,.event_json.click_time’) click_time,
get_json_object(event_json,’$.event_json.click_entry’) click_entry,
server_time
from game.dwd_base_event_log
where dt=‘2021-03-15’ and event_name=‘click’;
3)测试
hive (game)> select * from game.dwd_click_log where dt=‘2021-03-15’ limit 2;

3 搜索操作日志search
字段名称 含义
search_key
搜索关键字

search_time
搜索开始时间

search_code
搜索的结果 0:搜索失败 1:搜索成功

1)建表语句
hive (game)>
drop table if exists game.dwd_search_log;
CREATE EXTERNAL TABLE game.dwd_search_log(
channel_num String ,
lat String ,
lng String ,
log_time String ,
mobile_brand String ,
mobile_type String ,
net_type String ,
operate_version String ,
screen_size String ,
sdk_version String ,
sys_lag String ,
user_id String ,
user_ip String ,
version_name String ,
version_num String ,
search_key String ,
search_time String ,
search_code String ,
server_time String )
PARTITIONED BY (dt string)
location ‘/user/hive/warehouse/game/dwd/dwd_search_log/’;

2)导入数据
hive (game)>
insert overwrite table game.dwd_search_log
PARTITION (dt=‘2021-03-15’)
select
channel_num ,
lat ,
lng ,
log_time ,
mobile_brand ,
mobile_type ,
net_type ,
operate_version ,
screen_size ,
sdk_version ,
sys_lag ,
user_id ,
user_ip ,
version_name ,
version_num ,
get_json_object(event_json,’ . e v e n t j s o n . s e a r c h k e y ′ ) s e a r c h k e y , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.search_key') search_key, get_json_object(event_json,' .eventjson.searchkey)searchkey,getjsonobject(eventjson,.event_json.search_time’) search_time,
get_json_object(event_json,’$.event_json.search_code’) search_code,
server_time
from game.dwd_base_event_log
where dt=‘2021-03-15’ and event_name=‘search’;
3)测试
hive (game)> select * from game.dwd_search_log where dt=‘2021-03-15’ limit 2;

4 订阅操作日志subscribe
字段名称 含义
sub_type
订阅类型 0:取消订阅 1:订阅用户
sub_user_id
取消或者订阅用户id

sub_user_name
订阅用户名称

sub_user_discuss
订阅的频道多少用户在讨论

sub_goods_comment
订阅视频评论数

sub_goods_like
订阅商品的点赞数量

1)建表语句
hive (game)>
drop table if exists game.dwd_subscribe_log;
CREATE EXTERNAL TABLE game.dwd_subscribe_log(
channel_num String ,
lat String ,
lng String ,
log_time String ,
mobile_brand String ,
mobile_type String ,
net_type String ,
operate_version String ,
screen_size String ,
sdk_version String ,
sys_lag String ,
user_id String ,
user_ip String ,
version_name String ,
version_num String ,
sub_type String ,
sub_user_id String ,
sub_user_name String ,
sub_user_discuss String ,
sub_goods_comment String ,
sub_goods_like String ,
server_time String )
PARTITIONED BY (dt string)
location ‘/user/hive/warehouse/game/dwd/dwd_subscribe_log/’;

2)导入数据
hive (game)>
insert overwrite table game.dwd_subscribe_log
PARTITION (dt=‘2021-03-15’)
select
channel_num ,
lat ,
lng ,
log_time ,
mobile_brand ,
mobile_type ,
net_type ,
operate_version ,
screen_size ,
sdk_version ,
sys_lag ,
user_id ,
user_ip ,
version_name ,
version_num ,
get_json_object(event_json,’ . e v e n t j s o n . s u b t y p e ′ ) s u b t y p e , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.sub_type') sub_type, get_json_object(event_json,' .eventjson.subtype)subtype,getjsonobject(eventjson,.event_json.sub_user_id’) sub_user_id,
get_json_object(event_json,’ . e v e n t j s o n . s u b u s e r n a m e ′ ) s u b u s e r n a m e , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.sub_user_name') sub_user_name, get_json_object(event_json,' .eventjson.subusername)subusername,getjsonobject(eventjson,.event_json.sub_user_discuss’) sub_user_discuss,
get_json_object(event_json,’ . e v e n t j s o n . s u b g o o d s c o m m e n t ′ ) s u b g o o d s c o m m e n t , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.sub_goods_comment') sub_goods_comment, get_json_object(event_json,' .eventjson.subgoodscomment)subgoodscomment,getjsonobject(eventjson,.event_json.sub_goods_like’) sub_goods_like,
server_time
from game.dwd_base_event_log
where dt=‘2021-03-15’ and event_name=‘subscribe’;
3)测试
hive (game)> select * from game.dwd_subscribe_log where dt=‘2021-03-15’ limit 2;
5 加入圈子操作日志cycleAdd
字段名称 含义
cycle_operate_type
圈子操作类型 0:取消加入圈子 1:加入圈子

cycle_id
圈子id

cycleOperateTime
加入圈子时间

1)建表语句
hive (game)>
drop table if exists game.dwd_cycleAdd_log;
CREATE EXTERNAL TABLE game.dwd_cycleAdd_log(
channel_num String ,
lat String ,
lng String ,
log_time String ,
mobile_brand String ,
mobile_type String ,
net_type String ,
operate_version String ,
screen_size String ,
sdk_version String ,
sys_lag String ,
user_id String ,
user_ip String ,
version_name String ,
version_num String ,
cycle_operate_type String ,
cycle_id String ,
cycleOperateTime String ,
server_time String )
PARTITIONED BY (dt string)
location ‘/user/hive/warehouse/game/dwd/dwd_cycleAdd_log/’;

2)导入数据
hive (game)>
insert overwrite table game.dwd_cycleAdd_log
PARTITION (dt=‘2021-03-15’)
select
channel_num ,
lat ,
lng ,
log_time ,
mobile_brand ,
mobile_type ,
net_type ,
operate_version ,
screen_size ,
sdk_version ,
sys_lag ,
user_id ,
user_ip ,
version_name ,
version_num ,
get_json_object(event_json,’ . e v e n t j s o n . c y c l e o p e r a t e t y p e ′ ) c y c l e o p e r a t e t y p e , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.cycle_operate_type') cycle_operate_type, get_json_object(event_json,' .eventjson.cycleoperatetype)cycleoperatetype,getjsonobject(eventjson,.event_json.cycle_id’) cycle_id,
get_json_object(event_json,’$.event_json.cycleOperateTime’) cycleOperateTime,
server_time
from game.dwd_base_event_log
where dt=‘2021-03-15’ and event_name=‘cycleAdd’;
3)测试
hive (game)> select * from game.dwd_cycleAdd_log where dt=‘2021-03-15’ limit 2;

6 发视频操作日志sendVideo
字段名称 含义
video_id 视频id
video_name 视频名称
video_long 视频时长
video_time 开始发送视频时间
video_type 视频发送状态 0:开始发送日志记录 1:发送成功日志记录
video_success_time 视频发送成功记录时间
video_user_id 视频关联用户id
1)建表语句
hive (game)>
drop table if exists game.dwd_sendVideo_log;
CREATE EXTERNAL TABLE game.dwd_sendVideo_log(
channel_num String ,
lat String ,
lng String ,
log_time String ,
mobile_brand String ,
mobile_type String ,
net_type String ,
operate_version String ,
screen_size String ,
sdk_version String ,
sys_lag String ,
user_id String ,
user_ip String ,
version_name String ,
version_num String ,
video_id String ,
video_name String ,
video_long String ,
video_time String,
video_type String,
video_success_time String,
video_user_id String,
server_time String )
PARTITIONED BY (dt string)
location ‘/user/hive/warehouse/game/dwd/dwd_sendVideo_log/’;

2)导入数据
hive (game)>
insert overwrite table game.dwd_sendVideo_log
PARTITION (dt=‘2021-03-15’)
select
channel_num ,
lat ,
lng ,
log_time ,
mobile_brand ,
mobile_type ,
net_type ,
operate_version ,
screen_size ,
sdk_version ,
sys_lag ,
user_id ,
user_ip ,
version_name ,
version_num ,
get_json_object(event_json,’ . e v e n t j s o n . v i d e o i d ′ ) v i d e o i d , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.video_id') video_id, get_json_object(event_json,' .eventjson.videoid)videoid,getjsonobject(eventjson,.event_json.video_name’) video_name,
get_json_object(event_json,’ . e v e n t j s o n . v i d e o l o n g ′ ) v i d e o l o n g , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.video_long') video_long, get_json_object(event_json,' .eventjson.videolong)videolong,getjsonobject(eventjson,.event_json.video_time’) video_time,
get_json_object(event_json,’ . e v e n t j s o n . v i d e o t y p e ′ ) v i d e o t y p e , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.video_type') video_type, get_json_object(event_json,' .eventjson.videotype)videotype,getjsonobject(eventjson,.event_json.video_success_time’) video_success_time,
get_json_object(event_json,’$.event_json.video_user_id’) video_user_id,
server_time
from game.dwd_base_event_log
where dt=‘2021-03-15’ and event_name=‘sendVideo’;
3)测试
hive (game)> select * from game.dwd_sendVideo_log where dt=‘2021-03-15’ limit 2;

7 发帖子操作日志sendArticle
字段名称 含义
article_id
帖子id
article_type
发帖子类型 0:发帖 1:发帖成功 2:发帖失败

article_cycle
帖子所属圈子

article_pic_num
帖子图片数量

article_time
帖子操作时间

article_user_id
发帖关联用户id

1)建表语句
hive (game)>
drop table if exists game.dwd_sendArticle_log;
CREATE EXTERNAL TABLE game.dwd_sendArticle_log(
channel_num String ,
lat String ,
lng String ,
log_time String ,
mobile_brand String ,
mobile_type String ,
net_type String ,
operate_version String ,
screen_size String ,
sdk_version String ,
sys_lag String ,
user_id String ,
user_ip String ,
version_name String ,
version_num String ,
article_id String ,
article_type String ,
article_cycle String ,
article_pic_num String,
article_time String,
article_user_id String,
server_time String )
PARTITIONED BY (dt string)
location ‘/user/hive/warehouse/game/dwd/dwd_sendArticle_log/’;

2)导入数据
hive (game)>
insert overwrite table game.dwd_sendArticle_log
PARTITION (dt=‘2021-03-15’)
select
channel_num ,
lat ,
lng ,
log_time ,
mobile_brand ,
mobile_type ,
net_type ,
operate_version ,
screen_size ,
sdk_version ,
sys_lag ,
user_id ,
user_ip ,
version_name ,
version_num ,
get_json_object(event_json,’ . e v e n t j s o n . a r t i c l e i d ′ ) a r t i c l e i d , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.article_id') article_id, get_json_object(event_json,' .eventjson.articleid)articleid,getjsonobject(eventjson,.event_json.article_type’) article_type,
get_json_object(event_json,’ . e v e n t j s o n . a r t i c l e c y c l e ′ ) a r t i c l e c y c l e , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.article_cycle') article_cycle, get_json_object(event_json,' .eventjson.articlecycle)articlecycle,getjsonobject(eventjson,.event_json.article_pic_num’) article_pic_num,
get_json_object(event_json,’ . e v e n t j s o n . a r t i c l e t i m e ′ ) a r t i c l e t i m e , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.article_time') article_time, get_json_object(event_json,' .eventjson.articletime)articletime,getjsonobject(eventjson,.event_json.article_user_id’) article_user_id,
server_time
from game.dwd_base_event_log
where dt=‘2021-03-15’ and event_name=‘sendArticle’;
3)测试
hive (game)> select * from game.dwd_sendArticle_log where dt=‘2021-03-15’ limit 2;

8 点赞操作日志articleLike
字段名称 含义
userid
帖子id
target_id
发帖子类型 0:发帖 1:发帖成功 2:发帖失败

type
帖子所属圈子

add_time
添加时间

1)建表语句
hive (game)>
drop table if exists game.dwd_articleLike_log;
CREATE EXTERNAL TABLE game.dwd_articleLike_log(
channel_num String ,
lat String ,
lng String ,
log_time String ,
mobile_brand String ,
mobile_type String ,
net_type String ,
operate_version String ,
screen_size String ,
sdk_version String ,
sys_lag String ,
user_id String ,
user_ip String ,
version_name String ,
version_num String ,
userid String ,
target_id String ,
type String ,
add_time String,
server_time String )
PARTITIONED BY (dt string)
location ‘/user/hive/warehouse/game/dwd/dwd_articleLike_log/’;

2)导入数据
hive (game)>
insert overwrite table game.dwd_articleLike_log
PARTITION (dt=‘2021-03-15’)
select
channel_num ,
lat ,
lng ,
log_time ,
mobile_brand ,
mobile_type ,
net_type ,
operate_version ,
screen_size ,
sdk_version ,
sys_lag ,
user_id ,
user_ip ,
version_name ,
version_num ,
get_json_object(event_json,’ . e v e n t j s o n . u s e r i d ′ ) u s e r i d , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.userid') userid, get_json_object(event_json,' .eventjson.userid)userid,getjsonobject(eventjson,.event_json.target_id’) target_id,
get_json_object(event_json,’ . e v e n t j s o n . t y p e ′ ) t y p e , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.type') type, get_json_object(event_json,' .eventjson.type)type,getjsonobject(eventjson,.event_json.add_time’) add_time,
server_time
from game.dwd_base_event_log
where dt=‘2021-03-15’ and event_name=‘articleLike’;
3)测试
hive (game)> select * from game.dwd_articleLike_log where dt=‘2021-03-15’ limit 2;

9 评论操作日志articleComment
字段名称 含义
commentType
1 文章评论 2视频评论

commentId
评论视频或者文章id

p_comment_id

父级评论id(为0则是一级评论,不为0则是回复)

commentContent
评论内容

commentTime
评论时间

praise_count
点赞数量

reply_count
回复数量

1)建表语句
hive (game)>
drop table if exists game.dwd_articleComment_log;
CREATE EXTERNAL TABLE game.dwd_articleComment_log(
channel_num String ,
lat String ,
lng String ,
log_time String ,
mobile_brand String ,
mobile_type String ,
net_type String ,
operate_version String ,
screen_size String ,
sdk_version String ,
sys_lag String ,
user_id String ,
user_ip String ,
version_name String ,
version_num String ,
commentType String ,
commentId String ,
p_comment_id String ,
commentContent String ,
commentTime String ,
praise_count String ,
reply_count String ,
server_time String )
PARTITIONED BY (dt string)
location ‘/user/hive/warehouse/game/dwd/dwd_articleComment_log/’;

2)导入数据
hive (game)>
insert overwrite table game.dwd_articleComment_log
PARTITION (dt=‘2021-03-15’)
select
channel_num ,
lat ,
lng ,
log_time ,
mobile_brand ,
mobile_type ,
net_type ,
operate_version ,
screen_size ,
sdk_version ,
sys_lag ,
user_id ,
user_ip ,
version_name ,
version_num ,
get_json_object(event_json,’ . e v e n t j s o n . c o m m e n t T y p e ′ ) c o m m e n t T y p e , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.commentType') commentType, get_json_object(event_json,' .eventjson.commentType)commentType,getjsonobject(eventjson,.event_json.commentId’) commentId,
get_json_object(event_json,’ . e v e n t j s o n . p c o m m e n t i d ′ ) p c o m m e n t i d , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.p_comment_id') p_comment_id, get_json_object(event_json,' .eventjson.pcommentid)pcommentid,getjsonobject(eventjson,.event_json.commentContent’) commentContent,
get_json_object(event_json,’ . e v e n t j s o n . c o m m e n t T i m e ′ ) t a r g e t i d , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.commentTime') target_id, get_json_object(event_json,' .eventjson.commentTime)targetid,getjsonobject(eventjson,.event_json.praise_count’) praise_count,
get_json_object(event_json,’$.event_json.reply_count’) reply_count,
server_time
from game.dwd_base_event_log
where dt=‘2021-03-15’ and event_name=‘articleComment’;
3)测试
hive (game)> select * from game.dwd_articleComment_log where dt=‘2021-03-15’ limit 2;

10 分享操作日志articleShare
字段名称 含义
shareUserId 分享用户id
shareType 分享类型 1 文章分享 2 视频分享
shareArticleId 分享视频或者文章id
shareTime 分享时间
shareCycle 分享方式 0 微信好友 1微信朋友圈 2 qq好友 3 微博分享
1)建表语句
hive (game)>
drop table if exists game.dwd_articleShare_log;
CREATE EXTERNAL TABLE game.dwd_articleShare_log(
channel_num String ,
lat String ,
lng String ,
log_time String ,
mobile_brand String ,
mobile_type String ,
net_type String ,
operate_version String ,
screen_size String ,
sdk_version String ,
sys_lag String ,
user_id String ,
user_ip String ,
version_name String ,
version_num String ,
shareUserId String ,
shareType String ,
shareArticleId String ,
shareTime String ,
shareCycle String ,
server_time String )
PARTITIONED BY (dt string)
location ‘/user/hive/warehouse/game/dwd/dwd_articleShare_log/’;

2)导入数据
hive (game)>
insert overwrite table game.dwd_articleShare_log
PARTITION (dt=‘2021-03-15’)
select
channel_num ,
lat ,
lng ,
log_time ,
mobile_brand ,
mobile_type ,
net_type ,
operate_version ,
screen_size ,
sdk_version ,
sys_lag ,
user_id ,
user_ip ,
version_name ,
version_num ,
get_json_object(event_json,’ . e v e n t j s o n . s h a r e U s e r I d ′ ) s h a r e U s e r I d , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.shareUserId') shareUserId, get_json_object(event_json,' .eventjson.shareUserId)shareUserId,getjsonobject(eventjson,.event_json.shareType’) shareType,
get_json_object(event_json,’ . e v e n t j s o n . s h a r e A r t i c l e I d ′ ) s h a r e A r t i c l e I d , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.shareArticleId') shareArticleId, get_json_object(event_json,' .eventjson.shareArticleId)shareArticleId,getjsonobject(eventjson,.event_json.shareTime’) shareTime,
get_json_object(event_json,’$.event_json.shareCycle’) shareCycle,
server_time
from game.dwd_base_event_log
where dt=‘2021-03-15’ and event_name=‘articleShare’;
3)测试
hive (game)> select * from game.dwd_articleShare_log where dt=‘2021-03-15’ limit 2;

11 DWD层时间表加载数据脚本开发
1)在node03的/home/hadoop/bin目录下创建脚本
[hadoop@node03 bin]$ vim dwd_game_event_log.sh
在脚本中编写如下内容
#!/bin/bash

定义变量方便修改

APP=game
hive=/kkb/install/apache-hive-3.1.2/bin/hive

如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天

if [ -n “$1” ] ;then
do_date=$1
else
do_date=date -d "-1 day" +%F
fi

sql="

insert overwrite table " A P P " . d w d m o v e l o g P A R T I T I O N ( d t = ′ APP".dwd_move_log PARTITION (dt=' APP".dwdmovelogPARTITION(dt=do_date’)
select
channel_num ,
lat ,
lng ,
log_time ,
mobile_brand ,
mobile_type ,
net_type ,
operate_version ,
screen_size ,
sdk_version ,
sys_lag ,
user_id ,
user_ip ,
version_name ,
version_num ,
get_json_object(event_json,’ . e v e n t j s o n . m o v e d i r e c t ′ ) m o v e d i r e c t , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.move_direct') move_direct, get_json_object(event_json,' .eventjson.movedirect)movedirect,getjsonobject(eventjson,.event_json.start_offset’) start_offset,
get_json_object(event_json,’ . e v e n t j s o n . e n d o f f s e t ′ ) e n d o f f s e t , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.end_offset') end_offset, get_json_object(event_json,' .eventjson.endoffset)endoffset,getjsonobject(eventjson,.event_json.move_menu’) move_menu,
server_time
from " A P P " . d w d b a s e e v e n t l o g w h e r e d t = ′ APP".dwd_base_event_log where dt=' APP".dwdbaseeventlogwheredt=do_date’ and event_name=‘move’;

insert overwrite table “ A P P " . d w d c l i c k l o g P A R T I T I O N ( d t = ′ APP".dwd_click_log PARTITION (dt=' APP".dwdclicklogPARTITION(dt=do_date’)
select
channel_num ,
lat ,
lng ,
log_time ,
mobile_brand ,
mobile_type ,
net_type ,
operate_version ,
screen_size ,
sdk_version ,
sys_lag ,
user_id ,
user_ip ,
version_name ,
version_num ,
get_json_object(event_json,’ . e v e n t j s o n . c l i c k t y p e ′ ) c l i c k t y p e , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.click_type') click_type, get_json_object(event_json,' .eventjson.clicktype)clicktype,getjsonobject(eventjson,.event_json.type_name’) type_name,
get_json_object(event_json,’ . e v e n t j s o n . c l i c k r e q u e s t u r l ′ ) c l i c k r e q u e s t u r l , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.click_request_url') click_request_url, get_json_object(event_json,' .eventjson.clickrequesturl)clickrequesturl,getjsonobject(eventjson,.event_json.click_time’) click_time,
get_json_object(event_json,' . e v e n t j s o n . c l i c k e n t r y ′ ) c l i c k e n t r y , s e r v e r t i m e f r o m " .event_json.click_entry') click_entry, server_time from " .eventjson.clickentry)clickentry,servertimefrom"APP”.dwd_base_event_log
where dt=’$do_date’ and event_name=‘click’;

insert overwrite table “ A P P " . d w d s e a r c h l o g P A R T I T I O N ( d t = ′ APP".dwd_search_log PARTITION (dt=' APP".dwdsearchlogPARTITION(dt=do_date’)
select
channel_num ,
lat ,
lng ,
log_time ,
mobile_brand ,
mobile_type ,
net_type ,
operate_version ,
screen_size ,
sdk_version ,
sys_lag ,
user_id ,
user_ip ,
version_name ,
version_num ,
get_json_object(event_json,’ . e v e n t j s o n . s e a r c h k e y ′ ) s e a r c h k e y , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.search_key') search_key, get_json_object(event_json,' .eventjson.searchkey)searchkey,getjsonobject(eventjson,.event_json.search_time’) search_time,
get_json_object(event_json,' . e v e n t j s o n . s e a r c h c o d e ′ ) s e a r c h c o d e , s e r v e r t i m e f r o m " .event_json.search_code') search_code, server_time from " .eventjson.searchcode)searchcode,servertimefrom"APP”.dwd_base_event_log
where dt=’$do_date’ and event_name=‘search’;

insert overwrite table " A P P " . d w d s u b s c r i b e l o g P A R T I T I O N ( d t = ′ APP".dwd_subscribe_log PARTITION (dt=' APP".dwdsubscribelogPARTITION(dt=do_date’)
select
channel_num ,
lat ,
lng ,
log_time ,
mobile_brand ,
mobile_type ,
net_type ,
operate_version ,
screen_size ,
sdk_version ,
sys_lag ,
user_id ,
user_ip ,
version_name ,
version_num ,
get_json_object(event_json,’ . e v e n t j s o n . s u b t y p e ′ ) s u b t y p e , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.sub_type') sub_type, get_json_object(event_json,' .eventjson.subtype)subtype,getjsonobject(eventjson,.event_json.sub_user_id’) sub_user_id,
get_json_object(event_json,’ . e v e n t j s o n . s u b u s e r n a m e ′ ) s u b u s e r n a m e , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.sub_user_name') sub_user_name, get_json_object(event_json,' .eventjson.subusername)subusername,getjsonobject(eventjson,.event_json.sub_user_discuss’) sub_user_discuss,
get_json_object(event_json,’ . e v e n t j s o n . s u b g o o d s c o m m e n t ′ ) s u b g o o d s c o m m e n t , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.sub_goods_comment') sub_goods_comment, get_json_object(event_json,' .eventjson.subgoodscomment)subgoodscomment,getjsonobject(eventjson,.event_json.sub_goods_like’) sub_goods_like,
server_time
from " A P P " . d w d b a s e e v e n t l o g w h e r e d t = ′ APP".dwd_base_event_log where dt=' APP".dwdbaseeventlogwheredt=do_date’ and event_name=‘subscribe’;

insert overwrite table “ A P P " . d w d c y c l e A d d l o g P A R T I T I O N ( d t = ′ APP".dwd_cycleAdd_log PARTITION (dt=' APP".dwdcycleAddlogPARTITION(dt=do_date’)
select
channel_num ,
lat ,
lng ,
log_time ,
mobile_brand ,
mobile_type ,
net_type ,
operate_version ,
screen_size ,
sdk_version ,
sys_lag ,
user_id ,
user_ip ,
version_name ,
version_num ,
get_json_object(event_json,’ . e v e n t j s o n . c y c l e o p e r a t e t y p e ′ ) c y c l e o p e r a t e t y p e , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.cycle_operate_type') cycle_operate_type, get_json_object(event_json,' .eventjson.cycleoperatetype)cycleoperatetype,getjsonobject(eventjson,.event_json.cycle_id’) cycle_id,
get_json_object(event_json,' . e v e n t j s o n . c y c l e O p e r a t e T i m e ′ ) c y c l e O p e r a t e T i m e , s e r v e r t i m e f r o m " .event_json.cycleOperateTime') cycleOperateTime, server_time from " .eventjson.cycleOperateTime)cycleOperateTime,servertimefrom"APP”.dwd_base_event_log
where dt=’$do_date’ and event_name=‘cycleAdd’;

insert overwrite table “ A P P " . d w d s e n d V i d e o l o g P A R T I T I O N ( d t = ′ APP".dwd_sendVideo_log PARTITION (dt=' APP".dwdsendVideologPARTITION(dt=do_date’)
select
channel_num ,
lat ,
lng ,
log_time ,
mobile_brand ,
mobile_type ,
net_type ,
operate_version ,
screen_size ,
sdk_version ,
sys_lag ,
user_id ,
user_ip ,
version_name ,
version_num ,
get_json_object(event_json,’ . e v e n t j s o n . v i d e o i d ′ ) v i d e o i d , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.video_id') video_id, get_json_object(event_json,' .eventjson.videoid)videoid,getjsonobject(eventjson,.event_json.video_name’) video_name,
get_json_object(event_json,’ . e v e n t j s o n . v i d e o l o n g ′ ) v i d e o l o n g , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.video_long') video_long, get_json_object(event_json,' .eventjson.videolong)videolong,getjsonobject(eventjson,.event_json.video_time’) video_time,
get_json_object(event_json,’ . e v e n t j s o n . v i d e o t y p e ′ ) v i d e o t y p e , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.video_type') video_type, get_json_object(event_json,' .eventjson.videotype)videotype,getjsonobject(eventjson,.event_json.video_success_time’) video_success_time,
get_json_object(event_json,' . e v e n t j s o n . v i d e o u s e r i d ′ ) v i d e o u s e r i d , s e r v e r t i m e f r o m " .event_json.video_user_id') video_user_id, server_time from " .eventjson.videouserid)videouserid,servertimefrom"APP”.dwd_base_event_log
where dt=’$do_date’ and event_name=‘sendVideo’;

insert overwrite table " A P P " . d w d s e n d A r t i c l e l o g P A R T I T I O N ( d t = ′ APP".dwd_sendArticle_log PARTITION (dt=' APP".dwdsendArticlelogPARTITION(dt=do_date’)
select
channel_num ,
lat ,
lng ,
log_time ,
mobile_brand ,
mobile_type ,
net_type ,
operate_version ,
screen_size ,
sdk_version ,
sys_lag ,
user_id ,
user_ip ,
version_name ,
version_num ,
get_json_object(event_json,’ . e v e n t j s o n . a r t i c l e i d ′ ) a r t i c l e i d , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.article_id') article_id, get_json_object(event_json,' .eventjson.articleid)articleid,getjsonobject(eventjson,.event_json.article_type’) article_type,
get_json_object(event_json,’ . e v e n t j s o n . a r t i c l e c y c l e ′ ) a r t i c l e c y c l e , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.article_cycle') article_cycle, get_json_object(event_json,' .eventjson.articlecycle)articlecycle,getjsonobject(eventjson,.event_json.article_pic_num’) article_pic_num,
get_json_object(event_json,’ . e v e n t j s o n . a r t i c l e t i m e ′ ) a r t i c l e t i m e , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.article_time') article_time, get_json_object(event_json,' .eventjson.articletime)articletime,getjsonobject(eventjson,.event_json.article_user_id’) article_user_id,
server_time
from " A P P " . d w d b a s e e v e n t l o g w h e r e d t = ′ APP".dwd_base_event_log where dt=' APP".dwdbaseeventlogwheredt=do_date’ and event_name=‘sendArticle’;

insert overwrite table " A P P " . d w d a r t i c l e L i k e l o g P A R T I T I O N ( d t = ′ APP".dwd_articleLike_log PARTITION (dt=' APP".dwdarticleLikelogPARTITION(dt=do_date’)
select
channel_num ,
lat ,
lng ,
log_time ,
mobile_brand ,
mobile_type ,
net_type ,
operate_version ,
screen_size ,
sdk_version ,
sys_lag ,
user_id ,
user_ip ,
version_name ,
version_num ,
get_json_object(event_json,’ . e v e n t j s o n . u s e r i d ′ ) u s e r i d , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.userid') userid, get_json_object(event_json,' .eventjson.userid)userid,getjsonobject(eventjson,.event_json.target_id’) target_id,
get_json_object(event_json,’ . e v e n t j s o n . t y p e ′ ) t y p e , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.type') type, get_json_object(event_json,' .eventjson.type)type,getjsonobject(eventjson,.event_json.add_time’) add_time,
server_time
from " A P P " . d w d b a s e e v e n t l o g w h e r e d t = ′ APP".dwd_base_event_log where dt=' APP".dwdbaseeventlogwheredt=do_date’ and event_name=‘articleLike’;

insert overwrite table “ A P P " . d w d a r t i c l e C o m m e n t l o g P A R T I T I O N ( d t = ′ APP".dwd_articleComment_log PARTITION (dt=' APP".dwdarticleCommentlogPARTITION(dt=do_date’)
select
channel_num ,
lat ,
lng ,
log_time ,
mobile_brand ,
mobile_type ,
net_type ,
operate_version ,
screen_size ,
sdk_version ,
sys_lag ,
user_id ,
user_ip ,
version_name ,
version_num ,
get_json_object(event_json,’ . e v e n t j s o n . c o m m e n t T y p e ′ ) c o m m e n t T y p e , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.commentType') commentType, get_json_object(event_json,' .eventjson.commentType)commentType,getjsonobject(eventjson,.event_json.commentId’) commentId,
get_json_object(event_json,’ . e v e n t j s o n . p c o m m e n t i d ′ ) p c o m m e n t i d , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.p_comment_id') p_comment_id, get_json_object(event_json,' .eventjson.pcommentid)pcommentid,getjsonobject(eventjson,.event_json.commentContent’) commentContent,
get_json_object(event_json,’ . e v e n t j s o n . c o m m e n t T i m e ′ ) t a r g e t i d , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.commentTime') target_id, get_json_object(event_json,' .eventjson.commentTime)targetid,getjsonobject(eventjson,.event_json.praise_count’) praise_count,
get_json_object(event_json,' . e v e n t j s o n . r e p l y c o u n t ′ ) r e p l y c o u n t , s e r v e r t i m e f r o m " .event_json.reply_count') reply_count, server_time from " .eventjson.replycount)replycount,servertimefrom"APP”.dwd_base_event_log
where dt=’$do_date’ and event_name=‘articleComment’;

insert overwrite table “ A P P " . d w d a r t i c l e S h a r e l o g P A R T I T I O N ( d t = ′ APP".dwd_articleShare_log PARTITION (dt=' APP".dwdarticleSharelogPARTITION(dt=do_date’)
select
channel_num ,
lat ,
lng ,
log_time ,
mobile_brand ,
mobile_type ,
net_type ,
operate_version ,
screen_size ,
sdk_version ,
sys_lag ,
user_id ,
user_ip ,
version_name ,
version_num ,
get_json_object(event_json,’ . e v e n t j s o n . s h a r e U s e r I d ′ ) s h a r e U s e r I d , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.shareUserId') shareUserId, get_json_object(event_json,' .eventjson.shareUserId)shareUserId,getjsonobject(eventjson,.event_json.shareType’) shareType,
get_json_object(event_json,’ . e v e n t j s o n . s h a r e A r t i c l e I d ′ ) s h a r e A r t i c l e I d , g e t j s o n o b j e c t ( e v e n t j s o n , ′ .event_json.shareArticleId') shareArticleId, get_json_object(event_json,' .eventjson.shareArticleId)shareArticleId,getjsonobject(eventjson,.event_json.shareTime’) shareTime,
get_json_object(event_json,' . e v e n t j s o n . s h a r e C y c l e ′ ) s h a r e C y c l e , s e r v e r t i m e f r o m " .event_json.shareCycle') shareCycle, server_time from " .eventjson.shareCycle)shareCycle,servertimefrom"APP”.dwd_base_event_log
where dt=’$do_date’ and event_name=‘articleShare’;

"

h i v e − e " hive -e " hivee"sql"

2)增加脚本执行权限
[hadoop@node03 bin]$ chmod 777 dwd_game_event_log.sh
3)脚本使用
[hadoop@node03 bin]$ ./dwd_game_event_log.sh 2021-03-16
4)查询导入结果
hive (game)>
select * from dwd_articleshare_log limit where dt=‘2021-03-16’ limit 2;

5)脚本执行时间
企业开发中一般在每日凌晨30分~1点

12、数仓业务知识逻辑介绍
1、基础业务逻辑介绍
用户
用户以设备为判断标准,在移动统计中,每个独立设备认为是一个独立用户。Android系统根据IMEI号,IOS系统根据OpenUDID来标识一个独立用户,每部手机一个用户。

新增用户
首次联网使用应用的用户。如果一个用户首次打开某APP,那这个用户定义为新增用户;卸载再安装的设备,不会被算作一次新增。新增用户包括日新增用户、周新增用户、月新增用户。

活跃用户
打开应用的用户即为活跃用户,不考虑用户的使用情况。每天一台设备打开多次会被计为一个活跃用户。

周(月)活跃用户
某个自然周(月)内启动过应用的用户,该周(月)内的多次启动只记一个活跃用户。

月活跃率
月活跃用户与截止到该月累计的用户总和之间的比例。

沉默用户
用户仅在安装当天(次日)启动一次,后续时间无再启动行为。该指标可以反映新增用户质量和用户与APP的匹配程度

版本分布
不同版本的周内各天新增用户数,活跃用户数和启动次数。利于判断APP各个版本之间的优劣和用户行为习惯。
本周回流用户
上周未启动过应用,本周启动了应用的用户。
连续n周活跃用户
连续n周,每周至少启动一次。

忠诚用户
连续活跃5周以上的用户

连续活跃用户
连续2周及以上活跃的用户

近期流失用户
连续n(2<= n <= 4)周没有启动应用的用户。(第n+1周没有启动过)

留存用户
某段时间内的新增用户,经过一段时间后,仍然使用应用的被认作是留存用户;这部分用户占当时新增用户的比例即是留存率。
例如,5月份新增用户200,这200人在6月份启动过应用的有100人,7月份启动过应用的有80人,8月份启动过应用的有50人;则5月份新增用户一个月后的留存率是50%,二个月后的留存率是40%,三个月后的留存率是25%。

用户新鲜度
每天启动应用的新老用户比例,即新增用户数占活跃用户数的比例。

单次使用时长
每次启动使用的时间长度。

日使用时长
累计一天内的使用时间长度。

启动次数计算标准
IOS平台应用退到后台就算一次独立的启动;Android平台我们规定,两次启动之间的间隔小于30秒,被计算一次启动。用户在使用过程中,若因收发短信或接电话等退出应用30秒又再次返回应用中,那这两次行为应该是延续而非独立的,所以可以被算作一次使用行为,即一次启动。业内大多使用30秒这个标准,但用户还是可以自定义此时间间隔。

2、hive日期处理函数
2、日期处理函数
1)date_format函数(根据格式整理日期)
hive (game)> select date_format(‘2021-03-15’,‘yyyy-MM’);

2021-03

2)date_add函数(加减日期)
hive (game)> select date_add(‘2021-03-15’,1);
2021-03-16

hive (game)> select date_add(‘2021-03-15’,-1);
2021-03-14

3)next_day函数
(1)取当前天的下一个周一
hive (game)> select next_day(‘2021-03-15’,‘MO’);
2021-03-22
说明:星期一到星期日的英文(Monday,Tuesday、Wednesday、Thursday、Friday、Saturday、Sunday)
(2)取当前周的周一
hive (game)> select date_add(next_day(‘2021-03-15’,‘MO’),-7);
2021-03-15

4)last_day函数(求当月最后一天日期)
hive (game)> select last_day(‘2021-03-15’);
2021-03-31

13、 需求一:用户活跃主题
1 DWS层
目标:统计当日、当周、当月活动的每个设备明细
1.1 每日活跃设备明细

1)建表语句
hive (game)>
drop table if exists game.dws_uv_detail_day;
create external table game.dws_uv_detail_day(
mac_id String,
user_id String,
version_num String,
version_name String,
sys_lag String,
channel_num String ,
operate_version String ,
mobile_type String ,
mobile_brand String ,
sdk_version String,
screen_size String,
log_time String,
net_type String,
lng String,
lat String ,
user_ip String
) PARTITIONED BY ( dt string)
stored as parquet
location ‘/user/hive/warehouse/game/dws/dws_uv_detail_day/’;

2)数据导入
以用户单日访问为key进行聚合,如果某个用户在一天中使用了两种操作系统、两个系统版本、多个地区,登录不同账号,只取其中之一
hive (game)>
insert overwrite table game.dws_uv_detail_day partition(dt=‘2021-03-15’)
select
mac_id,
concat_ws(’|’, collect_set(user_id)) user_id,
concat_ws(’|’, collect_set(version_num)) version_num,
concat_ws(’|’, collect_set(version_name)) version_name,
concat_ws(’|’, collect_set(sys_lag)) sys_lag,
concat_ws(’|’, collect_set(channel_num)) channel_num,
concat_ws(’|’, collect_set(operate_version)) operate_version,
concat_ws(’|’, collect_set(mobile_type)) mobile_type,
concat_ws(’|’, collect_set(mobile_brand)) mobile_brand,
concat_ws(’|’, collect_set(sdk_version)) sdk_version,
concat_ws(’|’, collect_set(screen_size)) screen_size,
concat_ws(’|’, collect_set(log_time)) log_time,
concat_ws(’|’, collect_set(net_type)) net_type,
concat_ws(’|’, collect_set(lng)) lng,
concat_ws(’|’, collect_set(lat)) lat,
concat_ws(’|’, collect_set(user_ip)) user_ip
from game.dwd_start_log
where dt=‘2021-03-15’
group by mac_id;

3)查询导入结果
hive (game)> select * from game.dws_uv_detail_day limit 1;
hive (game)> select count(*) from game.dws_uv_detail_day;

4)思考:不同渠道来源的每日活跃数统计怎么计算?
1.2 每周活跃设备明细

根据日用户访问明细,获得周用户访问明细。
1)建表语句
hive (game)>
drop table if exists game.dws_uv_detail_wk;
create external table game.dws_uv_detail_wk(
mac_id String,
user_id String,
version_num String,
version_name String,
sys_lag String,
channel_num String ,
operate_version String ,
mobile_type String ,
mobile_brand String ,
sdk_version String,
screen_size String,
log_time String,
net_type String,
lng String,
lat String ,
user_ip String ,
monday_date string ,
sunday_date string
) PARTITIONED BY ( wk_dt string)
stored as parquet
location ‘/user/hive/warehouse/game/dws/dws_uv_detail_wk/’;

2)数据导入
hive (game)>
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=600000;
set hive.exec.max.dynamic.partitions=6000000;
set hive.exec.max.created.files=6000000;
insert overwrite table game.dws_uv_detail_wk partition(wk_dt)
select
mac_id,
concat_ws(’|’, collect_set(user_id)) user_id,
concat_ws(’|’, collect_set(version_num)) version_num,
concat_ws(’|’, collect_set(version_name)) version_name,
concat_ws(’|’, collect_set(sys_lag)) sys_lag,
concat_ws(’|’, collect_set(channel_num)) channel_num,
concat_ws(’|’, collect_set(operate_version)) operate_version,
concat_ws(’|’, collect_set(mobile_type)) mobile_type,
concat_ws(’|’, collect_set(mobile_brand)) mobile_brand,
concat_ws(’|’, collect_set(sdk_version)) sdk_version,
concat_ws(’|’, collect_set(screen_size)) screen_size,
concat_ws(’|’, collect_set(log_time)) log_time,
concat_ws(’|’, collect_set(net_type)) net_type,
concat_ws(’|’, collect_set(lng)) lng,
concat_ws(’|’, collect_set(lat)) lat,
concat_ws(’|’, collect_set(user_ip)) user_ip ,
date_add(next_day(‘2021-03-15’,‘MO’),-7),
date_add(next_day(‘2021-03-15’,‘MO’),-1),
concat(date_add( next_day(‘2021-03-15’,‘MO’),-7), ‘_’ , date_add(next_day(‘2021-03-15’,‘MO’),-1)
)
from game.dws_uv_detail_day
where dt>=date_add(next_day(‘2021-03-15’,‘MO’),-7) and dt<=date_add(next_day(‘2021-03-15’,‘MO’),-1)
group by mac_id;

3)查询导入结果
hive (game)> select * from game.dws_uv_detail_wk limit 1;
hive (game)> select count(*) from game.dws_uv_detail_wk;

1.3 每月活跃设备明细

1)建表语句
hive (game)>
drop table if exists game.dws_uv_detail_mn;
create external table game.dws_uv_detail_mn(
mac_id String,
user_id String,
version_num String,
version_name String,
sys_lag String,
channel_num String ,
operate_version String ,
mobile_type String ,
mobile_brand String ,
sdk_version String,
screen_size String,
log_time String,
net_type String,
lng String,
lat String ,
user_ip String
) COMMENT ‘活跃用户按月明细’
PARTITIONED BY (mn string)
stored as parquet
location ‘/user/hive/warehouse/game/dws/dws_uv_detail_mn/’;

2)数据导入
hive (game)>
set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table game.dws_uv_detail_mn partition(mn)
select
mac_id,
concat_ws(’|’, collect_set(user_id)) user_id,
concat_ws(’|’, collect_set(version_num)) version_num,
concat_ws(’|’, collect_set(version_name)) version_name,
concat_ws(’|’, collect_set(sys_lag)) sys_lag,
concat_ws(’|’, collect_set(channel_num)) channel_num,
concat_ws(’|’, collect_set(operate_version)) operate_version,
concat_ws(’|’, collect_set(mobile_type)) mobile_type,
concat_ws(’|’, collect_set(mobile_brand)) mobile_brand,
concat_ws(’|’, collect_set(sdk_version)) sdk_version,
concat_ws(’|’, collect_set(screen_size)) screen_size,
concat_ws(’|’, collect_set(log_time)) log_time,
concat_ws(’|’, collect_set(net_type)) net_type,
concat_ws(’|’, collect_set(lng)) lng,
concat_ws(’|’, collect_set(lat)) lat,
concat_ws(’|’, collect_set(user_ip)) user_ip ,
date_format(‘2021-03-15’,‘yyyy-MM’)
from game.dws_uv_detail_day
where date_format(dt,‘yyyy-MM’) = date_format(‘2021-03-15’,‘yyyy-MM’)
group by mac_id;

3)查询导入结果
hive (game)> select * from game.dws_uv_detail_mn limit 1;
hive (game)> select count(*) from game.dws_uv_detail_mn ;

1.4 DWS层加载数据脚本
1)在node03 的/home/hadoop/bin目录下创建脚本
[hadoop@node03 bin]$ vim dws_uv_log.sh

在脚本中编写如下内容
#!/bin/bash

定义变量方便修改

APP=game
hive=/kkb/install/apache-hive-3.1.2/bin/hive

如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天

if [ -n “$1” ] ;then
do_date=$1
else
do_date=date -d "-1 day" +%F
fi

sql="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=600000;
set hive.exec.max.dynamic.partitions=6000000;
set hive.exec.max.created.files=6000000;
insert overwrite table " A P P " . d w s u v d e t a i l d a y p a r t i t i o n ( d t = ′ APP".dws_uv_detail_day partition(dt=' APP".dwsuvdetaildaypartition(dt=do_date’)
select
mac_id,
concat_ws(’|’, collect_set(user_id)) user_id,
concat_ws(’|’, collect_set(version_num)) version_num,
concat_ws(’|’, collect_set(version_name)) version_name,
concat_ws(’|’, collect_set(sys_lag)) sys_lag,
concat_ws(’|’, collect_set(channel_num)) channel_num,
concat_ws(’|’, collect_set(operate_version)) operate_version,
concat_ws(’|’, collect_set(mobile_type)) mobile_type,
concat_ws(’|’, collect_set(mobile_brand)) mobile_brand,
concat_ws(’|’, collect_set(sdk_version)) sdk_version,
concat_ws(’|’, collect_set(screen_size)) screen_size,
concat_ws(’|’, collect_set(log_time)) log_time,
concat_ws(’|’, collect_set(net_type)) net_type,
concat_ws(’|’, collect_set(lng)) lng,
concat_ws(’|’, collect_set(lat)) lat,
concat_ws(’|’, collect_set(user_ip)) user_ip
from " A P P " . d w d s t a r t l o g w h e r e d t = ′ APP".dwd_start_log where dt=' APP".dwdstartlogwheredt=do_date’
group by mac_id;

insert overwrite table “ A P P " . d w s u v d e t a i l w k p a r t i t i o n ( w k d t ) s e l e c t m a c i d , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( u s e r i d ) ) u s e r i d , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( v e r s i o n n u m ) ) v e r s i o n n u m , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( v e r s i o n n a m e ) ) v e r s i o n n a m e , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( s y s l a g ) ) s y s l a g , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( c h a n n e l n u m ) ) c h a n n e l n u m , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( o p e r a t e v e r s i o n ) ) o p e r a t e v e r s i o n , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( m o b i l e t y p e ) ) m o b i l e t y p e , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( m o b i l e b r a n d ) ) m o b i l e b r a n d , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( s d k v e r s i o n ) ) s d k v e r s i o n , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( s c r e e n s i z e ) ) s c r e e n s i z e , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( l o g t i m e ) ) l o g t i m e , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( n e t t y p e ) ) n e t t y p e , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( l n g ) ) l n g , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( l a t ) ) l a t , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( u s e r i p ) ) u s e r i p , d a t e a d d ( n e x t d a y ( ′ APP".dws_uv_detail_wk partition(wk_dt) select mac_id, concat_ws('|', collect_set(user_id)) user_id, concat_ws('|', collect_set(version_num)) version_num, concat_ws('|', collect_set(version_name)) version_name, concat_ws('|', collect_set(sys_lag)) sys_lag, concat_ws('|', collect_set(channel_num)) channel_num, concat_ws('|', collect_set(operate_version)) operate_version, concat_ws('|', collect_set(mobile_type)) mobile_type, concat_ws('|', collect_set(mobile_brand)) mobile_brand, concat_ws('|', collect_set(sdk_version)) sdk_version, concat_ws('|', collect_set(screen_size)) screen_size, concat_ws('|', collect_set(log_time)) log_time, concat_ws('|', collect_set(net_type)) net_type, concat_ws('|', collect_set(lng)) lng, concat_ws('|', collect_set(lat)) lat, concat_ws('|', collect_set(user_ip)) user_ip , date_add(next_day(' APP".dwsuvdetailwkpartition(wkdt)selectmacid,concatws(,collectset(userid))userid,concatws(,collectset(versionnum))versionnum,concatws(,collectset(versionname))versionname,concatws(,collectset(syslag))syslag,concatws(,collectset(channelnum))channelnum,concatws(,collectset(operateversion))operateversion,concatws(,collectset(mobiletype))mobiletype,concatws(,collectset(mobilebrand))mobilebrand,concatws(,collectset(sdkversion))sdkversion,concatws(,collectset(screensize))screensize,concatws(,collectset(logtime))logtime,concatws(,collectset(nettype))nettype,concatws(,collectset(lng))lng,concatws(,collectset(lat))lat,concatws(,collectset(userip))userip,dateadd(nextday(do_date’,‘MO’),-7),
date_add(next_day(‘ d o d a t e ′ , ′ M O ′ ) , − 1 ) , c o n c a t ( d a t e a d d ( n e x t d a y ( ′ do_date','MO'),-1), concat(date_add( next_day(' dodate,MO),1),concat(dateadd(nextday(do_date’,‘MO’),-7), ‘_’ , date_add(next_day(' d o d a t e ′ , ′ M O ′ ) , − 1 ) ) f r o m " do_date','MO'),-1) ) from " dodate,MO),1))from"APP”.dws_uv_detail_day
where dt>=date_add(next_day(‘ d o d a t e ′ , ′ M O ′ ) , − 7 ) a n d d t < = d a t e a d d ( n e x t d a y ( ′ do_date','MO'),-7) and dt<=date_add(next_day(' dodate,MO),7)anddt<=dateadd(nextday(do_date’,‘MO’),-1)
group by mac_id;

insert overwrite table " A P P " . d w s u v d e t a i l m n p a r t i t i o n ( m n ) s e l e c t m a c i d , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( u s e r i d ) ) u s e r i d , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( v e r s i o n n u m ) ) v e r s i o n n u m , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( v e r s i o n n a m e ) ) v e r s i o n n a m e , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( s y s l a g ) ) s y s l a g , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( c h a n n e l n u m ) ) c h a n n e l n u m , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( o p e r a t e v e r s i o n ) ) o p e r a t e v e r s i o n , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( m o b i l e t y p e ) ) m o b i l e t y p e , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( m o b i l e b r a n d ) ) m o b i l e b r a n d , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( s d k v e r s i o n ) ) s d k v e r s i o n , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( s c r e e n s i z e ) ) s c r e e n s i z e , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( l o g t i m e ) ) l o g t i m e , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( n e t t y p e ) ) n e t t y p e , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( l n g ) ) l n g , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( l a t ) ) l a t , c o n c a t w s ( ′ ∣ ′ , c o l l e c t s e t ( u s e r i p ) ) u s e r i p , d a t e f o r m a t ( ′ APP".dws_uv_detail_mn partition(mn) select mac_id, concat_ws('|', collect_set(user_id)) user_id, concat_ws('|', collect_set(version_num)) version_num, concat_ws('|', collect_set(version_name)) version_name, concat_ws('|', collect_set(sys_lag)) sys_lag, concat_ws('|', collect_set(channel_num)) channel_num, concat_ws('|', collect_set(operate_version)) operate_version, concat_ws('|', collect_set(mobile_type)) mobile_type, concat_ws('|', collect_set(mobile_brand)) mobile_brand, concat_ws('|', collect_set(sdk_version)) sdk_version, concat_ws('|', collect_set(screen_size)) screen_size, concat_ws('|', collect_set(log_time)) log_time, concat_ws('|', collect_set(net_type)) net_type, concat_ws('|', collect_set(lng)) lng, concat_ws('|', collect_set(lat)) lat, concat_ws('|', collect_set(user_ip)) user_ip , date_format(' APP".dwsuvdetailmnpartition(mn)selectmacid,concatws(,collectset(userid))userid,concatws(,collectset(versionnum))versionnum,concatws(,collectset(versionname))versionname,concatws(,collectset(syslag))syslag,concatws(,collectset(channelnum))channelnum,concatws(,collectset(operateversion))operateversion,concatws(,collectset(mobiletype))mobiletype,concatws(,collectset(mobilebrand))mobilebrand,concatws(,collectset(sdkversion))sdkversion,concatws(,collectset(screensize))screensize,concatws(,collectset(logtime))logtime,concatws(,collectset(nettype))nettype,concatws(,collectset(lng))lng,concatws(,collectset(lat))lat,concatws(,collectset(userip))userip,dateformat(do_date’,‘yyyy-MM’)
from “ A P P " . d w s u v d e t a i l d a y w h e r e d a t e f o r m a t ( d t , ′ y y y y − M M ′ ) = d a t e f o r m a t ( ′ APP".dws_uv_detail_day where date_format(dt,'yyyy-MM') = date_format(' APP".dwsuvdetaildaywheredateformat(dt,yyyyMM)=dateformat(do_date’,‘yyyy-MM’)
group by mac_id;
"
h i v e − e " hive -e " hivee"sql”

2)增加脚本执行权限
[hadoop@node03 bin]$ chmod 777 dws_uv_log.sh

3)脚本使用
[hadoop@node03 bin]$ ./dws_uv_log.sh 2021-03-16

4)查询结果
hive (game)> select count() from game.dws_uv_detail_day where dt=‘2021-03-16’;
hive (game)> select count(
) from game.dws_uv_detail_wk ;
hive (game)> select count(*) from game.dws_uv_detail_mn ;

5)脚本执行时间
企业开发中一般在每日凌晨30分~1点
2 ADS层
目标:当日、当周、当月活跃设备数
2.1 活跃设备数

1)建表语句
hive (game)>
drop table if exists game.ads_uv_count;
create external table game.ads_uv_count(
dt string COMMENT ‘统计日期’,
day_count bigint COMMENT ‘当日用户数量’,
wk_count bigint COMMENT ‘当周用户数量’,
mn_count bigint COMMENT ‘当月用户数量’,
is_weekend string COMMENT ‘Y,N是否是周末,用于得到本周最终结果’,
is_monthend string COMMENT ‘Y,N是否是月末,用于得到本月最终结果’
)
row format delimited fields terminated by ‘\t’
location ‘/user/hive/warehouse/game/ads/ads_uv_count_day/’;

2)导入数据
hive (game)>
insert into table game.ads_uv_count
select
‘2021-03-15’ dt,
daycount.ct,
wkcount.ct,
mncount.ct,
if(date_add(next_day(‘2021-03-15’,‘MO’),-1)=‘2021-03-15’,‘Y’,‘N’) ,
if(last_day(‘2021-03-15’)=‘2021-03-15’,‘Y’,‘N’)
from
(
select
‘2021-03-15’ dt,
count() ct
from game.dws_uv_detail_day
where dt=‘2021-03-15’
)daycount join
(
select
‘2021-03-15’ dt,
count (
) ct
from game.dws_uv_detail_wk
where wk_dt=concat(date_add(next_day(‘2021-03-15’,‘MO’),-7),’_’ ,date_add(next_day(‘2021-03-15’,‘MO’),-1) )
) wkcount on daycount.dt=wkcount.dt
join
(
select
‘2021-03-15’ dt,
count (*) ct
from game.dws_uv_detail_mn
where mn=date_format(‘2021-03-15’,‘yyyy-MM’)
) mncount on daycount.dt=mncount.dt ;
3)查询导入结果
hive (game)> select * from game.ads_uv_count ;

2.2 ADS层加载数据脚本
1)在node03的/home/hadoop/bin目录下创建脚本
[hadoop@node03 bin]$ vim ads_uv_log.sh

在脚本中编写如下内容
#!/bin/bash

定义变量方便修改

APP=game
hive=/kkb/install/apache-hive-3.1.2/bin/hive

如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天

if [ -n “$1” ] ;then
do_date=$1
else
do_date=date -d "-1 day" +%F
fi

sql="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=600000;
set hive.exec.max.dynamic.partitions=6000000;
set hive.exec.max.created.files=6000000;

insert into table “ A P P " . a d s u v c o u n t s e l e c t ′ APP".ads_uv_count select ' APP".adsuvcountselectdo_date’ dt,
daycount.ct,
wkcount.ct,
mncount.ct,
if(date_add(next_day(‘ d o d a t e ′ , ′ M O ′ ) , − 1 ) = ′ do_date','MO'),-1)=' dodate,MO),1)=do_date’,‘Y’,‘N’) ,
if(last_day(‘ d o d a t e ′ ) = ′ do_date')=' dodate)=do_date’,‘Y’,‘N’)
from
(
select
' d o d a t e ′ d t , c o u n t ( ∗ ) c t f r o m " do_date' dt, count(*) ct from " dodatedt,count()ctfrom"APP”.dws_uv_detail_day
where dt=‘ d o d a t e ′ ) d a y c o u n t j o i n ( s e l e c t ′ do_date' )daycount join ( select ' dodate)daycountjoin(selectdo_date’ dt,
count () ct
from " A P P " . d w s u v d e t a i l w k w h e r e w k d t = c o n c a t ( d a t e a d d ( n e x t d a y ( ′ APP".dws_uv_detail_wk where wk_dt=concat(date_add(next_day(' APP".dwsuvdetailwkwherewkdt=concat(dateadd(nextday(do_date’,‘MO’),-7),’_’ ,date_add(next_day(‘ d o d a t e ′ , ′ M O ′ ) , − 1 ) ) ) w k c o u n t o n d a y c o u n t . d t = w k c o u n t . d t j o i n ( s e l e c t ′ do_date','MO'),-1) ) ) wkcount on daycount.dt=wkcount.dt join ( select ' dodate,MO),1)))wkcountondaycount.dt=wkcount.dtjoin(selectdo_date’ dt,
count (
) ct
from " A P P " . d w s u v d e t a i l m n w h e r e m n = d a t e f o r m a t ( ′ APP".dws_uv_detail_mn where mn=date_format(' APP".dwsuvdetailmnwheremn=dateformat(do_date’,‘yyyy-MM’)
) mncount on daycount.dt=mncount.dt ;
"

h i v e − e " hive -e " hivee"sql"
2)增加脚本执行权限
[hadoop@node03 bin]$ chmod 777 ads_uv_log.sh
3)脚本使用
[hadoop@node03 bin]$ ./ads_uv_log.sh 2021-03-16
4)脚本执行时间
企业开发中一般在每日凌晨30分~1点
5)查询导入结果
hive (game)> select * from game.ads_uv_count;
14、 需求二:用户新增主题
首次联网使用应用的用户。如果一个用户首次打开某APP,那这个用户定义为新增用户;卸载再安装的设备,不会被算作一次新增。新增用户包括日新增用户、周新增用户、月新增用户。
1 DWS层(每日新增设备明细表)

1)建表语句
hive (game)>
drop table if exists game.dws_new_mid_day;
create external table game.dws_new_mid_day
(
mac_id String,
user_id String,
version_num String,
version_name String,
sys_lag String,
channel_num String ,
operate_version String ,
mobile_type String ,
mobile_brand String ,
sdk_version String,
screen_size String,
log_time String,
net_type String,
lng String,
lat String ,
user_ip String,
create_date string
) COMMENT ‘每日新增设备信息’
stored as parquet
location ‘/user/hive/warehouse/game/dws/dws_new_mid_day/’;
2)导入数据
用每日活跃用户表Left Join每日新增设备表,关联的条件是mac_id相等。如果是每日新增的设备,则在每日新增设备表中为null。
hive (game)>
insert into table game.dws_new_mid_day
select
ud.mac_id ,
ud.user_id ,
ud.version_num ,
ud.version_name ,
ud.sys_lag ,
ud.channel_num ,
ud.operate_version ,
ud.mobile_type ,
ud.mobile_brand ,
ud.sdk_version ,
ud.screen_size ,
ud.log_time ,
ud.net_type ,
ud.lng ,
ud.lat ,
ud.user_ip ,
‘2021-03-15’
from game.dws_uv_detail_day ud left join game.dws_new_mid_day nm on ud.mac_id=nm.mac_id
where ud.dt=‘2021-03-15’ and nm.mac_id is null;

3)查询导入数据
hive (game)> select count(*) from game.dws_new_mid_day ;
2 ADS层(每日新增设备表)

1)建表语句
hive (game)>
drop table if exists game.ads_new_mid_count;
create external table game.ads_new_mid_count
(
create_date string comment ‘创建时间’ ,
new_mid_count BIGINT comment ‘新增设备数量’
) COMMENT ‘每日新增设备信息数量’
row format delimited fields terminated by ‘\t’
location ‘/user/hive/warehouse/game/ads/ads_new_mid_count/’;

2)导入数据
hive (game)>
insert into table game.ads_new_mid_count
select
create_date,
count(*)
from game.dws_new_mid_day
where create_date=‘2021-03-15’
group by create_date;

3)查询导入数据
hive (game)> select * from game.ads_new_mid_count;
15、 需求三:用户留存主题
1 需求目标
1.1 用户留存概念

1.2 需求描述

2 DWS层
2.1 DWS层(每日留存用户明细表)

1)建表语句
hive (game)>
drop table if exists game.dws_user_retention_day;
create external table game.dws_user_retention_day
(
mac_id String,
user_id String,
version_num String,
version_name String,
sys_lag String,
channel_num String ,
operate_version String ,
mobile_type String ,
mobile_brand String ,
sdk_version String,
screen_size String,
log_time String,
net_type String,
lng String,
lat String ,
user_ip String,
create_date string comment ‘设备新增时间’,
retention_day int comment ‘截止当前日期留存天数’
) COMMENT ‘每日用户留存情况’
PARTITIONED BY (dt string)
stored as parquet
location ‘/user/hive/warehouse/game/dws/dws_user_retention_day/’;

2)导入数据(每天计算前1天的新用户访问留存明细)
hive (game)>
insert overwrite table game.dws_user_retention_day
partition(dt=“2021-03-16”)
select
nm.mac_id ,
nm.user_id ,
nm.version_num ,
nm.version_name ,
nm.sys_lag ,
nm.channel_num ,
nm.operate_version ,
nm.mobile_type ,
nm.mobile_brand ,
nm.sdk_version ,
nm.screen_size ,
nm.log_time ,
nm.net_type ,
nm.lng ,
nm.lat ,
nm.user_ip ,
nm.create_date,
1 retention_day
from game.dws_uv_detail_day ud join game.dws_new_mid_day nm on ud.mac_id=nm.mac_id
where ud.dt=‘2021-03-16’ and nm.create_date=date_add(‘2021-03-16’,-1);
3)查询导入数据(每天计算前1天的新用户访问留存明细)
hive (game)> select count(*) from game.dws_user_retention_day;

2.2 Union与Union all区别
1)准备两张表
tableA tableB
id  name  score id  name  score
1   a    80 1   d    48
2   b    79 2   e    23
3   c    68 3   c    86
2)采用union查询
select name from tableA             
union                        
select name from tableB             
查询结果
name
a
d
b
e
c
3)采用union all查询
select name from tableA
union all
select name from tableB
查询结果
name
a
b
c
d
e
c
4)总结
(1)union会将联合的结果集去重,效率较union all差
(2)union all不会对结果集去重,所以效率高
2.3 DWS层(1,2,3,n天留存用户明细表)
1)导入数据(每天计算前1,2,3,n天的新用户访问留存明细)
hive (game)>
insert overwrite table game.dws_user_retention_day
partition(dt=“2021-03-16”)
select
nm.mac_id ,
nm.user_id ,
nm.version_num ,
nm.version_name ,
nm.sys_lag ,
nm.channel_num ,
nm.operate_version ,
nm.mobile_type ,
nm.mobile_brand ,
nm.sdk_version ,
nm.screen_size ,
nm.log_time ,
nm.net_type ,
nm.lng ,
nm.lat ,
nm.user_ip ,
nm.create_date,
1 retention_day
from game.dws_uv_detail_day ud join game.dws_new_mid_day nm on ud.mac_id =nm.mac_id
where ud.dt=‘2021-03-16’ and nm.create_date=date_add(‘2021-03-16’,-1)

union all
select
nm.mac_id ,
nm.user_id ,
nm.version_num ,
nm.version_name ,
nm.sys_lag ,
nm.channel_num ,
nm.operate_version ,
nm.mobile_type ,
nm.mobile_brand ,
nm.sdk_version ,
nm.screen_size ,
nm.log_time ,
nm.net_type ,
nm.lng ,
nm.lat ,
nm.user_ip ,
nm.create_date,
2 retention_day
from game.dws_uv_detail_day ud join game.dws_new_mid_day nm on ud.mac_id =nm.mac_id
where ud.dt=‘2021-03-16’ and nm.create_date=date_add(‘2021-03-16’,-2)

union all
select
nm.mac_id ,
nm.user_id ,
nm.version_num ,
nm.version_name ,
nm.sys_lag ,
nm.channel_num ,
nm.operate_version ,
nm.mobile_type ,
nm.mobile_brand ,
nm.sdk_version ,
nm.screen_size ,
nm.log_time ,
nm.net_type ,
nm.lng ,
nm.lat ,
nm.user_ip ,
nm.create_date,
3 retention_day
from game.dws_uv_detail_day ud join game.dws_new_mid_day nm on ud.mac_id =nm.mac_id
where ud.dt=‘2021-03-16’ and nm.create_date=date_add(‘2021-03-16’,-3);
2)查询导入数据(每天计算前1,2,3天的新用户访问留存明细)
hive (game)>select retention_day , count(*) from game.dws_user_retention_day group by retention_day;

3 ADS层
3.1 留存用户数

1)建表语句
hive (game)>
drop table if exists game.ads_user_retention_day_count;
create external table game.ads_user_retention_day_count
(
create_date string comment ‘设备新增日期’,
retention_day int comment ‘截止当前日期留存天数’,
retention_count bigint comment ‘留存数量’
) COMMENT ‘每日用户留存情况’
row format delimited fields terminated by ‘\t’
location ‘/user/hive/warehouse/game/ads/ads_user_retention_day_count/’;

2)导入数据
hive (game)>
insert into table game.ads_user_retention_day_count
select
create_date,
retention_day,
count(*) retention_count
from game.dws_user_retention_day
where dt=‘2021-03-16’
group by create_date,retention_day;

3)查询导入数据
hive (game)> select * from game.ads_user_retention_day_count;

3.2 留存用户比率

1)建表语句
hive (game)>
drop table if exists game.ads_user_retention_day_rate;

create external table game.ads_user_retention_day_rate
(
stat_date string comment ‘统计日期’,
create_date string comment ‘设备新增日期’,
retention_day int comment ‘截止当前日期留存天数’,
retention_count bigint comment ‘留存数量’,
new_mid_count bigint comment ‘当日设备新增数量’,
retention_ratio decimal(10,2) comment ‘留存率’
) COMMENT ‘每日用户留存情况’
row format delimited fields terminated by ‘\t’
location ‘/user/hive/warehouse/game/ads/ads_user_retention_day_rate/’;

2)导入数据
hive (game)>
insert into table game.ads_user_retention_day_rate
select
‘2021-03-16’,
ur.create_date,
ur.retention_day,
ur.retention_count,
nc.new_mid_count,
ur.retention_count/nc.new_mid_count*100
from game.ads_user_retention_day_count ur join game.ads_new_mid_count nc
on nc.create_date=ur.create_date;

3)查询导入数据
hive (game)> select * from game.ads_user_retention_day_rate;

16、 新数据准备
为了分析沉默用户、本周回流用户数、流失用户、最近连续3周活跃用户、最近七天内连续三天活跃用户数,需要准备2021-03-17、2021-03-25日的数据。
1)2021-03-17数据准备
(1)修改日志时间
[hadoop@node01 bin]$ ./dt.sh 2021-03-17
(2)生成日志数据
[hadoop@node01 bin]$ sh data_generate.sh start
(3)启动flume收集数据
[hadoop@node01 bin]$ sh flume_stop_start.sh start
(4)将HDFS数据导入到ODS层
[hadoop@node03 bin]$ ./ods_game_start_log.sh 2021-03-17
(5)将ODS数据导入到DWD层
[hadoop@node03 bin]$ sh dwd_game_start_log.sh 2021-03-17
[hadoop@node03 bin]$ sh dwd_game_base_event_log.sh 2021-03-17
[hadoop@node03 bin]$ sh dwd_game_event_log.sh 2021-03-17

(6)将DWD数据导入到DWS层
[hadoop@node03 bin]$ sh dws_uv_log.sh 2021-03-17
(7)验证
hive(game)> select * from game.dws_uv_detail_day where dt=‘2021-03-17’ limit 2;

2)2021-03-25数据准备
(1)修改日志时间
[hadoop@node01 bin]$ ./dt.sh 2021-03-25
(2)生成日志数据
[hadoop@node01 bin]$ sh data_generate.sh start
(3)启动flume收集数据
[hadoop@node01 bin]$ sh flume_stop_start.sh start

(4)将HDFS数据导入到ODS层
[hadoop@node03 bin]$ sh ods_game_start_log.sh 2021-03-25
(5)将ODS数据导入到DWD层
[hadoop@node03 bin]$ sh dwd_game_start_log.sh 2021-03-25
[hadoop@node03 bin]$ sh dwd_game_base_event_log.sh 2021-03-25
[hadoop@node03 bin]$ sh dwd_game_event_log.sh 2021-03-25

(6)将DWD数据导入到DWS层
[hadoop@node03 bin]$ sh dws_uv_log.sh 2021-03-25
(7)验证
hive(game)> select * from game.dws_uv_detail_day where dt=‘2021-03-25’ limit 2;

17、 需求四:沉默用户数
沉默用户:指的是只在安装当天启动过,且启动时间是在一周前
1 DWS层
使用日活明细表dws_uv_detail_day作为DWS层数据
2 ADS层

1)建表语句
hive (game)>
drop table if exists game.ads_silent_count;
create external table game.ads_silent_count(
dt string COMMENT ‘统计日期’,
silent_count bigint COMMENT ‘沉默设备数’
)
row format delimited fields terminated by ‘\t’
location ‘/warehouse/game/ads/ads_silent_count’;
2)导入2021-03-25数据
hive (game)>
insert into table game.ads_silent_count
select
‘2021-03-25’ dt,
count() silent_count
from
(
select mac_id
from game.dws_uv_detail_day
where dt<=‘2021-03-25’
group by mac_id
having count(
)=1 and min(dt)<date_add(‘2021-03-25’,-7)
) t1;

3)查询导入数据
hive (game)> select * from game.ads_silent_count;
3 编写脚本
1)在node03的/home/hadoop/bin目录下创建脚本
[hadoop@node03 bin]$ vim ads_silent_log.sh
在脚本中编写如下内容
#!/bin/bash

hive=/kkb/install/apache-hive-3.1.2/bin/hive
APP=game

if [ -n “$1” ];then
do_date=$1
else
do_date=date -d "-1 day" +%F
fi

echo “-----------导入日期$do_date-----------”

sql="
insert into table “ A P P " . a d s s i l e n t c o u n t s e l e c t ′ APP".ads_silent_count select ' APP".adssilentcountselectdo_date’ dt,
count() silent_count
from
(
select
mac_id
from " A P P " . d w s u v d e t a i l d a y w h e r e d t < = ′ APP".dws_uv_detail_day where dt<=' APP".dwsuvdetaildaywheredt<=do_date’
group by mac_id
having count(
)=1 and min(dt)<=date_add(’$do_date’,-7)
)t1;”

h i v e − e " hive -e " hivee"sql"

2)增加脚本执行权限
[hadoop@node03 bin]$ chmod 777 ads_silent_log.sh

3)脚本使用
[hadoop@node03 bin]$ ./ads_silent_log.sh 2021-03-25
4)查询结果
hive (game)> select * from game.ads_silent_count;

5)脚本执行时间
企业开发中一般在每日凌晨30分~1点
18、 需求五:本周回流用户数
本周回流=本周活跃-本周新增-上周活跃
1 DWS层
使用日活明细表dws_uv_detail_day作为DWS层数据
2 ADS层

1)建表语句
hive (game)>
drop table if exists game.ads_back_count;
create external table ads_back_count(
dt string COMMENT ‘统计日期’,
wk_dt string COMMENT ‘统计日期所在周’,
wastage_count bigint COMMENT ‘回流设备数’
)
row format delimited fields terminated by ‘\t’
location ‘/user/hive/warehouse/game/ads/ads_back_count’;

2)导入数据:
hive (game)>
insert into table game.ads_back_count
select
‘2021-03-25’ dt,
concat(date_add(next_day(‘2021-03-25’,‘MO’),-7),’’,date_add(next_day(‘2021-03-25’,‘MO’),-1)) wk_dt,
count(*)
from
(
select t1.mac_id
from
(
select mac_id
from game.dws_uv_detail_wk
where wk_dt=concat(date_add(next_day(‘2021-03-25’,‘MO’),-7),’
’,date_add(next_day(‘2021-03-25’,‘MO’),-1))
)t1
left join
(
select mac_id
from game.dws_new_mid_day
where create_date<=date_add(next_day(‘2021-03-25’,‘MO’),-1) and create_date>=date_add(next_day(‘2021-03-25’,‘MO’),-7)
)t2
on t1.mac_id=t2.mac_id
left join
(
select mac_id
from game.dws_uv_detail_wk
where wk_dt=concat(date_add(next_day(‘2021-03-25’,‘MO’),-7*2),’_’,date_add(next_day(‘2021-03-25’,‘MO’),-7-1))
)t3
on t1.mac_id=t3.mac_id
where t2.mac_id is null and t3.mac_id is null
)t4;

3)查询结果
hive (game)> select * from game.ads_back_count;
3 编写脚本
1)在node03 的/home/hadoop/bin目录下创建脚本
[hadoop@node03 bin]$ vim ads_back_log.sh

在脚本中编写如下内容

#!/bin/bash

if [ -n “$1” ];then
do_date=$1
else
do_date=date -d "-1 day" +%F
fi

hive=/kkb/install/apache-hive-3.1.2/bin/hive
APP=game

echo “-----------导入日期$do_date-----------”

sql="
insert into table " A P P " . a d s b a c k c o u n t s e l e c t ′ APP".ads_back_count select ' APP".adsbackcountselectdo_date’ dt,
concat(date_add(next_day(‘KaTeX parse error: Expected group after '_' at position 21: …te','MO'),-7),'_̲',date_add(next…do_date’,‘MO’),-1)) wk_dt,
count()
from
(
select t1.mac_id
from
(
select mac_id
from “ A P P " . d w s u v d e t a i l w k w h e r e w k d t = c o n c a t ( d a t e a d d ( n e x t d a y ( ′ APP".dws_uv_detail_wk where wk_dt=concat(date_add(next_day(' APP".dwsuvdetailwkwherewkdt=concat(dateadd(nextday(do_date’,‘MO’),-7),’_’,date_add(next_day(' d o d a t e ′ , ′ M O ′ ) , − 1 ) ) ) t 1 l e f t j o i n ( s e l e c t m a c i d f r o m " do_date','MO'),-1)) )t1 left join ( select mac_id from " dodate,MO),1)))t1leftjoin(selectmacidfrom"APP”.dws_new_mid_day
where create_date<=date_add(next_day(‘ d o d a t e ′ , ′ M O ′ ) , − 1 ) a n d c r e a t e d a t e > = d a t e a d d ( n e x t d a y ( ′ do_date','MO'),-1) and create_date>=date_add(next_day(' dodate,MO),1)andcreatedate>=dateadd(nextday(do_date’,‘MO’),-7)
)t2
on t1.mac_id=t2.mac_id
left join
(
select mac_id
from " A P P " . d w s u v d e t a i l w k w h e r e w k d t = c o n c a t ( d a t e a d d ( n e x t d a y ( ′ APP".dws_uv_detail_wk where wk_dt=concat(date_add(next_day(' APP".dwsuvdetailwkwherewkdt=concat(dateadd(nextday(do_date’,‘MO’),-7
2),’_’,date_add(next_day(’$do_date’,‘MO’),-7-1))
)t3
on t1.mac_id=t3.mac_id
where t2.mac_id is null and t3.mac_id is null
)t4;
"

h i v e − e " hive -e " hivee"sql"
2)增加脚本执行权限
[hadoop@node03 bin]$ chmod 777 ads_back_log.sh
3)脚本使用
[hadoop@node03 bin]$ ./ads_back_log.sh 2021-03-25
4)查询结果
hive (game)> select * from game.ads_back_count;

5)脚本执行时间
企业开发中一般在每周一凌晨30分~1点
19、 需求六:流失用户数
流失用户:最近7天未登录我们称之为流失用户
1 DWS层
使用日活明细表dws_uv_detail_day作为DWS层数据
2 ADS层

1)建表语句
hive (game)>
drop table if exists game.ads_wastage_count;
create external table ads_wastage_count(
dt string COMMENT ‘统计日期’,
wastage_count bigint COMMENT ‘流失设备数’
)
row format delimited fields terminated by ‘\t’
location ‘/warehouse/game/ads/ads_wastage_count’;

2)导入2021-03-25数据
hive (game)>
insert into table game.ads_wastage_count
select
‘2021-03-25’,
count(*)
from
(
select mac_id
from dws_uv_detail_day
group by mac_id
having max(dt)<=date_add(‘2021-03-25’,-7)
)t1;

3 编写脚本
1)在node03 的/home/hadoop/bin目录下创建脚本
[hadoop@node03 bin]$ vim ads_wastage_log.sh

在脚本中编写如下内容
#!/bin/bash

if [ -n “$1” ];then
do_date=$1
else
do_date=date -d "-1 day" +%F
fi

hive=/kkb/install/apache-hive-3.1.2/bin/hive
APP=game

echo “-----------导入日期$do_date-----------”

sql="
insert into table " A P P " . a d s w a s t a g e c o u n t s e l e c t ′ APP".ads_wastage_count select ' APP".adswastagecountselectdo_date’,
count(*)
from
(
select mac_id
from " A P P " . d w s u v d e t a i l d a y g r o u p b y m a c i d h a v i n g m a x ( d t ) < = d a t e a d d ( ′ APP".dws_uv_detail_day group by mac_id having max(dt)<=date_add(' APP".dwsuvdetaildaygroupbymacidhavingmax(dt)<=dateadd(do_date’,-7)
)t1;
"

h i v e − e " hive -e " hivee"sql"

2)增加脚本执行权限
[hadoop@node03 bin]$ chmod 777 ads_wastage_log.sh
3)脚本使用
[hadoop@node03 bin]$ ./ads_wastage_log.sh 2021-03-25

4)查询结果
hive (game)> select * from game.ads_wastage_count;
5)脚本执行时间
企业开发中一般在每日凌晨30分~1点
20、 需求七:最近连续三周活跃用户数
最近3周连续活跃的用户:通常是周一对前3周的数据做统计,该数据一周计算一次。
1 DWS层
使用周活明细表dws_uv_detail_wk作为DWS层数据
2 ADS层

1)建表语句
hive (game)>
drop table if exists game.ads_continuity_wk_count;
create external table game.ads_continuity_wk_count(
dt string COMMENT ‘统计日期,一般用结束周周日日期,如果每天计算一次,可用当天日期’,
wk_dt string COMMENT ‘持续时间’,
continuity_count bigint
)
row format delimited fields terminated by ‘\t’
location ‘/user/hive/warehouse/game/ads/ads_continuity_wk_count’;

2)导入2021-03-25所在周的数据
hive (game)>
insert into table game.ads_continuity_wk_count
select
‘2021-03-25’,
concat(date_add(next_day(‘2021-03-25’,‘MO’),-73),’_’,date_add(next_day(‘2021-03-25’,‘MO’),-1)),
count(
)
from
(
select mac_id
from game.dws_uv_detail_wk
where wk_dt>=concat(date_add(next_day(‘2021-03-25’,‘MO’),-73),’_’,date_add(next_day(‘2021-03-25’,‘MO’),-72-1))
and wk_dt<=concat(date_add(next_day(‘2021-03-25’,‘MO’),-7),’_’,date_add(next_day(‘2021-03-25’,‘MO’),-1))
group by mac_id
having count(*)=3
)t1;

3)查询
hive (game)> select * from game.ads_continuity_wk_count;

3 编写脚本
1)在node03 的/home/hadoop/bin目录下创建脚本
[hadoop@node03 bin]$ vim ads_continuity_wk_log.sh
在脚本中编写如下内容
#!/bin/bash

if [ -n “$1” ];then
do_date=$1
else
do_date=date -d "-1 day" +%F
fi

hive=/kkb/install/apache-hive-3.1.2/bin/hive
APP=game

echo “-----------导入日期$do_date-----------”

sql="
insert into table “ A P P " . a d s c o n t i n u i t y w k c o u n t s e l e c t ′ APP".ads_continuity_wk_count select ' APP".adscontinuitywkcountselectdo_date’,
concat(date_add(next_day(‘KaTeX parse error: Expected group after '_' at position 23: …','MO'),-7*3),'_̲',date_add(next…do_date’,‘MO’),-1)),
count()
from
(
select mac_id
from " A P P " . d w s u v d e t a i l w k w h e r e w k d t > = c o n c a t ( d a t e a d d ( n e x t d a y ( ′ APP".dws_uv_detail_wk where wk_dt>=concat(date_add(next_day(' APP".dwsuvdetailwkwherewkdt>=concat(dateadd(nextday(do_date’,‘MO’),-7
3),’’,date_add(next_day(‘ d o d a t e ′ , ′ M O ′ ) , − 7 ∗ 2 − 1 ) ) a n d w k d t < = c o n c a t ( d a t e a d d ( n e x t d a y ( ′ do_date','MO'),-7*2-1)) and wk_dt<=concat(date_add(next_day(' dodate,MO),721))andwkdt<=concat(dateadd(nextday(do_date’,‘MO’),-7),’’,date_add(next_day(’$do_date’,‘MO’),-1))
group by mac_id
having count(*)=3
)t1;”

h i v e − e " hive -e " hivee"sql"

2)增加脚本执行权限
[hadoop@node03 bin]$ chmod 777 ads_continuity_wk_log.sh
3)脚本使用
[hadoop@node03 bin]$ ./ads_continuity_wk_log.sh 2021-03-25
4)查询结果
hive (game)> select * from game.ads_continuity_wk_count;

5)脚本执行时间
企业开发中一般在每周一凌晨30分~1点
21、 需求八:最近七天内连续三天活跃用户数
说明:最近7天内连续3天活跃用户数
1 DWS层
使用日活明细表dws_uv_detail_day作为DWS层数据
2 ADS层

1)建表语句
hive (game)>
drop table if exists game.ads_continuity_uv_count;
create external table game.ads_continuity_uv_count(
dt string COMMENT ‘统计日期’,
wk_dt string COMMENT ‘最近7天日期’,
continuity_count bigint
) COMMENT ‘连续活跃设备数’
row format delimited fields terminated by ‘\t’
location ‘/warehouse/game/ads/ads_continuity_uv_count’;

2)写出导入数据的SQL语句
hive (game)>
insert into table game.ads_continuity_uv_count
select
‘2021-03-17’,
concat(date_add(‘2021-03-17’,-6),’_’,‘2021-03-17’),
count()
from
(
select mac_id
from
(
select mac_id
from
(
select
mac_id,
date_sub(dt,rank) date_dif
from
(
select
mac_id,
dt,
rank() over(partition by mac_id order by dt) rank
from game.dws_uv_detail_day
where dt>=date_add(‘2021-03-17’,-6) and dt<=‘2021-03-17’
)t1
)t2
group by mac_id,date_dif
having count(
)>=3
)t3
group by mac_id
)t4;

3)查询
hive (game)> select * from game.ads_continuity_uv_count;
3 编写脚本
1)在node03的/home/hadoop/bin目录下创建脚本
[hadoop@node03 bin]$ vim ads_continuity_log.sh
在脚本中编写如下内容
#!/bin/bash

if [ -n “$1” ];then
do_date=$1
else
do_date=date -d "-1 day" +%F
fi

hive=/kkb/install/apache-hive-3.1.2/bin/hive
APP=game

echo “-----------导入日期$do_date-----------”

sql="
insert into table " A P P " . a d s c o n t i n u i t y u v c o u n t s e l e c t ′ APP".ads_continuity_uv_count select ' APP".adscontinuityuvcountselectdo_date’,
concat(date_add(‘KaTeX parse error: Expected group after '_' at position 15: do_date',-6),'_̲','do_date’) dt,
count()
from
(
select mac_id
from
(
select mac_id
from
(
select
mac_id,
date_sub(dt,rank) date_diff
from
(
select
mac_id,
dt,
rank() over(partition by mac_id order by dt) rank
from " A P P " . d w s u v d e t a i l d a y w h e r e d t > = d a t e a d d ( ′ APP".dws_uv_detail_day where dt>=date_add(' APP".dwsuvdetaildaywheredt>=dateadd(do_date’,-6) and dt<=’$do_date’
)t1
)t2
group by mac_id,date_diff
having count(
)>=3
)t3
group by mac_id
)t4;
"

h i v e − e " hive -e " hivee"sql"
2)增加脚本执行权限
[hadoop@node03 bin]$ chmod 777 ads_continuity_log.sh

3)脚本使用
[hadoop@node03 bin]$ ./ads_continuity_log.sh 2021-03-17
4)查询结果
hive (game)> select * from game.ads_continuity_uv_count;
5)脚本执行时间
企业开发中一般在每日凌晨30分~1点
22 总结
1 用户行为数仓业务总结
1.1 数仓分几层?每层做什么的?
1)ODS层(原始数据层)
存储原始数据,直接加载原始日志、数据,数据保持原貌不做处理。
2)DWD层(明细层)
对ODS层数据进行清洗(去除空值、脏数据,超过极限范围的数据)
3)DWS层(服务数据层)
以DWD层为基础,进行轻度汇总。比如:用户当日、设备当日、商品当日。
4)ADS层(数据应用层)
1.2 在项目中是否自定义过UDF、UDTF函数,以及用他们处理了什么问题?
自定义过。
用UDF函数解析公共字段;用UDTF函数解析事件字段。
1.3 如何分析用户活跃?
在启动日志中统计不同设备id 出现次数。
1.4 如何分析用户新增?
用活跃用户表 left join 用户新增表,用户新增表中mid为空的即为用户新增。
1.5 如何分析用户1天留存?
留存用户=前一天新增 join 今天活跃
用户留存率=留存用户/前一天新增
1.6 如何分析沉默用户?
按照设备id对日活表分组,登录次数为1,且是在一周前登录。
1.7 如何分析本周回流用户?
本周活跃left join本周新增 left join上周活跃,且本周新增id和上周活跃id都为null
1.8 如何分析流失用户?
按照设备id对日活表分组,且七天内没有登录过。
1.9 如何分析最近连续3周活跃用户数?
按照设备id对周活进行分组,统计次数等于3次。
1.10 如何分析最近七天内连续三天活跃用户数?
1)查询出最近7天的活跃用户,并对用户活跃日期进行排名
2)计算用户活跃日期及排名之间的差值
3)对同用户及差值分组,统计差值个数
4)将差值相同个数大于等于3的数据取出,然后去重,即为连续3天及以上活跃的用户

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
假设你的课表数据是一个数组,每个元素包含了这门课程的信息,比如: ``` courses: [ { week: '1-10', time: '1-5', day: '星期一', name: '软件测试基础', location: '本部-实222' }, { week: '9-16', time: '6-9', day: '星期一', name: '大数据系统综合运用', location: '本部-实333' }, // ... ] ``` 然后你可以使用 `v-for` 指令循环遍历这个数组,渲染出课表: ``` <table> <thead> <tr> <th>时间</th> <th>星期一</th> <th>星期二</th> <th>星期三</th> <!-- ... --> </tr> </thead> <tbody> <tr v-for="i in 13"> <td>{{ i }}</td> <td v-for="j in 7"> <template v-for="(course, index) in courses"> <template v-if="course.day === weekdays[j] && isCourseInWeek(course.week, i) && isCourseInTime(course.time, j)"> <td :rowspan="getCourseDuration(course.time)" :key="index"> {{ course.week }}周({{ course.time }}节)-{{ course.day }}-{{ course.name }}-{{ course.location }} </td> </template> </template> </td> </tr> </tbody> </table> ``` 在模板中,我们使用两个嵌套的 `v-for` 循环遍历每个时间和每个星期,然后使用 `v-if` 指令过滤出符合条件的课程,并使用 `:rowspan` 属性控制单元格的跨度,渲染出课表。同时,我们还使用了一些辅助函数,比如 `isCourseInWeek` 和 `isCourseInTime` 来判断这门课程是否在当前周和当前时间段内,`getCourseDuration` 来计算这门课程的时间跨度。 完整的代码可能会比较长,这里只是给你一个大致的思路,希望对你有所帮助。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

低调的小哥哥

你的关注就是我为你服务的动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值