1、日志要进入的目标表结构信息
1.1 Hive库上的目标表结构信息
CREATE TABLE `yemao_log`(
`id` int,
`time` int,
`url_from` string,
`url_current` string,
`url_to` string,
`options` string,
`uid` int,
`new_visitor` string,
`province` string,
`city` string,
`site` string,
`device` string,
`phone` string,
`token` string,
`dorm` string,
`order_phone` string,
`order_dormitory` string,
`order_amount` string,
`order_id` int,
`uname` string,
`site_id` int,
`address` string,
`dorm_id` int,
`dormentry_id` int,
`rid` int,
`cart_quantity` string)
PARTITIONED BY (
`log_date` int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://Master:9000/user/hive/warehouse/yemao_log'
TBLPROPERTIES (
'transient_lastDdlTime'='1447308813');
1.2 Mysql库上当前表,其实就是一个临时表
CREATE TABLE `yemao_log` (
`id` varchar(8000) DEFAULT NULL,
`time` varchar(8000) DEFAULT NULL,
`url_from` text,
`url_current` text,
`url_to` text,
`options` text,
`uid` text,
`new_visitor` text,
`province` text,
`city` text,
`site` text,
`device` text,
`phone` text,
`token` text,
`dorm` text,
`order_phone` text,
`order_dormitory` text,
`order_amount` text,
`order_id` text,
`uname` text,
`site_id` text,
`address` text,
`dorm_id` text,
`dormentry_id` text,
`rid` text,
`cart_quantity` text
)

该方案详述了用户行为分析业务系统日志的处理流程,包括目标表结构(Hive与Mysql)、数据存储过程和Shell脚本。日志数据首先被转化为标准Json格式,然后导入MongoDB,接着部分字段会被加载到Hive和Mysql数据库中,通过定时任务进行自动化处理。
最低0.47元/天 解锁文章
1518

被折叠的 条评论
为什么被折叠?



