- 服务器接收数据后,写入log,我们需要写一个 Shell 脚本,来实现 将log文件导入到Hive中的自动化过程。
- 日志准备
将日志文件按照 mqtt_yyyyMM-dd-HH
的样式滚动生成,也就是每小时滚动一次。
mqtt_202101-07-10.log
mqtt_202101-07-11.log
mqtt_202101-07-12.log
... ...
- Hive 建表
在将数据导入Hive之前,必须先按照数据的字段格式,预先创建Hive表。show create table tableName
CREATE TABLE `mqttdata`(
`no` string COMMENT 'from deserializer',
`num` string COMMENT 'from deserializer',
`time` string COMMENT 'from deserializer',
`l/n` int COMMENT 'from deserializer',
`ca001` int COMMENT 'from deserializer',
`ca002` int COMMENT 'from deserializer',
`ca003` int COMMENT 'from deserializer',
`ca004` int COMMENT 'from deserializer',
`ca005` int COMMENT 'from deserializer',
`ca006` int COMMENT 'from deserializer',
`ca007` int COMMENT 'from deserializer',
`ca008` int COMMENT 'from deserializer',
`ca009` int COMMENT 'from deserializer',
`ca010` int COMMENT 'from deserializer',
`ca011` int COMMENT 'from deserializer',
`ca012` int COMMENT 'from deserializer',
`ca013` int COMMENT 'from deserializer',
`ca014` int COMMENT 'from deserializer',
`ca015` int COMMENT 'from deserializer',
`ca016` int COMMENT 'from deserializer',
`ca017` int COMMENT 'from deserializer',
`ca018` int COMMENT 'from deserializer',
`ca019` int COMMENT 'from deserializer',
`ca020` int COMMENT 'from deserializer',
`ca021` int COMMENT 'from deserializer',
`ca022` int COMMENT 'from deserializer',
`ca023` int COMMENT 'from deserializer',
`ca024` int COMMENT 'from deserializer',
`ca025` int COMMENT 'from deserializer',
`ca026` int COMMENT 'from deserializer',
`ca027` int COMMENT 'from deserializer',
`ca028` int COMMENT 'from deserializer',
`ca029` int COMMENT 'from deserializer',
`ca030` int COMMENT 'from deserializer',
`ca031` int COMMENT 'from deserializer',
`ca032` int COMMENT 'from deserializer',
`ca033` int COMMENT 'from deserializer')
PARTITIONED BY (
`ym` string,
`d` string,
`h` string)
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://192.168.44.xx:8020/user/hive/warehouse/hive_test.db/mqttdata'
TBLPROPERTIES (
'transient_lastDdlTime'='1609987429')
- shell 脚本
由于log是按照小时来滚动生成的,所以我们延迟一个小时来执行对应load操作。
首先通过date语句来获得systime;然后将年月ym
、日d
、小时h
由systime拆分获得;最后,通过执行hive语句,来实现log导入hive的过程。
systime=`date -d "-1 hour" +%Y%m-%d-%H `
ym=`echo ${systime} | awk -F '-' '{print $1}'`
d=`echo ${systime} | awk -F '-' '{print $2}'`
h=`echo ${systime} | awk -F '-' '{print $3}'`
hive -e "load data local inpath '/root/data/mqtt_${ym}-${d}-${h}.log' into table hive_test.mqttdata partition(ym='${ym}',d='${d}',h='${h}')"