导入 HDFS 数据至 HBase


数据格式

hadoop fs -ls /warehouse/orc_elapsed_log

/warehouse/orc_elapsed_log/dt=20160101

执行脚本,Java的Hive脚本。

cd wangchenlong/workspace/user-profile/processor/profile

hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main 20160101 20160131

hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main 20160201 20160229

hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main 20160301 20160331

hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main 20160401 20160430

hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main 20160501 20160731

hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main 20160801 20161031

hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main 20161101 20161231

hadoop fs -ls /tmp/wangchenlong/log_event

hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main -h 20160601 20160731

hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main -h 20160912 20161031

hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main -h 20161214 20161231

Hive的Maven Jar包,与Orc包造成冲突,版本不同,导致类不同,一些方法找不到,

java.lang.Exception: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.getDataColumnCount()

at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)

Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.getDataColumnCount()

at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1044)

原因是hive-exec和orc-mapreduce的hive-storage-api版本不同,导致VectorizedRowBatch类异常。

测试:

hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_process.HiveMRDemo /tmp/wangchenlong/orc

解决方案,添加hive-storage-api,强制指定使用新的类。

<dependency>

<groupId>org.apache.hive</groupId>

<artifactId>hive-storage-api</artifactId>

<version>2.4.0</version>

</dependency>

导入HBase

HDFS导入HBase,查看表

hbash shell

list

desc 'cy_event'

scan 'cy_event', {LIMIT=>5} # 显示5个rowkey

表数据

user_time|1488384000000|29768601 column=info:assess_num, timestamp=1505384441438, value=3

user_time|1488384000000|29768601 column=info:duration, timestamp=1505384441438, value=42654

user_time|1488384000000|29768601 column=info:event_name, timestamp=1505384441438, value=user_time

user_time|1488384000000|29768601 column=info:event_time, timestamp=1505384441438, value=20170302_000000

user_time|1488384000000|29768601 column=info:login_zone, timestamp=1505384441438, value=0

user_time|1488384000000|29768601 column=info:uid, timestamp=1505384441438, value=29768601

执行数据,从HDFS导入HBase

cd wangchenlong/workspace/user-profile/processor/profile

hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main -p 20170101 20170331

hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main -p 20170401 20170731

hadoop jar ./target/profile-1.1.1-jar-with-dependencies.jar me.chunyu.log_analysis.Main -p 20170731 20170909

Processor业务类

public class UserTimeHBaseProcessor extends BaseSavedProcessor<LogEntity> {

public static final String DEF_NAME = UserTimeHBaseProcessor.class.getSimpleName();

protected void onProcess(LogEntity entity) {

super.onProcess(entity);

String line = entity.original_line;

String[] items = line.split("\\|");

if (items.length != 6) {

return;

}

Map<String, String> map = new HashMap<>();

String uid = items[0];

String event_name = items[1];

String time = items[2];

Date date = LaDateUtils.parseWriteDate(time);

if (date == null) {

return;

}

String login_zone = items[3];

String duration = items[4];

String assess_num = items[5];

String rowKey = event_name + "|" + date.getTime() + "|" + uid;

map.put("uid", uid);

map.put("event_name", event_name);

map.put("event_time", time);

map.put("login_zone", login_zone);

map.put("duration", duration);

map.put("assess_num", assess_num);

saveHBase(rowKey, map);

}

}

注册Processor

public class ProcessorRegister extends BaseMainManager {

private static class Holder {

private static ProcessorRegister sInstance = new ProcessorRegister();

}

public static ProcessorRegister getInstance() {

return Holder.sInstance;

}

private ProcessorRegister() {

super();

//++++++++++++++++++++ 处理器添加位置 ++++++++++++++++++++/

// registerProcessor(UserTimeProcessor.DEF_NAME, new UserTimeProcessor());

registerProcessor(UserTimeHBaseProcessor.DEF_NAME, new UserTimeHBaseProcessor());

//++++++++++++++++++++ 处理器添加位置 ++++++++++++++++++++/

}

}

执行

case "-p":

main = new ProcessMain(args[1], args[2], LaValues.PathFormat.USER_TIME_PATH_FORMAT); // 进程模式

break;

使用Log_Analysis分析框架


  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值