一、实现功能
1.importtsv将tsv导入hbase
2.importtsv将csv导入hbase
3.importtsv通过completebulkload导入hfile的文件数据
二、实例准备
1.需求
stu_info有20列数据,将info下name这列数据读出来,然后写入另一张表tb02。
2.hbase新建两张表
create 'stu_info','info','degree','work'
create 'tb02','info'
3.插入数据
put 'stu_info','10001','degree:xueli','benke'
put 'stu_info','10001','info:age','18'
put 'stu_info','10001','info:sex','male'
put 'stu_info','10001','info:name','tom'
put 'stu_info','10001','work:job','bigdata'
put 'stu_info','10002','degree:xueli','gaozhong'
put 'stu_info','10002','info:age','22'
put 'stu_info','10002','info:sex','female'
put 'stu_info','10002','info:name','jack'
put 'stu_info','10003','info:age','22'
put 'stu_info','10003','info:name','leo'
put 'stu_info','10004','info:age','18'
put 'stu_info','10004','info:name','peter'
put 'stu_info','10005','info:age','19'
put 'stu_info','10005','info:name','jim'
put 'stu_info','10006','info:age','20'
put 'stu_info','10006','info:name','zhangsan'
三、不同功能实现
1.importtsv 使用
(1)需求:将tsv的格式文件导入到表stu_info,创建tsv
vi hbase-test.tsv
(2)输入hbase-test.tsv数据
10001 ngsan 12 male
10002 lisi 13 female
10003 wangwu 14 male
10004 zhaoliu 15 female
10005 xieqi 16 female
(3)上传到hdfs上
bin/hdfs dfs -put /opt/datas/hbase-test.tsv /hadoop
(4)运行:将hdfs上 /hbase-test.tsv 导入stu_info表
bin/yarn jar /opt/modules/hbase-1.2.0-cdh5.7.0/lib/hbase-server-1.2.0-cdh5.7.0.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:age,info:sex stu_info /hadoop/hbase-test.tsv
(5)查看结果
ROW COLUMN+CELL
10001 column=degree:xueli, timestamp=1543583981287, value=benke
10001 column=info:age, timestamp=1543583981314, value=18
10001 column=info:name, timestamp=1543583981366, value=tom
10001 column=info:sex, timestamp=1543583981340, value=male
10001 column=work:job, timestamp=1543583981381, value=bigdata
10002 column=degree:xueli, timestamp=1543583981396, value=gaozhong
10002 column=info:age, timestamp=1543583981410, value=22
10002 column=info:name, timestamp=1543583981438, value=jack
10002 column=info:sex, timestamp=1543583981425, value=female
10003 column=info:age, timestamp=1543583981457, value=22
10003 column=info:name, timestamp=1543583981484, value=leo
10004 column=info:age, timestamp=1543583981497, value=18
10004 column=info:name, timestamp=1543583981509, value=peter
10005 column=info:age, timestamp=1543583981533, value=19
10005 column=info:name, timestamp=1543583981547, value=jim
10006 column=info:age, timestamp=1543583981559, value=20
10006 column=info:name, timestamp=1543583982459, value=zhangsan
2.将csv的格式文件导入到表stu_info
(1)创建csv文件hbase-test2.csv
10011,ngsan,12,male
10012,lisi,13,female
10013,wangwu,14,male
10014,zhaoliu,15,female
10015,xieqi,16,female
(2)上传hdfs
bin/hdfs dfs -put /opt/datas/hbase-test2.csv /hadoop
(3)运行
$HADOOP_HOME/bin/yarn jar /opt/modules/hbase-1.2.0-cdh5.7.0/lib/hbase-server-1.2.0-cdh5.7.0.jar importtsv -Dimporttsv.separator=, -Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:age,info:sex stu_info /hadoop/hbase-test2.csv
(4)查看结果
ROW COLUMN+CELL
10001 column=degree:xueli, timestamp=1543583981287, value=benke
10001 column=info:age, timestamp=1543583981314, value=18
10001 column=info:name, timestamp=1543583981366, value=tom
10001 column=info:sex, timestamp=1543583981340, value=male
10001 column=work:job, timestamp=1543583981381, value=bigdata
10002 column=degree:xueli, timestamp=1543583981396, value=gaozhong
10002 column=info:age, timestamp=1543583981410, value=22
10002 column=info:name, timestamp=1543583981438, value=jack
10002 column=info:sex, timestamp=1543583981425, value=female
10003 column=info:age, timestamp=1543583981457, value=22
10003 column=info:name, timestamp=1543583981484, value=leo
10004 column=info:age, timestamp=1543583981497, value=18
10004 column=info:name, timestamp=1543583981509, value=peter
10005 column=info:age, timestamp=1543583981533, value=19
10005 column=info:name, timestamp=1543583981547, value=jim
10006 column=info:age, timestamp=1543583981559, value=20
10006 column=info:name, timestamp=1543583982459, value=zhangsan
10011 column=info:age, timestamp=1543585629390, value=12
10011 column=info:name, timestamp=1543585629390, value=ngsan
10011 column=info:sex, timestamp=1543585629390, value=male
10012 column=info:age, timestamp=1543585629390, value=13
10012 column=info:name, timestamp=1543585629390, value=lisi
10012 column=info:sex, timestamp=1543585629390, value=female
10013 column=info:age, timestamp=1543585629390, value=14
10013 column=info:name, timestamp=1543585629390, value=wangwu
10013 column=info:sex, timestamp=1543585629390, value=male
10014 column=info:age, timestamp=1543585629390, value=15
10014 column=info:name, timestamp=1543585629390, value=zhaoliu
10014 column=info:sex, timestamp=1543585629390, value=female
10015 column=info:age, timestamp=1543585629390, value=16
10015 column=info:name, timestamp=1543585629390, value=xieqi
10015 column=info:sex, timestamp=1543585629390, value=female
3.completebulkload 导入hfile的文件数据
(1)实现步骤
第一步:导入hfile的文件数据,可以将执行的mapreduce任务和导入操作进行解耦
第二步:可以利用集群资源的低谷区将数据文件转化成对应的表的hifle文件,需要映射就可以完成快速导入
(2)首先将数据文件转换为storefile文件(hfile格式),存储到hdfs的路径 -Dimporttsv.bulk.output
(3)创建数据hbase-test3.csv
10011,ngsan,12,male
10012,lisi,13,female
10013,wangwu,14,male
10014,zhaoliu,15,female
10015,xieqi,16,female
(4)上传hdfs
/opt/modules/apache/hadoop-2.7.3/bin/hdfs dfs -put /opt/datas/hbase-test3.csv /hbase-test3.csv
(5)运行:数据源上传到/testHfile下面
export HBASE_HOME=/opt/modules/hbase-0.98.6-hadoop2
export HADOOP_HOME=/opt/modules/apache/hadoop-2.7.3
HADOOP_CLASSPATH=/opt/modules/hbase-0.98.6-hadoop2/lib/*
$HADOOP_HOME/bin/yarn jar /opt/modules/hbase-0.98.6-hadoop2/lib/hbase-server-0.98.6-hadoop2.jar importtsv -Dimporttsv.separator=, -Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:age,info:sex -Dimporttsv.bulk.output=/testHfile stu_info /hbase-test3.csv
解释:
-》-Dimporttsv.bulk.output=/testHfile hfile的hdfs目录
-》stu_info 对应那个表的hfile
-》/hbase-test3.csv 数据源
(6)再将hfile文件导入到hbase
此时,/testHfile下面的文件,会被转移至hbase对应列簇下。相当于移动操作
bin/yarn jar /opt/modules/hbase-0.98.6-hadoop2/lib/hbase-server-0.98.6-hadoop2.jar completebulkload /testHfile stu_info
解释:
-》指定hdfs数据源 /testHfile
-》制定表 stu_info