采用hbase自带的importtsv工具来导入数据,首先要把数据文件上传到hdfs上,然后导入hbase表,该方法只能导入tsv格式的数据,需要先将txt格式转换为tsv格式.
1.下载数据。本文中使用 “美国国家海洋和大气管理局 气候平均值”的公共数据集合。访问下载。 在目录 products | hourly 下的小时温度数据。下载hly-temp-10pctl.txt文件。
2.用python脚本将txt文件转换为tsv格式文件,生成之后上传到虚拟机linux下的/home/hadoop/data下,python脚本下载地址
D:\work\hbase学习>python to_tsv_hly.py -f hly-temp-10pctl.txt -t hly-temp-10pctl.tsv
3.在hdfs上创建数据存放目录
hadoop fs -mkdir /input
4.将数据库copy到hdfs数据存放目录中
[hadoop@ora12c data]$ hadoop fs -copyFromLocal /home/hadoop/data/hly-temp-10pctl.tsv /input
[hadoop@ora12c data]$ hadoop fs -ls /input
Found 1 items
-rw-r--r-- 1 hadoop supergroup 22821236 2015-12-23 20:56 /input/hly-temp-10pctl.tsv
5.在hbase中创建要导入数据的表
create 'hly_temp', {NAME => 't', VERSIONS => 1}
6.执行数据导入
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,t:v01,t:v02,t:v03,t:v04,t:v05,t:v06,t:v07,t:v08,t:v09,t:v10,t:v11,t:v12,t:v13,t:v14,t:v15,t:v16,t:v17,t:v18,t:v19,t:v20,t:v21,t:v22,t:v23,t:v24 hly_temp /input
7.导入后检查数据
hbase(main):002:0> count 'hly_temp'
Current count: 1000, row: FMW000405040927
Current count: 2000, row: GQW000414150624
Current count: 3000, row: RMW000407100321
..................................................................
Current count: 166000, row: USW000949101017
166805 row(s) in 35.6090 seconds
=> 166805
hbase(main):004:0> scan 'hly_temp', {COLUMNS => 't', LIMIT => 2}
ROW COLUMN+CELL
AQW000617050101 column=t:v01, timestamp=1450924204703, value=759P
AQW000617050101 column=t:v02, timestamp=1450924204703, value=766C
AQW000617050101 column=t:v03, timestamp=1450924204703, value=759C
AQW000617050101 column=t:v04, timestamp=1450924204703, value=759C
AQW000617050101 column=t:v05, timestamp=1450924204703, value=759C
AQW000617050101 column=t:v06, timestamp=1450924204703, value=759C
AQW000617050101 column=t:v07, timestamp=1450924204703, value=752C
AQW000617050101 column=t:v08, timestamp=1450924204703, value=775C
AQW000617050101 column=t:v09, timestamp=1450924204703, value=801C
AQW000617050101 column=t:v10, timestamp=1450924204703, value=810C
AQW000617050101 column=t:v11, timestamp=1450924204703, value=810C
AQW000617050101 column=t:v12, timestamp=1450924204703, value=810C
AQW000617050101 column=t:v13, timestamp=1450924204703, value=810C
AQW000617050101 column=t:v14, timestamp=1450924204703, value=808C
AQW000617050101 column=t:v15, timestamp=1450924204703, value=806C
AQW000617050101 column=t:v16, timestamp=1450924204703, value=810C
AQW000617050101 column=t:v17, timestamp=1450924204703, value=808C
AQW000617050101 column=t:v18, timestamp=1450924204703, value=801C
AQW000617050101 column=t:v19, timestamp=1450924204703, value=801C
AQW000617050101 column=t:v20, timestamp=1450924204703, value=790C
AQW000617050101 column=t:v21, timestamp=1450924204703, value=781C
AQW000617050101 column=t:v22, timestamp=1450924204703, value=781C
AQW000617050101 column=t:v23, timestamp=1450924204703, value=770C
AQW000617050101 column=t:v24, timestamp=1450924204703, value=770C
AQW000617050102 column=t:v01, timestamp=1450924204703, value=768P
AQW000617050102 column=t:v02, timestamp=1450924204703, value=766C
AQW000617050102 column=t:v03, timestamp=1450924204703, value=759C
AQW000617050102 column=t:v04, timestamp=1450924204703, value=759C
AQW000617050102 column=t:v05, timestamp=1450924204703, value=759C
AQW000617050102 column=t:v06, timestamp=1450924204703, value=759C
AQW000617050102 column=t:v07, timestamp=1450924204703, value=757C
AQW000617050102 column=t:v08, timestamp=1450924204703, value=775C
AQW000617050102 column=t:v09, timestamp=1450924204703, value=801C
AQW000617050102 column=t:v10, timestamp=1450924204703, value=810C
AQW000617050102 column=t:v11, timestamp=1450924204703, value=810C
AQW000617050102 column=t:v12, timestamp=1450924204703, value=810C
AQW000617050102 column=t:v13, timestamp=1450924204703, value=811C
AQW000617050102 column=t:v14, timestamp=1450924204703, value=810C
AQW000617050102 column=t:v15, timestamp=1450924204703, value=810C
AQW000617050102 column=t:v16, timestamp=1450924204703, value=808C
AQW000617050102 column=t:v17, timestamp=1450924204703, value=808C
AQW000617050102 column=t:v18, timestamp=1450924204703, value=801C
AQW000617050102 column=t:v19, timestamp=1450924204703, value=795C
AQW000617050102 column=t:v20, timestamp=1450924204703, value=790C
AQW000617050102 column=t:v21, timestamp=1450924204703, value=781C
AQW000617050102 column=t:v22, timestamp=1450924204703, value=781C
AQW000617050102 column=t:v23, timestamp=1450924204703, value=774C
AQW000617050102 column=t:v24, timestamp=1450924204703, value=770C
2 row(s) in 0.4910 seconds
至此,导入成功!