mysql import tsv,采用importtsv导入外部数据到hbase中

采用hbase自带的importtsv工具来导入数据,首先要把数据文件上传到hdfs上,然后导入hbase表,该方法只能导入tsv格式的数据,需要先将txt格式转换为tsv格式.

1.下载数据。本文中使用 “美国国家海洋和大气管理局 气候平均值”的公共数据集合。访问下载。 在目录 products | hourly 下的小时温度数据。下载hly-temp-10pctl.txt文件。

2.用python脚本将txt文件转换为tsv格式文件,生成之后上传到虚拟机linux下的/home/hadoop/data下,python脚本下载地址

D:\work\hbase学习>python to_tsv_hly.py -f hly-temp-10pctl.txt -t hly-temp-10pctl.tsv

3.在hdfs上创建数据存放目录

hadoop fs -mkdir /input

4.将数据库copy到hdfs数据存放目录中

[hadoop@ora12c data]$ hadoop fs -copyFromLocal /home/hadoop/data/hly-temp-10pctl.tsv /input

[hadoop@ora12c data]$ hadoop fs -ls /input

Found 1 items

-rw-r--r--   1 hadoop supergroup   22821236 2015-12-23 20:56 /input/hly-temp-10pctl.tsv

5.在hbase中创建要导入数据的表

create 'hly_temp', {NAME => 't', VERSIONS => 1}

6.执行数据导入

hbase org.apache.hadoop.hbase.mapreduce.ImportTsv  -Dimporttsv.columns=HBASE_ROW_KEY,t:v01,t:v02,t:v03,t:v04,t:v05,t:v06,t:v07,t:v08,t:v09,t:v10,t:v11,t:v12,t:v13,t:v14,t:v15,t:v16,t:v17,t:v18,t:v19,t:v20,t:v21,t:v22,t:v23,t:v24 hly_temp  /input

7.导入后检查数据

hbase(main):002:0> count 'hly_temp'

Current count: 1000, row: FMW000405040927

Current count: 2000, row: GQW000414150624

Current count: 3000, row: RMW000407100321

..................................................................

Current count: 166000, row: USW000949101017

166805 row(s) in 35.6090 seconds

=> 166805

hbase(main):004:0> scan 'hly_temp', {COLUMNS => 't', LIMIT => 2}

ROW                                COLUMN+CELL

AQW000617050101                   column=t:v01, timestamp=1450924204703, value=759P

AQW000617050101                   column=t:v02, timestamp=1450924204703, value=766C

AQW000617050101                   column=t:v03, timestamp=1450924204703, value=759C

AQW000617050101                   column=t:v04, timestamp=1450924204703, value=759C

AQW000617050101                   column=t:v05, timestamp=1450924204703, value=759C

AQW000617050101                   column=t:v06, timestamp=1450924204703, value=759C

AQW000617050101                   column=t:v07, timestamp=1450924204703, value=752C

AQW000617050101                   column=t:v08, timestamp=1450924204703, value=775C

AQW000617050101                   column=t:v09, timestamp=1450924204703, value=801C

AQW000617050101                   column=t:v10, timestamp=1450924204703, value=810C

AQW000617050101                   column=t:v11, timestamp=1450924204703, value=810C

AQW000617050101                   column=t:v12, timestamp=1450924204703, value=810C

AQW000617050101                   column=t:v13, timestamp=1450924204703, value=810C

AQW000617050101                   column=t:v14, timestamp=1450924204703, value=808C

AQW000617050101                   column=t:v15, timestamp=1450924204703, value=806C

AQW000617050101                   column=t:v16, timestamp=1450924204703, value=810C

AQW000617050101                   column=t:v17, timestamp=1450924204703, value=808C

AQW000617050101                   column=t:v18, timestamp=1450924204703, value=801C

AQW000617050101                   column=t:v19, timestamp=1450924204703, value=801C

AQW000617050101                   column=t:v20, timestamp=1450924204703, value=790C

AQW000617050101                   column=t:v21, timestamp=1450924204703, value=781C

AQW000617050101                   column=t:v22, timestamp=1450924204703, value=781C

AQW000617050101                   column=t:v23, timestamp=1450924204703, value=770C

AQW000617050101                   column=t:v24, timestamp=1450924204703, value=770C

AQW000617050102                   column=t:v01, timestamp=1450924204703, value=768P

AQW000617050102                   column=t:v02, timestamp=1450924204703, value=766C

AQW000617050102                   column=t:v03, timestamp=1450924204703, value=759C

AQW000617050102                   column=t:v04, timestamp=1450924204703, value=759C

AQW000617050102                   column=t:v05, timestamp=1450924204703, value=759C

AQW000617050102                   column=t:v06, timestamp=1450924204703, value=759C

AQW000617050102                   column=t:v07, timestamp=1450924204703, value=757C

AQW000617050102                   column=t:v08, timestamp=1450924204703, value=775C

AQW000617050102                   column=t:v09, timestamp=1450924204703, value=801C

AQW000617050102                   column=t:v10, timestamp=1450924204703, value=810C

AQW000617050102                   column=t:v11, timestamp=1450924204703, value=810C

AQW000617050102                   column=t:v12, timestamp=1450924204703, value=810C

AQW000617050102                   column=t:v13, timestamp=1450924204703, value=811C

AQW000617050102                   column=t:v14, timestamp=1450924204703, value=810C

AQW000617050102                   column=t:v15, timestamp=1450924204703, value=810C

AQW000617050102                   column=t:v16, timestamp=1450924204703, value=808C

AQW000617050102                   column=t:v17, timestamp=1450924204703, value=808C

AQW000617050102                   column=t:v18, timestamp=1450924204703, value=801C

AQW000617050102                   column=t:v19, timestamp=1450924204703, value=795C

AQW000617050102                   column=t:v20, timestamp=1450924204703, value=790C

AQW000617050102                   column=t:v21, timestamp=1450924204703, value=781C

AQW000617050102                   column=t:v22, timestamp=1450924204703, value=781C

AQW000617050102                   column=t:v23, timestamp=1450924204703, value=774C

AQW000617050102                   column=t:v24, timestamp=1450924204703, value=770C

2 row(s) in 0.4910 seconds

至此,导入成功!

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值