ImportTsv

hbase

用法一

$ hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,f:q1,c <options> <tablename> <hdfs-inputdir>

<tablename> <hdfs-inputdir>参数需要放置在命令尾端

用法二

$ HADOOP_CLASSPATH=`hbase classpath` hadoop jar ${HBASE_HOME}/lib/hbase-server-<version>.jar importtsv <options> <tablename> <hdfs-inputdir>

<tablename> <hdfs-inputdir>参数需要放置在命令尾端

例子

文件

$ hadoop fs -cat /hbasetest/test.txt
1|value1|valueq1
2|value2|valueq2

hbase表

$ hbase shell
create 'htable','f'

用法一

$ hbase org.apache.hadoop.hbase.mapreduce.ImportTsv '-Dimporttsv.columns=HBASE_ROW_KEY,f:q1,f:q2' '-Dimporttsv.separator=|' -Dimporttsv.skip.bad.lines=false  htable /hbasetest/test.txt

用法二

$ HADOOP_CLASSPATH=`hbase classpath` hadoop jar /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hbase/hbase-server-1.2.0-cdh5.10.0.jar importtsv '-Dimporttsv.columns=HBASE_ROW_KEY,f:q1,f:q2' '-Dimporttsv.separator=|' -Dimporttsv.skip.bad.lines=false  htable /hbasetest/test.txt

帮助信息

$ hbase org.apache.hadoop.hbase.mapreduce.ImportTsv

Usage: importtsv -Dimporttsv.columns=a,b,c <tablename> <inputdir>

Imports the given input directory of TSV data into the specified table.

The column names of the TSV data must be specified using the -Dimporttsv.columns
option. This option takes the form of comma-separated column names, where each
column name is either a simple column family, or a columnfamily:qualifier. The special
column name HBASE_ROW_KEY is used to designate that this column should be used
as the row key for each imported record. You must specify exactly one column
to be the row key, and you must specify a column name for every column that exists in the
input data. Another special columnHBASE_TS_KEY designates that this column should be
used as timestamp for each record. Unlike HBASE_ROW_KEY, HBASE_TS_KEY is optional.
You must specify at most one column as timestamp key for each imported record.
Record with invalid timestamps (blank, non-numeric) will be treated as bad record.
Note: if you use this option, then 'importtsv.timestamp' option will be ignored.

Other special columns that can be specified are HBASE_CELL_TTL and HBASE_CELL_VISIBILITY.
HBASE_CELL_TTL designates that this column will be used as a Cell's Time To Live (TTL) attribute.
HBASE_CELL_VISIBILITY designates that this column contains the visibility label expression.

HBASE_ATTRIBUTES_KEY can be used to specify Operation Attributes per record.
 Should be specified as key=>value where -1 is used 
 as the seperator.  Note that more than one OperationAttributes can be specified.
By default importtsv will load data directly into HBase. To instead generate
HFiles of data to prepare for a bulk data load, pass the option:
  -Dimporttsv.bulk.output=/path/for/output
  Note: if you do not use this option, then the target table must already exist in HBase

Other options that may be specified with -D include:
  -Dimporttsv.skip.bad.lines=false - fail if encountering an invalid line
  '-Dimporttsv.separator=|' - eg separate on pipes instead of tabs
  -Dimporttsv.timestamp=currentTimeAsLong - use the specified timestamp for the import
  -Dimporttsv.mapper.class=my.Mapper - A user-defined Mapper to use instead of org.apache.hadoop.hbase.mapreduce.TsvImporterMapper
  -Dmapreduce.job.name=jobName - use the specified mapreduce job name for the import
  -Dcreate.table=no - can be used to avoid creation of table by this tool
  Note: if you set this to 'no', then the target table must already exist in HBase
  -Dno.strict=true - ignore column family check in hbase table. Default is false

For performance consider the following options:
  -Dmapreduce.map.speculative=false
  -Dmapreduce.reduce.speculative=false

转载于:https://my.oschina.net/yulongblog/blog/849018

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值