在将有定界符文本文件导入HBASE库中,需要将后面的定界符去掉,否则将导入失败。
如下所示:
[hadoop@hadoop1 bin]$ cat /tmp/emp.txt1,A,201304,2,B,201305,3,C,201306,4,D,201307,
这个文件后面多了一个逗号。
[hadoop@hadoop1 bin]$ hadoop fs -put /tmp/emp.txt /emp.txt
hbase(main):017:0> describe 't'DESCRIPTION ENABLED{NAME => 't', FAMILIES => [{NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', trueREPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL =>'2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'}]}1 row(s) in 0.1410 seconds
表T只有一个COLUMN FAMILAY CF.
[hadoop@hadoop1 bin]$ hadoop jar /home/hadoop/hbase-0.94.6/hbase-0.94.6.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:c1,cf:c2 -Dimporttsv.separator=, t /emp.txt
............
13/04/10 08:06:24 INFO mapred.JobClient: Running job: job_201304100706_000813/04/10 08:06:25 INFO mapred.JobClient: map 0% reduce 0%13/04/10 08:07:24 INFO mapred.JobClient: map 100% reduce 0%13/04/10 08:07:29 INFO mapred.JobClient: Job complete: job_201304100706_000813/04/10 08:07:29 INFO mapred.JobClient: Counters: 1913/04/10 08:07:29 INFO mapred.JobClient: Job Counters13/04/10 08:07:29 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=3717913/04/10 08:07:29 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=013/04/10 08:07:29 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=013/04/10 08:07:29 INFO mapred.JobClient: Rack-local map tasks=113/04/10 08:07:29 INFO mapred.JobClient: Launched map tasks=113/04/10 08:07:29 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=013/04/10 08:07:29 INFO mapred.JobClient: ImportTsv13/04/10 08:07:29 INFO mapred.JobClient: Bad Lines=413/04/10 08:07:29 INFO mapred.JobClient: File Output Format Counters13/04/10 08:07:29 INFO mapred.JobClient: Bytes Written=013/04/10 08:07:29 INFO mapred.JobClient: FileSystemCounters13/04/10 08:07:29 INFO mapred.JobClient: HDFS_BYTES_READ=14513/04/10 08:07:29 INFO mapred.JobClient: FILE_BYTES_WRITTEN=3353513/04/10 08:07:29 INFO mapred.JobClient: File Input Format Counters13/04/10 08:07:29 INFO mapred.JobClient: Bytes Read=4813/04/10 08:07:29 INFO mapred.JobClient: Map-Reduce Framework13/04/10 08:07:29 INFO mapred.JobClient: Map input records=413/04/10 08:07:29 INFO mapred.JobClient: Physical memory (bytes) snapshot=3783065613/04/10 08:07:29 INFO mapred.JobClient: Spilled Records=013/04/10 08:07:29 INFO mapred.JobClient: CPU time spent (ms)=20013/04/10 08:07:29 INFO mapred.JobClient: Total committed heap usage (bytes)=815513613/04/10 08:07:29 INFO mapred.JobClient: Virtual memory (bytes) snapshot=34551808013/04/10 08:07:29 INFO mapred.JobClient: Map output records=013/04/10 08:07:29 INFO mapred.JobClient: SPLIT_RAW_BYTES=97
可以看到这4行都被标记为Bad Lines而抛弃了。
[hadoop@hadoop1 bin]$ cat /tmp/emp.txt1,A,2013042,B,2013053,C,2013064,D,201307
[hadoop@hadoop1 bin]$ hadoop fs -rmr /emp.txt
Deleted hdfs://192.168.0.88:9000/emp.txt
[hadoop@hadoop1 bin]$ hadoop fs -put /tmp/emp.txt /emp.txt
[hadoop@hadoop1 bin]$ hadoop jar /home/hadoop/hbase-0.94.6/hbase-0.94.6.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:c1,cf:c2 -Dimporttsv.separator=, t /emp.txt
13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.3-1240972, built on 02/06/2012 10:48 GMT13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:host.name=hadoop113/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.013/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:java.home=/java/jdk1.7.0/jre13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/home/hadoop/hadoop-1.0.4/conf:/java/jdk1.7.0/lib/tools.jar:/home/hadoop/hadoop-1.0.4/libexec/..:/home/hadoop/hadoop-1.0.4/libexec/../hadoop-core-1.0.4.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/asm-3.2.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/aspectjrt-1.6.5.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/aspectjtools-1.6.5.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-beanutils-1.7.0.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-beanutils-core-1.8.0.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-cli-1.2.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-codec-1.4.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-collections-3.2.1.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-configuration-1.6.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-daemon-1.0.1.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-digester-1.8.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-el-1.0.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-httpclient-3.0.1.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-io-2.1.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-lang-2.4.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-logging-1.1.1.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-logging-api-1.0.4.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-math-2.1.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/commons-net-1.4.1.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/core-3.1.1.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/guava-11.0.2.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/hadoop-capacity-scheduler-1.0.4.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/hadoop-fairscheduler-1.0.4.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/hadoop-thriftfs-1.0.4.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/hbase-0.94.6-tests.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/hbase-0.94.6.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/hsqldb-1.8.0.10.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jasper-compiler-5.5.12.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jasper-runtime-5.5.12.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jdeb-0.8.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jersey-core-1.8.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jersey-json-1.8.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jersey-server-1.8.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jets3t-0.6.1.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jetty-6.1.26.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jetty-util-6.1.26.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jsch-0.1.42.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/junit-4.5.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/kfs-0.2.2.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/log4j-1.2.15.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/mockito-all-1.8.5.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/oro-2.0.8.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/protobuf-java-2.4.0a.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/servlet-api-2.5-20081211.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/slf4j-api-1.4.3.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/slf4j-log4j12-1.4.3.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/xmlenc-0.52.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/zookeeper-3.4.3.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jsp-2.1/jsp-2.1.jar:/home/hadoop/hadoop-1.0.4/libexec/../lib/jsp-2.1/jsp-api-2.1.jar13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/home/hadoop/hadoop-1.0.4/libexec/../lib/native/Linux-i386-3213/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:java.compiler=13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:os.arch=i38613/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.18-92.el5xen13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:user.name=hadoop13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hadoop13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/hadoop/sqoop-1.4.3/bin13/04/10 07:54:40 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=192.168.0.90:2181 sessionTimeout=180000 watcher=hconnection13/04/10 07:54:40 INFO zookeeper.ClientCnxn: Opening socket connection to server /192.168.0.90:218113/04/10 07:54:40 INFO client.ZooKeeperSaslClient: Client will not SASL-authenticate because the default JAAS configuration section 'Client' could not be found. If you are not using SASL, you may ignore this. On the other hand, if you expected SASL to work, please fix your JAAS configuration.13/04/10 07:54:40 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 12549@hadoop113/04/10 07:54:40 INFO zookeeper.ClientCnxn: Socket connection established to hadoop3/192.168.0.90:2181, initiating session13/04/10 07:54:46 INFO zookeeper.ClientCnxn: Session establishment complete on server hadoop3/192.168.0.90:2181, sessionid = 0x13df12619940011, negotiated timeout = 18000013/04/10 07:54:56 INFO mapreduce.TableOutputFormat: Created table instance for t13/04/10 07:54:56 INFO input.FileInputFormat: Total input paths to process : 113/04/10 07:54:56 INFO util.NativeCodeLoader: Loaded the native-hadoop library13/04/10 07:54:56 WARN snappy.LoadSnappy: Snappy native library not loaded13/04/10 07:54:57 INFO mapred.JobClient: Running job: job_201304100706_000713/04/10 07:54:59 INFO mapred.JobClient: map 0% reduce 0%13/04/10 07:57:29 INFO mapred.JobClient: map 100% reduce 0%13/04/10 07:57:37 INFO mapred.JobClient: Job complete: job_201304100706_000713/04/10 07:57:37 INFO mapred.JobClient: Counters: 1913/04/10 07:57:37 INFO mapred.JobClient: Job Counters13/04/10 07:57:37 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=12578513/04/10 07:57:37 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=013/04/10 07:57:37 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=013/04/10 07:57:37 INFO mapred.JobClient: Rack-local map tasks=113/04/10 07:57:37 INFO mapred.JobClient: Launched map tasks=113/04/10 07:57:37 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=013/04/10 07:57:37 INFO mapred.JobClient: ImportTsv13/04/10 07:57:37 INFO mapred.JobClient: Bad Lines=013/04/10 07:57:37 INFO mapred.JobClient: File Output Format Counters13/04/10 07:57:37 INFO mapred.JobClient: Bytes Written=013/04/10 07:57:37 INFO mapred.JobClient: FileSystemCounters13/04/10 07:57:37 INFO mapred.JobClient: HDFS_BYTES_READ=14113/04/10 07:57:37 INFO mapred.JobClient: FILE_BYTES_WRITTEN=3353713/04/10 07:57:37 INFO mapred.JobClient: File Input Format Counters13/04/10 07:57:37 INFO mapred.JobClient: Bytes Read=4413/04/10 07:57:37 INFO mapred.JobClient: Map-Reduce Framework13/04/10 07:57:37 INFO mapred.JobClient: Map input records=413/04/10 07:57:37 INFO mapred.JobClient: Physical memory (bytes) snapshot=3786752013/04/10 07:57:37 INFO mapred.JobClient: Spilled Records=013/04/10 07:57:37 INFO mapred.JobClient: CPU time spent (ms)=17013/04/10 07:57:37 INFO mapred.JobClient: Total committed heap usage (bytes)=795033613/04/10 07:57:37 INFO mapred.JobClient: Virtual memory (bytes) snapshot=34538700813/04/10 07:57:37 INFO mapred.JobClient: Map output records=413/04/10 07:57:37 INFO mapred.JobClient: SPLIT_RAW_BYTES=97
hbase(main):016:0> scan 't'ROW COLUMN+CELL1 column=cf:c1, timestamp=1365551680259, value=A1 column=cf:c2, timestamp=1365551680259, value=2013042 column=cf:c1, timestamp=1365551680259, value=B2 column=cf:c2, timestamp=1365551680259, value=2013053 column=cf:c1, timestamp=1365551680259, value=C3 column=cf:c2, timestamp=1365551680259, value=2013064 column=cf:c1, timestamp=1365551680259, value=D4 column=cf:c2, timestamp=1365551680259, value=2013074 row(s) in 0.5480 seconds