1.在mysql表中创建一个千万条数据的测试表card
2.在Hbase中创建对应的test表,指定一个列族info
hbase shell
create 'test','info'
3.将mysql数据导入hbase中
sqoop import
--connect jdbc:mysql://192.168.20.160/test
--username root
--password 111111
--table card
--hbase-table 'test' # 指定hbase表的列族名
--hbase-row-key card_id # 指定hbase表的rowkey对应为mysql表的card_id主键
--column-family 'info' #指定hbase表列族
--hbase-create-table # 自动在hbase数据库中创建"test"这张表,如果之前创建了,请忽略这一句
中间碰到了一个报错:
tool.ImportTool: Import failed: java.io.IOException: java.sql.SQLException: Incorrect key file for table './test/card.MYI'; try to repair it
这是mysql中的表索引损坏了,重新用repair命令执行一下就好了
repair table card;
4.执行成功
19/05/15 15:08:25 INFO mapreduce.Job: Job job_1557888023370_0004 completed successfully
19/05/15 15:08:26 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=747592
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=476
HDFS: Number of bytes written=0
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=1347125
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=1347125
Total vcore-milliseconds taken by all map tasks=2694250
Total megabyte-milliseconds taken by all map tasks=1379456000
Map-Reduce Framework
Map input records=10000000
Map output records=10000000
Input split bytes=476
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=3329
CPU time spent (ms)=1170200
Physical memory (bytes) snapshot=1686634496
Virtual memory (bytes) snapshot=11169898496
Total committed heap usage (bytes)=1444937728
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
19/05/15 15:08:26 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 486.5645 seconds (0 bytes/sec)
19/05/15 15:08:26 INFO mapreduce.ImportJobBase: Retrieved 10000000 records.
显示成功导入hbase test表一千万条数据
5.进入hbase,验证数据是否导入成功
利用hbase jar中自带的统计行数的工具类,查询test表总条数
hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'test'
执行结果出来,test表里面新增了一千万条数据
并且可以执行
scan 'test'
查看表中数据