转载来自:https://i-blog.csdnimg.cn/blog_migrate/007bf33750244b5ab995af570f1569db.png
http://www.micmiu.com/bigdata/sqoop/sqoop-setup-and-demo/
http://blog.csdn.net/aaronhadoop/article/details/26713431
之所以选择Sqoop1是因为Sqoop2目前问题太多。无法正常使用,综合比较后选择Sqoop1。
Sqoop1安装配置比较简单
一、安装部署
(1)、下载地址:http://archive.cloudera.com/cdh5/cdh/5/sqoop-1.4.6-cdh5.5.2.tar.gz
解压到/opt/cdh5/sqoop
(2)、拷贝mysql的jdbc驱动包mysql-connector-java-5.1.31-bin.jar到sqoop/lib目录下。
(3)、配置环境变量
#sqoop
export SQOOP_HOME=/opt/cdh5/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
(4)、复制sqoop/conf/sqoop-env-template.sh为sqoop-env.sh
添加相关的配置
#Setpath to where bin/hadoop is available
exportHADOOP_COMMON_HOME=/opt/cdh5/hadoop
#Setpath to where hadoop-*-core.jar isavailable
exportHADOOP_MAPRED_HOME=/opt/cdh5/hadoop
#setthe path to where bin/hbase isavailable
exportHBASE_HOME=/opt/cdh5/hbase
#Setthe path to where bin/hive is available
exportHIVE_HOME= /opt/cdh5/hive
#Setthe path for where zookeper config diris
exportZOOCFGDIR= /opt/cdh5/zookeeper
(5)、测试Sqoop
发现有警告
修改$SQOOP_HOME/bin/configure-sqoop
注释掉HCatalog,Accumulo检查(除非你准备使用HCatalog,Accumulo等HADOOP上的组件)
## Moved to be a runtime check in sqoop.
#if [ ! -d "${HCAT_HOME}" ]; then
# echo "Warning: $HCAT_HOME does not exist! HCatalog jobs willfail."
# echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
#fi
#if [ ! -d "${ACCUMULO_HOME}" ];then
# echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports willfail."
# echo 'Please set $ACCUMULO_HOME to the root of your Accumuloinstallation.'
#fi
再次执行sqoop version
你也可以查看某一个命令的使用说明:
$ sqoopimport --help
$ sqoophelp import
sqoop import 的一个示例如下:
$ sqoop import --connect jdbc:mysql://192.168.56.121:3306/metastore --username hiveuser --password redhat --table TBLS
你还可以使用 --options-file 来传入一个文件,使用这种方式可以重用一些配置参数:
$ sqoop --options-file /users/homer/work/import.txt --table TEST
/users/homer/work/import.txt 文件内容如下:
import
--connect
jdbc:mysql://192.168.56.121:3306/metastore
--username
hiveuser
--password
redhat
二、Sqoop使用说明
1、测试连接:
(1)显示mysql数据库列表
sqoop list-databases –connect jdbc:mysql://hadoop003:3306/--username root –P
(2)显示数据库里所有表:
sqoop list-tables --connect jdbc:mysql://hadoop003:3306/EDW --username root -P
2、Mysql与HDFS互导
(1)mysql导入到hdfs
把mysql中表fin_cashier_order导入到hdfs,导入之前查询一下fin_cashier_order表中的数据
共:199条
sqoop import --connect jdbc:mysql://hadoop003:3306/ssa --username root --password ***** --table fin_cashier_order --target-dir /user/hadoop/databases/ssa/fin_cashier_order -m 4
-m 表示Map并发数
若是不写--target-dir 则默认是hdfs上的user/username/tablename 路径
如果重复执行,会提示目录已经存在,可以手动删除
--password 可以指定为-P,然后手动输入密码
注:你如果使用新建的mysql用户,那么这个用户要有集群所有节点的权限eg:
grant all on *.* to wkz@master identified by 'wkz';
grant all on *.* to wkz@slave identified by 'wkz';
grant all on *.* to wkz@slave2 identified by 'wkz'
运行完mapreduce结束后去HDFS上检查
验证hdfs上导入的数据:
hadoop fs -ls /user/hadoop/databases/ssa/fin_cashier_order
hadoop fs -cat /user/hadoop/databases/ssa/fin_cashier_order/part-m-00000
ps:默认设置下导入到hdfs上的路径是: /user/username/tablename/(files)
,比如我的当前用户是hadoop,那么实际路径即: /user/hadoop/demo_blog/(files)
。
(2)HDFS导入到mysql
把HDFS中数据导入到Mysql表中,mysql中需要预先建立空表fin_cashier_order2,此时该表为空
sqoop export --connect jdbc:mysql://hadoop003:3306/ssa --table fin_cashier_order2 --username root --password ****** --export-dir hdfs://master:9000/user/hadoop/databases/ssa/fin_cashier_order/
注意:hdfs://master:9000/user/hadoop/databases/ssa/fin_cashier_order/ 可改为/user/hadoop/databases/ssa/fin_cashier_order/,因为hadoop配置文件中已经设置了hdfs://master:9000,故可以省略
运行完显示
16/02/25 16:23:39 INFOmapreduce.ExportJobBase: Transferred 70.4619 KB in 48.3235 seconds (1.4581KB/sec)
16/02/25 16:23:39 INFO mapreduce.ExportJobBase:Exported 199 records.
显示导出199条记录。
到表fin_cashier_order2查看
并且正好也是199条。
至此,用Sqoop将mysql与HDFS互导功都验证完毕
你还可以指定其他的参数:
参数 | 说明 |
---|---|
将数据追加到hdfs中已经存在的dataset中。使用该参数,sqoop将把数据先导入到一个临时目录中,然后重新给文件命名到一个正式的目录中,以避免和该目录中已存在的文件重名。 | |
将数据导入到一个Avro数据文件中 | |
将数据导入到一个sequence文件中 | |
将数据导入到一个普通文本文件中,生成该文本文件后,可以在hive中通过sql语句查询出结果。 | |
边界查询,也就是在导入前先通过SQL查询得到一个结果集,然后导入的数据就是该结果集内的数据,格式如: | ,表示导入的数据为id=3的记录,或者 ,注意查询的字段中不能有数据类型为字符串的字段,否则会报错|
指定要导入的字段值,格式如: | |
直接导入模式,使用的是关系数据库自带的导入导出工具。官网上是说这样导入会更快 | |
在使用上面direct直接导入的基础上,对导入的流按字节数分块,特别是使用直连模式从PostgreSQL导入数据的时候,可以将一个到达设定大小的文件分为几个独立的文件。 | |
设定大对象数据类型的最大值 | |
启动N个map来并行导入数据,默认是4个,最好不要将数字设置为高于集群的节点数 | |
从查询结果中导入数据,该参数使用时必须指定 | 、 ,在查询语句中一定要有where条件且在where条件中需要包含 ,示例:|
表的列名,用来切分工作单元,一般后面跟主键ID | |
关系数据库表名,数据从该表中获取 | |
删除目标目录 | |
指定hdfs路径 | |
与 | 不能同时使用,指定数据导入的存放目录,适用于hdfs导入,不适合导入hive目录|
从关系数据库导入数据时的查询条件,示例: | |
压缩参数,默认情况下数据是没被压缩的,通过该参数可以使用gzip压缩算法对数据进行压缩,适用于SequenceFile, text文本文件, 和Avro文件 | |
Hadoop压缩编码,默认是gzip | |
可选参数,如果没有指定,则字符串null将被使用 | |
可选参数,如果没有指定,则字符串null将被使用 |
示例程序:
$ sqoop import --connect jdbc:mysql://192.168.56.121:3306/metastore --username hiveuser --password redhat --table TBLS --columns "tbl_id,create_time" --where "tbl_id > 1" --target-dir /user/hive/result
使用 sql 语句
参照上表,使用 sql 语句查询时,需要指定 $CONDITIONS
$ sqoop import --connect jdbc:mysql://192.168.56.121:3306/metastore --username hiveuser --password redhat --query 'SELECT * from TBLS where tbl_id >2 and $CONDITIONS ' --split-by tbl_id -m 4 --target-dir /user/hive/result
注: Sqoop支持将任意的查询结果集导入,不使用--table、--columns和--where,使用SQL语句--query参数执行自由查询导入,但是必须指定--target-dir目录,必须指定--split-by 分隔列,同时必须使用where且在其后加个$CONDITIONS,使Sqoop进程替代为一个唯一的条件表达式达到条件查询效果。
上面命令通过 -m 1 控制并发的 map 数。
使用 direct 模式:
$ sqoop import --connect jdbc:mysql://192.168.56.121:3306/metastore --username hiveuser --password redhat --table TBLS --delete-target-dir --direct --default-character-set UTF-8 --target-dir /user/hive/result
指定文件输出格式:
$ sqoop import --connect jdbc:mysql://192.168.56.121:3306/metastore --username hiveuser --password redhat --table TBLS --fields-terminated-by"\t" --lines-terminated-by "\n" --delete-target-dir --target-dir /user/hive/result
这时候查看 hdfs 中数据(观察分隔符是否为制表符):
$ hadoop fs -ls result
Found 5 items
-rw-r--r-- 3 root hadoop 0 2014-08-04 16:07 result/_SUCCESS
-rw-r--r-- 3 root hadoop 69 2014-08-04 16:07 result/part-m-00000
-rw-r--r-- 3 root hadoop 0 2014-08-04 16:07 result/part-m-00001
-rw-r--r-- 3 root hadoop 142 2014-08-04 16:07 result/part-m-00002
-rw-r--r-- 3 root hadoop 62 2014-08-04 16:07 result/part-m-00003
$ hadoop fs -cat result/part-m-00000
34 1406784308 8 0 root 0 45 test1 EXTERNAL_TABLE null null null
$ hadoop fs -cat result/part-m-00002
40 1406797005 9 0 root 0 52 test2 EXTERNAL_TABLE null null null
42 1407122307 7 0 root 0 59 test3 EXTERNAL_TABLE null null null
指定空字符串:
$ sqoop import --connect jdbc:mysql://192.168.56.121:3306/metastore --username hiveuser --password redhat --table TBLS --fields-terminated-by "\t" --lines-terminated-by "\n" --delete-target-dir --null-string '\\N' --null-non-string '\\N' --target-dir /user/hive/result
如果需要指定压缩:
$ sqoop import --connect jdbc:mysql://192.168.56.121:3306/metastore --username hiveuser --password redhat --table TBLS --fields-terminated-by "\t" --lines-terminated-by "\n" --delete-target-dir --null-string '\\N' --null-non-string '\\N' --compression-codec "com.hadoop.compression.lzo.LzopCodec" --target-dir /user/hive/result
附:可选的文件参数如下表。
参数 | 说明 |
---|---|
给字段值前后加上指定的字符,比如双引号,示例:dd@dd.com" | ,显示例子:"3","jimsss","|
给双引号作转义处理,如字段值为"测试",经过 | 处理后,在hdfs中的显示值为: ,对单引号无效|
设定每个字段是以什么符号作为结束的,默认是逗号,也可以改为其它符号,如句号 | ,示例如:|
设定每条记录行之间的分隔符,默认是换行串,但也可以设定自己所需要的字符串,示例如: | 以#号分隔|
Mysql默认的分隔符设置,字段之间以 | 隔开,行之间以换行 隔开,默认转义符号是 ,字段值以单引号 包含起来。|
enclosed-by是强制给每个字段值前后都加上指定的符号,而 | 只是给带有双引号或单引号的字段值加上指定的符号,故叫可选的
出现如下错误:
错误1:
ERROR tool.ImportTool: EncounteredIOException running import job: java.io.IOException: DataStreamer Exception:
atorg.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:796)
Caused by: java.lang.OutOfMemoryError:unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
atorg.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.initDataStreaming(DFSOutputStream.java:581)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:669)
解决方法:
net.ipv4.ip_local_port_range = 1024 65000
修改系统参数
echo "100000" > /proc/sys/kernel/threads-max |
echo "100000" > /proc/sys/kernel/pid_max (默认32768) |
echo "200000" > /proc/sys/vm/max_map_count (默认65530) |
修改/etc/security/limits.conf
* - nproc 999999
* -nofile 999999
PS:nproc是修改系统的max user processes大小;nofile 是修改open files的大小,另外linux 2.6.25内核之前有个宏定义,定义了这个值的最大值,为1024*1024,正好是100万,而在2.6.25内核及其之后,这个值是可以通过/proc/sys/fs/nr_open来设置,不过,999999足够用了。
很多人可能会遇到,只能启动32000多个线程就不能再起更多的线程了,其实就是pid_max = 32768 给限制住了
增加map数量,通过sqoop -m 选项指定更多的map。通过更多的map,降少每个子进程占用的heap space,避免超出hadoop设置的java heap space 大小
sqoop ... -m <map 数量>
错误2:
Caused by: java.lang.RuntimeException:java.sql.SQLException: Access denied for user 'root'@'hadoop003' (usingpassword: YES)
atorg.apache.sqoop.mapreduce.db.DBInputFormat.getConnection(DBInputFormat.java:220)
atorg.apache.sqoop.mapreduce.db.DBInputFormat.setConf(DBInputFormat.java:165)
... 9 more
程序正常运行,结果也正确。说明数据库连接是正确的,而且单独用mysql连接也是可以的。就是报这个异常错误。
查询数据库里用户信息
![](https://i-blog.csdnimg.cn/blog_migrate/007bf33750244b5ab995af570f1569db.png)
JDBC连接的机器名是hadoop003,虽然已经赋予root所有机器的远程访问权限(“%”),但是hadoop003这个账户可能没有开通机器的远程权限。于是给hadoop003这个机器开通远程权限。
GRANT ALL PRIVILEGES ON *.* TO'root'@'hadoop003' IDENTIFIED BY '********' WITH GRANT OPTION;
再次执行导入命令。成功。不再出现上面错误。
从mysql导入到Hive里报如下错误:
ERROR hive.HiveConfig: Could not loadorg.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.
16/02/26 14:43:47 ERROR tool.ImportTool:Encountered IOException running import job: java.io.IOException:java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
数据已经导入到HDFS。从HDFS移动到Hive里时出错。提示HIVE_CONF_DIR配置不对。
但是将HIVE_CONF_DIR添加到sqoop-env.sh、hadoop-env.sh、hive-env.sh都不起作用。
最终正确的解决方法:
在/etc/profile 里添加下面一句话
exportHADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*
Sqoop-1.4.7版本好像解决了这个问题。期待这个版本
导入数据到HBase
演示把表 demo_blog 数据导入到HBase ,指定Hbase中表名为 demo_sqoop2hbase 的命令:
sqoop import --connect jdbc:mysql://192.168.6.77/test --username root --password micmiu --table demo_blog --hbase-table demo_sqoop2hbase --hbase-create-table --hbase-row-key id --column-family url 执行过程:
$ sqoop import --connect jdbc:mysql://192.168.6.77/test --username root --password micmiu --table demo_blog --hbase-table demo_sqoop2hbase --hbase-create-table --hbase-row-key id --column-family url
Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
14/04/09 16:23:38 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/04/09 16:23:38 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/04/09 16:23:38 INFO tool.CodeGenTool: Beginning code generation
14/04/09 16:23:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `demo_blog` AS t LIMIT 1
14/04/09 16:23:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `demo_blog` AS t LIMIT 1
14/04/09 16:23:39 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: /tmp/sqoop-hadoop/compile/85408c854ee8fba75bbb2458e5e25093/demo_blog.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/04/09 16:23:40 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/85408c854ee8fba75bbb2458e5e25093/demo_blog.jar
14/04/09 16:23:40 WARN manager.MySQLManager: It looks like you are importing from mysql.
14/04/09 16:23:40 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
14/04/09 16:23:40 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
14/04/09 16:23:40 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
14/04/09 16:23:40 INFO mapreduce.ImportJobBase: Beginning import of demo_blog
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase-0.98.0-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/04/09 16:23:40 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/04/09 16:23:40 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/04/09 16:23:41 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
14/04/09 16:23:41 INFO zookeeper.ZooKeeper: Client environment:host.name=Master.Hadoop
14/04/09 16:23:41 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_20
14/04/09 16:23:41 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
14/04/09 16:23:41 INFO zookeeper.ZooKeeper: Client environment:java.home=/java/jdk1.6.0_20/jre
14/04/09 16:23:41 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/usr/local/hadoop/etc/hadoop: .......
14/04/09 16:23:41 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/local/hadoop/lib/native
14/04/09 16:23:41 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
14/04/09 16:23:41 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
14/04/09 16:23:41 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
14/04/09 16:23:41 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
14/04/09 16:23:41 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-71.el6.x86_64
14/04/09 16:23:41 INFO zookeeper.ZooKeeper: Client environment:user.name=hadoop
14/04/09 16:23:41 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/hadoop
14/04/09 16:23:41 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/hadoop
14/04/09 16:23:41 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=Slave6.Hadoop:2181,Slave5.Hadoop:2181,Slave7.Hadoop:2181 sessionTimeout=90000 watcher=hconnection-0x57c8b24d, quorum=Slave6.Hadoop:2181,Slave5.Hadoop:2181,Slave7.Hadoop:2181, baseZNode=/hbase
14/04/09 16:23:41 INFO zookeeper.ClientCnxn: Opening socket connection to server Slave5.Hadoop/192.168.8.205:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
14/04/09 16:23:41 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x57c8b24d connecting to ZooKeeper ensemble=Slave6.Hadoop:2181,Slave5.Hadoop:2181,Slave7.Hadoop:2181
14/04/09 16:23:41 INFO zookeeper.ClientCnxn: Socket connection established to Slave5.Hadoop/192.168.8.205:2181, initiating session
14/04/09 16:23:41 INFO zookeeper.ClientCnxn: Session establishment complete on server Slave5.Hadoop/192.168.8.205:2181, sessionid = 0x453fecb6c50009, negotiated timeout = 90000
14/04/09 16:23:41 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
14/04/09 16:23:41 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=Slave6.Hadoop:2181,Slave5.Hadoop:2181,Slave7.Hadoop:2181 sessionTimeout=90000 watcher=catalogtracker-on-hconnection-0x57c8b24d, quorum=Slave6.Hadoop:2181,Slave5.Hadoop:2181,Slave7.Hadoop:2181, baseZNode=/hbase
14/04/09 16:23:41 INFO zookeeper.ClientCnxn: Opening socket connection to server Slave7.Hadoop/192.168.8.207:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
14/04/09 16:23:41 INFO zookeeper.RecoverableZooKeeper: Process identifier=catalogtracker-on-hconnection-0x57c8b24d connecting to ZooKeeper ensemble=Slave6.Hadoop:2181,Slave5.Hadoop:2181,Slave7.Hadoop:2181
14/04/09 16:23:41 INFO zookeeper.ClientCnxn: Socket connection established to Slave7.Hadoop/192.168.8.207:2181, initiating session
14/04/09 16:23:41 INFO zookeeper.ClientCnxn: Session establishment complete on server Slave7.Hadoop/192.168.8.207:2181, sessionid = 0x2453fecb6f50008, negotiated timeout = 90000
14/04/09 16:23:41 INFO zookeeper.ZooKeeper: Session: 0x2453fecb6f50008 closed
14/04/09 16:23:41 INFO zookeeper.ClientCnxn: EventThread shut down
14/04/09 16:23:41 INFO mapreduce.HBaseImportJob: Creating missing HBase table demo_sqoop2hbase
14/04/09 16:23:42 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=Slave6.Hadoop:2181,Slave5.Hadoop:2181,Slave7.Hadoop:2181 sessionTimeout=90000 watcher=catalogtracker-on-hconnection-0x57c8b24d, quorum=Slave6.Hadoop:2181,Slave5.Hadoop:2181,Slave7.Hadoop:2181, baseZNode=/hbase
14/04/09 16:23:42 INFO zookeeper.RecoverableZooKeeper: Process identifier=catalogtracker-on-hconnection-0x57c8b24d connecting to ZooKeeper ensemble=Slave6.Hadoop:2181,Slave5.Hadoop:2181,Slave7.Hadoop:2181
14/04/09 16:23:42 INFO zookeeper.ClientCnxn: Opening socket connection to server Slave7.Hadoop/192.168.8.207:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
14/04/09 16:23:42 INFO zookeeper.ClientCnxn: Socket connection established to Slave7.Hadoop/192.168.8.207:2181, initiating session
14/04/09 16:23:42 INFO zookeeper.ClientCnxn: Session establishment complete on server Slave7.Hadoop/192.168.8.207:2181, sessionid = 0x2453fecb6f50009, negotiated timeout = 90000
14/04/09 16:23:42 INFO zookeeper.ZooKeeper: Session: 0x2453fecb6f50009 closed
14/04/09 16:23:42 INFO zookeeper.ClientCnxn: EventThread shut down
14/04/09 16:23:42 INFO client.RMProxy: Connecting to ResourceManager at Master.Hadoop/192.168.6.77:8032
14/04/09 16:23:47 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM `demo_blog`
14/04/09 16:23:47 INFO mapreduce.JobSubmitter: number of splits:3
14/04/09 16:23:47 INFO Configuration.deprecation: mapred.job.classpath.files is deprecated. Instead, use mapreduce.job.classpath.files
14/04/09 16:23:47 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/04/09 16:23:47 INFO Configuration.deprecation: mapred.cache.files.filesizes is deprecated. Instead, use mapreduce.job.cache.files.filesizes
14/04/09 16:23:47 INFO Configuration.deprecation: mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files
14/04/09 16:23:47 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/04/09 16:23:47 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/04/09 16:23:47 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
14/04/09 16:23:47 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/04/09 16:23:47 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
14/04/09 16:23:47 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
14/04/09 16:23:47 INFO Configuration.deprecation: mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps
14/04/09 16:23:47 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/04/09 16:23:47 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/04/09 16:23:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1396936838233_0005
14/04/09 16:23:47 INFO impl.YarnClientImpl: Submitted application application_1396936838233_0005 to ResourceManager at Master.Hadoop/192.168.6.77:8032
14/04/09 16:23:47 INFO mapreduce.Job: The url to track the job: http://Master.Hadoop:8088/proxy/application_1396936838233_0005/
14/04/09 16:23:47 INFO mapreduce.Job: Running job: job_1396936838233_0005
14/04/09 16:23:55 INFO mapreduce.Job: Job job_1396936838233_0005 running in uber mode : false
14/04/09 16:23:55 INFO mapreduce.Job: map 0% reduce 0%
14/04/09 16:24:05 INFO mapreduce.Job: map 33% reduce 0%
14/04/09 16:24:12 INFO mapreduce.Job: map 100% reduce 0%
14/04/09 16:24:12 INFO mapreduce.Job: Job job_1396936838233_0005 completed successfully
14/04/09 16:24:12 INFO mapreduce.Job: Counters: 27
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=354636
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=295
HDFS: Number of bytes written=0
HDFS: Number of read operations=3
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=3
Other local map tasks=3
Total time spent by all maps in occupied slots (ms)=35297
Total time spent by all reduces in occupied slots (ms)=0
Map-Reduce Framework
Map input records=3
Map output records=3
Input split bytes=295
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=381
CPU time spent (ms)=11050
Physical memory (bytes) snapshot=543367168
Virtual memory (bytes) snapshot=3918925824
Total committed heap usage (bytes)=156958720
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
14/04/09 16:24:12 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 29.7126 seconds (0 bytes/sec)
14/04/09 16:24:12 INFO mapreduce.ImportJobBase: Retrieved 3 records.
hbase shell中验证导入的数据:
hbase(main):009:0> list
TABLE
demo_sqoop2hbase
table_02
table_03
test_table
xyz
5 row(s) in 0.0310 seconds
=> ["demo_sqoop2hbase", "table_02", "table_03", "test_table", "xyz"]
hbase(main):010:0> scan "demo_sqoop2hbase"
ROW COLUMN+CELL
1 column=url:blog, timestamp=1397031850700, value=micmiu.com
2 column=url:blog, timestamp=1397031844106, value=ctosun.com
3 column=url:blog, timestamp=1397031849888, value=baby.micmiu.com
3 row(s) in 0.0730 seconds
hbase(main):011:0> describe "demo_sqoop2hbase"
DESCRIPTION ENABLED
'demo_sqoop2hbase', {NAME => 'url', DATA_BLOCK_ENCODING => 'NONE', BL true
OOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRE
SSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELET
ED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOC
KCACHE => 'true'}
1 row(s) in 0.0580 seconds
hbase(main):012:0>
验证导入成功。