sqoop使用心得~~~



sqoop是么哥?sqoop就是封装了RDB与hdfs之间互相cp数据的一个工具,没有它,我们当然也可以写一坨脚本,手动执行,但是用它会更方便,so。。。


安装步骤:

1、去 http://archive.cloudera.com/cdh/3/ 下载开发包,现在一般下cdh3u3。

2、下载jdbc rdb 驱动jar包,我测试用的是mysql 驱动。

3、解压完之后把mysql 驱动包放入sqoop目录的lib下。

4、配置环境变量:前提为hadoop,hive,hbase,zk都已安装,且配置正确,只需要配置SQOOP_HOME即可,因为我发现bin/configure-sqoop 里面配置了

export SQOOP_CLASSPATH
export SQOOP_CONF_DIR
export SQOOP_JAR_DIR
export HADOOP_CLASSPATH
export HADOOP_HOME
export HBASE_HOME

SQOOP_HOME是直接引用的,所以需要配,否则会找不到jar包

如果hbase,zk未安装,则要去configure-sqoop 里注释掉相应的检查语句。


测试:

RDB到hdfs:

[hadoop@master cuirong]$ sqoop import --connect jdbc:mysql://master:3306/sqoop --username sqoop --password sqoop123 --table t_etl_cmd_relation -m 1 
12/03/01 12:12:56 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
12/03/01 12:12:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
12/03/01 12:12:56 INFO tool.CodeGenTool: Beginning code generation
12/03/01 12:12:56 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t_etl_cmd_relation` AS t LIMIT 1
12/03/01 12:12:56 INFO orm.CompilationManager: HADOOP_HOME is /usr/local/hadoop
12/03/01 12:12:58 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/9ced3ca3be0a259acc993a93328c40c8/t_etl_cmd_relation.jar
12/03/01 12:12:58 WARN manager.MySQLManager: It looks like you are importing from mysql.
12/03/01 12:12:58 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
12/03/01 12:12:58 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
12/03/01 12:12:58 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
12/03/01 12:12:58 INFO mapreduce.ImportJobBase: Beginning import of t_etl_cmd_relation
12/03/01 12:13:00 INFO mapred.JobClient: Running job: job_201202212041_15467
12/03/01 12:13:01 INFO mapred.JobClient:  map 0% reduce 0%
12/03/01 12:13:06 INFO mapred.JobClient:  map 100% reduce 0%
12/03/01 12:13:06 INFO mapred.JobClient: Job complete: job_201202212041_15467
12/03/01 12:13:06 INFO mapred.JobClient: Counters: 16
12/03/01 12:13:06 INFO mapred.JobClient:   Job Counters 
12/03/01 12:13:06 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=4925
12/03/01 12:13:06 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
12/03/01 12:13:06 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
12/03/01 12:13:06 INFO mapred.JobClient:     Launched map tasks=1
12/03/01 12:13:06 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
12/03/01 12:13:06 INFO mapred.JobClient:   FileSystemCounters
12/03/01 12:13:06 INFO mapred.JobClient:     HDFS_BYTES_READ=87
12/03/01 12:13:06 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=56161
12/03/01 12:13:06 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1131
12/03/01 12:13:06 INFO mapred.JobClient:   Map-Reduce Framework
12/03/01 12:13:06 INFO mapred.JobClient:     Map input records=16
12/03/01 12:13:06 INFO mapred.JobClient:     Physical memory (bytes) snapshot=130781184
12/03/01 12:13:06 INFO mapred.JobClient:     Spilled Records=0
12/03/01 12:13:06 INFO mapred.JobClient:     CPU time spent (ms)=640
12/03/01 12:13:06 INFO mapred.JobClient:     Total committed heap usage (bytes)=505348096
12/03/01 12:13:06 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1533403136
12/03/01 12:13:06 INFO mapred.JobClient:     Map output records=16
12/03/01 12:13:06 INFO mapred.JobClient:     SPLIT_RAW_BYTES=87
12/03/01 12:13:06 INFO mapreduce.ImportJobBase: Transferred 1.1045 KB in 8.0517 seconds (140.4665 bytes/sec)
12/03/01 12:13:06 INFO mapreduce.ImportJobBase: Retrieved 16 records.测试:

导出的数据默认到hadoop hdfs主目录下,可使用 参数指定其它目录。

也可以直接导入hive,具体用法请参见官方文档。

hdfs到RDB:


[hadoop@master cuirong]$ sqoop export --connect jdbc:mysql://master:3306/sqoop --username sqoop --password sqoop123 --table t_etl_cmd_relation --export-dir /tmp/cui1/part-m-00000
12/03/01 12:52:10 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
12/03/01 12:52:10 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
12/03/01 12:52:10 INFO tool.CodeGenTool: Beginning code generation
12/03/01 12:52:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `t_etl_cmd_relation` AS t LIMIT 1
12/03/01 12:52:10 INFO orm.CompilationManager: HADOOP_HOME is /usr/local/hadoop
12/03/01 12:52:11 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/44fdcaa5fbd99233f89008548b08b370/t_etl_cmd_relation.jar
12/03/01 12:52:11 INFO mapreduce.ExportJobBase: Beginning export of t_etl_cmd_relation
12/03/01 12:52:13 INFO input.FileInputFormat: Total input paths to process : 1
12/03/01 12:52:13 INFO input.FileInputFormat: Total input paths to process : 1
12/03/01 12:52:13 INFO mapred.JobClient: Running job: job_201202212041_15480
12/03/01 12:52:14 INFO mapred.JobClient:  map 0% reduce 0%
12/03/01 12:52:19 INFO mapred.JobClient:  map 100% reduce 0%
12/03/01 12:52:19 INFO mapred.JobClient: Job complete: job_201202212041_15480
12/03/01 12:52:19 INFO mapred.JobClient: Counters: 16
12/03/01 12:52:19 INFO mapred.JobClient:   Job Counters 
12/03/01 12:52:19 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=4093
12/03/01 12:52:19 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
12/03/01 12:52:19 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
12/03/01 12:52:19 INFO mapred.JobClient:     Rack-local map tasks=1
12/03/01 12:52:19 INFO mapred.JobClient:     Launched map tasks=1
12/03/01 12:52:19 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
12/03/01 12:52:19 INFO mapred.JobClient:   FileSystemCounters
12/03/01 12:52:19 INFO mapred.JobClient:     HDFS_BYTES_READ=1239
12/03/01 12:52:19 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=55964
12/03/01 12:52:19 INFO mapred.JobClient:   Map-Reduce Framework
12/03/01 12:52:19 INFO mapred.JobClient:     Map input records=16
12/03/01 12:52:19 INFO mapred.JobClient:     Physical memory (bytes) snapshot=123195392
12/03/01 12:52:19 INFO mapred.JobClient:     Spilled Records=0
12/03/01 12:52:19 INFO mapred.JobClient:     CPU time spent (ms)=680
12/03/01 12:52:19 INFO mapred.JobClient:     Total committed heap usage (bytes)=505348096
12/03/01 12:52:19 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1457479680
12/03/01 12:52:19 INFO mapred.JobClient:     Map output records=16
12/03/01 12:52:19 INFO mapred.JobClient:     SPLIT_RAW_BYTES=114
12/03/01 12:52:19 INFO mapreduce.ExportJobBase: Transferred 1.21 KB in 7.2897 seconds (169.9657 bytes/sec)
12/03/01 12:52:19 INFO mapreduce.ExportJobBase: Exported 16 records.

hdfs文件分隔符的问题还是要注意的,mysql默认分隔符是逗号,如果hdfs文件分割符是其它的需要制定参数 ,如:--input-fields-terminated-by '\001'

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值