超简单的Sqoop入门教程

版权声明:本文为博主原创文章,欢迎转载。 https://blog.csdn.net/chengyuqiang/article/details/80401750

1、下载

http://sqoop.apache.org/
https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/

这里写图片描述
对于sqoop-1.4.x.tar.gz软件包,从1.4.5开始以后版本安装包的lib下就已经没有了sqoop-1.4.x.jar啦,我们这里需要下载sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz。

[root@node1 ~]# wget http://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
[root@node1 ~]# tar -zxvf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

将解压后的目录移动到/opt下

[root@node1 ~]# mv sqoop-1.4.7.bin__hadoop-2.6.0 /opt/sqoop-1.4.7

2、配置

进入目录

[root@node1 ~]# cd sqoop-1.4.7
[root@node1 sqoop-1.4.7]# ls
bin        CHANGELOG.txt  conf  ivy      lib          NOTICE.txt   README.txt       sqoop-patch-review.py  src
build.xml  COMPILING.txt  docs  ivy.xml  LICENSE.txt  pom-old.xml  sqoop-1.4.7.jar  sqoop-test-1.4.7.jar   testdata
[root@node1 sqoop-1.4.7]#

编译配置文件

[root@node1 sqoop-1.4.7]# vi bin/configure-sqoop
## Moved to be a runtime check in sqoop.
#if [ ! -d "${HCAT_HOME}" ]; then
#  echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
#  echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
#fi

#if [ ! -d "${ACCUMULO_HOME}" ]; then
#  echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
#  echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
#fi
#if [ ! -d "${ZOOKEEPER_HOME}" ]; then
#  echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail."
#  echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.'
#f

3、测试MySQL数据库

(1)复制驱动程序

[root@node1 mysql-connector-java-5.1.46]# cp mysql-connector-java-5.1.46-bin.jar /opt/sqoop-1.4.7/lib/

(2)测试连接

[root@node1 sqoop-1.4.7]# bin/sqoop list-databases --connect jdbc:mysql://node3:3306/ --username root --password 123456
18/04/03 10:34:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
18/04/03 10:34:56 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/04/03 10:34:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
hive
mysql
performance_schema
test
webMagic
[root@node1 sqoop-1.4.7]#

(3)将MySQL中数据表导入HDFS

[root@node1 sqoop-1.4.7]# bin/sqoop import --connect jdbc:mysql://node3:3306/webMagic --table csdnblog --username root --password 123456 -m 1
18/04/03 10:50:12 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
18/04/03 10:50:12 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/04/03 10:50:12 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/04/03 10:50:12 INFO tool.CodeGenTool: Beginning code generation
18/04/03 10:50:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `csdnblog` AS t LIMIT 1
18/04/03 10:50:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `csdnblog` AS t LIMIT 1
18/04/03 10:50:13 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/hadoop-2.7.3
Note: /tmp/sqoop-root/compile/3e1f797b0f1cc2b27c58837b45ea81c1/csdnblog.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/04/03 10:50:19 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/3e1f797b0f1cc2b27c58837b45ea81c1/csdnblog.jar
18/04/03 10:50:19 WARN manager.MySQLManager: It looks like you are importing from mysql.
18/04/03 10:50:19 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
18/04/03 10:50:19 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
18/04/03 10:50:19 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
18/04/03 10:50:19 INFO mapreduce.ImportJobBase: Beginning import of csdnblog
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/04/03 10:50:20 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/04/03 10:50:23 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/04/03 10:50:36 INFO db.DBInputFormat: Using read commited transaction isolation
18/04/03 10:50:37 INFO mapreduce.JobSubmitter: number of splits:1
18/04/03 10:50:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1522766299245_0001
18/04/03 10:50:41 INFO impl.YarnClientImpl: Submitted application application_1522766299245_0001
18/04/03 10:50:42 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1522766299245_0001/
18/04/03 10:50:42 INFO mapreduce.Job: Running job: job_1522766299245_0001
18/04/03 10:51:04 INFO mapreduce.Job: Job job_1522766299245_0001 running in uber mode : false
18/04/03 10:51:04 INFO mapreduce.Job:  map 0% reduce 0%
18/04/03 10:51:48 INFO mapreduce.Job:  map 100% reduce 0%
18/04/03 10:51:52 INFO mapreduce.Job: Job job_1522766299245_0001 completed successfully
18/04/03 10:51:52 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=138180
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=30206
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=40637
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=40637
        Total vcore-milliseconds taken by all map tasks=40637
        Total megabyte-milliseconds taken by all map tasks=41612288
    Map-Reduce Framework
        Map input records=182
        Map output records=182
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=169
        CPU time spent (ms)=2190
        Physical memory (bytes) snapshot=140386304
        Virtual memory (bytes) snapshot=2152366080
        Total committed heap usage (bytes)=44564480
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=30206
18/04/03 10:51:52 INFO mapreduce.ImportJobBase: Transferred 29.498 KB in 89.0977 seconds (339.0213 bytes/sec)
18/04/03 10:51:52 INFO mapreduce.ImportJobBase: Retrieved 182 records.
[root@node1 sqoop-1.4.7]# 

(4)查看HDFS上新导入数据

[root@node1 sqoop-1.4.7]# hdfs dfs -ls /user/root
Found 3 items
drwxr-xr-x   - root supergroup          0 2017-09-22 10:31 /user/root/.sparkStaging
drwxr-xr-x   - root supergroup          0 2018-04-03 10:51 /user/root/csdnblog
drwxr-xr-x   - root supergroup          0 2017-10-05 09:40 /user/root/input
[root@node1 sqoop-1.4.7]# hdfs dfs -ls /user/root/csdnblog
Found 2 items
-rw-r--r--   3 root supergroup          0 2018-04-03 10:51 /user/root/csdnblog/_SUCCESS
-rw-r--r--   3 root supergroup      30206 2018-04-03 10:51 /user/root/csdnblog/part-m-00000
[root@node1 sqoop-1.4.7]#

4、可能遇到的错误

[root@node1 sqoop-1.4.7]# bin/sqoop import --connect jdbc:mysql://10.17.12.160:3306/test --table users --username root --password 123456 -m 1
Warning: /opt/sqoop-1.4.7/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
18/05/22 13:32:02 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
18/05/22 13:32:02 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/05/22 13:32:02 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/05/22 13:32:02 INFO tool.CodeGenTool: Beginning code generation
18/05/22 13:32:03 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `users` AS t LIMIT 1
18/05/22 13:32:03 ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@3f197a46 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
java.sql.SQLException: Streaming result set com.mysql.jdbc.RowDataDynamic@3f197a46 is still active. No statements may be issued when any streaming result sets are open and in use on a given connection. Ensure that you have called .close() on any active streaming result sets before attempting more queries.
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:931)
    at com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingData(MysqlIO.java:2518)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1748)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1961)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2537)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2466)
    at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1383)
    at com.mysql.jdbc.ConnectionImpl.getMaxBytesPerChar(ConnectionImpl.java:2939)
    at com.mysql.jdbc.Field.getMaxBytesPerCharacter(Field.java:576)
    at com.mysql.jdbc.ResultSetMetaData.getPrecision(ResultSetMetaData.java:440)
    at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:305)
    at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260)
    at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:246)
    at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:327)
    at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1872)
    at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1671)
    at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:501)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
18/05/22 13:32:03 ERROR tool.ImportTool: Import failed: java.io.IOException: No columns to generate for ClassWriter
    at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1677)
    at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:501)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:252)

[root@node1 sqoop-1.4.7]# 

解决办法,加上驱动参数

[root@node1 sqoop-1.4.7]# bin/sqoop import --connect jdbc:mysql://10.17.12.160:3306/test --table users --username root --password 123456 -m 1 --driver com.mysql.jdbc.Driver
Warning: /opt/sqoop-1.4.7/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
18/05/22 13:35:47 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
18/05/22 13:35:48 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/05/22 13:35:48 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
18/05/22 13:35:48 INFO manager.SqlManager: Using default fetchSize of 1000
18/05/22 13:35:48 INFO tool.CodeGenTool: Beginning code generation
18/05/22 13:35:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM users AS t WHERE 1=0
18/05/22 13:35:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM users AS t WHERE 1=0
18/05/22 13:35:48 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/hadoop-2.7.5
Note: /tmp/sqoop-root/compile/054354d9b9af11ea0e4c96f467800c74/users.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/05/22 13:35:50 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/054354d9b9af11ea0e4c96f467800c74/users.jar
18/05/22 13:35:50 INFO mapreduce.ImportJobBase: Beginning import of users
18/05/22 13:35:50 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/05/22 13:35:50 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM users AS t WHERE 1=0
18/05/22 13:35:51 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/05/22 13:35:51 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
18/05/22 13:35:56 INFO db.DBInputFormat: Using read commited transaction isolation
18/05/22 13:35:56 INFO mapreduce.JobSubmitter: number of splits:1
18/05/22 13:35:56 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1526097883376_0001
18/05/22 13:35:57 INFO impl.YarnClientImpl: Submitted application application_1526097883376_0001
18/05/22 13:35:57 INFO mapreduce.Job: The url to track the job: http://bigdata03-test:8088/proxy/application_1526097883376_0001/
18/05/22 13:35:57 INFO mapreduce.Job: Running job: job_1526097883376_0001
18/05/22 13:36:07 INFO mapreduce.Job: Job job_1526097883376_0001 running in uber mode : false
18/05/22 13:36:07 INFO mapreduce.Job:  map 0% reduce 0%
18/05/22 13:36:18 INFO mapreduce.Job:  map 100% reduce 0%
18/05/22 13:36:19 INFO mapreduce.Job: Job job_1526097883376_0001 completed successfully
18/05/22 13:36:19 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=142003
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=40
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=8202
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=8202
        Total vcore-milliseconds taken by all map tasks=8202
        Total megabyte-milliseconds taken by all map tasks=8398848
    Map-Reduce Framework
        Map input records=3
        Map output records=3
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=72
        CPU time spent (ms)=1510
        Physical memory (bytes) snapshot=164237312
        Virtual memory (bytes) snapshot=2122526720
        Total committed heap usage (bytes)=108527616
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=40
18/05/22 13:36:19 INFO mapreduce.ImportJobBase: Transferred 40 bytes in 28.4436 seconds (1.4063 bytes/sec)
18/05/22 13:36:19 INFO mapreduce.ImportJobBase: Retrieved 3 records.
[root@node1 sqoop-1.4.7]# 
阅读更多
换一批

没有更多推荐了,返回首页