Sqoop的安装和配置
- 下载sqoop和hadoop:wget http://archive.cloudera.com/cdh/3/sqoop-1.3.0-cdh3u1.tar.gz
wget http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u1.tar.gz - 解压:tar zxvf sqoop-1.3.0-cdh3u1.tar.gz
tar zxvf hadoop-0.20.2-cdh3u1.tar.gz - 设置环境变量:export SQOOP_HOME=XXXXX
- 将Hadoop的hadoop-core-0.20.2-cdh3u1.jar、HBase的hbase-XXX.jar、ZooKeeper的zookeeper-XXX.jar以及MySQL的mysql-connector-java-XXXX-bin.jar拷贝到Sqoop的lib目录下
- 设置要远程访问的MySQL数据库的权限:
- [root@test1 bin]# /usr/local/mysql/bin/mysql -u root -p
- Enter password:
- Welcome to the MySQL monitor. Commands end with ; or \g.
- Your MySQL connection id is 110
- Server version: 5.1.59 MySQL Community Server (GPL)
- Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.
- Oracle is a registered trademark of Oracle Corporation and/or its
- affiliates. Other names may be trademarks of their respective
- owners.
- Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
- mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' WITH GRANT OPTION;
- Query OK, 0 rows affected (0.00 sec)
- mysql> flush privileges;
- Query OK, 0 rows affected (0.01 sec)
sqoop的简单使用
连接MySQL数据库
查看MySQL的数据库
- sqoop list-databases --connect jdbc:mysql://IP:PORT/ --username USER --password PASSWD
- sqoop list-databases --connect jdbc:mysql://IP:PORT/ --username USER -P (此方式更加安全一点)
- sqoop list-databases --connect jdbc:mysql://IP:PORT/ --username USER --password PASSWD
- sqoop list-databases --connect jdbc:mysql://IP:PORT/ --username USER -P (此方式更加安全一点)
- [root@test1 bin]# sqoop list-databases --connect jdbc:mysql://IP:PORT/ --username root -P
- Enter password:
- 12/06/04 07:56:48 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
- information_schema
- hive_metastore
- mysql
- test
查看数据库中的表
- <strong>sqoop list-tables --connect jdbc:mysql://IP:PORT/database --username USER -P</strong>
- [root@test1 bin]# sqoop list-tables --connect jdbc:mysql://IP:PORT/mysql --username root -P
- Enter password:
- 12/06/04 07:55:57 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
- columns_priv
- db
- event
- func
- general_log
- help_category
- help_keyword
- help_relation
- help_topic
- host
- ndb_binlog_index
- plugin
- proc
- procs_priv
- servers
- slow_log
- tables_priv
- time_zone
- time_zone_leap_second
- time_zone_name
- time_zone_transition
- time_zone_transition_type
- user
sqoop-imoprt : RMDBS——>DFS
将mysql的test库中的表students的数据导到DFS上
- --connect : 要连接的数据库JDBC-URL
- --username:登录数据库的用户名
- --password:登录数据的密码
- --table : 需要导出的表
- --target-dir :目标目录
- --split-by:字段的分隔符
- [root@test1 bin]# sqoop import --verbose --fields-terminated-by ',' --connect jdbc:mysql://IP:PORT/test --username root --password PASSWD --table students --target-dir /test/students --split-by 'age'
- 12/06/04 11:52:07 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
- 12/06/04 11:52:07 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
- 12/06/04 11:52:07 INFO tool.CodeGenTool: Beginning code generation
- 12/06/04 11:52:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `students` AS t LIMIT 1
- 12/06/04 11:52:07 INFO orm.CompilationManager: HADOOP_HOME is /usr/local/hadoop/install/bin/..
- Note: /tmp/sqoop-root/compile/821690e0ecd4932aeddf6b4ad1252aa0/students.java uses or overrides a deprecated API.
- Note: Recompile with -Xlint:deprecation for details.
- 12/06/04 11:52:14 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/821690e0ecd4932aeddf6b4ad1252aa0/students.jar
- 12/06/04 11:52:14 WARN manager.MySQLManager: It looks like you are importing from mysql.
- 12/06/04 11:52:14 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
- 12/06/04 11:52:14 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
- 12/06/04 11:52:14 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
- 12/06/04 11:52:14 INFO mapreduce.ImportJobBase: Beginning import of students
- 12/06/04 11:52:56 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`age`), MAX(`age`) FROM `students`
- 12/06/04 11:52:57 INFO mapred.JobClient: Running job: job_201206040901_0014
- 12/06/04 11:52:58 INFO mapred.JobClient: map 0% reduce 0%
- 12/06/04 11:54:04 INFO mapred.JobClient: map 100% reduce 0%
- 12/06/04 11:54:18 INFO mapred.JobClient: Job complete: job_201206040901_0014
- 12/06/04 11:54:18 INFO mapred.JobClient: Counters: 14
- 12/06/04 11:54:18 INFO mapred.JobClient: Job Counters
- 12/06/04 11:54:18 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=35427
- 12/06/04 11:54:18 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
- 12/06/04 11:54:18 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
- 12/06/04 11:54:18 INFO mapred.JobClient: Launched map tasks=1
- 12/06/04 11:54:18 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
- 12/06/04 11:54:18 INFO mapred.JobClient: File Output Format Counters
- 12/06/04 11:54:18 INFO mapred.JobClient: Bytes Written=21
- 12/06/04 11:54:18 INFO mapred.JobClient: FileSystemCounters
- 12/06/04 11:54:18 INFO mapred.JobClient: GFS_BYTES_WRITTEN=21
- 12/06/04 11:54:18 INFO mapred.JobClient: FILE_BYTES_WRITTEN=39425
- 12/06/04 11:54:18 INFO mapred.JobClient: GFS_BYTES_READ=106
- 12/06/04 11:54:18 INFO mapred.JobClient: File Input Format Counters
- 12/06/04 11:54:18 INFO mapred.JobClient: Bytes Read=0
- 12/06/04 11:54:18 INFO mapred.JobClient: Map-Reduce Framework
- 12/06/04 11:54:18 INFO mapred.JobClient: Map input records=2
- 12/06/04 11:54:18 INFO mapred.JobClient: Spilled Records=0
- 12/06/04 11:54:18 INFO mapred.JobClient: Map output records=2
- 12/06/04 11:54:18 INFO mapred.JobClient: SPLIT_RAW_BYTES=103
- 12/06/04 11:54:18 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 123.2224 seconds (0 bytes/sec)
- 12/06/04 11:54:18 INFO mapreduce.ImportJobBase: Retrieved 2 records.
- [root@test1 bin]# hadoop fs -ls /test/students
- Found 3 items
- -rwxrwxrwx 2 0 2012-06-04 11:54 /test/students/_SUCCESS
- drwxrwxrwx - 0 2012-06-04 11:52 /test/students/_logs
- -rwxrwxrwx 2 21 2012-06-04 11:53 /test/students/part-m-00000
- [root@test1 bin]# hadoop fs -cat /test/students/part-m-00000
- aaaaaa,26
- bbbbbb,27
- [root@test1 bin]#
- [root@test1 conf]# /usr/local/mysql/bin/mysql -uUSER -pPASSWD
- Welcome to the MySQL monitor. Commands end with ; or \g.
- Your MySQL connection id is 230
- Server version: 5.1.59 MySQL Community Server (GPL)
- Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.
- Oracle is a registered trademark of Oracle Corporation and/or its
- affiliates. Other names may be trademarks of their respective
- owners.
- Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
- mysql> select * from test.students;
- +---------+------+
- | name | age |
- +---------+------+
- | aaaaaa | 26 |
- | bbbbbb | 27 |
- +---------+------+
- 2 rows in set (0.01 sec)
- mysql>
sqoop-export:DFS——>RMDBS
将数据从DFS导入到MySQL- [root@test1 bin]# sqoop export --connect jdbc:mysql://IP:PORT/test --username USER --password PASSWD --table students --export-dir /test/students/ --input-fields-terminated-by '\t'
- 12/06/04 12:31:49 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
- 12/06/04 12:31:49 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
- 12/06/04 12:31:49 INFO tool.CodeGenTool: Beginning code generation
- 12/06/04 12:31:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `students` AS t LIMIT 1
- 12/06/04 12:31:50 INFO orm.CompilationManager: HADOOP_HOME is /usr/local/hadoop/install/bin/..
- Note: /tmp/sqoop-root/compile/546460e114dd8311c4a1dc80cac7419b/students.java uses or overrides a deprecated API.
- Note: Recompile with -Xlint:deprecation for details.
- 12/06/04 12:32:01 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/546460e114dd8311c4a1dc80cac7419b/students.jar
- 12/06/04 12:32:01 INFO mapreduce.ExportJobBase: Beginning export of students
- 12/06/04 12:32:40 INFO input.FileInputFormat: Total input paths to process : 1
- 12/06/04 12:32:40 INFO input.FileInputFormat: Total input paths to process : 1
- 12/06/04 12:32:41 INFO mapred.JobClient: Running job: job_201206040901_0021
- 12/06/04 12:32:42 INFO mapred.JobClient: map 0% reduce 0%
- 12/06/04 12:33:27 INFO mapred.JobClient: map 100% reduce 0%
- 12/06/04 12:33:38 INFO mapred.JobClient: Job complete: job_201206040901_0021
- 12/06/04 12:33:39 INFO mapred.JobClient: Counters: 14
- 12/06/04 12:33:39 INFO mapred.JobClient: Job Counters
- 12/06/04 12:33:39 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=20935
- 12/06/04 12:33:39 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
- 12/06/04 12:33:39 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
- 12/06/04 12:33:39 INFO mapred.JobClient: Rack-local map tasks=1
- 12/06/04 12:33:39 INFO mapred.JobClient: Launched map tasks=1
- 12/06/04 12:33:39 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
- 12/06/04 12:33:39 INFO mapred.JobClient: File Output Format Counters
- 12/06/04 12:33:39 INFO mapred.JobClient: Bytes Written=0
- 12/06/04 12:33:39 INFO mapred.JobClient: FileSystemCounters
- 12/06/04 12:33:39 INFO mapred.JobClient: FILE_BYTES_WRITTEN=39238
- 12/06/04 12:33:39 INFO mapred.JobClient: GFS_BYTES_READ=148
- 12/06/04 12:33:39 INFO mapred.JobClient: File Input Format Counters
- 12/06/04 12:33:39 INFO mapred.JobClient: Bytes Read=0
- 12/06/04 12:33:39 INFO mapred.JobClient: Map-Reduce Framework
- 12/06/04 12:33:39 INFO mapred.JobClient: Map input records=2
- 12/06/04 12:33:39 INFO mapred.JobClient: Spilled Records=0
- 12/06/04 12:33:39 INFO mapred.JobClient: Map output records=2
- 12/06/04 12:33:39 INFO mapred.JobClient: SPLIT_RAW_BYTES=108
- 12/06/04 12:33:39 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 96.3916 seconds (0 bytes/sec)
- 12/06/04 12:33:39 INFO mapreduce.ExportJobBase: Exported 2 records.
查看DFS上的数据是否已经导入mysql:
- [root@test1 bin]# hadoop fs -cat /test/students/part-m*
- cccccc 26
- dddddd 24
- [root@test1 bin]# /usr/local/mysql/bin/mysql -uUSER -pPASSWD
- Welcome to the MySQL monitor. Commands end with ; or \g.
- Your MySQL connection id is 262
- Server version: 5.1.59 MySQL Community Server (GPL)
- Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.
- Oracle is a registered trademark of Oracle Corporation and/or its
- affiliates. Other names may be trademarks of their respective
- owners.
- Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
- mysql> select * from test.students;
- +---------+------+
- | name | age |
- +---------+------+
- | aaaaaa | 26 |
- | bbbbbb | 27 |
- | cccccc | 26 |
- | dddddd | 24 |
- +---------+------+
- 4 rows in set (0.00 sec)
- mysql>
选取一定条件的数据导入DFS
- sqoop import --connect jdbc:mysql://IP:PORT/test --username USER --password PASSWD --query 'SELECT * FROM students WHERE age=26 AND $CONDITIONS' -m 1 --target-dir /test/student26
- 12/06/05 05:53:55 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
- 12/06/05 05:53:55 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
- 12/06/05 05:53:55 INFO tool.CodeGenTool: Beginning code generation
- 12/06/05 05:53:56 INFO manager.SqlManager: Executing SQL statement: SELECT * FROM students WHERE age=26 AND (1 = 0)
- 12/06/05 05:53:56 INFO manager.SqlManager: Executing SQL statement: SELECT * FROM students WHERE age=26 AND (1 = 0)
- 12/06/05 05:53:56 INFO orm.CompilationManager: HADOOP_HOME is /usr/local/hadoop/install/bin/..
- 12/06/05 05:54:06 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/38bfade178cccb8dfaa2b31c2502eec3/QueryResult.jar
- 12/06/05 05:54:06 INFO mapreduce.ImportJobBase: Beginning query import.
- 12/06/05 05:54:38 INFO mapred.JobClient: Running job: job_201206040901_0033
- 12/06/05 05:54:39 INFO mapred.JobClient: map 0% reduce 0%
- 12/06/05 05:55:11 INFO mapred.JobClient: map 100% reduce 0%
- 12/06/05 05:55:19 INFO mapred.JobClient: Job complete: job_201206040901_0033
- 12/06/05 05:55:19 INFO mapred.JobClient: Counters: 14
- 12/06/05 05:55:19 INFO mapred.JobClient: Job Counters
- 12/06/05 05:55:19 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=21246
- 12/06/05 05:55:19 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
- 12/06/05 05:55:19 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
- 12/06/05 05:55:19 INFO mapred.JobClient: Launched map tasks=1
- 12/06/05 05:55:19 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
- 12/06/05 05:55:19 INFO mapred.JobClient: File Output Format Counters
- 12/06/05 05:55:19 INFO mapred.JobClient: Bytes Written=20
- 12/06/05 05:55:19 INFO mapred.JobClient: FileSystemCounters
- 12/06/05 05:55:19 INFO mapred.JobClient: GFS_BYTES_WRITTEN=20
- 12/06/05 05:55:19 INFO mapred.JobClient: FILE_BYTES_WRITTEN=32065
- 12/06/05 05:55:19 INFO mapred.JobClient: GFS_BYTES_READ=90
- 12/06/05 05:55:19 INFO mapred.JobClient: File Input Format Counters
- 12/06/05 05:55:19 INFO mapred.JobClient: Bytes Read=0
- 12/06/05 05:55:19 INFO mapred.JobClient: Map-Reduce Framework
- 12/06/05 05:55:19 INFO mapred.JobClient: Map input records=2
- 12/06/05 05:55:19 INFO mapred.JobClient: Spilled Records=0
- 12/06/05 05:55:19 INFO mapred.JobClient: Map output records=2
- 12/06/05 05:55:19 INFO mapred.JobClient: SPLIT_RAW_BYTES=87
- 12/06/05 05:55:19 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 72.4785 seconds (0 bytes/sec)
- 12/06/05 05:55:19 INFO mapreduce.ImportJobBase: Retrieved 2 records.
到目前为止已经可以实现MySQL和DFS之间相互导数据了,可以先凑合着用了。
使用sqoop将MySQL数据库中的数据导入Hbase
- sqoop import --connect jdbc:mysql://IP:PORT/test --table students --hbase-table sqoop_test --column-family st_info --hbase-row-key id --hbase-create-table --username USER --password PASSWD
- 12/06/13 15:09:10 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
- 12/06/13 15:09:10 INFO tool.CodeGenTool: Beginning code generation
- 12/06/13 15:09:11 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `students` AS t LIMIT 1
- 12/06/13 15:09:11 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `students` AS t LIMIT 1
- 12/06/13 15:09:11 INFO orm.CompilationManager: HADOOP_HOME is /usr/local/hadoop/install/bin/..
- 12/06/13 15:09:18 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/807268e14873b072ec8b9a9e7f46f326/students.jar
- 12/06/13 15:09:18 WARN manager.MySQLManager: It looks like you are importing from mysql.
- 12/06/13 15:09:18 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
- 12/06/13 15:09:18 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
- 12/06/13 15:09:18 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
- 12/06/13 15:09:18 INFO mapreduce.ImportJobBase: Beginning import of students
- 12/06/13 15:09:19 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `students` AS t LIMIT 1
- 12/06/13 15:09:20 INFO zookeeper.ZooKeeper: Session: 0x437cf07b9a50013 closed
- 12/06/13 15:09:20 INFO mapreduce.HBaseImportJob: Creating missing HBase table sqoop_test
- 12/06/13 15:09:20 INFO zookeeper.ClientCnxn: EventThread shut down
- 12/06/13 15:09:54 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM `students`
- 12/06/13 15:09:56 INFO mapred.JobClient: Running job: job_201206131415_0002
- 12/06/13 15:09:57 INFO mapred.JobClient: map 0% reduce 0%
- 12/06/13 15:10:29 INFO mapred.JobClient: map 33% reduce 0%
- 12/06/13 15:10:54 INFO mapred.JobClient: map 66% reduce 0%
- 12/06/13 15:10:58 INFO mapred.JobClient: map 100% reduce 0%
- 12/06/13 15:11:09 INFO mapred.JobClient: Job complete: job_201206131415_0002
- 12/06/13 15:11:09 INFO mapred.JobClient: Counters: 13
- 12/06/13 15:11:09 INFO mapred.JobClient: Job Counters
- 12/06/13 15:11:09 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=48737
- 12/06/13 15:11:09 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
- 12/06/13 15:11:09 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
- 12/06/13 15:11:09 INFO mapred.JobClient: Launched map tasks=3
- 12/06/13 15:11:09 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
- 12/06/13 15:11:09 INFO mapred.JobClient: File Output Format Counters
- 12/06/13 15:11:09 INFO mapred.JobClient: Bytes Written=0
- 12/06/13 15:11:09 INFO mapred.JobClient: FileSystemCounters
- 12/06/13 15:11:09 INFO mapred.JobClient: FILE_BYTES_WRITTEN=124270
- 12/06/13 15:11:09 INFO mapred.JobClient: GFS_BYTES_READ=304
- 12/06/13 15:11:09 INFO mapred.JobClient: File Input Format Counters
- 12/06/13 15:11:09 INFO mapred.JobClient: Bytes Read=0
- 12/06/13 15:11:09 INFO mapred.JobClient: Map-Reduce Framework
- 12/06/13 15:11:09 INFO mapred.JobClient: Map input records=4
- 12/06/13 15:11:09 INFO mapred.JobClient: Spilled Records=0
- 12/06/13 15:11:09 INFO mapred.JobClient: Map output records=4
- 12/06/13 15:11:09 INFO mapred.JobClient: SPLIT_RAW_BYTES=295
- 12/06/13 15:11:09 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 103.0751 seconds (0 bytes/sec)
- 12/06/13 15:11:09 INFO mapreduce.ImportJobBase: Retrieved 4 records.
- hbase(main):004:0> list
- TABLE
- sqoop_test
- 1 row(s) in 0.7200 seconds
- hbase(main):005:0> scan 'sqoop_test'
- ROW COLUMN+CELL
- 1 column=st_info:age, timestamp=1339571427057, value=26
- 1 column=st_info:name, timestamp=1339571427057, value=aaaaaa
- 2 column=st_info:age, timestamp=1339571450535, value=27
- 2 column=st_info:name, timestamp=1339571450535, value=bbbbbb
- 3 column=st_info:age, timestamp=1339571451670, value=26
- 3 column=st_info:name, timestamp=1339571451670, value=cccccc
- 4 column=st_info:age, timestamp=1339571451670, value=24
- 4 column=st_info:name, timestamp=1339571451670, value=dddddd
- 4 row(s) in 0.5560 seconds
- hbase(main):006:0>
性能测试
1)MySQL导向DFS
数据库表sqoop_test:- mysql> desc relation ;
- +-----------------+------------------+------+-----+---------+-------+
- | Field | Type | Null | Key | Default | Extra |
- +-----------------+------------------+------+-----+---------+-------+
- | cid | varchar(40) | | | | |
- | recommended_cid | varchar(40) | | | | |
- | similarity | int(10) unsigned | | | 0 | |
- +-----------------+------------------+------+-----+---------+-------+
- 3 rows in set (0.00 sec)
- mysql> select count(*) from relation;
- +----------+
- | count(*) |
- +----------+
- | 23512528 |
- +----------+
- 1 row in set (0.00 sec)
split-by | map num | time |
cid | 6 | 4m16.053s |
cid | 12 | 2m36.905s |
| 4 | 3m36.390s |
| 10 | 2m5.120s |
注:推荐--sqlit-by的参数为整数类型的:
- 12/06/11 17:03:16 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.
2)DFS导向MySQL
数据还是和上述的数据一样,总共23512528行记录。
测试结果如下表所示:
map num | time |
4 | 42m56.324s |
10 | 41m31.888s |
在往Hive导数据的过程中遇到下面的问题,目前还未找到解决方法
- [root@test1 ~]# sqoop import --connect jdbc:mysql://IP:PORT/test --username USER --password PASSWD --table students --input-fields-terminated-by '\t' --hive-import -m 1
- 12/06/05 06:34:32 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
- 12/06/05 06:34:32 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
- 12/06/05 06:34:32 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
- 12/06/05 06:34:32 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
- 12/06/05 06:34:32 INFO tool.CodeGenTool: Beginning code generation
- 12/06/05 06:34:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `students` AS t LIMIT 1
- 12/06/05 06:34:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `students` AS t LIMIT 1
- 12/06/05 06:34:33 INFO orm.CompilationManager: HADOOP_HOME is /usr/local/hadoop/install/bin/..
- 12/06/05 06:34:39 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/97263d4aeecd15c92e0b3d570c3aa024/students.jar
- 12/06/05 06:34:39 WARN manager.MySQLManager: It looks like you are importing from mysql.
- 12/06/05 06:34:39 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
- 12/06/05 06:34:39 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
- 12/06/05 06:34:39 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
- 12/06/05 06:34:39 INFO mapreduce.ImportJobBase: Beginning import of students
- 12/06/05 06:34:40 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `students` AS t LIMIT 1
- 12/06/05 06:35:08 INFO mapred.JobClient: Running job: job_201206040901_0035
- 12/06/05 06:35:09 INFO mapred.JobClient: map 0% reduce 0%
- 12/06/05 06:35:52 INFO mapred.JobClient: map 100% reduce 0%
- 12/06/05 06:36:03 INFO mapred.JobClient: Job complete: job_201206040901_0035
- 12/06/05 06:36:03 INFO mapred.JobClient: Counters: 14
- 12/06/05 06:36:03 INFO mapred.JobClient: Job Counters
- 12/06/05 06:36:03 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=28519
- 12/06/05 06:36:03 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
- 12/06/05 06:36:03 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
- 12/06/05 06:36:03 INFO mapred.JobClient: Launched map tasks=1
- 12/06/05 06:36:03 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
- 12/06/05 06:36:03 INFO mapred.JobClient: File Output Format Counters
- 12/06/05 06:36:03 INFO mapred.JobClient: Bytes Written=41
- 12/06/05 06:36:03 INFO mapred.JobClient: FileSystemCounters
- 12/06/05 06:36:03 INFO mapred.JobClient: GFS_BYTES_WRITTEN=41
- 12/06/05 06:36:03 INFO mapred.JobClient: FILE_BYTES_WRITTEN=31978
- 12/06/05 06:36:03 INFO mapred.JobClient: GFS_BYTES_READ=90
- 12/06/05 06:36:03 INFO mapred.JobClient: File Input Format Counters
- 12/06/05 06:36:03 INFO mapred.JobClient: Bytes Read=0
- 12/06/05 06:36:03 INFO mapred.JobClient: Map-Reduce Framework
- 12/06/05 06:36:03 INFO mapred.JobClient: Map input records=4
- 12/06/05 06:36:03 INFO mapred.JobClient: Spilled Records=0
- 12/06/05 06:36:03 INFO mapred.JobClient: Map output records=4
- 12/06/05 06:36:03 INFO mapred.JobClient: SPLIT_RAW_BYTES=87
- 12/06/05 06:36:03 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 83.3192 seconds (0 bytes/sec)
- 12/06/05 06:36:03 INFO mapreduce.ImportJobBase: Retrieved 4 records.
- 12/06/05 06:36:03 INFO hive.HiveImport: Removing temporary files from import process: students/_logs
- 12/06/05 06:36:03 INFO hive.HiveImport: Loading uploaded data into Hive
- 12/06/05 06:36:03 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `students` AS t LIMIT 1
- 12/06/05 06:36:03 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `students` AS t LIMIT 1
- hive-log4j.properties not found
- Hive history file=/tmp/root/hive_job_log_root_201206050636_1946485104.txt
- 12/06/05 06:36:05 INFO exec.HiveHistory: Hive history file=/tmp/root/hive_job_log_root_201206050636_1946485104.txt
- 12/06/05 06:36:05 INFO parse.ParseDriver: Parsing command: CREATE TABLE IF NOT EXISTS `students` ( `name` STRING, `age` INT) COMMENT 'Imported by sqoop on 2012/06/05 06:36:03' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE
- 12/06/05 06:36:06 INFO parse.ParseDriver: Parse Completed
- 12/06/05 06:36:06 INFO parse.SemanticAnalyzer: Starting Semantic Analysis
- 12/06/05 06:36:06 INFO parse.SemanticAnalyzer: Creating table students position=27
- org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: URI: does not have a scheme
- at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:919)
- at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:904)
- at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:7074)
- at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6573)
- at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
- at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
- at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
- at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
- at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
- at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:262)
- at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:269)
- at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:430)
- at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
- at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
- at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
- at java.lang.reflect.Method.invoke(Method.java:616)
- at com.cloudera.sqoop.hive.HiveImport.executeScript(HiveImport.java:270)
- at com.cloudera.sqoop.hive.HiveImport.importTable(HiveImport.java:218)
- at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:362)
- at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:423)
- at com.cloudera.sqoop.Sqoop.run(Sqoop.java:144)
- at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
- at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:180)
- at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:219)
- at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:228)
- at com.cloudera.sqoop.Sqoop.main(Sqoop.java:237)
- Caused by: java.lang.IllegalArgumentException: URI: does not have a scheme
- at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:127)
- at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:1868)
- at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:1878)
- at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:917)
- ... 25 more
- 12/06/05 06:36:06 INFO ql.Driver: Semantic Analysis Completed
- 12/06/05 06:36:06 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
- 12/06/05 06:36:06 INFO ql.Driver: Starting command: CREATE TABLE IF NOT EXISTS `students` ( `name` STRING, `age` INT) COMMENT 'Imported by sqoop on 2012/06/05 06:36:03' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE
- 12/06/05 06:36:06 INFO exec.DDLTask: Default to LazySimpleSerDe for table students
- 12/06/05 06:36:06 INFO hive.log: DDL: struct students { string name, i32 age}
- FAILED: Error in metadata: java.lang.IllegalArgumentException: URI: does not have a scheme
- 12/06/05 06:36:07 ERROR exec.DDLTask: FAILED: Error in metadata: java.lang.IllegalArgumentException: URI: does not have a scheme
- org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: URI: does not have a scheme
- at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:476)
- at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3146)
- at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:213)
- at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
- at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
- at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)
- at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)
- at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)
- at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
- at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
- at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:262)
- at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:269)
- at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:430)
- at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
- at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
- at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
- at java.lang.reflect.Method.invoke(Method.java:616)
- at com.cloudera.sqoop.hive.HiveImport.executeScript(HiveImport.java:270)
- at com.cloudera.sqoop.hive.HiveImport.importTable(HiveImport.java:218)
- at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:362)
- at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:423)
- at com.cloudera.sqoop.Sqoop.run(Sqoop.java:144)
- at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
- at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:180)
- at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:219)
- at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:228)
- at com.cloudera.sqoop.Sqoop.main(Sqoop.java:237)
- Caused by: java.lang.IllegalArgumentException: URI: does not have a scheme
- at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:127)
- at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:1868)
- at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:1878)
- at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:470)
- ... 26 more
- FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
- 12/06/05 06:36:07 ERROR ql.Driver: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
- 12/06/05 06:36:08 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Hive CliDriver exited with status=9
- at com.cloudera.sqoop.hive.HiveImport.executeScript(HiveImport.java:294)
- at com.cloudera.sqoop.hive.HiveImport.importTable(HiveImport.java:218)
- at com.cloudera.sqoop.tool.ImportTool.importTable(ImportTool.java:362)
- at com.cloudera.sqoop.tool.ImportTool.run(ImportTool.java:423)
- at com.cloudera.sqoop.Sqoop.run(Sqoop.java:144)
- at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
- at com.cloudera.sqoop.Sqoop.runSqoop(Sqoop.java:180)
- at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:219)
- at com.cloudera.sqoop.Sqoop.runTool(Sqoop.java:228)
- at com.cloudera.sqoop.Sqoop.main(Sqoop.java:237)