还没有安装hafs和hbase,尝试熟悉一下命令。发现一篇不错的文章,转载
http://blog.csdn.net/niityzu/article/details/42834993
问题导读:
1、--connect参数作用?
2、使用哪个参数从控制台读取数据库访问密码?
3、Sqoop将关系型数据库表中数据导入HDFS基本参数要求及命令?
4、数据默认导入HDFS文件系统中的路径?
5、--columns参数的作用?
6、--where参数的作用?
一、部分关键参数介绍
- 参数介绍
- --connect <jdbc-uri> 指定关系型数据库JDBC连接字符串
- --connection-manager <class-name> 指定数据库使用的管理类
- --driver <class-name> 手动指定要使用JDBC驱动程序类
- --hadoop-mapred-home <dir> 重写覆盖$HADOOP_MAPRED_HOME
- --help 使用提示帮助提示
- --password-file 设置包含身份验证密码的路径文件
- -P 从控制台读取数据库访问密码
- --password <password> 设置数据库身份验证密码
- --username <username> 设置数据库访问用户
- --verbose 打印更多程序执行流程信息
- --connection-param-file <filename> 用于提供连接参数的可选的属性文件
- [hadoopUser@secondmgt ~]$ mysql -uhive -phive spice
- Reading table information for completion of table and column names
- You can turn off this feature to get a quicker startup with -A
- Welcome to the MySQL monitor. Commands end with ; or \g.
- Your MySQL connection id is 419
- Server version: 5.1.73 Source distribution
- Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
- Oracle is a registered trademark of Oracle Corporation and/or its
- affiliates. Other names may be trademarks of their respective
- owners.
- Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
- mysql> select * from users;
- +----+----------+----------+-----+---------+------------+-------+------+
- | id | username | password | sex | content | datetime | vm_id | isad |
- +----+----------+----------+-----+---------+------------+-------+------+
- | 56 | hua | hanyun | 男 | 开通 | 2013-12-02 | 0 | 1 |
- | 58 | feng | 123456 | 男 | 开通 | 2013-11-22 | 0 | 0 |
- | 59 | test | 123456 | 男 | 开通 | 2014-03-05 | 58 | 0 |
- | 60 | user1 | 123456 | 男 | 开通 | 2014-06-26 | 66 | 0 |
- | 61 | user2 | 123 | 男 | 开通 | 2013-12-13 | 56 | 0 |
- | 62 | user3 | 123456 | 男 | 开通 | 2013-12-14 | 0 | 0 |
- | 64 | kai.zhou | 123456 | ? | ?? | 2014-03-05 | 65 | 0 |
- +----+----------+----------+-----+---------+------------+-------+------+
- 7 rows in set (0.00 sec)
三、将上面数据库users表中数据导入到HDFS中
执行导入命令,最少要指定数据库连接字符串、访问用户名、访问密码和要导入的表名,默认情况下会将数据导入到HDFS根目录下的/user/hadoopUser/<表名>/目录下,也可以使用--target-dir参数,指定导入目录。如下:
- [hadoopUser@secondmgt ~]$ sqoop import --connect jdbc:mysql://secondmgt:3306/spice --username hive --password hive --table users --target-dir /output/sqoop/
- Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
- Please set $HCAT_HOME to the root of your HCatalog installation.
- 15/01/17 20:28:16 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
- 15/01/17 20:28:16 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
- 15/01/17 20:28:16 INFO tool.CodeGenTool: Beginning code generation
- 15/01/17 20:28:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `users` AS t LIMIT 1
- 15/01/17 20:28:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `users` AS t LIMIT 1
- 15/01/17 20:28:16 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoopUser/cloud/hadoop/programs/hadoop-2.2.0
- Note: /tmp/sqoop-hadoopUser/compile/c010e7410ec7339ef9b4d9dc2ddaac80/users.java uses or overrides a deprecated API.
- Note: Recompile with -Xlint:deprecation for details.
- 15/01/17 20:28:18 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoopUser/compile/c010e7410ec7339ef9b4d9dc2ddaac80/users.jar
- 15/01/17 20:28:18 WARN manager.MySQLManager: It looks like you are importing from mysql.
- 15/01/17 20:28:18 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
- 15/01/17 20:28:18 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
- 15/01/17 20:28:18 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
- 15/01/17 20:28:18 INFO mapreduce.ImportJobBase: Beginning import of users
- 15/01/17 20:28:18 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
- SLF4J: Class path contains multiple SLF4J bindings.
- SLF4J: Found binding in [jar:file:/home/hadoopUser/cloud/hadoop/programs/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
- SLF4J: Found binding in [jar:file:/home/hadoopUser/cloud/hbase/hbase-0.96.2-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
- SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
- SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
- 15/01/17 20:28:18 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
- 15/01/17 20:28:19 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
- 15/01/17 20:28:19 INFO client.RMProxy: Connecting to ResourceManager at secondmgt/192.168.2.133:8032
- 15/01/17 20:28:20 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM `users`
- 15/01/17 20:28:20 INFO mapreduce.JobSubmitter: number of splits:4
- 15/01/17 20:28:20 INFO Configuration.deprecation: mapred.job.classpath.files is deprecated. Instead, use mapreduce.job.classpath.files
- 15/01/17 20:28:20 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
- 15/01/17 20:28:20 INFO Configuration.deprecation: mapred.cache.files.filesizes is deprecated. Instead, use mapreduce.job.cache.files.filesizes
- 15/01/17 20:28:20 INFO Configuration.deprecation: mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files
- 15/01/17 20:28:20 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
- 15/01/17 20:28:20 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
- 15/01/17 20:28:20 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
- 15/01/17 20:28:20 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
- 15/01/17 20:28:20 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
- 15/01/17 20:28:20 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
- 15/01/17 20:28:20 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
- 15/01/17 20:28:20 INFO Configuration.deprecation: mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps
- 15/01/17 20:28:20 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
- 15/01/17 20:28:20 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
- 15/01/17 20:28:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1421373857783_0002
- 15/01/17 20:28:21 INFO impl.YarnClientImpl: Submitted application application_1421373857783_0002 to ResourceManager at secondmgt/192.168.2.133:8032
- 15/01/17 20:28:21 INFO mapreduce.Job: The url to track the job: http://secondmgt:8088/proxy/application_1421373857783_0002/
- 15/01/17 20:28:21 INFO mapreduce.Job: Running job: job_1421373857783_0002
- 15/01/17 20:28:34 INFO mapreduce.Job: Job job_1421373857783_0002 running in uber mode : false
- 15/01/17 20:28:34 INFO mapreduce.Job: map 0% reduce 0%
- 15/01/17 20:28:44 INFO mapreduce.Job: map 25% reduce 0%
- 15/01/17 20:28:49 INFO mapreduce.Job: map 75% reduce 0%
- 15/01/17 20:28:54 INFO mapreduce.Job: map 100% reduce 0%
- 15/01/17 20:28:54 INFO mapreduce.Job: Job job_1421373857783_0002 completed successfully
- 15/01/17 20:28:54 INFO mapreduce.Job: Counters: 27
- File System Counters
- FILE: Number of bytes read=0
- FILE: Number of bytes written=368040
- FILE: Number of read operations=0
- FILE: Number of large read operations=0
- FILE: Number of write operations=0
- HDFS: Number of bytes read=401
- HDFS: Number of bytes written=288
- HDFS: Number of read operations=16
- HDFS: Number of large read operations=0
- HDFS: Number of write operations=8
- Job Counters
- Launched map tasks=4
- Other local map tasks=4
- Total time spent by all maps in occupied slots (ms)=174096
- Total time spent by all reduces in occupied slots (ms)=0
- Map-Reduce Framework
- Map input records=7
- Map output records=7
- Input split bytes=401
- Spilled Records=0
- Failed Shuffles=0
- Merged Map outputs=0
- GC time elapsed (ms)=205
- CPU time spent (ms)=10510
- Physical memory (bytes) snapshot=599060480
- Virtual memory (bytes) snapshot=3535720448
- Total committed heap usage (bytes)=335544320
- File Input Format Counters
- Bytes Read=0
- File Output Format Counters
- Bytes Written=288
- 15/01/17 20:28:54 INFO mapreduce.ImportJobBase: Transferred 288 bytes in 35.2792 seconds (8.1635 bytes/sec)
- 15/01/17 20:28:54 INFO mapreduce.ImportJobBase: Retrieved 7 records.
- [hadoopUser@secondmgt ~]$ hadoop fs -cat /output/sqoop/*
- 56,hua,hanyun,男,开通,2013-12-02,0,1
- 58,feng,123456,男,开通,2013-11-22,0,0
- 59,test,123456,男,开通,2014-03-05,58,0
- 60,user1,123456,男,开通,2014-06-26,66,0
- 61,user2,123,男,开通,2013-12-13,56,0
- 62,user3,123456,男,开通,2013-12-14,0,0
- 64,kai.zhou,123456,?,??,2014-03-05,65,0
与原数据库中记录一样,导入成功。
五、选择部分数据导入
1、--columns参数指定列
Sqoop默认是将表中每条记录的所有属性值导入,有的时候,我们只需要导入部分属性值,这时可以使用--columns参数,指定需要导入的列名,多个列之间用逗号隔开。如下将users表中的用户名、性别和时间导入到HDFS中:
- [hadoopUser@secondmgt ~]$ sqoop import --connect jdbc:mysql://secondmgt:3306/spice --username hive --password hive \
- > --table users --columns "username,sex,datetime" --target-dir /output/sqoop/
- [hadoopUser@secondmgt ~]$ hadoop fs -cat /output/sqoop/*
- hua,男,2013-12-02
- feng,男,2013-11-22
- test,男,2014-03-05
- user1,男,2014-06-26
- user2,男,2013-12-13
- user3,男,2013-12-14
- kai.zhou,?,2014-03-05
另一个参数--where,可以对行做过滤,得到部分符合条件的记录,而不是表中全部记录。如下,将users表中id值大于60的数据导入到HDFS中:
- [hadoopUser@secondmgt conf]$ sqoop import --connect jdbc:mysql://secondmgt:3306/spice --username hive --password hive \
- > --table users --where " id > 60" --target-dir /output/sqoop/
- [hadoopUser@secondmgt conf]$ hadoop fs -cat /output/sqoop/*
- 61,user2,123,男,开通,2013-12-13,56,0
- 62,user3,123456,男,开通,2013-12-14,0,0
- 64,kai.zhou,123456,?,??,2014-03-05,65,0