sqoop1
1、使用Sqoop导入MySQL数据到HDFS
[root@srv01 ~]# sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password root --table user --columns 'uid,uname' -m 1 -target-dir '/sqoop/user'; //-m 指定map进程数,-target-dir指定存放目录
2、使用Sqoop导入MySQL数据到Hive中
[root@srv01 ~]# sqoop import --hive-import --connect jdbc:mysql://localhost:3306/test --username root --password root --table user --columns 'uid,uname' -m 1
3、使用Sqoop导入MySQL数据到Hive中,并且指定表名
[root@srv01 ~]# sqoop import --hive-import --connect jdbc:mysql://localhost:3306/test --username root --password root --table user --columns 'uid,uname' -m 1 --hive-table user1; //如果hive中没有这张表,则创建这张表保存对应数据
4、使用Sqoop导入MySQL数据到Hive中,并使用where条件
[root@srv01 ~]# sqoop import --hive-import --connect jdbc:mysql://localhost:3306/test --username root --password root --table user --columns 'uid,uname' -m 1 --hive-table user2 where uid=10;
5、使用Sqoop导入MySQL数据到Hive中,并使用查询语句
[root@srv01 ~]# sqoop import --hive-import --connect jdbc:mysql://localhost:3306/test --username root --password root -m 1 --hive-table user6 --query 'select * from user where uid<10 and $conditions' --target-dir /sqoop/user5;
//and $conditions 必须加在查询语句中,不加报错
6、使用Sqoop将Hive中的数据导出到MySQL中
[root@srv01 ~]# sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password root -m 1 --table user5 --export-dir /sqoop/user5 //两张表的列的个数和类型必须相同
sqoop2
sqoop-shell
启动sqoop-shell
1 jjzhu:bin didi$ sqoop2-shell2 Setting conf dir: /opt/sqoop-1.99.7/bin/../conf3 Sqoop home directory: /opt/sqoop-1.99.7
4 Sqoop Shell: Type 'help' or '\h' forhelp.5
6 sqoop:000> set server --host localhost --port 12000 --webapp sqoop
7 Server is setsuccessfully8 sqoop:000> show version --all
9 client version:10 Sqoop 1.99.7source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb11 Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016
12 0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes whereapplicable13 server version:14 Sqoop 1.99.7source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb15 Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016
16 API versions:17 [v1]
18 sqoop:000>
配置sqoop server
sqoop:000> set server --host localhost --port 12000 --webapp sqoop
Server is set successfully
查看server连接是否可用
sqoop:000> show version --all
client version:
Sqoop1.99.7source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
Compiledby abefine on Tue Jul 19 16:08:27 PDT 2016
0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes whereapplicable
server version:
Sqoop1.99.7source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
Compiledby abefine on Tue Jul 19 16:08:27 PDT 2016API versions:[v1]sqoop:000>
创建链接
查看sqoop server上可用的链接
1 sqoop:000>show connector2 +------------------------+---------+------------------------------------------------------------+----------------------+
3 | Name | Version | Class | Supported Directions |
4 +------------------------+---------+------------------------------------------------------------+----------------------+
5 | generic-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO |
6 | kite-connector | 1.99.7 | org.apache.sqoop.connector.kite.KiteConnector | FROM/TO |
7 | oracle-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO |
8 | ftp-connector | 1.99.7 | org.apache.sqoop.connector.ftp.FtpConnector | TO |
9 | hdfs-connector | 1.99.7 | org.apache.sqoop.connector.hdfs.HdfsConnector | FROM/TO |
10 | kafka-connector | 1.99.7 | org.apache.sqoop.connector.kafka.KafkaConnector | TO |
11 | sftp-connector | 1.99.7 | org.apache.sqoop.connector.sftp.SftpConnector | TO |
12 +------------------------+---------+------------------------------------------------------------+----------------------+
13 sqoop:000>
generic-jdbc-connector
依赖于java JDBC的connector,可以作为数据导入的数据源和目标源
hdfs-connector
以hdfs作为数据源或者目标源的connector
用如下命令创建一个generic-jdbc-connector的链接
1 sqoop:002> create link -c generic-jdbc-connector2 Creating link for connector with name generic-jdbc-connector3 Please fill following values to createnew link object4 Name: mysql_weibouser_link5
6 Databaseconnection7
8 Driver class: com.mysql.jdbc.Driver9 Connection String: jdbc:mysql://127.0.0.1:3306/spider10 Username: root11 Password: ****
12 FetchSize:13 Connection Properties:14 There are currently 0 values inthe map:15 entry# protocol=tcp16 There are currently 1 values inthe map:17 protocol =tcp18 entry#19
20 SQL Dialect21
22 Identifier enclose: **注意 这里不能直接回车!要打一个空格符号!因为如果不打,查询mysql表的时候会在表上加上“”,导致查询出错!23 **
24 New link was successfully created with validation status OK and name mysql_weibouser_link
创建hdfs link
1 sqoop:002> create link -c hdfs-connector2 Creating link for connector with name hdfs-connector3 Please fill following values to createnew link object4 Name: hdfs_weibouser_link5
6 HDFS cluster7
8 URI: hdfs://localhost:9000
9 Conf directory: /opt/hadoop-2.7.3/etc/hadoop10 Additional configs::11 There are currently 0 values inthe map:12 entry#13 New link was successfully created with validation status OK and name hdfs_weibouser_link
查看link
1 sqoop:002>show link2 +----------------------+------------------------+---------+
3 | Name | Connector Name | Enabled |
4 +----------------------+------------------------+---------+
5 | mysql_weibouser | generic-jdbc-connector | true |
6 | mysql_weibouser_link | generic-jdbc-connector | true |
7 | hdfs_link | hdfs-connector | true |
8 | hdfs_link2 | hdfs-connector | true |
9 | hdfs_weibouser_link | hdfs-connector | true |
10 +----------------------+------------------------+---------+
创建job
1 sqoop:002> create job -f "mysql_weibouser_link" -t "hdfs_weibouser_link"2 Creating job for links with from name mysql_weibouser_link and toname hdfs_weibouser_link3 Please fill following values to createnew job object4 Name: job_weibouser5
6 Databasesource7
8 Schemaname: spider9 Tablename: spiders_weibouser10 SQL statement:11 Columnnames:12 There are currently 0 values inthe list:13 element#14 Partition column:15 Partition columnnullable:16 Boundary query:17
18 Incremental read
19
20 Check column:21 Last value:22
23 Target configuration24
25 Override nullvalue:26 Nullvalue:27 Fileformat:28 0: TEXT_FILE29 1: SEQUENCE_FILE30 2: PARQUET_FILE31 Choose: 0
32 Compression codec:33 0: NONE34 1 : DEFAULT
35 2: DEFLATE36 3: GZIP37 4: BZIP238 5: LZO39 6: LZ440 7: SNAPPY41 8: CUSTOM42 Choose: 0
43 Custom codec:44 Output directory: hdfs://localhost:9000/usr/jjzhu/spider/spiders_weibouser45 Append mode:46
47 Throttling resources48
49 Extractors: 2
50 Loaders: 2
51
52 Classpath configuration53
54 Extra mapper jars:55 There are currently 0 values inthe list:56 element#57 New job was successfully created with validation status OK and name job_weibouser
View Code
各参数意义:
1 以下是各个属性2 Name:一个标示符,自己指定即可。3 SchemaName:指定Database或Schema的名字,在MySQL中,Schema同Database类似,具体什么区别没有深究过,但官网描述在创建时差不多。。4 TableName:自己指定导出的表。5 SQL Statement:就是sql查询语句,文档上说需要指定一个$condition,但我一直没有创建成功,貌似是一个条件子句。6 配置完以上几项,又回出现element#提示符,提示输入一些hash值,直接回车过。7 Partition column:8 Partition columnnullable:9 Boundary query10 Last value11 后面需要配置数据目的地各项值:12 Nullalue:大概说的是如果有空值用什么覆盖13 Fileformat:指定在HDFS中的数据文件是什么文件格式,这里使用TEXT_FILE,即最简单的文本文件。14 Compression codec:用于指定使用什么压缩算法进行导出数据文件压缩,我指定NONE,这个也可以使用自定义的压缩算法CUSTOM,用Java实现相应的接口。15 Custom codec:这个就是指定的custom压缩算法,本例选择NONE,所以直接回车过去。16 Output directory:指定存储在HDFS文件系统中的路径,这里最好指定一个存在的路径,或者存在但路劲下是空的,貌似这样才能成功。17 Append mode:用于指定是否是在已存在导出文件的情况下将新数据追加到数据文件中。18 Extractors:2
19 Loaders:2
20 最后再次出现element#提示符,用于输入extra mapper jars的属性,可以什么都不写。直接回车。21
22 至此若出现successful则证明已经成功创建。
View Code
查看创建的job
1 sqoop:002>show job2 +----+---------------+-----------------------------------------------+--------------------------------------+---------+
3 | Id | Name | From Connector | To Connector | Enabled |
4 +----+---------------+-----------------------------------------------+--------------------------------------+---------+
5 | 1 | spider_job | mysql_weibouser (generic-jdbc-connector) | hdfs_link (hdfs-connector) | true |
6 | 2 | job_weibouser | mysql_weibouser_link (generic-jdbc-connector) | hdfs_weibouser_link (hdfs-connector) | true |
7 +----+---------------+-----------------------------------------------+--------------------------------------+---------+
8 sqoop:002>
View Code
启动job
1 start job -n job_weibouser2 sqoop:002> start job -n job_weibouser3 Submission details4 Job Name: job_weibouser5 Server URL: http://localhost:12000/sqoop/
6 Created by: didi7 Creation date: 2017-04-11 14:37:46CST8 Lastly updated by: didi9 External ID: job_1491888730134_000310 http://jjzhu:8088/proxy/application_1491888730134_0003/
11 2017-04-11 14:37:46 CST: BOOTING - Progress is not available
View Code
查看job运行状态
1 sqoop:002> status job -n job_weibouser2 Submission details3 Job Name: job_weibouser4 Server URL: http://localhost:12000/sqoop/
5 Created by: didi6 Creation date: 2017-04-11 14:37:46CST7 Lastly updated by: didi8 External ID: job_1491888730134_00039 http://jjzhu:8088/proxy/application_1491888730134_0003/
10 2017-04-11 14:38:41CST: SUCCEEDED11 Counters:12 org.apache.hadoop.mapreduce.FileSystemCounter13 FILE_LARGE_READ_OPS: 0
14 FILE_WRITE_OPS: 0
15 HDFS_READ_OPS: 2
16 HDFS_BYTES_READ: 290
17 HDFS_LARGE_READ_OPS: 0
18 FILE_READ_OPS: 0
19 FILE_BYTES_WRITTEN: 51361466
20 FILE_BYTES_READ: 25115854
21 HDFS_WRITE_OPS: 2
22 HDFS_BYTES_WRITTEN: 24652721
23 org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter24 BYTES_WRITTEN: 0
25 org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter26 BYTES_READ: 0
27 org.apache.hadoop.mapreduce.JobCounter28 TOTAL_LAUNCHED_MAPS: 2
29 VCORES_MILLIS_REDUCES: 20225
30 MB_MILLIS_MAPS: 27120640
31 TOTAL_LAUNCHED_REDUCES: 2
32 SLOTS_MILLIS_REDUCES: 20225
33 VCORES_MILLIS_MAPS: 26485
34 MB_MILLIS_REDUCES: 20710400
35 SLOTS_MILLIS_MAPS: 26485
36 MILLIS_REDUCES: 20225
37 OTHER_LOCAL_MAPS: 2
38 MILLIS_MAPS: 26485
39 org.apache.sqoop.submission.counter.SqoopCounters40 ROWS_READ: 109408
41 ROWS_WRITTEN: 109408
42 org.apache.hadoop.mapreduce.TaskCounter43 MAP_OUTPUT_MATERIALIZED_BYTES: 25115866
44 REDUCE_INPUT_RECORDS: 109408
45 SPILLED_RECORDS: 218816
46 MERGED_MAP_OUTPUTS: 4
47 VIRTUAL_MEMORY_BYTES: 0
48 MAP_INPUT_RECORDS: 0
49 SPLIT_RAW_BYTES: 290
50 FAILED_SHUFFLE: 0
51 MAP_OUTPUT_BYTES: 24762129
52 REDUCE_SHUFFLE_BYTES: 25115866
53 PHYSICAL_MEMORY_BYTES: 0
54 GC_TIME_MILLIS: 1648
55 REDUCE_INPUT_GROUPS: 109408
56 COMBINE_OUTPUT_RECORDS: 0
57 SHUFFLED_MAPS: 4
58 REDUCE_OUTPUT_RECORDS: 109408
59 MAP_OUTPUT_RECORDS: 109408
60 COMBINE_INPUT_RECORDS: 0
61 CPU_MILLISECONDS: 0
62 COMMITTED_HEAP_BYTES: 1951399936
63 Shuffle Errors64 CONNECTION: 0
65 WRONG_LENGTH: 0
66 BAD_ID: 0
67 WRONG_MAP: 0
68 WRONG_REDUCE: 0
69 IO_ERROR: 0
70 Job executed successfully
查看hdfs的相关路径,看是否有输出文件
1 jjzhu:~ didi$ hdfs dfs -ls /usr/jjzhu/spider2 Found 4items3 drwxr-xr-x - didi supergroup 0 2017-04-11 14:38 /usr/jjzhu/spider/spiders_weibouser4 drwxr-xr-x - 777 supergroup 0 2017-04-11 10:58 /usr/jjzhu/spider/weibouser5 drwxr-xr-x - 777 supergroup 0 2017-04-11 13:33 /usr/jjzhu/spider/weobouser6 drwxr-xr-x - didi supergroup 0 2017-04-11 13:39 /usr/jjzhu/spider/weobouser27 jjzhu:~ didi$ hdfs dfs -ls /usr/jjzhu/spider/spiders_weibouser8 Found 2items9 -rw-r--r-- 1 didi supergroup 12262783 2017-04-11 14:38 /usr/jjzhu/spider/spiders_weibouser/33b56441-b638-48cc-8d0d-37a808f25653.txt
10 -rw-r--r-- 1 didi supergroup 12389938 2017-04-11 14:38 /usr/jjzhu/spider/spiders_weibouser/73b20d50-de72-4aea-8c8c-d97cdc48e667.txt
转自:https://yq.aliyun.com/articles/73582