sqoop导出 建表_Sqoop1和sqoop2导入导出

sqoop1

1、使用Sqoop导入MySQL数据到HDFS

[root@srv01 ~]# sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password root --table user --columns 'uid,uname' -m 1 -target-dir '/sqoop/user'; //-m 指定map进程数,-target-dir指定存放目录

2、使用Sqoop导入MySQL数据到Hive中

[root@srv01 ~]# sqoop import --hive-import --connect jdbc:mysql://localhost:3306/test --username root --password root --table user --columns 'uid,uname' -m 1

3、使用Sqoop导入MySQL数据到Hive中,并且指定表名

[root@srv01 ~]# sqoop import --hive-import --connect jdbc:mysql://localhost:3306/test --username root --password root --table user --columns 'uid,uname' -m 1 --hive-table user1; //如果hive中没有这张表,则创建这张表保存对应数据

4、使用Sqoop导入MySQL数据到Hive中,并使用where条件

[root@srv01 ~]# sqoop import --hive-import --connect jdbc:mysql://localhost:3306/test --username root --password root --table user --columns 'uid,uname' -m 1 --hive-table user2 where uid=10;

5、使用Sqoop导入MySQL数据到Hive中,并使用查询语句

[root@srv01 ~]# sqoop import --hive-import --connect jdbc:mysql://localhost:3306/test --username root --password root -m 1 --hive-table user6 --query 'select * from user where uid<10 and $conditions' --target-dir /sqoop/user5;

//and $conditions 必须加在查询语句中,不加报错

6、使用Sqoop将Hive中的数据导出到MySQL中

[root@srv01 ~]# sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password root -m 1 --table user5 --export-dir /sqoop/user5 //两张表的列的个数和类型必须相同

sqoop2

sqoop-shell

启动sqoop-shell

1 jjzhu:bin didi$ sqoop2-shell2 Setting conf dir: /opt/sqoop-1.99.7/bin/../conf3 Sqoop home directory: /opt/sqoop-1.99.7

4 Sqoop Shell: Type 'help' or '\h' forhelp.5

6 sqoop:000> set server --host localhost --port 12000 --webapp sqoop

7 Server is setsuccessfully8 sqoop:000> show version --all

9 client version:10 Sqoop 1.99.7source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb11 Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016

12 0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes whereapplicable13 server version:14 Sqoop 1.99.7source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb15 Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016

16 API versions:17 [v1]

18 sqoop:000>

配置sqoop server

sqoop:000> set server --host localhost --port 12000 --webapp sqoop

Server is set successfully

查看server连接是否可用

sqoop:000> show version --all

client version:

Sqoop1.99.7source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb

Compiledby abefine on Tue Jul 19 16:08:27 PDT 2016

0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes whereapplicable

server version:

Sqoop1.99.7source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb

Compiledby abefine on Tue Jul 19 16:08:27 PDT 2016API versions:[v1]sqoop:000>

创建链接

查看sqoop server上可用的链接

1 sqoop:000>show connector2 +------------------------+---------+------------------------------------------------------------+----------------------+

3 | Name | Version | Class | Supported Directions |

4 +------------------------+---------+------------------------------------------------------------+----------------------+

5 | generic-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO |

6 | kite-connector | 1.99.7 | org.apache.sqoop.connector.kite.KiteConnector | FROM/TO |

7 | oracle-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO |

8 | ftp-connector | 1.99.7 | org.apache.sqoop.connector.ftp.FtpConnector | TO |

9 | hdfs-connector | 1.99.7 | org.apache.sqoop.connector.hdfs.HdfsConnector | FROM/TO |

10 | kafka-connector | 1.99.7 | org.apache.sqoop.connector.kafka.KafkaConnector | TO |

11 | sftp-connector | 1.99.7 | org.apache.sqoop.connector.sftp.SftpConnector | TO |

12 +------------------------+---------+------------------------------------------------------------+----------------------+

13 sqoop:000>

generic-jdbc-connector

依赖于java JDBC的connector,可以作为数据导入的数据源和目标源

hdfs-connector

以hdfs作为数据源或者目标源的connector

用如下命令创建一个generic-jdbc-connector的链接

1 sqoop:002> create link -c generic-jdbc-connector2 Creating link for connector with name generic-jdbc-connector3 Please fill following values to createnew link object4 Name: mysql_weibouser_link5

6 Databaseconnection7

8 Driver class: com.mysql.jdbc.Driver9 Connection String: jdbc:mysql://127.0.0.1:3306/spider10 Username: root11 Password: ****

12 FetchSize:13 Connection Properties:14 There are currently 0 values inthe map:15 entry# protocol=tcp16 There are currently 1 values inthe map:17 protocol =tcp18 entry#19

20 SQL Dialect21

22 Identifier enclose: **注意 这里不能直接回车!要打一个空格符号!因为如果不打,查询mysql表的时候会在表上加上“”,导致查询出错!23 **

24 New link was successfully created with validation status OK and name mysql_weibouser_link

创建hdfs link

1 sqoop:002> create link -c hdfs-connector2 Creating link for connector with name hdfs-connector3 Please fill following values to createnew link object4 Name: hdfs_weibouser_link5

6 HDFS cluster7

8 URI: hdfs://localhost:9000

9 Conf directory: /opt/hadoop-2.7.3/etc/hadoop10 Additional configs::11 There are currently 0 values inthe map:12 entry#13 New link was successfully created with validation status OK and name hdfs_weibouser_link

查看link

1 sqoop:002>show link2 +----------------------+------------------------+---------+

3 | Name | Connector Name | Enabled |

4 +----------------------+------------------------+---------+

5 | mysql_weibouser | generic-jdbc-connector | true |

6 | mysql_weibouser_link | generic-jdbc-connector | true |

7 | hdfs_link | hdfs-connector | true |

8 | hdfs_link2 | hdfs-connector | true |

9 | hdfs_weibouser_link | hdfs-connector | true |

10 +----------------------+------------------------+---------+

创建job

1 sqoop:002> create job -f "mysql_weibouser_link" -t "hdfs_weibouser_link"2 Creating job for links with from name mysql_weibouser_link and toname hdfs_weibouser_link3 Please fill following values to createnew job object4 Name: job_weibouser5

6 Databasesource7

8 Schemaname: spider9 Tablename: spiders_weibouser10 SQL statement:11 Columnnames:12 There are currently 0 values inthe list:13 element#14 Partition column:15 Partition columnnullable:16 Boundary query:17

18 Incremental read

19

20 Check column:21 Last value:22

23 Target configuration24

25 Override nullvalue:26 Nullvalue:27 Fileformat:28 0: TEXT_FILE29 1: SEQUENCE_FILE30 2: PARQUET_FILE31 Choose: 0

32 Compression codec:33 0: NONE34 1 : DEFAULT

35 2: DEFLATE36 3: GZIP37 4: BZIP238 5: LZO39 6: LZ440 7: SNAPPY41 8: CUSTOM42 Choose: 0

43 Custom codec:44 Output directory: hdfs://localhost:9000/usr/jjzhu/spider/spiders_weibouser45 Append mode:46

47 Throttling resources48

49 Extractors: 2

50 Loaders: 2

51

52 Classpath configuration53

54 Extra mapper jars:55 There are currently 0 values inthe list:56 element#57 New job was successfully created with validation status OK and name job_weibouser

View Code

各参数意义:

1 以下是各个属性2 Name:一个标示符,自己指定即可。3 SchemaName:指定Database或Schema的名字,在MySQL中,Schema同Database类似,具体什么区别没有深究过,但官网描述在创建时差不多。。4 TableName:自己指定导出的表。5 SQL Statement:就是sql查询语句,文档上说需要指定一个$condition,但我一直没有创建成功,貌似是一个条件子句。6 配置完以上几项,又回出现element#提示符,提示输入一些hash值,直接回车过。7 Partition column:8 Partition columnnullable:9 Boundary query10 Last value11 后面需要配置数据目的地各项值:12 Nullalue:大概说的是如果有空值用什么覆盖13 Fileformat:指定在HDFS中的数据文件是什么文件格式,这里使用TEXT_FILE,即最简单的文本文件。14 Compression codec:用于指定使用什么压缩算法进行导出数据文件压缩,我指定NONE,这个也可以使用自定义的压缩算法CUSTOM,用Java实现相应的接口。15 Custom codec:这个就是指定的custom压缩算法,本例选择NONE,所以直接回车过去。16 Output directory:指定存储在HDFS文件系统中的路径,这里最好指定一个存在的路径,或者存在但路劲下是空的,貌似这样才能成功。17 Append mode:用于指定是否是在已存在导出文件的情况下将新数据追加到数据文件中。18 Extractors:2

19 Loaders:2

20 最后再次出现element#提示符,用于输入extra mapper jars的属性,可以什么都不写。直接回车。21

22 至此若出现successful则证明已经成功创建。

View Code

查看创建的job

1 sqoop:002>show job2 +----+---------------+-----------------------------------------------+--------------------------------------+---------+

3 | Id | Name | From Connector | To Connector | Enabled |

4 +----+---------------+-----------------------------------------------+--------------------------------------+---------+

5 | 1 | spider_job | mysql_weibouser (generic-jdbc-connector) | hdfs_link (hdfs-connector) | true |

6 | 2 | job_weibouser | mysql_weibouser_link (generic-jdbc-connector) | hdfs_weibouser_link (hdfs-connector) | true |

7 +----+---------------+-----------------------------------------------+--------------------------------------+---------+

8 sqoop:002>

View Code

启动job

1 start job -n job_weibouser2 sqoop:002> start job -n job_weibouser3 Submission details4 Job Name: job_weibouser5 Server URL: http://localhost:12000/sqoop/

6 Created by: didi7 Creation date: 2017-04-11 14:37:46CST8 Lastly updated by: didi9 External ID: job_1491888730134_000310 http://jjzhu:8088/proxy/application_1491888730134_0003/

11 2017-04-11 14:37:46 CST: BOOTING - Progress is not available

View Code

查看job运行状态

1 sqoop:002> status job -n job_weibouser2 Submission details3 Job Name: job_weibouser4 Server URL: http://localhost:12000/sqoop/

5 Created by: didi6 Creation date: 2017-04-11 14:37:46CST7 Lastly updated by: didi8 External ID: job_1491888730134_00039 http://jjzhu:8088/proxy/application_1491888730134_0003/

10 2017-04-11 14:38:41CST: SUCCEEDED11 Counters:12 org.apache.hadoop.mapreduce.FileSystemCounter13 FILE_LARGE_READ_OPS: 0

14 FILE_WRITE_OPS: 0

15 HDFS_READ_OPS: 2

16 HDFS_BYTES_READ: 290

17 HDFS_LARGE_READ_OPS: 0

18 FILE_READ_OPS: 0

19 FILE_BYTES_WRITTEN: 51361466

20 FILE_BYTES_READ: 25115854

21 HDFS_WRITE_OPS: 2

22 HDFS_BYTES_WRITTEN: 24652721

23 org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter24 BYTES_WRITTEN: 0

25 org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter26 BYTES_READ: 0

27 org.apache.hadoop.mapreduce.JobCounter28 TOTAL_LAUNCHED_MAPS: 2

29 VCORES_MILLIS_REDUCES: 20225

30 MB_MILLIS_MAPS: 27120640

31 TOTAL_LAUNCHED_REDUCES: 2

32 SLOTS_MILLIS_REDUCES: 20225

33 VCORES_MILLIS_MAPS: 26485

34 MB_MILLIS_REDUCES: 20710400

35 SLOTS_MILLIS_MAPS: 26485

36 MILLIS_REDUCES: 20225

37 OTHER_LOCAL_MAPS: 2

38 MILLIS_MAPS: 26485

39 org.apache.sqoop.submission.counter.SqoopCounters40 ROWS_READ: 109408

41 ROWS_WRITTEN: 109408

42 org.apache.hadoop.mapreduce.TaskCounter43 MAP_OUTPUT_MATERIALIZED_BYTES: 25115866

44 REDUCE_INPUT_RECORDS: 109408

45 SPILLED_RECORDS: 218816

46 MERGED_MAP_OUTPUTS: 4

47 VIRTUAL_MEMORY_BYTES: 0

48 MAP_INPUT_RECORDS: 0

49 SPLIT_RAW_BYTES: 290

50 FAILED_SHUFFLE: 0

51 MAP_OUTPUT_BYTES: 24762129

52 REDUCE_SHUFFLE_BYTES: 25115866

53 PHYSICAL_MEMORY_BYTES: 0

54 GC_TIME_MILLIS: 1648

55 REDUCE_INPUT_GROUPS: 109408

56 COMBINE_OUTPUT_RECORDS: 0

57 SHUFFLED_MAPS: 4

58 REDUCE_OUTPUT_RECORDS: 109408

59 MAP_OUTPUT_RECORDS: 109408

60 COMBINE_INPUT_RECORDS: 0

61 CPU_MILLISECONDS: 0

62 COMMITTED_HEAP_BYTES: 1951399936

63 Shuffle Errors64 CONNECTION: 0

65 WRONG_LENGTH: 0

66 BAD_ID: 0

67 WRONG_MAP: 0

68 WRONG_REDUCE: 0

69 IO_ERROR: 0

70 Job executed successfully

查看hdfs的相关路径,看是否有输出文件

1 jjzhu:~ didi$ hdfs dfs -ls /usr/jjzhu/spider2 Found 4items3 drwxr-xr-x - didi supergroup 0 2017-04-11 14:38 /usr/jjzhu/spider/spiders_weibouser4 drwxr-xr-x - 777 supergroup 0 2017-04-11 10:58 /usr/jjzhu/spider/weibouser5 drwxr-xr-x - 777 supergroup 0 2017-04-11 13:33 /usr/jjzhu/spider/weobouser6 drwxr-xr-x - didi supergroup 0 2017-04-11 13:39 /usr/jjzhu/spider/weobouser27 jjzhu:~ didi$ hdfs dfs -ls /usr/jjzhu/spider/spiders_weibouser8 Found 2items9 -rw-r--r-- 1 didi supergroup 12262783 2017-04-11 14:38 /usr/jjzhu/spider/spiders_weibouser/33b56441-b638-48cc-8d0d-37a808f25653.txt

10 -rw-r--r-- 1 didi supergroup 12389938 2017-04-11 14:38 /usr/jjzhu/spider/spiders_weibouser/73b20d50-de72-4aea-8c8c-d97cdc48e667.txt

转自:https://yq.aliyun.com/articles/73582

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值