增量导入
递增列的增量导入append
# 第一次导入
[root@node222 ~]# /usr/local/sqoop-1.4.7/bin/sqoop import --connect jdbc:mysql://192.168.0.200:3306/sakila?useSSL=false --table actor --where "actor_id < 50" --username sakila -P --num-mappers 1 --target-dir /tmp/hive/sqoop/actor_all
...
18/10/15 14:32:14 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password:
...
18/10/15 14:32:34 INFO mapreduce.Job: Running job: job_1539583112983_0002
18/10/15 14:32:49 INFO mapreduce.Job: Job job_1539583112983_0002 running in uber mode : false
18/10/15 14:32:49 INFO mapreduce.Job: map 0% reduce 0%
18/10/15 14:33:06 INFO mapreduce.Job: map 100% reduce 0%
18/10/15 14:33:07 INFO mapreduce.Job: Job job_1539583112983_0002 completed successfully
18/10/15 14:33:08 INFO mapreduce.Job: Counters: 30
...
18/10/15 14:33:08 INFO mapreduce.ImportJobBase: Transferred 1.8262 KB in 40.9516 seconds (45.6636 bytes/sec)
18/10/15 14:33:08 INFO mapreduce.ImportJobBase: Retrieved 49 records.
[hadoop@node224 ~]$ hdfs dfs -cat /tmp/hive/sqoop/actor_all/part-m-00000
1,PENELOPE,GUINESS,2006-02-15 04:34:33.0
...
48,FRANCES,DAY-LEWIS,2006-02-15 04:34:33.0
49,ANNE,CRONYN,2006-02-15 04:34:33.0
[hadoop@node224 ~]$
# apppend增量导入actor_id < 50
# 指定增量模式
--incremental (mode append|lastmodified)
# 指定增量校对字段
--check-column (col)
# 指定增量起始值
--last-value (value)
# append模式增量
[root@node222 ~]# /usr/local/sqoop-1.4.7/bin/sqoop import --connect jdbc:mysql://192.168.0.200:3306/sakila?useSSL=false --table actor --username sakila -P --incremental append --check-column actor_id --last-value 49 --num-mappers 1 --target-dir /tmp/hive/sqoop/actor_all
...
Enter password:
...
18/10/15 14:43:03 INFO mapreduce.Job: Running job: job_1539583112983_0003
18/10/15 14:43:19 INFO mapreduce.Job: Job job_1539583112983_0003 running in uber mode : false
18/10/15 14:43:19 INFO mapreduce.Job: map 0% reduce 0%
18/10/15 14:43:34 INFO mapreduce.Job: map 100% reduce 0%
18/10/15 14:43:35 INFO mapreduce.Job: Job job_1539583112983_0003 completed successfully
18/10/15 14:43:35 INFO mapreduce.Job: Counters: 30
...
18/10/15 14:43:35 INFO mapreduce.ImportJobBase: Transferred 5.79 KB in 38.6992 seconds (153.2074 bytes/sec)
18/10/15 14:43:35 INFO mapreduce.ImportJobBase: Retrieved 151 records.
18/10/15 14:43:35 INFO util.AppendUtils: Appending to directory actor_all
18/10/15 14:43:35 INFO util.AppendUtils: Using found partition 1
18/10/15 14:43:35 INFO tool.ImportTool: Incremental import complete! To run another incremental import of all data following this import, supply the following arguments:
18/10/15 14:43:35 INFO tool.ImportTool: --incremental append
18/10/15 14:43:35 INFO tool.ImportTool: --check-column actor_id
18/10/15 14:43:35 INFO tool.ImportTool: --last-value 200
18/10/15 14:43:35 INFO tool.ImportTool: (Consider saving this with 'sqoop job --create')
[hadoop@node224 ~]$ hdfs dfs -cat /tmp/hive/sqoop/actor_all/part-m-00001
50,NATALIE,HOPKINS,2006-02-15 04:34:33.0
...
200,JULIA,FAWCETT,2006-02-15 04:34:33.0
时间列的增量导入lastmodified
# 将actor_new表中的时间修改非相同
UPDATE actor_new SET last_update = DATE_ADD(last_update,INTERVAL (FLOOR(RAND()*199+1)) DAY)
# 第一次导入
[root@node222 ~]# /usr/local/sqoop-1.4.7/bin/sqoop import --connect jdbc:mysql://192.168.0.200:3306/sakila?useSSL=false --table actor_new --where "last_update < '2006-04-25 04:34:33'" --username sakila -P --num-mappers 1 --target-dir /tmp/hive/sqoop/actor_lastmodified --delete-target-dir
...
18/10/15 14:57:23 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password:
...
18/10/15 14:57:42 INFO mapreduce.Job: Running job: job_1539583112983_0004
18/10/15 14:58:01 INFO mapreduce.Job: Job job_1539583112983_0004 running in uber mode : false
18/10/15 14:58:01 INFO mapreduce.Job: map 0% reduce 0%
18/10/15 14:58:22 INFO mapreduce.Job: map 100% reduce 0%
18/10/15 14:58:23 INFO mapreduce.Job: Job job_1539583112983_0004 completed successfully
...
18/10/15 14:58:23 INFO mapreduce.ImportJobBase: Transferred 2.6592 KB in 32.9053 seconds (82.7527 bytes/sec)
18/10/15 14:58:23 INFO mapreduce.ImportJobBase: Retrieved 69 records.
# 通过 lastmodified 增量模式导入时