sqoop--oracle与hdfs互导

此例中使用的oracle数据库版本为10.2.0.2,hadoop版本为1.2.0,sqoop版本为1.4.3。示例数据位oracle自带的sample schema。

一、从oracle到hdfs

1.编辑脚本

[hadoop@node1 bin]$ vi oracle2hdfs.sh
#jdbc连接字符串
CONNECTURL=jdbc:oracle:thin:@10.190.105.65:1521:PROD
#oracle中的用户名称,注意必须大写
ORANAME=HR
#oracle用户密码
ORAPWD=oracle
#所要导出的表名
oraTName=COUNTRIES
#所要导出的列名
columns=COUNTRY_ID,COUNTRY_NAME,REGION_ID
#保存到hdfs上的位置
hdfsPath=/user/hadoop/test/$oraTName
./sqoop import --append --connect $CONNECTURL --username $ORANAME --password $ORAPWD --target-dir $hdfsPath  --num-mappers 1 --table $oralceTableName --columns $columns --fields-terminated-by '\001'

sqoop的详细用法请参考:http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html

2.查看oracle源表中的数据

[oracle@edbjr2p5 ~]$ sqlplus hr/oracle

SQL*Plus: Release 10.2.0.2.0 - Production on Tue Jun 25 10:37:32 2013

Copyright (c) 1982, 2005, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - Production
With the Partitioning, OLAP and Data Mining options
HR@PROD>select * from countries;

CO COUNTRY_NAME                              REGION_ID
-- ---------------------------------------- ----------
AR Argentina                                         2
AU Australia                                         3
BE Belgium                                           1
BR Brazil                                            2
CA Canada                                            2
CH Switzerland                                       1
CN China                                             3
DE Germany                                           1
DK Denmark                                           1
EG Egypt                                             4
FR France                                            1

CO COUNTRY_NAME                              REGION_ID
-- ---------------------------------------- ----------
HK HongKong                                          3
IL Israel                                            4
IN India                                             3
IT Italy                                             1
JP Japan                                             3
KW Kuwait                                            4
MX Mexico                                            2
NG Nigeria                                           4
NL Netherlands                                       1
SG Singapore                                         3
UK United Kingdom                                    1

CO COUNTRY_NAME                              REGION_ID
-- ---------------------------------------- ----------
US United States of America                          2
ZM Zambia                                            4
ZW Zimbabwe                                          4

25 rows selected.

HR@PROD>

 
3.执行脚本进行数据迁移

 

[hadoop@node1 bin]$ sh oracle2hdfs.sh
Warning: $HADOOP_HOME is deprecated.

13/06/25 16:32:14 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
13/06/25 16:32:14 INFO manager.SqlManager: Using default fetchSize of 1000
13/06/25 16:32:14 INFO tool.CodeGenTool: Beginning code generation
13/06/25 16:32:15 INFO manager.OracleManager: Time zone has been set to GMT
13/06/25 16:32:15 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM COUNTRIES t WHERE 1=0
13/06/25 16:32:15 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hadoop-1.2.0
Note: /tmp/sqoop-hadoop/compile/2c0151b667fb37735eba29c28dab82e7/COUNTRIES.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/06/25 16:32:17 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/2c0151b667fb37735eba29c28dab82e7/COUNTRIES.jar
13/06/25 16:32:17 INFO manager.OracleManager: Time zone has been set to GMT
13/06/25 16:32:17 INFO mapreduce.ImportJobBase: Beginning import of COUNTRIES
13/06/25 16:32:20 INFO mapred.JobClient: Running job: job_201306251627_0001
13/06/25 16:32:21 INFO mapred.JobClient:  map 0% reduce 0%
13/06/25 16:32:31 INFO mapred.JobClient:  map 100% reduce 0%
13/06/25 16:32:33 INFO mapred.JobClient: Job complete: job_201306251627_0001
13/06/25 16:32:33 INFO mapred.JobClient: Counters: 18
13/06/25 16:32:33 INFO mapred.JobClient:   Job Counters
13/06/25 16:32:33 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=8778
13/06/25 16:32:33 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/06/25 16:32:33 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/06/25 16:32:33 INFO mapred.JobClient:     Launched map tasks=1
13/06/25 16:32:33 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
13/06/25 16:32:33 INFO mapred.JobClient:   File Output Format Counters
13/06/25 16:32:33 INFO mapred.JobClient:     Bytes Written=348
13/06/25 16:32:33 INFO mapred.JobClient:   FileSystemCounters
13/06/25 16:32:33 INFO mapred.JobClient:     HDFS_BYTES_READ=87
13/06/25 16:32:33 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=58138
13/06/25 16:32:33 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=348
13/06/25 16:32:33 INFO mapred.JobClient:   File Input Format Counters
13/06/25 16:32:33 INFO mapred.JobClient:     Bytes Read=0
13/06/25 16:32:33 INFO mapred.JobClient:   Map-Reduce Framework
13/06/25 16:32:33 INFO mapred.JobClient:     Map input records=25
13/06/25 16:32:33 INFO mapred.JobClient:     Physical memory (bytes) snapshot=81510400
13/06/25 16:32:33 INFO mapred.JobClient:     Spilled Records=0
13/06/25 16:32:33 INFO mapred.JobClient:     CPU time spent (ms)=1530
13/06/25 16:32:33 INFO mapred.JobClient:     Total committed heap usage (bytes)=15925248
13/06/25 16:32:33 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=564215808
13/06/25 16:32:33 INFO mapred.JobClient:     Map output records=25
13/06/25 16:32:33 INFO mapred.JobClient:     SPLIT_RAW_BYTES=87
13/06/25 16:32:33 INFO mapreduce.ImportJobBase: Transferred 348 bytes in 15.4518 seconds (22.5217 bytes/sec)
13/06/25 16:32:33 INFO mapreduce.ImportJobBase: Retrieved 25 records.

13/06/25 16:32:33 INFO util.AppendUtils: Creating missing output directory - COUNTRIES

4.核实hdfs中的数据

查看hdfs中的目录

[hadoop@node1 bin]$ hadoop fs -ls test
Warning: $HADOOP_HOME is deprecated.

Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2013-06-25 16:32 /user/hadoop/test/COUNTRIES

下载上传的COUNTRIES

[hadoop@node1 bin]$ hadoop fs -get /user/hadoop/test/COUNTRIES /home/hadoop
Warning: $HADOOP_HOME is deprecated.

 查看COUNTIRES中的内容

[hadoop@node1 COUNTRIES]$ cd /home/hadoop/COUNTRIES
[hadoop@node1 COUNTRIES]$ ls
_logs  part-m-00000
[hadoop@node1 COUNTRIES]$ vi part-m-00000
  1 AR^AArgentina^A2
  2 AU^AAustralia^A3
  3 BE^ABelgium^A1
  4 BR^ABrazil^A2
  5 CA^ACanada^A2
  6 CH^ASwitzerland^A1
  7 CN^AChina^A3
  8 DE^AGermany^A1
  9 DK^ADenmark^A1
 10 EG^AEgypt^A4
 11 FR^AFrance^A1
 12 HK^AHongKong^A3
 13 IL^AIsrael^A4
 14 IN^AIndia^A3
 15 IT^AItaly^A1
 16 JP^AJapan^A3
 17 KW^AKuwait^A4
 18 MX^AMexico^A2
 19 NG^ANigeria^A4
 20 NL^ANetherlands^A1
 21 SG^ASingapore^A3
 22 UK^AUnited Kingdom^A1
 23 US^AUnited States of America^A2
 24 ZM^AZambia^A4
 25 ZW^AZimbabwe^A4


 或者直接查看hdfs中的数据

[hadoop@node1 ~]$ hadoop fs -ls  /user/hadoop/test/COUNTRIES
Warning: $HADOOP_HOME is deprecated.

Found 2 items
drwxr-xr-x   - hadoop supergroup          0 2013-06-25 16:32 /user/hadoop/test/COUNTRIES/_logs
-rw-r--r--   3 hadoop supergroup        348 2013-06-25 16:32 /user/hadoop/test/COUNTRIES/part-m-00000
[hadoop@node1 ~]$  hadoop fs -cat /user/hadoop/test/COUNTRIES/part-m-00000
Warning: $HADOOP_HOME is deprecated.

ARArgentina2
AUAustralia3
BEBelgium1
BRBrazil2
CACanada2
CHSwitzerland1
CNChina3
DEGermany1
DKDenmark1
EGEgypt4
FRFrance1
HKHongKong3
ILIsrael4
INIndia3
ITItaly1
JPJapan3
KWKuwait4
MXMexico2
NGNigeria4
NLNetherlands1
SGSingapore3
UKUnited Kingdom1
USUnited States of America2
ZMZambia4
ZWZimbabwe4
[hadoop@node1 ~]$

二、从hdfs到oracle

1.编辑脚本

[hadoop@node1 bin]$ vi hdfs2oracle.sh
CONNECTURL=jdbc:oracle:thin:@10.190.105.65:1521:PROD
ORACLENAME=SCOTT
ORACLEPASSWORD=oracle
oralceTableName=COUNTRIES
hdfsPath=/user/hadoop/test/COUNTRIES/part-m-00000
./sqoop export --connect $CONNECTURL --username $ORACLENAME --password $ORACLEPASSWORD --export-dir $hdfsPath  --num-mappers 1 --table $oralceTableName --fields-terminated-by '\001'

2.在oracle的scott用户下创建所需表

SCOTT@PROD>create table countries as select * from hr.countries where 1=0;

Table created.

SCOTT@PROD>desc countries;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 COUNTRY_ID                                NOT NULL CHAR(2)
 COUNTRY_NAME                                       VARCHAR2(40)
 REGION_ID                                          NUMBER

3.执行脚本

[hadoop@node1 bin]$ sh hdfs2oracle.sh
Warning: $HADOOP_HOME is deprecated.

13/06/25 22:57:43 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
13/06/25 22:57:44 INFO manager.SqlManager: Using default fetchSize of 1000
13/06/25 22:57:44 INFO tool.CodeGenTool: Beginning code generation
13/06/25 22:57:44 INFO manager.OracleManager: Time zone has been set to GMT
13/06/25 22:57:44 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM COUNTRIES t WHERE 1=0
13/06/25 22:57:44 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hadoop-1.2.0
Note: /tmp/sqoop-hadoop/compile/ff168bd9efae3174b4fec94442ca1892/COUNTRIES.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/06/25 22:57:46 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/ff168bd9efae3174b4fec94442ca1892/COUNTRIES.jar
13/06/25 22:57:46 INFO mapreduce.ExportJobBase: Beginning export of COUNTRIES
13/06/25 22:57:47 INFO manager.OracleManager: Time zone has been set to GMT
13/06/25 22:57:48 INFO input.FileInputFormat: Total input paths to process : 1
13/06/25 22:57:48 INFO input.FileInputFormat: Total input paths to process : 1
13/06/25 22:57:48 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/06/25 22:57:48 WARN snappy.LoadSnappy: Snappy native library not loaded
13/06/25 22:57:48 INFO mapred.JobClient: Running job: job_201306251627_0013
13/06/25 22:57:49 INFO mapred.JobClient:  map 0% reduce 0%
13/06/25 22:57:56 INFO mapred.JobClient:  map 100% reduce 0%
13/06/25 22:57:57 INFO mapred.JobClient: Job complete: job_201306251627_0013
13/06/25 22:57:57 INFO mapred.JobClient: Counters: 18
13/06/25 22:57:57 INFO mapred.JobClient:   Job Counters
13/06/25 22:57:57 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=7020
13/06/25 22:57:57 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/06/25 22:57:57 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/06/25 22:57:57 INFO mapred.JobClient:     Launched map tasks=1
13/06/25 22:57:57 INFO mapred.JobClient:     Data-local map tasks=1
13/06/25 22:57:57 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
13/06/25 22:57:57 INFO mapred.JobClient:   File Output Format Counters
13/06/25 22:57:57 INFO mapred.JobClient:     Bytes Written=0
13/06/25 22:57:57 INFO mapred.JobClient:   FileSystemCounters
13/06/25 22:57:57 INFO mapred.JobClient:     HDFS_BYTES_READ=472
13/06/25 22:57:57 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=58417
13/06/25 22:57:57 INFO mapred.JobClient:   File Input Format Counters
13/06/25 22:57:57 INFO mapred.JobClient:     Bytes Read=0
13/06/25 22:57:57 INFO mapred.JobClient:   Map-Reduce Framework
13/06/25 22:57:57 INFO mapred.JobClient:     Map input records=25
13/06/25 22:57:57 INFO mapred.JobClient:     Physical memory (bytes) snapshot=77176832
13/06/25 22:57:57 INFO mapred.JobClient:     Spilled Records=0
13/06/25 22:57:57 INFO mapred.JobClient:     CPU time spent (ms)=1440
13/06/25 22:57:57 INFO mapred.JobClient:     Total committed heap usage (bytes)=15925248
13/06/25 22:57:57 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=566824960
13/06/25 22:57:57 INFO mapred.JobClient:     Map output records=25
13/06/25 22:57:57 INFO mapred.JobClient:     SPLIT_RAW_BYTES=121
13/06/25 22:57:57 INFO mapreduce.ExportJobBase: Transferred 472 bytes in 10.3519 seconds (45.5956 bytes/sec)
13/06/25 22:57:57 INFO mapreduce.ExportJobBase: Exported 25 records.

4.在oracle上验证数据

SCOTT@PROD>select * from countries;

CO COUNTRY_NAME                              REGION_ID
-- ---------------------------------------- ----------
AR Argentina                                         2
AU Australia                                         3
BE Belgium                                           1
BR Brazil                                            2
CA Canada                                            2
CH Switzerland                                       1
CN China                                             3
DE Germany                                           1
DK Denmark                                           1
EG Egypt                                             4
FR France                                            1

CO COUNTRY_NAME                              REGION_ID
-- ---------------------------------------- ----------
HK HongKong                                          3
IL Israel                                            4
IN India                                             3
IT Italy                                             1
JP Japan                                             3
KW Kuwait                                            4
MX Mexico                                            2
NG Nigeria                                           4
NL Netherlands                                       1
SG Singapore                                         3
UK United Kingdom                                    1

CO COUNTRY_NAME                              REGION_ID
-- ---------------------------------------- ----------
US United States of America                          2
ZM Zambia                                            4
ZW Zimbabwe                                          4

25 rows selected.


 

 


 

 

 

 


 

 

 

 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值