连接Oracle与Hadoop(4) Oracle使用OSCH访问Hive表

OSCH是Oracle SQL Connector for Hadoop的缩写 ,Oracle的大数据连接器(BDC)的一个组件,可以将HDFS的文件映射成Oracle外部表,在Oracle中直接访问HDFS文件。

本文在同一台服务器上模拟oracle数据库与hadoop集群,实验目标:使用OSCH从Oracle数据库端直接访问Hadoop端的Hive表

  • 软件环境部署说明

OSCH支持三种文件格式:

文本文件

OLH生成的Datapump格式

基于文本的Hive表格式,如果需要访问Hive表格式,则Oracle端和Hadoop端都需要安装OSCH

 

Oracle端:

服务器

系统用户

安装软件

软件安装路径

Server1

oracle

Oracle Database 12.1.0.2

/u01/app/oracle/product/12.1.0/dbhome_1

jdk1.8.0_65

/home/oracle/jdk1.8.0_65

Hadoop 2.6.2(作为HDFS客户端)

/home/oracle/hadoop-2.6.2

OSCH 3.3.0

/home/oracle/orahdfs-3.3.0

 

Hadoop集群端:

服务器

系统用户

安装软件

软件安装路径

Server1

hadoop

Hadoop 2.6.2

/home/hadoop/hadoop-2.6.2

Hive 1.1.1

/home/hadoop/hive-1.1.1

Hbase 1.1.2

/home/hadoop/hbase-1.1.2

jdk1.8.0_65

/home/hadoop/jdk1.8.0_65

OSCH 3.3.0

/home/hadoop/orahdfs-3.3.0

 

  • 部署Oracle端的软件环境

部署Oracle数据库,使用oracle用户将jdk、hadoop、osch包解压到相应目录

  1. [oracle@server1 ~]$ tree -L 1
  2. .
  3. ── hadoop-2.6.2
  4. ── jdk1.8.0_65
  5. ── orahdfs-3.3.0

 

  • 配置Oracle端环境变量
  1. #Oracle Database
  2. ORACLE_SID=orcl; export ORACLE_SID
  3. ORACLE_BASE=/u01/app/oracle; export ORACLE_BASE
  4. ORACLE_HOME=$ORACLE_BASE/product/12.2.0/dbhome_1; export ORACLE_HOME
  5. ORACLE_TERM=xterm; export ORACLE_TERM
  6.  
  7. NLS_LANG="AMERICAN_AMERICA.zhs16gbk" ; export NLS_LANG
  8. NLS_DATE_FORMAT="YYYY-MON-DD HH24:MI:SS"; export NLS_DATE_FORMAT
  9.  
  10. LD_LIBRARY_PATH=$ORACLE_HOME/lib
  11. LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$ORACLE_HOME/oracm/lib
  12. LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/lib:/usr/lib:/usr/local/lib
  13. export LD_LIBRARY_PATH
  14.  
  15. CLASSPATH=$ORACLE_HOME/JRE
  16. CLASSPATH=${CLASSPATH}:$ORACLE_HOME/jlib
  17. CLASSPATH=${CLASSPATH}:$ORACLE_HOME/rdbms/jlib
  18. CLASSPATH=${CLASSPATH}:$ORACLE_HOME/oc4j/ant/lib/ant.jar
  19. CLASSPATH=${CLASSPATH}:$ORACLE_HOME/oc4j/ant/lib/ant-launcher.jar
  20. CLASSPATH=${CLASSPATH}:$JAVA_HOME/db/lib/derby.jar
  21. CLASSPATH=${CLASSPATH}:$ORACLE_HOME/network/jlib
  22. export CLASSPATH
  23.  
  24. #JAVA
  25. export JAVA_HOME=/home/oracle/jdk1.8.0_65
  26.  
  27. #Hadoop
  28. export HADOOP_USER_NAME=hadoop
  29. export HADOOP_HOME=/home/oracle/hadoop-2.6.2
  30. export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
  31. export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
  32. export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec
  33. export HADOOP_COMMON_HOME=${HADOOP_HOME}
  34. export HADOOP_HDFS_HOME=${HADOOP_HOME}
  35. export HADOOP_MAPRED_HOME=${HADOOP_HOME}
  36. export HADOOP_YARN_HOME=${HADOOP_HOME}
  37. export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop
  38. export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
  39. export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
  40.  
  41. #OSCH_HOME
  42. export OSCH_HOME=/home/oracle/orahdfs-3.3.0
  43.  
  44. #PATH
  45. export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar:$OSCH_HOME/jlib/*
  46. export PATH=$ORACLE_HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH

 

  • 创建Oracle端数据库必要的目录对象
  1. mkdir -p /home/oracle/osch_output
  2.  
  3. CONNECT / AS sysdba;
  4. drop DIRECTORY osch_bin_path;
  5. CREATE OR REPLACE DIRECTORY osch_bin_path AS '/home/oracle/orahdfs-3.3.0/bin';
  6. GRANT READ, EXECUTE ON DIRECTORY OSCH_BIN_PATH TO baron;
  7.  
  8. drop DIRECTORY osch_hive_dir;
  9. CREATE OR REPLACE DIRECTORY osch_hive_dir AS '/home/oracle/osch_output';
  10. GRANT READ, WRITE ON DIRECTORY osch_hive_dir TO baron;

 

  • 配置Oracle端的HDFS客户端

只需要设置好Hadoop Cluster Namenode的IP和端口即可

  1. [oracle@server1 ~]$ cat hadoop-2.6.2/etc/hadoop/core-site.xml
  2.  
  3. <configuration>
  4. <property>
  5.         <name>fs.defaultFS</name>
  6.         <value>hdfs://server1:8020</value>
  7. </property>
  8. </configuration>

 

  • 验证Oracle的HDFS客户端可以访问Hadoop集群
  1. [oracle@server1 ~]$ hdfs dfs -ls /
  2. Found 5 items
  3. drwxr-xr-x - hadoop supergroup 0 2015-12-11 02:18 /catalog
  4. drwxr-xr-x - hadoop supergroup 0 2015-12-23 06:48 /hbase
  5. drwx------ - hadoop supergroup 0 2015-12-21 01:31 /tmp
  6. drwxr-xr-x - hadoop supergroup 0 2015-12-07 22:34 /user
  7. drwxr-xr-x - hadoop supergroup 0 2015-12-06 10:49 /yarn

 

  • 部署Hadoop端软件环境
  1. [hadoop@server1 ~]$ tree -L 1
  2. .
  3. ── hadoop-2.6.2
  4. ── hbase-1.1.2
  5. ── hive-1.1.1
  6. ── jdk1.8.0_65
  7. ── orahdfs-3.3.0

 

  • 配置Hadoop端环境变量
  1. export JAVA_HOME=/home/hadoop/jdk1.8.0_65
  2.  
  3. export HADOOP_USER_NAME=hadoop
  4. export HADOOP_YARN_USER=hadoop
  5. export HADOOP_HOME=/home/hadoop/hadoop-2.6.2
  6. export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
  7. export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
  8. export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec
  9. export HADOOP_COMMON_HOME=${HADOOP_HOME}
  10. export HADOOP_HDFS_HOME=${HADOOP_HOME}
  11. export HADOOP_MAPRED_HOME=${HADOOP_HOME}
  12. export HADOOP_YARN_HOME=${HADOOP_HOME}
  13. export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop
  14. export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
  15.  
  16. export HIVE_HOME=/home/hadoop/hive-1.1.1
  17. export HIVE_CONF_DIR=${HIVE_HOME}/conf
  18.  
  19. export OSCH_HOME=/home/hadoop/orahdfs-3.3.0
  20.  
  21. export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar:/usr/share/java/mysql-connector-java.jar
  22. export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*
  23. export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$OSCH_HOME/jlib/*
  24.  
  25. export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HBASE_HOME/bin:$PATH

 

  • 在Hadoop端,创建Hive表
  1.   (
  2.     catalogid INTEGER PRIMARY KEY,
  3.     journal VARCHAR2(25),
  4.     publisher VARCHAR2(25),
  5.     edition VARCHAR2(25),
  6.     title VARCHAR2(45),
  7.     author VARCHAR2(25)
  8.   );
  9.  
  10.  
  11. # @shell
  12. $ echo '1,Oracle Magazine,Oracle Publishing,Nov-Dec 2004,Database Resource Manager,Kimberly Floss
  13. 2,Oracle Magazine,Oracle Publishing,Nov-Dec 2004,From ADF UIX to JSF,Jonas Jacobi
  14. 3,Oracle Magazine,Oracle Publishing,March-April 2005,Starting with Oracle ADF,Steve Muench' > catalog.txt
  15.  
  16. Hive> load data local inpath '/home/hadoop/catalog.txt' into table catalog;

 

  • 在Hadoop端,运行OSCH包通过JDBC连接到Oracle创建外部表
  1. hadoop jar $OSCH_HOME/jlib/orahdfs.jar \
  2. oracle.hadoop.exttab.ExternalTable \
  3. -D oracle.hadoop.exttab.tableName=orders_ext \
  4. -D oracle.hadoop.exttab.sourceType=hive \
  5. -D oracle.hadoop.exttab.locationFileCount=2 \
  6. -D oracle.hadoop.exttab.hive.tableName=orders_raw \
  7. -D oracle.hadoop.exttab.hive.databaseName=default \
  8. -D oracle.hadoop.exttab.defaultDirectory=osch_hive_dir \
  9. -D oracle.hadoop.connection.url=jdbc:oracle:thin:@//server1:1521/orcl \
  10. -D oracle.hadoop.connection.user=baron \
  11. -D oracle.hadoop.exttab.printStackTrace=true \
  12. -createTable

输出结果如下:

  1. Oracle SQL Connector for HDFS Release 3.3.0 - Production
  2.  
  3. Copyright (c) 2011, 2015, Oracle and/or its affiliates. All rights reserved.
  4.  
  5. [Enter Database Password:]
  6. 15/12/15 04:45:30 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
  7. 15/12/15 04:45:31 INFO metastore.ObjectStore: ObjectStore, initialize called
  8. 15/12/15 04:45:31 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
  9. 15/12/15 04:45:31 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
  10. SLF4J: Class path contains multiple SLF4J bindings.
  11. SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.6.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  12. SLF4J: Found binding in [jar:file:/home/hadoop/hbase-1.1.2/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  13. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  14. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
  15. 15/12/15 04:45:33 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
  16. 15/12/15 04:45:37 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
  17. 15/12/15 04:45:37 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
  18. 15/12/15 04:45:37 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
  19. 15/12/15 04:45:37 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
  20. 15/12/15 04:45:37 INFO DataNucleus.Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
  21. 15/12/15 04:45:37 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL
  22. 15/12/15 04:45:37 INFO metastore.ObjectStore: Initialized ObjectStore
  23. 15/12/15 04:45:38 INFO metastore.HiveMetaStore: Added admin role in metastore
  24. 15/12/15 04:45:38 INFO metastore.HiveMetaStore: Added public role in metastore
  25. 15/12/15 04:45:38 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
  26. 15/12/15 04:45:39 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=catalog
  27. 15/12/15 04:45:39 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=get_table : db=default tbl=catalog
  28. The create table command succeeded.
  29.  
  30. User : "BARON" performed the following actions in schema: BARON
  31.  
  32. CREATE TABLE "BARON"."CATALOG_EXT"
  33. (
  34.  "CATALOGID" INTEGER,
  35.  "JOURNAL" VARCHAR2(4000),
  36.  "PUBLISHER" VARCHAR2(4000),
  37.  "EDITION" VARCHAR2(4000),
  38.  "TITLE" VARCHAR2(4000),
  39.  "AUTHOR" VARCHAR2(4000)
  40. )
  41. ORGANIZATION EXTERNAL
  42. (
  43.    TYPE ORACLE_LOADER
  44.    DEFAULT DIRECTORY "CATALOG_HIVE_DIR"
  45.    ACCESS PARAMETERS
  46.    (
  47.      RECORDS DELIMITED BY 0X'0A'
  48.      CHARACTERSET AL32UTF8
  49.      PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream'
  50.      FIELDS TERMINATED BY 0X'2C'
  51.      MISSING FIELD VALUES ARE NULL
  52.      (
  53.        "CATALOGID" CHAR NULLIF "CATALOGID"=0X'5C4E',
  54.        "JOURNAL" CHAR(4000) NULLIF "JOURNAL"=0X'5C4E',
  55.        "PUBLISHER" CHAR(4000) NULLIF "PUBLISHER"=0X'5C4E',
  56.        "EDITION" CHAR(4000) NULLIF "EDITION"=0X'5C4E',
  57.        "TITLE" CHAR(4000) NULLIF "TITLE"=0X'5C4E',
  58.        "AUTHOR" CHAR(4000) NULLIF "AUTHOR"=0X'5C4E'
  59.      )
  60.    )
  61.    LOCATION
  62.    (
  63.      'osch-20151215044541-2290-1'
  64.    )
  65. ) PARALLEL REJECT LIMIT UNLIMITED;
  66.  
  67. The following location files were created.
  68.  
  69. osch-20151215044541-2290-1 contains 1 URI, 263 bytes
  70.  
  71.          263 hdfs://server1:8020/user/hive/warehouse/catalog/catalog.txt

运行完毕后Oracle中直接外部表就已创建完毕

 

  • 在Oracle数据库中,查询Hive外部表
  1. SQL > select count(*) from catalog_ext;
  2.  
  3.   COUNT(*)
  4. ----------
  5.          3
  6.  
  7. SQL>select * from catalog_ext;
  8.  
  9.  CATALOGID JOURNAL PUBLISHER EDITION TITLE AUTHOR
  10. ---------- ------------------------- ------------------------- ------------------------- -----------------
  11.          1 Oracle Magazine Oracle Publishing Nov-Dec 2004 Database Resource Manager Kimberly Floss
  12.          2 Oracle Magazine Oracle Publishing Nov-Dec 2004 From ADF UIX to JSF Jonas Jacobi
  13.          3 Oracle Magazine Oracle Publishing March-April 2005 Starting with Oracle ADF Steve Muench

转载于:https://www.cnblogs.com/panwenyu/p/5065414.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值