OSCH是Oracle SQL Connector for Hadoop的缩写 ,Oracle的大数据连接器(BDC)的一个组件,可以将HDFS的文件映射成Oracle外部表,在Oracle中直接访问HDFS文件。
本文在同一台服务器上模拟oracle数据库与hadoop集群,实验目标:使用OSCH从Oracle数据库端直接访问Hadoop端的Hive表
- 软件环境部署说明
OSCH支持三种文件格式:
文本文件
OLH生成的Datapump格式
基于文本的Hive表格式,如果需要访问Hive表格式,则Oracle端和Hadoop端都需要安装OSCH
Oracle端:
服务器 | 系统用户 | 安装软件 | 软件安装路径 |
Server1 | oracle | Oracle Database 12.1.0.2 | /u01/app/oracle/product/12.1.0/dbhome_1 |
jdk1.8.0_65 | /home/oracle/jdk1.8.0_65 | ||
Hadoop 2.6.2(作为HDFS客户端) | /home/oracle/hadoop-2.6.2 | ||
OSCH 3.3.0 | /home/oracle/orahdfs-3.3.0 |
Hadoop集群端:
服务器 | 系统用户 | 安装软件 | 软件安装路径 |
Server1 | hadoop | Hadoop 2.6.2 | /home/hadoop/hadoop-2.6.2 |
Hive 1.1.1 | /home/hadoop/hive-1.1.1 | ||
Hbase 1.1.2 | /home/hadoop/hbase-1.1.2 | ||
jdk1.8.0_65 | /home/hadoop/jdk1.8.0_65 | ||
OSCH 3.3.0 | /home/hadoop/orahdfs-3.3.0 |
- 部署Oracle端的软件环境
部署Oracle数据库,使用oracle用户将jdk、hadoop、osch包解压到相应目录
-
[oracle@server1 ~]$ tree -L 1
-
.
-
├── hadoop-2.6.2
-
├── jdk1.8.0_65
-
├── orahdfs-3.3.0
- 配置Oracle端环境变量
-
#Oracle Database
-
ORACLE_SID=orcl; export ORACLE_SID
-
ORACLE_BASE=/u01/app/oracle; export ORACLE_BASE
-
ORACLE_HOME=$ORACLE_BASE/product/12.2.0/dbhome_1; export ORACLE_HOME
-
ORACLE_TERM=xterm; export ORACLE_TERM
-
-
NLS_LANG="AMERICAN_AMERICA.zhs16gbk" ; export NLS_LANG
-
NLS_DATE_FORMAT="YYYY-MON-DD HH24:MI:SS"; export NLS_DATE_FORMAT
-
-
LD_LIBRARY_PATH=$ORACLE_HOME/lib
-
LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$ORACLE_HOME/oracm/lib
-
LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/lib:/usr/lib:/usr/local/lib
-
export LD_LIBRARY_PATH
-
-
CLASSPATH=$ORACLE_HOME/JRE
-
CLASSPATH=${CLASSPATH}:$ORACLE_HOME/jlib
-
CLASSPATH=${CLASSPATH}:$ORACLE_HOME/rdbms/jlib
-
CLASSPATH=${CLASSPATH}:$ORACLE_HOME/oc4j/ant/lib/ant.jar
-
CLASSPATH=${CLASSPATH}:$ORACLE_HOME/oc4j/ant/lib/ant-launcher.jar
-
CLASSPATH=${CLASSPATH}:$JAVA_HOME/db/lib/derby.jar
-
CLASSPATH=${CLASSPATH}:$ORACLE_HOME/network/jlib
-
export CLASSPATH
-
-
#JAVA
-
export JAVA_HOME=/home/oracle/jdk1.8.0_65
-
-
#Hadoop
-
export HADOOP_USER_NAME=hadoop
-
export HADOOP_HOME=/home/oracle/hadoop-2.6.2
-
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
-
export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
-
export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec
-
export HADOOP_COMMON_HOME=${HADOOP_HOME}
-
export HADOOP_HDFS_HOME=${HADOOP_HOME}
-
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
-
export HADOOP_YARN_HOME=${HADOOP_HOME}
-
export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop
-
export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
-
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
-
-
#OSCH_HOME
-
export OSCH_HOME=/home/oracle/orahdfs-3.3.0
-
-
#PATH
-
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar:$OSCH_HOME/jlib/*
-
export PATH=$ORACLE_HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH
- 创建Oracle端数据库必要的目录对象
- 配置Oracle端的HDFS客户端
只需要设置好Hadoop Cluster Namenode的IP和端口即可
-
[oracle@server1 ~]$ cat hadoop-2.6.2/etc/hadoop/core-site.xml
-
-
<configuration>
-
<property>
-
<name>fs.defaultFS</name>
-
<value>hdfs://server1:8020</value>
-
</property>
-
</configuration>
- 验证Oracle的HDFS客户端可以访问Hadoop集群
-
[oracle@server1 ~]$ hdfs dfs -ls /
-
Found 5 items
-
drwxr-xr-x - hadoop supergroup 0 2015-12-11 02:18 /catalog
-
drwxr-xr-x - hadoop supergroup 0 2015-12-23 06:48 /hbase
-
drwx------ - hadoop supergroup 0 2015-12-21 01:31 /tmp
-
drwxr-xr-x - hadoop supergroup 0 2015-12-07 22:34 /user
-
drwxr-xr-x - hadoop supergroup 0 2015-12-06 10:49 /yarn
- 部署Hadoop端软件环境
-
[hadoop@server1 ~]$ tree -L 1
-
.
-
├── hadoop-2.6.2
-
├── hbase-1.1.2
-
├── hive-1.1.1
-
├── jdk1.8.0_65
-
├── orahdfs-3.3.0
- 配置Hadoop端环境变量
-
export JAVA_HOME=/home/hadoop/jdk1.8.0_65
-
-
export HADOOP_USER_NAME=hadoop
-
export HADOOP_YARN_USER=hadoop
-
export HADOOP_HOME=/home/hadoop/hadoop-2.6.2
-
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
-
export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
-
export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec
-
export HADOOP_COMMON_HOME=${HADOOP_HOME}
-
export HADOOP_HDFS_HOME=${HADOOP_HOME}
-
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
-
export HADOOP_YARN_HOME=${HADOOP_HOME}
-
export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop
-
export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
-
-
export HIVE_HOME=/home/hadoop/hive-1.1.1
-
export HIVE_CONF_DIR=${HIVE_HOME}/conf
-
-
export OSCH_HOME=/home/hadoop/orahdfs-3.3.0
-
-
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar:/usr/share/java/mysql-connector-java.jar
-
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*
-
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$OSCH_HOME/jlib/*
-
-
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HBASE_HOME/bin:$PATH
- 在Hadoop端,创建Hive表
-
(
-
journal VARCHAR2(25),
-
publisher VARCHAR2(25),
-
edition VARCHAR2(25),
-
title VARCHAR2(45),
-
author VARCHAR2(25)
-
);
-
-
-
# @shell
-
$ echo '1,Oracle Magazine,Oracle Publishing,Nov-Dec 2004,Database Resource Manager,Kimberly Floss
-
3,Oracle Magazine,Oracle Publishing,March-April 2005,Starting with Oracle ADF,Steve Muench' > catalog.txt
-
- 在Hadoop端,运行OSCH包通过JDBC连接到Oracle创建外部表
-
hadoop jar $OSCH_HOME/jlib/orahdfs.jar \
-
oracle.hadoop.exttab.ExternalTable \
-
-D oracle.hadoop.exttab.tableName=orders_ext \
-
-D oracle.hadoop.exttab.sourceType=hive \
-
-D oracle.hadoop.exttab.locationFileCount=2 \
-
-D oracle.hadoop.exttab.hive.tableName=orders_raw \
-
-D oracle.hadoop.exttab.hive.databaseName=default \
-
-D oracle.hadoop.exttab.defaultDirectory=osch_hive_dir \
-
-D oracle.hadoop.connection.url=jdbc:oracle:thin:@//server1:1521/orcl \
-
-D oracle.hadoop.connection.user=baron \
-
-D oracle.hadoop.exttab.printStackTrace=true \
-
-createTable
输出结果如下:
-
-
-
[Enter Database Password:]
-
15/12/15 04:45:30 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
-
15/12/15 04:45:31 INFO metastore.ObjectStore: ObjectStore, initialize called
-
15/12/15 04:45:31 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
-
SLF4J: Class path contains multiple SLF4J bindings.
-
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
-
15/12/15 04:45:33 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
-
15/12/15 04:45:37 INFO DataNucleus.Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
-
15/12/15 04:45:37 INFO metastore.ObjectStore: Initialized ObjectStore
-
15/12/15 04:45:38 INFO metastore.HiveMetaStore: Added admin role in metastore
-
-
-
(
-
"CATALOGID" INTEGER,
-
"JOURNAL" VARCHAR2(4000),
-
"PUBLISHER" VARCHAR2(4000),
-
"EDITION" VARCHAR2(4000),
-
"TITLE" VARCHAR2(4000),
-
"AUTHOR" VARCHAR2(4000)
-
)
-
ORGANIZATION EXTERNAL
-
(
-
TYPE ORACLE_LOADER
-
DEFAULT DIRECTORY "CATALOG_HIVE_DIR"
-
ACCESS PARAMETERS
-
(
-
RECORDS DELIMITED BY 0X'0A'
-
CHARACTERSET AL32UTF8
-
PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream'
-
FIELDS TERMINATED BY 0X'2C'
-
(
-
)
-
)
-
LOCATION
-
(
-
'osch-20151215044541-2290-1'
-
)
-
) PARALLEL REJECT LIMIT UNLIMITED;
-
-
The following location files were created.
-
-
osch-20151215044541-2290-1 contains 1 URI, 263 bytes
-
运行完毕后Oracle中直接外部表就已创建完毕
- 在Oracle数据库中,查询Hive外部表