Tajo是韩国一个科研团队在2013年ICDE上发表的一个分布式的数据仓库系统。现在Tajo是Apache下的一个二级项目。Tajo作为Hadoop生态系统中的数据仓库系统,其设计思想和Hive有着本质的区别。是一个很有意思的系统。最近花了一周的时间部署了单机版的Tajo系统和集群版的Tajo系统。本文旨在总结安装技术及经验。
Tajo论文地址:http://ieeexplore.ieee.org/document/6544934/
Tajo官网:http://tajo.apache.org/
一、安装前准备
1.1 硬件系统描述
本次实验的硬件平台是一个由三节点(每个节点8GB内存+128GB SSD+3块3TB SATA)+一台千兆交换机组成的集群。
在该集群上,本人在之前已经部署了分布式的Hadoop-2.7.3、Zookeeper-3.4.9、HBase-1.2.4和MySQL。上述系统的部署详情请参考本人以下博客:
- Hadoop:https://my.oschina.net/xhhuang/blog/807914
- Zookeeper:https://my.oschina.net/xhhuang/blog/808600
- HBase:https://my.oschina.net/xhhuang/blog/809210
- MySQL:https://my.oschina.net/xhhuang/blog/865372
1.2 软件及版本
1)Tajo:本文部署的Tajo版本是0.11.3
- Tajo-0.11.3下载镜像地址:http://www.apache.org/dyn/closer.cgi/tajo
- Tajo-0.11.3文档:http://tajo.apache.org/docs/current/index.html
进入tajo-0.11.3目录下后,有三个文件可以下载:
其中src.tar.gz文件是可以通过maven编译生成tajo-0.11.3.tar.gz安装文件的,官方文档在结合hadoop版本编译这一块写的有一些错误,所以不推荐使用src.tar.gz自己编译。可以直接下载tajo-0.11.3.tar.gz使用,亲测有效。tajo-jdbc-0.11.3.jar是提供java程序访问RDBMS类似功能的包。在这个页面只需要下载tajo-0.11.3.tar.gz和tajo-jdbc-0.11.3.jar两个文件。
2)JDK
Tajo-0.11.3对JDK的版本最低要求是1.6。由于本人之前在安装Hadoop-2.7.3已经安装了JDK 1.8。所以安装步骤不在此赘述。
3)Hadoop
Tajo-0.11.3对Hadoop的版本要求最低是2.3.0。安装过程不在此赘述。
4)Protocol Buffer
Tajo-0.11.3官方文档指明的Protocol Buffer的版本要求是2.5.0
下载地址:(需要访问外网)https://github.com/google/protobuf/releases?after=v2.6.1
如下图,直接下载protobuf-2.5.0.tar.gz即可。
解压之后执行下面命令安装即可:
./configure
make
make check
sudo make install
5)mysql-connector-java-5.1.42.tar.gz
在配置集群部署时用到
下载地址:https://dev.mysql.com/downloads/connector/j/
二、单机部署
2.1 解压tajo-0.11.3.tar.gz
通过命令
tar -xzvf tajo-0.11.3.tar.gz
将tajo-0.11.3.tar.gz文件解压至理想的目录,将Tajo的相关路径添加至/etc/profile文件中
sudo vi /etc/profile
在文件末尾添加如下内容:
export TAJO_HOME=/home/hadoop/tajo-0.11.3
export PATH=$PATH:$TAJO_HOME/bin
然后在执行下述命令即可
sudo source /etc/profile
2.2 相关文件配置
进入conf/目录下,conf目录下的文件如下图所示:
通过以下命令复制catalog-sites.xml、storage-site.json和tajo-site.xml三个文件
cp catalog-site.xml.template catalog-site.xml
cp storage-site.json.template storage-site.json
tajo-site.xml.template tajo-site.xml
单机部署只需要编辑tajo-env.sh文件,将hadoop路径和jdk路径添加即可运行。
下面是范例:
# Hadoop home. Required
# export HADOOP_HOME=
export HADOOP_HOME=/home/hadoop/hadoop-2.7.3
# The java implementation to use. Required.
# export JAVA_HOME=/usr/java/default
export JAVA_HOME=/usr/java/jdk1.8.0_112
2.3 创建目录
在tajo-0.11.3目录下,使用mkdir命令创建logs、pids目录。并在tajo-site.xml文件下配置如下项:
# Where log files are stored. $TAJO_HOME/logs by default.
# export TAJO_LOG_DIR=${TAJO_HOME}/logs
export TAJO_LOG_DIR=${TAJO_HOME}/logs
# The directory where pid files are stored. /tmp by default.
# export TAJO_PID_DIR=/var/tajo/pids
export TAJO_PID_DIR=${TAJO_HOME}/pids
配置完后直接运行start-tajo.sh即可,运行tsql命令可进入sql终端。
三、集群部署
集群部署master和worker要单独配置。
3.1 配置catalog存储,tajo默认将catalog存储在derby系统中,但是将catalog存储在RDBMS中能够更好的发挥其性能。因此本文将catalog存储在MySQL中。配置步骤如下:
- 在MySQL新建Tajo数据库和Tajo用户,并赋予相关权限
mysql> create user 'tajo'@'localhost' identified by 'hadoop';
Query OK, 0 rows affected (0.00 sec)
mysql> create database tajo;
Query OK, 1 row affected (0.00 sec)
mysql> grant all on tajo.* to 'tajo'@'localhost';
Query OK, 0 rows affected (0.01 sec)
- 配置mysql-connector-java-5.1.42.tar.gz
将下载好的mysql-connector-java-5.1.42.tar.gz文件解压,将其中的
mysql-connector-java-5.1.42-bin.jar
文件拷贝至tajo-0.11.3目录下的lib目录下
- 配置catalog-site.xml
<!-- JDBC Common Settings -->
<!-- Please remove comments if you want JDBC-based catalog store. -->
<property>
<name>tajo.catalog.jdbc.connection.id</name>
<value>tajo</value>
</property>
<property>
<name>tajo.catalog.jdbc.connection.password</name>
<value>hadoop</value>
</property>
-->
<!-- JDBC Store Section -->
<!-- Please remove comments corresponding to your desired catalog store. -->
<!-- You must choose only one catalog store driver among them. -->
<!-- MySQL Catalog Store Driver -->
<property>
<name>tajo.catalog.store.class</name>
<value>org.apache.tajo.catalog.store.MySQLStore</value>
</property>
<property>
<name>tajo.catalog.jdbc.uri</name>
<value>jdbc:mysql://localhost:3306/tajo?rewriteBatchedStatements=true</value>
</property>
3.2 配置tajo-site.xml
master节点和slave节点的tajo-site.xml文件配置是不一样的,本文为了方便在master节点进行配置,然后使用scp命令将tajo-0.11.3目录分发至其他slave节点。
<property>
<name>tajo.rootdir</name>
<value>hdfs://master:9000/tajo</value>
<description>Base directory including system directories.</description>
</property>
<property>
<name>tajo.master.umbilical-rpc.address</name>
<value>master:26001</value>
<description>TajoMaster binding address between master and workers.</description>
</property>
<property>
<name>tajo.master.client-rpc.address</name>
<value>master:26002</value>
<description>TajoMaster binding address between master and clients.</description>
</property>
<property>
<name>tajo.resource-tracker.rpc.address</name>
<value>master:26003</value>
<description>TajoMaster binding address between master and workers.</description>
</property>
<property>
<name>tajo.catalog.client-rpc.address</name>
<value>master:26005</value>
<description>CatalogServer binding address between catalog server and workers.</description>
</property>
<property>
<name>tajo.worker.tmpdir.locations</name>
<value>/home/hadoop/tajo-0.11.3/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>tajo.worker.resource.dfs-dir-aware</name>
<value>true</value>
</property>
3.3 配置worker文件
worker文件指定了哪些节点是worker,配置如下:
slave1
slave2
3.4 使用scp命令分发配置好的tajo目录
scp /home/hadoop/tajo-0.11.3 hadoop@slave1:/home/hadoop/
scp /home/hadoop/tajo-0.11.3 hadoop@slave2:/home/hadoop/
启动start-tajo.sh命令即可启动tajo集群,在浏览器中输入master:26080即可看到tajo的web UI。如下图所示:
4.BUG记录
在配置Tajo集群时,本人遇到了如下问题,集群启动不了,然后报错如下:
[hadoop@master ~]$ start-tajo.sh
Starting single TajoMaster
starting master, logging to /home/hadoop/tajo-0.11.3/logs/tajo-hadoop-master-master.out
slave1: starting worker, logging to /home/hadoop/tajo-0.11.3/logs/tajo-hadoop-worker-slave1.out
slave2: starting worker, logging to /home/hadoop/tajo-0.11.3/logs/tajo-hadoop-worker-slave2.out
Tajo master web UI: http://master:26080
Tajo Client Service: master:26002
[hadoop@master ~]$ tsql
Exception in thread "main" org.apache.tajo.exception.TajoRuntimeException: org.apache.tajo.client.v2.exception.ClientConnectionException: Connection refused: master/59.71.229.189:26002 (Notes: the IP address of the master is 59.71.229.189)
at org.apache.tajo.client.SessionConnection.getTajoMasterConnection(SessionConnection.java:137)
at org.apache.tajo.client.SessionConnection.<init>(SessionConnection.java:110)
at org.apache.tajo.client.TajoClientImpl.<init>(TajoClientImpl.java:64)
at org.apache.tajo.cli.tsql.TajoCli.<init>(TajoCli.java:255)
at org.apache.tajo.cli.tsql.TajoCli.main(TajoCli.java:719)
Caused by: org.apache.tajo.client.v2.exception.ClientConnectionException: Connection refused: master/59.71.229.189:26002
... 5 more
Caused by: io.netty.channel.ConnectTimeoutException: Connection refused: master/59.71.229.189:26002
at org.apache.tajo.rpc.NettyClientBase.makeConnectException(NettyClientBase.java:202)
at org.apache.tajo.rpc.NettyClientBase.doReconnect(NettyClientBase.java:251)
at org.apache.tajo.rpc.NettyClientBase.connect(NettyClientBase.java:220)
at org.apache.tajo.rpc.RpcClientManager.newBlockingClient(RpcClientManager.java:164)
at org.apache.tajo.client.SessionConnection.getTajoMasterConnection(SessionConnection.java:134)
... 4 more
[hadoop@master ~]$
查看logs目录下文件内容如下:
tajo-hadoop-master-master.out
org.apache.tajo.exception.TajoInternalError: internal error: null
at org.apache.tajo.catalog.CatalogServer.serviceInit(CatalogServer.java:140)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.tajo.master.TajoMaster.serviceInit(TajoMaster.java:213)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.tajo.master.TajoMaster.main(TajoMaster.java:603)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.tajo.catalog.CatalogServer.serviceInit(CatalogServer.java:135)
... 5 more
Caused by: org.apache.tajo.exception.MetadataConnectionException: cannot connect metadata store 'jdbc:mysql://master:3306/tajo?rewriteBatchedStatements=true': Access denied for user 'tajo'@'master' (using password: YES)
at org.apache.tajo.catalog.store.AbstractDBStore.<init>(AbstractDBStore.java:163)
at org.apache.tajo.catalog.store.MySQLStore.<init>(MySQLStore.java:34)
... 10 more
Caused by: java.sql.SQLException: Access denied for user 'tajo'@'master' (using password: YES)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:964)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:873)
at com.mysql.jdbc.MysqlIO.proceedHandshakeWithPluggableAuthentication(MysqlIO.java:1710)
at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1226)
at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2194)
at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2225)
at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2024)
at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:779)
at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:425)
at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:389)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:330)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at org.apache.tajo.catalog.store.MySQLStore.createConnection(MySQLStore.java:49)
at org.apache.tajo.catalog.store.AbstractDBStore.<init>(AbstractDBStore.java:160)
... 11 more
通过查看mysql中tajo数据库的情况以及自己结合日志的分析,最后这个错误的原因是因为在master端的catalog-site.xml文件如下一块配置错误。
<property>
<name>tajo.catalog.jdbc.uri</name>
<value>jdbc:mysql://localhost:3306/tajo?rewriteBatchedStatements=true</value>
</property>
将jdbc:mysql://localhost:3306/tajo?rewriteBatchedStatements=true配置成了
jdbc:mysql://master:3306/tajo?rewriteBatchedStatements=true
因为mysql是装在master本地的,而我在创建tajo数据库和用户的时候,给的权限都是localhost,因此这里出错。排查了3天,好心酸。。