首先安装好Sequoiadb数据库。给Sequoiadb数据库搭建spark和postgresql作为SQL解析层。其中spark自己的数据存储非常小,需要使用postgresql保存spark的元数据。执行步骤如下
在sdbadmin用户下启动 PG
bin/postgres -D pg_data/ >> logfile 2>&1 &
cat ./pg_data/postgresql.conf | grep -e listen -e standard_conforming_strings
vi ./pg_data/postgresql.conf
修改下面两项配置
listen_addresses = '*'
standard_conforming_strings = off
在/pgsql/pg_data/pg_hba.conf中在下面一行后面追加一行
#host all all 127.0.0.1/32 trust
#追加一行:
host all all 0.0.0.0/0 trust
下载 postgresql 的 jdbc 驱动
将生成的jdbc包拷贝到spark lib中
cp /usr/share/java/postgresql-jdbc.jar /opt/spark/spark-2.1.0-bin-hadoop2.7/lib/
在spark的conf中配置spark-env.sh
vi conf/spark-env.sh
SPARK_CLASSPATH=/opt/spark/spark-2.1.0-bin-hadoop2.7/lib/sequoiadb.jar:/opt/spark/spark-2.1.0-bin-hadoop2.7/lib/spark-sequoiadb_2.11-2.6.0.jar:/opt/spark/spark-2.1.0-bin-hadoop2.7/lib/postgresql-jdbc.jar
重启 PG
#停止PG
bin/pg_ctl -s -D pg_data/ -m fast stop
#启动PG
bin/postgres -D pg_data/ >> logfile 2>&1 &
将俩个安装包放在同一目录下:
hive-schema-1.2.0.postgres.sql
hive-txn-schema-0.13.0.postgres.sql
这里是放在/home/sdbadmin/pgsql/目录下
需要进行修改,打开hive-schema-1.2.0.postgres.sql在最后进行修改,给hive-txn-schema-0.13.0.postgres.sql;前面加上绝对路径
vi hive-schema-1.2.0.postgres.sql
\i /home/sdbadmin/pgsql/hive-txn-schema-0.13.0.postgres.sql;
进入PG(这是设置的用户名和密码是写postgre的用户和密码,用户和密码均为sdbadmi)
./bin/psql -p 5432 foo
postgres=# CREATE USER sdbadmin WITH PASSWORD 'sdbadmin';
postgres=# CREATE DATABASE metastore;
postgres=# \c metastore;
You are now connected to database 'metastore'.
执行 hive 里的sql 脚本(这个命令其实就是给postgresql 预先创建一些未来要用到的table和设置变量之类)
postgres=# \i /home/sdbadmin/pgsql/hive-schema-1.2.0.postgres.sql
postgres=# \d
metastore=# \o /tmp/grant-privs
metastore=# SELECT 'GRANT SELECT,INSERT,UPDATE,DELETE ON "' || schemaname || '"."' || tablename || '" TO sdbadmin ;' FROM pg_tables WHERE tableowner = CURRENT_USER and schemaname = 'public';
metastore=# \o
metastore=# \i /tmp/grant-privs
注意SELECT的SQL要粘贴完全,不能遗漏。
注意其中蓝色部分替换为自己的用户名。
切换到spark目录下
cd /spark/spark-2.1.0-bin-hadoop2.7/conf
新建配置文件hive-site.xml
>vi hive-site.xml
<configuration>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://T-NL-HISDB01:5432/metastore</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.postgresql.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>sdbadmin</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>sdbadmin</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
<description>creates necessary schema on a startup if one doesn't exist. set this to false, after creating it once</description>
</property>
</configuration>
注意:
javax.jdo.option.ConnectionURL 填写postgresql 的服务地址、端口号和database
javax.jdo.option.ConnectionUserName 填写postgre用户名(sdbadmin)
javax.jdo.option.ConnectionPassword 填写用户密码(sdbadmin)
重新启动spark,依次执行如下脚本
cd /usr/spark/sbin
./stop-all.sh
./stop-thriftserver.sh
jps
./start-all.sh
./start-thriftserver.sh
重新启动PG
#检查5432端口是否已经启动
netstat -nap | grep 5432
#停止PG
bin/pg_ctl -s -D pg_data/ -m fast stop
#启动PG
bin/postgres -D pg_data/ >> logfile 2>&1 &
netstat -nap | grep 5432
进入PG 查看是否已经连接到spark
select * from pg_stat_activity;