一、前期准备工作
1、准备三台物理机,master(192.168.251.8),dataserver1(192.168.251.9),dataserver2(192.168.251.10);
2、目前最新版本是2.4.0,
官网下载地址:http://hawq.apache.org/
源码编辑及安装Apache官方文档地址为:https://cwiki.apache.org/confluence/display/HAWQ/Build+and+Install
3、此次选择安装部署的是在官网上打包号的安装包,下载地址: http://apache.org/dyn/closer.cgi/hawq/2.4.0.0/apache-hawq-rpm-2.4.0.0.tar.gz ,将部署包拷贝到相应的服务器中;
4、关闭防火墙
关闭防火墙:systemctl stop firewalld
关闭防火墙自动运行:systemctl disable firewalld
查看防火墙状态:systemctl status firewalld
5、HAWQ是基于hadoop的,在安装HAWQ前确保已经安装好了hadoop集群。
二、依赖项及前期配置
前期保证网络正常
1、安装依赖项,依次执行下列命令
wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
# For CentOs 7 the link is https://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-9.noarch.rpm
rpm -ivh epel-release-latest-7.noarch.rpm
yum makecache
# On redhat7, make sure enabled rhel-7-server-extras-rpms and rhel-7-server-optional-rpms channel in /etc/yum.repos.d/redhat.repo
# Otherwise yum will prompt some packages(e.g. gperf) not be found
yum install -y man passwd sudo tar which git mlocate links make bzip2 net-tools \
autoconf automake libtool m4 gcc gcc-c++ gdb bison flex gperf maven indent \
libuuid-devel krb5-devel libgsasl-devel expat-devel libxml2-devel \
perl-ExtUtils-Embed pam-devel python-devel libcurl-devel snappy-devel \
thrift-devel libyaml-devel libevent-devel bzip2-devel openssl-devel \
openldap-devel protobuf-devel readline-devel net-snmp-devel apr-devel \
libesmtp-devel python-pip json-c-devel \
java-1.7.0-openjdk-devel lcov cmake3 \
openssh-clients openssh-server perl-JSON perl-Env
# need tomcat6 if enable-rps
# download from http://archive.apache.org/dist/tomcat/tomcat-6/v6.0.44/
ln -s /usr/bin/cmake3 /usr/bin/cmake
pip --retries=50 --timeout=300 install pycrypto
2、修改系统环境参数,打开 vim /etc/sysctl.conf,添加如下配置
kernel.shmmax = 1000000000
kernel.shmmni = 4096
kernel.shmall = 4000000000
kernel.sem = 250 512000 100 2048
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_syn_backlog = 200000
net.ipv4.conf.all.arp_filter = 1
net.ipv4.ip_local_port_range = 1281 65535
net.core.netdev_max_backlog = 200000
vm.overcommit_memory = 2
fs.nr_open = 3000000
kernel.threads-max = 798720
kernel.pid_max = 798720
#增加网络
net.core.rmem_max = 2097152
net.core.wmem_max = 2097152
执行以下命令将更新的 /etc/sysctl.conf 文件应用于操作系统配置:
sysctl -p
3、使用文本编辑器编辑 /etc/security/limits.conf 文件
#按照列出的确切顺序添加以下定义
#(请确保在编辑limits.conf之前应用fs.nr_open = 3000000,否则您可能无法ssh到您的实例)
* soft nofile 2900000
* hard nofile 2900000
* soft nproc 131072
* hard nproc 131072
4、新增 gpadmin用户(root下不能运行HAWQ)
useradd -m gpadmin -G root -s /bin/bash
passwd gpadmin
5、授予管理员权限
输入visudo命令,在打开的文件中找到 root ALL=(ALL) ALL 这一行
在底部补充添加一行 gpadmin ALL=(ALL) ALL 保存退出
6、给gpadmin用户配置SSH无密码登录
1)进入gpadmin用户目录下,ssh-keygen -t rsa 生成各自机器对应的公钥文件;
2)将集群中各自公钥文件集成到authorized_keys文件中;
3)将authorized_keys文件拷贝到集群各个节点中的~/.ssh/目录下;
4)给authorized_keys授权文件,其它用户可以访问
chmod 600 authorized_keys。
三、安装HAWQ
1、解压HAWQ压缩包到目标目录/usr/local下
tar -zxvf apache-hawq-rpm-2.4.0.0.tar.gz -C /usr/local/
2、进入解压后的目录下 执行rpm,安装hawq
cd hawq_rpm_packages/
rpm -ivh apache-hawq-2.4.0.0-el7.x86_64.rpm
3、将安装目录的所属用户及所属组改为gpadmin
chown -hR gpadmin /usr/local/apache-hawq/
chgrp -hR gpadmin /usr/local/apache-hawq/
4、在/usr/local/apache-hawq目录下创建 hawq_data_directory文件夹并在其中分别 创建文件夹masterdd及segment
mkdir /usr/local/apache-hawq/hawq-data-directory/masterdd
mkdir /usr/local/apache-hawq/hawq-data-directory/segmentdd
5、配置/usr/local/apache-hawq/etc目录下的hawq-site.xml文件(主要的配置信息如下,其它的默认保持不变)
<configuration>
<property>
<name>hawq_master_address_host</name>
<value>master</value>
<description>The host name of hawq master.</description>
</property>
<property>
<name>hawq_master_address_port</name>
<value>5432</value>
<description>The port of hawq master.</description>
</property>
<property>
<name>hawq_standby_address_host</name>
<value>none</value>
<description>The host name of hawq standby master.</description>
</property>
<property>
<name>hawq_segment_address_port</name>
<value>40000</value>
<description>The port of hawq segment.</description>
</property>
<property>
<name>hawq_dfs_url</name>
<value>master:9000/hawq_default</value> # 端口及ip地址与dfs的一致
<description>URL for accessing HDFS.</description>
</property>
<property>
<name>hawq_master_directory</name>
<value>/usr/local/apache-hawq/hawq-data-directory/masterdd</value>
<description>The directory of hawq master.</description>
</property>
<property>
<name>hawq_segment_directory</name>
<value>/usr/local/apache-hawq/hawq-data-directory/segmentdd</value>
<description>The directory of hawq segment.</description>
</property>
<property>
<name>hawq_master_temp_directory</name>
<value>usr/local/apache-hawq/tmp</value>
<description>The temporary directory reserved for hawq master.</description>
</property>
<property>
<name>hawq_segment_temp_directory</name>
<value>usr/local/apache-hawq/tmp</value>
<description>The temporary directory reserved for hawq segment.</description>
</property>
<property>
<name>hawq_rm_yarn_address</name>
<value>master:8032</value>
<description>The address of YARN resource manager server.</description>
</property>
<property>
<name>hawq_rm_yarn_scheduler_address</name>
<value>master:8030</value>
<description>The address of YARN scheduler server.</description>
</property>
<property>
<name>hawq_rm_yarn_queue_name</name>
<value>default</value>
<description>The YARN queue name to register hawq resource manager.</description>
</property>
<property>
<name>hawq_rps_address_port</name>
<value>8432</value>
<description>The port number of Ranger Plugin Serice. HAWQ RPS address is
http://$rps_host(hawq_master_address_host or hawq_standby_address_host):$hawq_rps_address_port/rps
For example, http://localhost:8432/rps
</description>
</property>
<property>
<name>default_hash_table_bucket_number</name>
<value>12</value>
</property>
</configuration>
注:master 和 standby 装在 hadoop namenode 和secondnamenode 上, segmentdd 装在datanode所在服务器
5、配置pgadmin用户免密登陆
cd /usr/local/apache-hawq #进入hawq目录中
source greenplum_path.sh
cd bin
./hawq ssh-exkeys -h master -h dataserver1 -h dataserver2
6、切换到hadoop用户,在hadoop创建hawq所需的文件夹,并改变文件夹所有者
su hadoop
hadoop dfs -mkdir /hawq_default
hadoop dfs -chown gpadmin:gpadmin /hawq_default
7、初始化hawq
cd /usr/local/apache-hawq/bin
./hawq init cluster
初始化后,默认hawq是启动状态;
8、启动和关闭hawq
启动之前保证hadoop服务已启动
./hawq start cluster
./hawq stop cluster
9、在pg_hba.conf文件中添加如下
host all gpadmin 192.168.251.1/24 trust
可以远程访问(例如可以使用navicat工具)
官网文档:http://hawq.apache.org/docs/userguide/2.3.0.0-incubating/tutorial/overview.html