1. 下载安装包
移步这里下载:https://download.csdn.net/download/fm0517/10577703
2. 节点分类及说明
【GTM】
全局事务控制节点,保证集群数据的一致性,与Coordinator节点和Datanode节点不断通信,是整个集群的核心节点,只存在一个,可以存在一个GTM Standby节点,对GTM实时备份。GTM一旦故障,整个集群立刻无法访问,此时可以切换到GTM Standby节点上。如果部署了GTM Standby节点,就应该同时部署GTM Proxy,一般和Coordinator、Datanode部署在同一台服务器上。GTM Proxy的作用代理Coordinator和Datanode对GTM的访问,起到减轻GTM负载的作用,另外一个重要的作用是帮助完成GTM的故障切换,当GTM节点发生故障后,GTM Standby成为新的GTM,此时Coordinator和Datanode节点并不需要重新指定GTM地址,只需要GTM Proxy重新连接到新的GTM地址即可。
【Coordinator】
接收数据访问请求的节点,本质上是由PG后台进程组成。接收的一条查询后,Coordinator节点执行查询计划,然后会根据查询数据涉及的数据节点将查询分发给相关的数据节点。写入数据时,也会根据不同的数据分布策略将数据写入相关的节点。可以说Coordinator节点上保存着集群的全局数据位置。Coordinator节点可以任意扩展,各个节点之间除了访问地址不同以外是完全对等的,通过一个节点更新的数据可以在另一个节点上立刻看到。每个Coordinator节点可以配置一个对应的standby节点,避免单点故障。
【Datanode】
实际存取数据的节点,接收Coordinator的请求并执行SQL语句存取数据,节点之间也会互相通信。一般的,一个节点上的数据并不是全局的,数据节点不直接对外提供数据访问。一个表的数据在数据节点上的分布存在两种模式:复制模式和分片模式,复制模式下,一个表的数据在指定的节点上存在多个副本;分片模式下,一个表的数据按照一定的规则分布在多个数据节点上,这些节点共同保存一份完整的数据。这两种模式的选择是在创建表的时候执行CREATE TABLE语句指定的,具体语法如下:
CREATE TABLE table_name(...)
DISTRIBUTE BY
HASH(col)|MODULO(col)|ROUNDROBIN|REPLICATION
TO NODE(nodename1,nodename2...)
可以看到,如果DISTRIBUTE BY 后面是REPLICATION,则是复制模式,其余则是分片模式,HASH指的是按照指定列的哈希值分布数据,MODULO指的是按照指定列的取摩运算分布数据,ROUNDROBIN指的是按照轮询的方式分布数据。TO NODE指定了数据分布的节点范围,如果没有指定则默认所有数据节点参与数据分布。如果没有指定分布模式,即使用普通的CREATE TABLE语句,PGXL会默认采用分片模式将数据分布到所有数据节点。
3. 主机规划
GTM: server0: 192.168.51.140
Coordinator: server1: 192.168.51.141
Datanode1 Master: server2: 192.168.51.142
Datanode2 Master: server3: 192.168.51.143
Datanode1 Slave: server4: 192.168.51.144
Datanode2 Slave: server5: 192.168.51.145
—–Datanode3用于后面演示动态增删节点—–
Datanode3 Master: server6: 192.168.51.146
【下面的步骤4-13,在所有节点上都要执行】
4. 修改主机hosts文件
# vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.51.140 server0
192.168.51.141 server1
192.168.51.142 server2
192.168.51.143 server3
192.168.51.144 server4
192.168.51.145 server5
192.168.51.146 server6
5. 安装依赖的软件包
openssh-clients, flex, bison, readline-devel, zlib-devel, openjade, docbook-style-dsssl, gcc
用rpm命令查看是否已安装
# rpm -qa |grep xxx
如果没装可以用yum安装
6. 卸载操作系统自带的PostgreSQL
# rpm -qa |grep postgresql
# rpm -qa |grep postgresql | xargs rpm -e --nodeps
# rpm -qa |grep postgresql
7. 添加postgres用户
建立组:
# groupadd postgres
建立用户:
# useradd -m -d /home/postgres postgres -g postgres
初始化密码:
# passwd postgres
输入密码:12345678(举例)
注:如果需要删除postgres用户,可以以root用户执行命令:
# userdel -r postgres
8. 配置免密登陆
root用户
# ssh-keygen
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@server0
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@server1
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@server2
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@server3
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@server4
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@server5
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@server6
postgres用户
$ ssh-keygen
$ ssh-copy-id -i ~/.ssh/id_rsa.pub postgres@server0
$ ssh-copy-id -i ~/.ssh/id_rsa.pub postgres@server1
$ ssh-copy-id -i ~/.ssh/id_rsa.pub postgres@server2
$ ssh-copy-id -i ~/.ssh/id_rsa.pub postgres@server3
$ ssh-copy-id -i ~/.ssh/id_rsa.pub postgres@server4
$ ssh-copy-id -i ~/.ssh/id_rsa.pub postgres@server5
$ ssh-copy-id -i ~/.ssh/id_rsa.pub postgres@server6
如果配置完成后,发现ssh并不免密,可按下面操作进行完善:
chmod 755 /home/postgres
chmod 700 /home/postgres/.ssh
chmod 644 /home/postgres/.ssh/authorized_keys
9. 修改内核参数
在/etc/sysctl.conf中添加参数kernel.sem
# vi /etc/sysctl.conf
kernel.sem = 50100 128256000 50100 2560
使参数生效
# sysctl -p
验证参数是否生效。
# ipcs -ls
—— Semaphore Limits ——–
max number of arrays = 2560
max semaphores per array = 50100
max semaphores system wide = 128256000
max ops per semop call = 50100
semaphore max value = 32767
10. 配置防火墙
关闭防火墙
# systemctl stop firewalld
# systemctl disable firewalld
或在防火墙中开放端口
遇到防护墙不能关闭的情况,可使用该步骤!
具体需要开放的端口参加(主机规划部分),以5432为例:
使用这些命令来永久打开一个新端口(如TCP/5432)。
# sudo firewall-cmd --zone=public --add-port=5432/tcp --permanent
# sudo firewall-cmd --reload
注:查看防火墙状态命令:systemctl status firewalld
注:开启防火墙:systemctl start firewalld
注:关闭防火墙:systemctl stop firewalld
11. 关闭SELinux
查看SELinux状态,执行:
# /usr/sbin/sestatus -v
如果SELinux status参数为enabled即为开启状态
永久关闭SELinux,执行:
# vi /etc/selinux/config
将 SELINUX=enforcing 改为 SELINUX=disabled
重启后生效
12. 安装Postgres-XL软件
解压
# tar -jxvf postgres-xl-9.5r1.6.tar.bz2
# chown -R postgres:postgres postgres-xl-9.5r1.6
切换用户
# su - postgres
安装
$ cd postgres-xl-9.5r1.6
$ ./configure --prefix=/home/postgres/pgxl9.5
$ make
$ make install
安装扩展
$ cd contrib
$ make
$ make install
13. 配置环境变量
在文件.bash_profile中添加如下内容:
# su - postgres
$ vi .bash_profile
export PGHOME=/home/postgres/pgxl9.5
export PGUSER=postgres
export LD_LIBRARY_PATH=$PGHOME/lib:$LD_LIBRARY_PATH
export PATH=$PGHOME/bin:$PATH
使环境变量生效:
$ source ~/.bashrc
验证环境变量是否生效:
pg_ctl --version
显示:pg_ctl (PostgreSQL) 9.5.8 (Postgres-XL 9.5r1.6)
14. 配置Postgres-XL集群
在GTM节点上以postgres用户执行:
$ pgxc_ctl ---初次执行,会提示Error说没有配置文件,忽略即可
PGXC prepare ---执行该命令将会生成一份配置文件模板
PGXC exit --退出 pgxc_ctl交互窗
执行完成后,在postgres用户根目录下,会生成一个pgxc_ctl目录,编辑其中的pgxc_ctl.conf文件
$ vi pgxc_ctl.conf
修改为:
#!/usr/bin/env bash
# pgxcInstallDir variable is needed if you invoke "deploy" command from pgxc_ctl utility.
# If don't you don't need this variable.
pgxcInstallDir=$PGHOME
pgxlDATA=$PGHOME/data
#---- OVERALL -----------------------------------------------------------------------------
#
pgxcOwner=postgres # owner of the Postgres-XC databaseo cluster. Here, we use this
# both as linus user and database user. This must be
# the super user of each coordinator and datanode.
pgxcUser=$pgxcOwner # OS user of Postgres-XC owner
tmpDir=/tmp # temporary dir used in XC servers
localTmpDir=$tmpDir # temporary dir used here locally
configBackup=n # If you want config file backup, specify y to this value.
configBackupHost=pgxc-linker # host to backup config file
configBackupDir=$HOME/pgxc # Backup directory
configBackupFile=pgxc_ctl.bak # Backup file name --> Need to synchronize when original changed.
#---- GTM ------------------------------------------------------------------------------------
#---- GTM Master -----------------------------------------------
#---- Overall ----
gtmName=gtm
gtmMasterServer=server0
gtmMasterPort=6666
gtmMasterDir=$pgxlDATA/nodes/gtm
#---- Configuration ---
gtmExtraConfig=none # Will be added gtm.conf for both Master and Slave (done at initilization only)
gtmMasterSpecificExtraConfig=none # Will be added to Master's gtm.conf (done at initialization only)
#---- Coordinators ----------------------------------------------------------------------------------------------------
#---- shortcuts ----------
coordMasterDir=$pgxlDATA/nodes/coord
coordSlaveDir=$pgxlDATA/nodes/coord_slave
coordArchLogDir=$pgxlDATA/nodes/coord_archlog
#---- Overall ------------
coordNames=(coord1) # Master and slave use the same name
coordPorts=(5432) # Master ports
poolerPorts=(20004) # Master pooler ports
coordPgHbaEntries=(0.0.0.0/0) # Assumes that all the coordinator (master/slave) accepts
# the same connection
# This entry allows only $pgxcOwner to connect.
# If you'd like to setup another connection, you should
# supply these entries through files specified below.
#---- Master -------------
coordMasterServers=server1 # none means this master is not available
coordMasterDirs=$coordMasterDir
coordMaxWALsernder=10 # max_wal_senders: needed to configure slave. If zero value is specified,
# it is expected to supply this parameter explicitly by external files
# specified in the following. If you don't configure slaves, leave this value to zero.
coordMaxWALSenders=$coordMaxWALsernder
# max_wal_senders configuration for each coordinator.
#---- Configuration files---
coordExtraConfig=coordExtraConfig # Extra configuration file for coordinators.
# This file will be added to all the coordinators'
# postgresql.conf
# Pleae note that the following sets up minimum parameters which you may want to change.
# You can put your postgresql.conf lines here.
cat > $coordExtraConfig <<EOF
#================================================
# Added to all the coordinator postgresql.conf
# Original: $coordExtraConfig
log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
listen_addresses = '*'
max_connections = 512
EOF
# Additional Configuration file for specific coordinator master.
# You can define each setting by similar means as above.
coordSpecificExtraConfig=(none none)
coordExtraPgHba=none # Extra entry for pg_hba.conf. This file will be added to all the coordinators' pg_hba.conf
coordSpecificExtraPgHba=(none none)
#---- Datanodes -------------------------------------------------------------------------------------------------------
#---- Shortcuts --------------
datanodeMasterDir=$pgxlDATA/nodes/dn_master
datanodeSlaveDir=$pgxlDATA/nodes/dn_slave
datanodeArchLogDir=$pgxlDATA/nodes/datanode_archlog
#---- Overall ---------------
#primaryDatanode=datanode1 # Primary Node.
# At present, xc has a priblem to issue ALTER NODE against the primay node. Until it is fixed, the test will be done
# without this feature.
primaryDatanode=datanode1 # Primary Node.
datanodeNames=(datanode1 datanode2)
datanodePorts=(5433 5433) # Master ports
datanodePoolerPorts=(20005 20005) # Master pooler ports
datanodePgHbaEntries=(0.0.0.0/0) # Assumes that all the coordinator (master/slave) accepts
# the same connection
# This list sets up pg_hba.conf for $pgxcOwner user.
# If you'd like to setup other entries, supply them
# through extra configuration files specified below.
# Note: The above parameter is extracted as "host all all 0.0.0.0/0 trust". If you don't want
# such setups, specify the value () to this variable and suplly what you want using datanodeExtraPgHba
# and/or datanodeSpecificExtraPgHba variables.
#datanodePgHbaEntries=(::1/128) # Same as above but for IPv6 addresses
#---- Master ----------------
datanodeMasterServers=(server2 server3) # none means this master is not available.
# This means that there should be the master but is down.
# The cluster is not operational until the master is
# recovered and ready to run.
datanodeMasterDirs=($datanodeMasterDir/dn1 $datanodeMasterDir/dn2)
datanodeMaxWalSender=10 # max_wal_senders: needed to configure slave. If zero value is
# specified, it is expected this parameter is explicitly supplied
# by external configuration files.
# If you don't configure slaves, leave this value zero.
datanodeMaxWALSenders=($datanodeMaxWalSender $datanodeMaxWalSender)
# max_wal_senders configuration for each datanode
#---- Slave -----------------
datanodeSlave=y # Specify y if you configure at least one coordiantor slave. Otherwise, the following
# configuration parameters will be set to empty values.
# If no effective server names are found (that is, every servers are specified as none),
# then datanodeSlave value will be set to n and all the following values will be set to
# empty values.
datanodeSlaveServers=(server4 server5) # value none means this slave is not available
datanodeSlavePorts=(15433 15433) # value none means this slave is not available
datanodeSlavePoolerPorts=(20015 20015) # value none means this slave is not available
datanodeSlaveSync=y # If datanode slave is connected in synchronized mode
datanodeSlaveDirs=($datanodeSlaveDir $datanodeSlaveDir)
datanodeArchLogDirs=( $datanodeArchLogDir $datanodeArchLogDir)
# ---- Configuration files ---
# You may supply your bash script to setup extra config lines and extra pg_hba.conf entries here.
# These files will go to corresponding files for the master.
# Or you may supply these files manually.
datanodeExtraConfig=none # Extra configuration file for datanodes. This file will be added to all the
# datanodes' postgresql.conf
datanodeSpecificExtraConfig=(none none)
datanodeExtraPgHba=none # Extra entry for pg_hba.conf. This file will be added to all the datanodes' postgresql.conf
datanodeSpecificExtraPgHba=(none none)
15.初始化集群
在GTM节点上以postgres用户执行:
初始化集群:
$ pgxc_ctl -c ~/pgxc_ctl/pgxc_ctl.conf init all
启动集群:
$ pgxc_ctl -c /home/postgres/pgxc_ctl/pgxc_ctl.conf start all
停止集群
$ pgxc_ctl -c /home/postgres/pgxc_ctl/pgxc_ctl.conf stop all
16.删除集群
在所有节点上以postgres用户执行:
$ rm /home/postgres/pgxl9.5/data/nodes