系统层面配置(所有节点都要配置)
1、安装计划
3台机器
192.168.0.111 mdw
192.168.0.112 sdw1
192.168.0.113 sdw2
2、系统/etc/hosts配置
配置/etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.0.80 dw-greenplum-1 mdw
192.168.0.81 dw-greenplum-2 sdw1
192.168.0.82 dw-greenplum-3 sdw2
3、修改主机名
[root@localhost ~]# vi /etc/hostname
mdw
各个主机配置上相应的主机名
4、关闭防火墙
centos7以上系统
[root@localhost ~]# systemctl disable firewalld
Removed symlink /etc/systemd/system/dbusorg.fedoraproject.FirewallD1.service.
Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.
centos7以下
service iptables stop 停止防火墙服务,重启电脑后仍然会开启
chkconfig iptables off 关闭防火墙服务开机启动,重启后生效
可以两个命令结合使用避免重启
5、关闭selinux
vi /etc/selinux/config
# disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of three two values:
# targeted - Targeted processes are protected
6、创建gpadmin用户
groupdel gpadmin
userdel gpadmin
groupadd -g 530 gpadmin
useradd -g 530 -u 530 -m -d /home/gpadmin -s /bin/bash gpadmin
对文件夹进行赋权,为gpadmin用户创建密码
chown -R gpadmin:gpadmin /home/gpadmin/
passwd gpadmin
[输入密码: gpadmin]
7、设置gpadmin用户环境
[gpadmin@bj-gp-node1 ~]$ cd /home/gpadmin
[gpadmin@bj-gp-node1 ~]$ vi .bashrc
[gpadmin@bj-gp-node1 ~]$ vi .bash_profile
.bashrc和.bash_profile最后都添加下面两行(两个文件其中一个即可)
source /usr/local/greenplum-db/greenplum_path.sh
export MASTER_DATA_DIRECTORY=/data/gpdata/master/gpseg-1
8、修改/etc/sysctl.conf
# Controls the maximum shared segment size, in bytes
kernel.shmmax = 68719476736 #根据系统最大内存大小来设置=内存大小*1024的三次方
# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 16777216 #内存页为4k每页,最大内存比上4的大小
kernel.shmmni = 4096
kernel.sem = 250 512000 100 2048
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_forward = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.conf.all.arp_filter = 1
net.ipv4.ip_local_port_range = 1025 65535
net.core.netdev_max_backlog = 10000
net.core.rmem_max = 2097152
net.core.wmem_max = 2097152
vm.overcommit_memory = 2
vm.overcommit_ratio = 90
9、配置/etc/security/limits.conf
添加以下内容
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072
下面这两行或许已存在,看一下文件末尾
* soft memlock unlimited
* hard memlock unlimited
10、设置预读块的值
# /sbin/blockdev --getra /dev/sda 查看预读块,默认大小为256
# /sbin/blockdev --setra 16384 /dev/sda 设置预读块
11、设置磁盘访问I/O调度策略
#echo deadline > /sys/block/sda/queue/scheduler 【临时生效】
#grubby --update-kernel=ALL --args="elevator=deadline" 重启生效
在cent6.*里面命令目测无用(red hat可行),需要手工去执行:
7里面 /boot/grub2/grub.cfg
6里面 vi /etc/grub.conf
在里面的Linux或者kernel里面都加上elevator=deadline
还需要在/etc/fstab末尾加上
LEBEL=/data /data xfs rw,noatime,inode64,allocsize=16m 1 1
重启后查看是否生效:
cat /sys/block/sda/queue/scheduler
12、配置免密
注:在root下和gpadmin下都得配置,这里例子207是主,251和252是计算节点
三台主机检查 ~/.ssh 文件夹没有则新建
ssh-keygen -t rsa 一路狂按回车,最终生成(id_rsa,id_rsa.pub两个文件).
把id_rsa.pub 复制到251,252两台node上去.
scp ~/.ssh/id_rsa.pub root@192.168.30.251:~/.ssh
scp ~/.ssh/id_rsa.pub root@192.168.30.252:~/.ssh
切换到251,252:
生成authorized_keys.
cat id_rsa.pub >> authorized_keys
然后把authorized_keys scp到207.
scp ~/.ssh/authorized_keys root@192.168.30.207:~/.ssh
然后把三台机器 .ssh/ 文件夹权限改为700,authorized_keys文件权限改为600(or 644).
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
正式安装数据库
1、上传并解压数据库安装文件
(master节点)也可以使用tar包安装
[yum install unzip net-tools perl -y]
# unzip greenplum-db-4.3.16.1-rhel5-x86_64.zip
2、安装软件(master节点)
# /bin/bash greenplum-db-4.3.16.1-rhel5-x86_64.bin
# ctrl c
安装过程中会显示以下内容,直接使用默认即可
安装之后的目录文件如下:
3、配置环境变量:
其环境变量在/usr/local/greenplum-db/greenplum_path.sh文件中也有,可以参考
打开/etc/profile对环境变量修改:
vim /etc/profile 添加以下文件:
GPHOME=/usr/local/greenplum-db-4.3.6.2(你安装软件的版本号对应的目录)
PATH=$GPHOME/bin:$GPHOME/ext/Python/bin:$PATH
export GPHOME
export PATH
也可以临时获取环境参数(master节点)
# source /usr/local/greenplum-db/greenplum_path.sh
###4、将安装好的安装目录分发到各个节点
#scp -r /usr/local/greenplum-5.2 @sdw1:/etc/local/
#scp -r /usr/local/greenplum-5.2 @sdw2:/etc/local/
为各个节点的安装目录建立一个软连接,方便今后升级
#ln -s /usr/local/greenplum-5.2 greenplum
赋予这两个目录gpadmin的权限
chown -R gpadmin:gpadmin /usr/local/greenplum-5.2
chown -R gpadmin:gpadmin /usr/local/greenplum
也可以用gpssh工具来实现上面的操作
运行gpseginstall工具 (master节点)
# gpssh-exkeys -f all_hosts -u gpadmin -p gpadmin
# gpseginstall -f all_hosts -u gpadmin -p gpadmin
如果这步执行后没有对应目录存在,那么手工scp将greenplum-db-4.3.16.1拷贝过去,
然后创建软连接:ln -fs greenplum-db-4.3.16.1 greenplum-db
修改文件属主: chown -R gpadmin:gpadmin /usr/local/greenplum-db 等】
all_hosts是上个步骤创建的文件,安装过程中会让输入三台主机的密码,完成后提示成功,如下图:
5、创建all_hosts文件
包含所有主机名的文件
mdw
sdw1
sdw2
6、切换到gpadmin用户验证无密码登录
(1)切换用户[以上均为root用户]
$ su - gpadmin
(2)使用gpssh工具来测试无密码登录所有主机,结果如下图:
$ source /usr/local/greenplum-db/greenplum_path.sh
或者直接添加到.bash_profile里面
$ gpssh -f all_hosts -e ls -l $GPHOME
7、配置环境变量(master节点)
将". /usr/local/greenplum-db/greenplum_path.sh"添加到.bashrc文件最后,
然后将.bashrc文件发送到sdw1和sdw2,命令如下:
scp .bashrc sdw1:~
scp .bashrc sdw2:~
添加“export MASTER_DATA_DIRECTORY=/data/master/gpseg-1”到~/.bashrc文件尾,并同步到sdw1和sdw2节点
需要source一下.bashrc文件
$ scp .bashrc sdw1:~
$ scp .bashrc sdw2:~
两行都得加
8、创建data目录
(1) 创建Master数据存储区域
# mkdir -p ./data/master
(2) 改变目录的所有权【如果是gpadmin用户创建的话就不用再赋权】
# chown gpadmin:gpadmin ./data/master
(3) 创建一个包含所有segment主机名的文件seg_hosts,内容如下:
sdw1
sdw2
(4) 使用gpssh工具在所有segment主机上创建主数据和镜像数据目录,如果没有设置镜像可以不创建mirror目录
# gpssh -f seg_hosts -e 'mkdir -p /data1/primary /data2/primary' [或者gpdata/data1 data2]
# gpssh -f seg_hosts -e 'mkdir -p /data1/mirror /data2/mirror'
# gpssh -f seg_hosts -e 'chown -R gpadmin:gpadmin /data1/primary ....'
# gpssh -f seg_hosts -e 'chown -R gpadmin:gpadmin /data1/mirror ....'
9、同步系统时间【root用户】
(1) 在Master主机上编辑/etc/ntp.conf来设置如下内容:
server 127.127.1.0
(2) 在Segment主机上编辑/etc/ntp.conf
server mdw
(3) 在Master主机上,通过NTP守护进程同步系统时钟【gpadmin用户】
# gpssh -f all_hosts -v -e 'ntpd
重启ntpd服务 /etc/init.d/ntpd restart
查看ntp同步情况 ntpq -p
使ntpd服务重启服务器后也启动
chkconfig --level 0123456 ntpd on
10、验证操作系统设置
# gpcheck -f all_hosts -m mdw
注意运行结果是否有error信息,如果有处理掉错误信息后重新执行上面的命令检查,直至没有error信息
例如以下错误
20170215:00:22:13:004992 gpcheck:test01:root-[ERROR]:-GPCHECK_ERROR host(test02): on device (sr0) IO scheduler 'cfq' does not match expected value 'deadline'
20170215:00:22:13:004992 gpcheck:test01:root-[ERROR]:-GPCHECK_ERROR host(test01): on device (sr0) IO scheduler 'cfq' does not match expected value 'deadline'
#echo deadline > /sys/block/sr0/queue/scheduler这个错误就可解决
成功的信息如下:
11、创建Greenplum数据库配置文件
(1) 以gpadmin用户登录
# su - gpadmin
(2) 从模板中拷贝一份gpinitsystem_config文件
$ cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config /home/gpadmin/gpinitsystem_config
$ chmod 775 gpinitsystem_config
(3) 设置所有必须的参数【需要对应配置文件看一下,里面大多选项都有,改原有内容即可】
ARRAY_NAME="EMC Greenplum DW"
SEG_PREFIX=gpseg
PORT_BASE=40000
declare -a DATA_DIRECTORY=(/data1/primary /data2/primary /data1/primary /data2/primary)
MASTER_HOSTNAME=mdw
MASTER_DIRECTORY=/data/master
MASTER_PORT=5432
TRUSTED_SHELL=ssh
CHECK_POINT_SEGMENT=8
ENCODING=UNICODE
(4) 设置可选参数
MIRROR_PORT_BASE=50000
REPLICATION_PORT_BASE=41000
MIRROR_REPLICATION_PORT_BASE=51000
declare -a MIRROR_DATA_DIRECTORY=(/data1/mirror /data2/mirror /data1/mirror /data2/mirror)
(5) 设置优化参数:【可以在创建好之后再来设置】
shared_buffer=2GB #segment用作磁盘读写的内存缓冲区
gp_vmem_protect_limit=28GB #每个segment的分配内存
statement_mem = 27.79*1024*0.9/500 = 51.222528
sed -i 's/shared_buffers = 125MB/shared_buffers = 1GB/g' /data/*/*/postgresql.conf
sed -i 's/gp_vmem_protect_limit = 8192/gp_vmem_protect_limit = 30720/g' /data/*/*/postgresql.conf
sed -i 's/kernel.shmmax = 500000000/kernel.shmmax = 53687091200/g' /etc/sysctl.conf
sysctl -p
HINT: Valid units for this parameter are "kB", "MB", and "GB".
postgres=# alter database lxcos set statement_mem to 1.8GB; [2G可能会报错]
初始化之前退出重进一下远程,yum install net-tools
12、运行初始化工具初始化数据库
$ gpinitsystem -c gpinitsystem_config -h seg_hosts
成功之后,数据库便启动了,信息如下:
13、启动和停止数据库
测试是否能正常启动和关闭,命令如下
$ gpstart
$ gpstop
$ gpstate -s
安装完成后的检查
1、访问数据库
psql postgres
输入查询语句
select datname,datdba,encoding,datacl from pg_database;
显示信息如下
2、查看节点运行情况
主库下运行:
postgres=# select * from pg_stat_replication ;
procpid | usesysid | usename | application_name | client_addr | client_port | backend_start | state | sent_location | write_location | flush_loca
tion | replay_location | sync_priority | sync_state
---------+----------+---------+------------------+---------------+-------------+-------------------------------+-----------+---------------+----------------+-----------
-----+-----------------+---------------+------------
2270 | 10 | gpadmin | walreceiver | 192.168.0.110 | 18608 | 2017-09-24 22:29:46.733544-04 | streaming | 0/C000410 | 0/C000410 | 0/C000410
| 0/C000410 | 1 | sync
(1 row)
查看运行情况,是否有宕机:【d】
postgres=# select * from gp_segment_configuration ;
dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port | san_mounts
------+---------+------+----------------+------+--------+-------+----------+---------+------------------+------------
1 | -1 | p | p | s | u | 5432 | mdw | mdw | |
3 | 1 | p | p | s | u | 40000 | sdw1 | sdw1 | 41000 |
4 | 2 | p | p | s | u | 40000 | sdw2 | sdw2 | 41000 |
5 | 3 | p | p | s | u | 40000 | sdw3 | sdw3 | 41000 |
8 | 1 | m | m | s | u | 50000 | sdw2 | sdw2 | 51000 |
9 | 2 | m | m | s | u | 50000 | sdw3 | sdw3 | 51000 |
10 | 3 | m | m | s | u | 50000 | sdw4 | sdw4 | 51000 |
12 | -1 | m | m | s | u | 5432 | sby | sby | |
2 | 0 | m | p | s | d | 40000 | sby | sby | 41000 |
7 | 0 | p | m | c | u | 50000 | sdw1 | sdw1 | 51000 |
6 | 4 | p | p | c | u | 40000 | sdw4 | sdw4 | 41000 |
11 | 4 | m | m | s | d | 50000 | sby | sby | 51000 |
(12 rows)
3、造数据计算测试
psql postgres连接到数据库中
以下是在gp数据库中
create table test01(id int primary key, t text);
select * from gp_segment_configuration;
insert into test01 select seq, seq||'xxxxxxxxxxxxxxxxxxx' from generate_series(1, 1000000) as t(seq);
insert into test01 select seq, seq||'xxxxxxxxxxxxxxxxxxx' from generate_series(1, 1000000) as t(seq);
create table test02(i1 int, i2 int, t text) DISTRIBUTED BY (i2);
insert into test02 select seq, seq, seq||'xxxxxxxxxxxxxxxxxxx' from generate_series(1, 1000000) as t(seq);
再建了一张测试表test02
explain analyze select a.*,b.* from test01 a, test02 b where a.id=b.i1;
在psql中执行命令: \timing 然后就能把每条sql的执行时间打印出来。
explain analyze select a.*,b.* from test01 a, test02 b where a.id=b.i1;