-
Greenplum学习实践-【安装部署】-2、 5.10集群部署
第 1 章** 环境说明
1.1 官方文档
Greenplum官方安装说明:
https://gpdb.docs.pivotal.io/5160/install_guide/install_extensions.html
1.2 系统要求
Operating System | *• Red Hat Enterprise Linux 64-bit 7.x (See the following Note)**• Red Hat Enterprise Linux 64-bit 6.x**• SuSE Linux Enterprise Server 64-bit 12 SP2 and SP3 with kernel version greater than 4.4.73-5. (See**the following Note)**• SuSE Linux Enterprise Server 64-bit 11 SP4 (See the following Note)**• CentOS 64-bit 7.x**• CentOS 64-bit 6.x* |
---|---|
File Systems | • xfs required for data storage on SUSE Linux and Red Hat (ext3 supported for root file system) |
Minimum CPU | Pentium Pro compatible (P3/Athlon and above) |
Minimum Memory | 16 GB RAM per server |
---|---|
Disk Requirements | • 150MB per host for Greenplum installation • Approximately 300MB per segment instance for meta data • Appropriate free space for data with disks at no more than 70% capacity • High-speed, local storage |
Network Requirements | 10 Gigabit Ethernet within the array Dedicated, non-blocking switch NIC bonding is recommended when multiple interfaces are present |
Software and Utilities | zlib compression libraries bash shell GNU tars GNU zip GNU sed (used by Greenplum Database gpinitsystem) perl secure shell |
*Important**:* SSL is supported only on the Greenplum Database master host system.
1.3 搭建环境介绍
操作系统:CentOS Linux release 7.4.1708 (Core)
机器型号 | PowerEdge R330 *4 |
---|---|
CPU | Intel(R) Xeon(R) CPU E3-1220 v5 @ 3.00GHz *4物理核心(16cores) |
内存 | 16G |
磁盘 | 4T |
Swap | 32G |
第 2 章 安装架构
2.1 安装大致步骤
Perform the following tasks in order: *1.* Make sure your systems meet the System Requirements *2.* Setting the Greenplum Recommended OS Parameters *3.* (master only) Creating the Greenplum Database Administrative User Account *4.* (master only) Installing the Greenplum Database Software *5.* Installing and Configuring Greenplum on all Hosts *6.* (Optional) Installing Oracle Compatibility Functions *7.* (Optional) Installing Optional Modules *8.* (Optional) Installing Greenplum Database Extensions *9.* (Optional) Installing and Configuring the Greenplum Platform Extension Framework (PXF) *10.*Creating the Data Storage Areas *11.*Synchronizing System Clocks *12.*Next Steps
2.2 软件信息
greenplum-db-5.10.2-rhel6-x86_64.zip是mpp软件,
greenplum-cc-web-4.3.1-LINUX-x86_64.zip是web监控平台
2.3 架构说明
准备 4 台服务器, 1 台做 master, 3台都做存储共部署 6 个segment 及其镜像
IP | 主机名 | cpu | 内存 | 组件规划 |
---|---|---|---|---|
10.102.254.24 | sdw1 | 16 | 16 | 4*segment |
10.102.254.25 | sdw2 | 16 | 16 | 4*segment |
10.102.254.26 | sdw3 | 16 | 16 | 4*segment |
10.102.254.27 | mdw1 | 16 | 16 | master |
=架构目标==
mdw | sdw1 | sdw2 | sdw3 |
---|---|---|---|
master | seg0p seg1p seg5m seg4m | seg2p seg3p seg0m seg1m | seg4p seg5p seg2m seg3m smdw |
第 3 章 安装架构
3.1 关闭防火墙和selinux-所有节点
systemctl stop firewalld.service systemctl disable firewalld.service iptables -F vi /etc/selinux/config SELINUX=DISABLED # sestatus SELinuxstatus: disabled
3.2 修改host文件 -所有节点
-
1. vi /etc/hosts 2. 3. 10.102.254.24 sdw1 4. 10.102.254.25 sdw2 5. 10.102.254.26 sdw3 6. 10.102.254.27 mdw
3.2 修改主机名
hostnamectl set-hostname sdw1 hostnamectl status 状态 /etc/sysconfig/network
3.3 系统内核参数优化
vi /etc/sysctl.conf kernel.shmmax = 500000000 kernel.shmmni = 4096 kernel.shmall = 4000000000 kernel.sem = 250 512000 100 2048 kernel.sysrq = 1 kernel.core_uses_pid = 1 kernel.msgmnb = 65536 kernel.msgmax = 65536 kernel.msgmni = 2048 net.ipv4.tcp_syncookies = 1 net.ipv4.ip_forward = 0 net.ipv4.conf.default.accept_source_route = 0 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.conf.all.arp_filter = 1 net.ipv4.ip_local_port_range = 1025 65535 net.core.netdev_max_backlog = 10000 net.core.rmem_max = 2097152 net.core.wmem_max = 2097152 vm.overcommit_memory = 2 sysctl -p cat > /etc/sysctl.conf << EOF \# sysctl settings are defined through files in \# /usr/lib/sysctl.d/, /run/sysctl.d/, and /etc/sysctl.d/. \# \# Vendors settings live in /usr/lib/sysctl.d/. \# To override a whole file, create a new file with the same in \# /etc/sysctl.d/ and put new settings there. To override \# only specific settings, add a file with a lexically later \# name in /etc/sysctl.d/ and put new settings there. \# \# For more information, see sysctl.conf(5) and sysctl.d(5). kernel.shmmax = 500000000 kernel.shmmni = 4096 kernel.shmall = 4000000000 kernel.sem = 500 1024000 200 4096 kernel.sysrq = 1 kernel.core_uses_pid = 1 kernel.msgmnb = 65536 kernel.msgmax = 65536 kernel.msgmni = 2048 net.ipv4.tcp_syncookies = 1 net.ipv4.ip_forward = 0 net.ipv4.conf.default.accept_source_route = 0 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_max_syn_backlog = 4096 net.ipv4.conf.all.arp_filter = 1 net.ipv4.ip_local_port_range = 1025 65535 net.core.netdev_max_backlog = 10000 net.core.rmem_max = 2097152 net.core.wmem_max = 2097152 vm.overcommit_memory = 2 vm.swappiness = 1 kernel.pid_max = 655350 EOF sysctl -p
3.4 修改Linux最大限制
cat /etc/security/limits.conf vi /etc/security/limits.conf * soft nofile 65536 * hard nofile 65536 * soft nproc 131072 * hard nproc 131072 cat > /etc/security/limits.conf << EOF * soft nofile 65536 * hard nofile 65536 * soft nproc 131072 * hard nproc 131072 EOF
如何是rhel 6.x 请注意 /etc/security/limits.d/90-nproc.conf,详细情况请见文档
3.5 设备与IO-文件系统
设置XFS文件系统并挂载
EXT4是第四代扩展文件系统(英语:Fourth EXtended filesystem,缩写为ext4)是Linux系统下的日志文件系统,是ext3文件系统的后继版本。
Ext4的文件系统容量达到1EB,而文件容量则达到16TB,这是一个非常大的数字了。对一般的台式机和服务器而言,这可能并不重要,但对于大型磁盘阵列的用户而言,这就非常重要了。
XFS是一个64位文件系统,最大支持8EB减1字节的单个文件系统,实际部署时取决于宿主操作系统的最大块限制。对于一个32位Linux系统,文件和文件系统的大小会被限制在16TB。
二者各有特点,而性能表现基本上是差不多的。例如,谷歌公司就考虑将EXT2系统升级,最终确定为EXT4系统。谷歌公司表示,他们还考虑过XFS和JFS。结果显示,EXT4和XFS的表现类似,不过从EXT2升级到EXT4比升级到XFS容易。
例子:
cat >> /etc/fstab << EOF
/dev/sdb1 /greenplum xfs rw,nodev,noatime,inode64,allocsize=16m 0 0
EOF
rw,nodev,noatime,nobarrier,inode64
cat /etc/fstab
3.6 磁盘访问策略
Linux磁盘I/O调度器对磁盘的访问支持不同的策略,默认的为CFQ,GP建议设置为deadline
查看磁盘的I/O调度策略,看到默认的为[cfq]
The deadline scheduler option is recommended. To specify a scheduler until the next system reboot,
run the following:
# echo schedulername > /sys/block/devname/queue/scheduler
echo deadline > /sys/block/sda/queue/scheduler
linux 7
# grubby --update-kernel=ALL --args="elevator=deadline"
grubby --info=ALL
3.7 调整磁盘预读扇区数
fdisk -l
检查
/sbin/blockdev --getra /dev/sda
设置
/sbin/blockdev --setra 16384 /dev/sda
在参数文件 /etc/rc.d/rc.local 中增加
DELL : blockdev --setra 16384 /dev/sd* (红色部分为硬盘设备标识) HP:blockdev --setra 16384 /dev/cciss/c?d?*
3.8 禁用THP
On systems that use grub2 such as RHEL 7.x or CentOS 7.x, use the system utility grubby. This
command adds the parameter when run as root.
# grubby --update-kernel=ALL --args="transparent_hugepage=never"
After adding the parameter, reboot the system.
This cat command checks the state of THP. The output indicates that THP is disabled.
$ cat /sys/kernel/mm/*transparent_hugepage/enabled
always [never]
服务方式注册
# 创建 init.d 脚本
echo '#!/bin/sh
case $1 in
start)
if [ -d /sys/kernel/mm/transparent_hugepage ]; then
thp_path=/sys/kernel/mm/transparent_hugepage
elif [ -d /sys/kernel/mm/redhat_transparent_hugepage ]; then
thp_path=/sys/kernel/mm/redhat_transparent_hugepage
else
exit 0
fi
echo never > ${thp_path}/enabled
echo never > ${thp_path}/defrag
unset thp_path
;;
esac' > /etc/init.d/disable-transparent-hugepages
# 注册systemd文件
echo '[Unit]
Description=Disable Transparent Hugepages
After=multi-user.target
[Service]
ExecStart=/etc/init.d/disable-transparent-hugepages start
Type=simple
[Install]
WantedBy=multi-user.target' > /etc/systemd/system/disable-thp.service
# 磁盘预读扇区数
/sbin/blockdev --getra /dev/sdb1 # 查看大小
/sbin/blockdev --setra 65535 /dev/sdb1 # 设置大小
# 创建 init.d 脚本
echo '#!/bin/sh
device_name=/dev/sdb1
case $1 in
start)
if mount | grep "^${device_name}" > /dev/null
;then
/sbin/blockdev --setra 65535 ${device_name}
else
exit 0
fi
unset device_name
;;
esac' > /etc/init.d/blockdev-setra-sdb
# 注册systemd文件
echo '[Unit]
Description=Blocdev --setra N
After=multi-user.target
[Service]
ExecStart=/etc/init.d/blockdev-setra-sdb start
Type=simple
[Install]
WantedBy=multi-user.target' > /etc/systemd/system/blockdev-setra-sdb.service
# 授权并设置开机启动
chmod 755 /etc/init.d/disable-transparent-hugepages
chmod 755 /etc/init.d/blockdev-setra-sdb
chmod 755 /etc/systemd/system/disable-thp.service
chmod 755 /etc/systemd/system/blockdev-setra-sdb.service
systemctl enable disable-thp blockdev-setra-sdb
3.9 Disable IPC object removal for RHEL 7 or CentOS 7
Set this parameter in /etc/systemd/logind.conf on the Greenplum
Database host systems.
RemoveIPC=no
The setting takes effect after restarting the systemd-login service or rebooting the system. To
restart the service, run this command as the root user.
service systemd-logind restart
cat /etc/systemd/logind.conf
3.10 时间同步
/etc/chrony.conf
systemctl status chronyd.service --查看状态
systemctl start chronyd.service --启动
systemctl enable chronyd.service --使其开机自启
systemctl status chronyd.service
server 10.1.3.1 prefer
查看时间同步源
chronyc sources -v
chronyc sourcestats -v
3.11 控制ssh连接数
/etc/ssh/sshd_config
MaxStartups 10:30:200
systemctl restart sshd.service
3.12 系统依赖包
yum -y install epel-release
yum -y install wget cmake3 git gcc gcc-c++ bison flex libedit-devel zlib zlib-devel perl-devel perl-ExtUtils-Embed python-devel libevent libevent-devel libxml2 libxml2-devel libcurl libcurl-devel bzip2 bzip2-devel net-tools libffi-devel openssl-devel
第 4章 安装软件
4.1 创建用户和组
master only
# groupadd gpadmin # useradd gpadmin -g gpadmin # passwd gpadmin New password: <changeme> Retype new password: <changeme> echo gpadmin | passwd gpadmin --stdin
4.2 root用户解压缩和安装
./greenplum-db-5.10.2-rhel6-x86_64.bin I HAVE READ AND AGREE TO THE TERMS OF THE ABOVE PIVOTAL SOFTWARE LICENSE AGREEMENT. ******************************************************************************** Do you accept the Pivotal Database license agreement? [yes|no] ******************************************************************************** yes ******************************************************************************** Provide the installation path for Greenplum Database or press ENTER to accept the default installation path: /usr/local/greenplum-db-5.10.2 ******************************************************************************** ******************************************************************************** Install Greenplum Database into /usr/local/greenplum-db-5.10.2? [yes|no] ******************************************************************************** yes ******************************************************************************** /usr/local/greenplum-db-5.10.2 does not exist. Create /usr/local/greenplum-db-5.10.2 ? [yes|no] (Selecting no will exit the installer) ******************************************************************************** 安装完成后授权 \# chown -R gpadmin /usr/local/greenplum*(在创建gpadmin后执行) \# chgrp -R gpadmin /usr/local/greenplum*(在创建gpadmin后执行)
4.3 编辑环境变量
cat >> .bashrc << EOF export MASTER_DATA_DIRECTORY=/greenplum/gpdata/master/gpseg-1 source /usr/local/greenplum-db/greenplum_path.sh EOF source .bashrc cat >> /home/gpadmin/.bash_profile <<EOF export MASTER_DATA_DIRECTORY=/greenplum/gpdata/master/gpseg-1 source /usr/local/greenplum-db/greenplum_path.sh export PGPORT=5432 export PGDATABASE=archdata EOF source /home/gpadmin/.bash_profile
4.4 进行文件配置
切换root
source /usr/local/greenplum-db/greenplum_path.sh
------只在mdw,smdw执行
mkdir /home/gpadmin/gpconfig
chown -R gpadmin:gpadmin /home/gpadmin/gpconfig
------只在mdw,smdw执行
cat >> /home/gpadmin/gpconfig/all_host <<EOF
mdw
sdw1
sdw2
sdw3
EOF
------只在mdw,smdw执行
cat >> /home/gpadmin/gpconfig/all_segment <<EOF
sdw1
sdw2
sdw3
EOF
chown -R gpadmin:gpadmin /home/gpadmin/gpconfig/all_host
chown -R gpadmin:gpadmin /home/gpadmin/gpconfig/all_segment
4.5 设置主机免密码登陆 -
source /usr/local/greenplum-db/greenplum_path.sh
/usr/local/greenplum-db/bin/gpssh-exkeys -f /home/gpadmin/gpconfig/all_host
4.6 确认检查主机连接状态
gpssh -f /home/gpadmin/gpconfig/all_host -e "ls -l"
4.7 批量创建其他节点的用户
gpssh -f /home/gpadmin/gpconfig/all_segment
groupadd gpadmin
useradd gpadmin -g gpadmin
passwd gpadmin
echo gpadmin | passwd gpadmin --stdin
4.8 gpadmin用户-互信
source /usr/local/greenplum-db/greenplum_path.sh
/usr/local/greenplum-db/bin/gpssh-exkeys -f /home/gpadmin/gpconfig/all_host
gpssh -f /home/gpadmin/gpconfig/all_host -e "ls -l"
4.9 检查时间同步
gpssh -f /home/gpadmin/gpconfig/all_host -e "date"
4.10 分发所有seg节点软件
root执行
source /usr/local/greenplum-db/greenplum_path.sh
gpseginstall -f /home/gpadmin/gpconfig/all_host -u gpadmin -p gpadmin
4.11 检查安装情况
o Log in as the gpadmin user and source
• source /usr/local/greenplum-db/greenplum_path.sh
o Use the gpssh utility to see if you can login to all hosts without a password prompt
•
4.12 创建相关目录(root用户)
mkdir -p /greenplum/gpdata/master
chown gpadmin:gpadmin /greenplum/gpdata/master
gpssh -f /home/gpadmin/gpconfig/all_segment -e 'mkdir -p /greenplum/gpdata/primary1'
gpssh -f /home/gpadmin/gpconfig/all_segment -e 'mkdir -p /greenplum/gpdata/primary2'
gpssh -f /home/gpadmin/gpconfig/all_segment -e 'chown -R gpadmin:gpadmin /greenplum/gpdata'
gpssh -f /home/gpadmin/gpconfig/all_segment -e 'mkdir -p /greenplum/gpdata/mirror1'
gpssh -f /home/gpadmin/gpconfig/all_segment -e 'mkdir -p /greenplum/gpdata/mirror2'
gpssh -f /home/gpadmin/gpconfig/all_segment -e 'chown -R gpadmin:gpadmin /greenplum/gpdata'
or 批量创建
gpssh -f /home/gpadmin/gpconfig/all_segment -e 'mkdir -p /greenplum/gpdata/primary{1..2}'
gpssh -f /home/gpadmin/gpconfig/all_segment -e 'chown -R gpadmin:gpadmin /greenplum/gpdata'
4.13 验证系统
检查系统参数和测试性能
检查命令:gpcheck -f host_file -m mdw -ssmdw
Validating Hardware Performance
o gpcheckperf can be used to identify hardware and system-level issues on the machines in your Greenplum
Database array.
o Network Performance (gpnetbench*)
• gpcheckperf -f hostfile_gpchecknet_ic1 -r N -d /tmp > subnet1.out
o Disk I/O Performance (dd test) & Memory Bandwidth (stream test)
• gpcheckperf -f hostfile_gpcheckperf -r ds -D -d /data/primary -d /data/mirror
4.14 验证OS配置
source /usr/local/greenplum-db/greenplum_path.sh
gpcheck -f /home/gpadmin/gpconfig/all_host -m mdw
验证硬件性能--这个需要确认(网络和IO)
gpcheckperf -f /home/gpadmin/gpconfig/all_host -r N -d /tmp > checknetwork.out [root@mdw greenplum-db]# cat checknetwork.out /usr/local/greenplum-db/./bin/gpcheckperf -f /home/gpadmin/gpconfig/all_host -r N -d /tmp \------------------- -- NETPERF TEST \------------------- ==================== == RESULT ==================== Netperf bisection bandwidth test mdw -> sdw1 = 112.340000 sdw2 -> sdw3 = 112.340000 sdw1 -> mdw = 112.330000 sdw3 -> sdw2 = 112.330000 Summary: sum = 449.34 MB/sec min = 112.33 MB/sec max = 112.34 MB/sec avg = 112.33 MB/sec median = 112.34 MB/sec gpcheckperf -f /home/gpadmin/gpconfig/all_host -r ds -D -d /greenplum/gpdata/primary1 -d /greenplum/gpdata/mirror1 > checkDISKIO.out [root@mdw greenplum-db]# gpcheckperf -f /home/gpadmin/gpconfig/all_host -r ds -D -d /greenplum/gpdata/primary1 -d /greenplum/gpdata/mirror1 /usr/local/greenplum-db/./bin/gpcheckperf -f /home/gpadmin/gpconfig/all_host -r ds -D -d /greenplum/gpdata/primary1 -d /greenplum/gpdata/mirror1 \-------------------- -- DISK WRITE TEST \-------------------- \-------------------- -- DISK READ TEST \-------------------- \-------------------- -- STREAM TEST \-------------------- ==================== == RESULT ==================== disk write avg time (sec): 20.88 disk write tot bytes: 132920115200 disk write tot bandwidth (MB/s): 6074.65 disk write min bandwidth (MB/s): 1476.04 [ mdw] disk write max bandwidth (MB/s): 1551.18 [sdw3] -- per host bandwidth -- disk write bandwidth (MB/s): 1476.04 [ mdw] disk write bandwidth (MB/s): 1537.63 [sdw1] disk write bandwidth (MB/s): 1509.80 [sdw2] disk write bandwidth (MB/s): 1551.18 [sdw3] disk read avg time (sec): 59.80 disk read tot bytes: 132920115200 disk read tot bandwidth (MB/s): 2175.57 disk read min bandwidth (MB/s): 454.54 [sdw2] disk read max bandwidth (MB/s): 700.04 [sdw1] -- per host bandwidth -- disk read bandwidth (MB/s): 520.03 [ mdw] disk read bandwidth (MB/s): 700.04 [sdw1] disk read bandwidth (MB/s): 454.54 [sdw2] disk read bandwidth (MB/s): 500.96 [sdw3] stream tot bandwidth (MB/s): 49348.52 stream min bandwidth (MB/s): 12297.76 [ mdw] stream max bandwidth (MB/s): 12388.57 [sdw2] -- per host bandwidth -- stream bandwidth (MB/s): 12297.76 [ mdw] stream bandwidth (MB/s): 12321.47 [sdw1] stream bandwidth (MB/s): 12388.57 [sdw2] stream bandwidth (MB/s): 12340.73 [sdw3]
第 5 章 初始化database
-
copy配置文件
cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config /home/gpadmin/gpconfig/gpinitsystem_config
-
设置初始化系统文件列表
cat >> /home/gpadmin/gpconfig/hostfile_gpinitsystem <<EOF sdw1 sdw2 sdw3 EOF chown -R gpadmin:gpadmin /home/gpadmin/gpconfig/gpinitsystem_config chown -R gpadmin:gpadmin /home/gpadmin/gpconfig/hostfile_gpinitsystem
-
调整参数
ARRAY_NAME="EMC Greenplum DW" PORT_BASE=40000 SEG_PREFIX=gpseg declare -a DATA_DIRECTORY=(/greenplum/gpdata/primary1 /greenplum/gpdata/primary2) MASTER_HOSTNAME=mdw MASTER_DIRECTORY=/greenplum/gpdata/master MASTER_PORT=5432 TRUSTED_SHELL=ssh CHECK_POINT_SEGMENTS=8 ENCODING=UNICODE MIRROR_PORT_BASE=50000 REPLICATION_PORT_BASE=41000 MIRROR_REPLICATION_PORT_BASE=51000 declare -a MIRROR_DATA_DIRECTORY=(/greenplum/gpdata/mirror1 /greenplum/gpdata/mirror2) vim /home/gpadmin/gpconfig/gpinitsystem_config 修改如下 [gpadmin@mdw ~]$ cat /home/gpadmin/gpconfig/gpinitsystem_config \# FILE NAME: gpinitsystem_config \# Configuration file needed by the gpinitsystem \################################################ \#### REQUIRED PARAMETERS \################################################ \#### Name of this Greenplum system enclosed in quotes. ARRAY_NAME="Greenplum Data Platform" \#### Naming convention for utility-generated data directories. SEG_PREFIX=gpseg \#### Base number by which primary segment port numbers \#### are calculated. PORT_BASE=40000 \#### File system location(s) where primary segment data directories \#### will be created. The number of locations in the list dictate \#### the number of primary segments that will get created per \#### physical host (if multiple addresses for a host are listed in \#### the hostfile, the number of segments will be spread evenly across \#### the specified interface addresses). declare -a DATA_DIRECTORY=(/greenplum/gpdata/primary1 /greenplum/gpdata/primary2) \#### OS-configured hostname or IP address of the master host. MASTER_HOSTNAME=mdw \#### File system location where the master data directory \#### will be created. MASTER_DIRECTORY=/greenplum/gpdata/master \#### Port number for the master instance. MASTER_PORT=5432 \#### Shell utility used to connect to remote hosts. TRUSTED_SHELL=ssh \#### Maximum log file segments between automatic WAL checkpoints. CHECK_POINT_SEGMENTS=8 \#### Default server-side character set encoding. ENCODING=UNICODE \################################################ \#### OPTIONAL MIRROR PARAMETERS \################################################ \#### Base number by which mirror segment port numbers \#### are calculated. MIRROR_PORT_BASE=50000 \#### Base number by which primary file replication port \#### numbers are calculated. REPLICATION_PORT_BASE=41000 \#### Base number by which mirror file replication port \#### numbers are calculated. MIRROR_REPLICATION_PORT_BASE=51000 \#### File system location(s) where mirror segment data directories \#### will be created. The number of mirror locations must equal the \#### number of primary locations as specified in the \#### DATA_DIRECTORY parameter. declare -a MIRROR_DATA_DIRECTORY=(/greenplum/gpdata/mirror1 /greenplum/gpdata/mirror2) \################################################ \#### OTHER OPTIONAL PARAMETERS \################################################ \#### Create a database of this name after initialization. \#DATABASE_NAME=name_of_database \#### Specify the location of the host address file here instead of \#### with the the -h option of gpinitsystem. \#MACHINE_LIST_FILE=/home/gpadmin/gpconfigs/hostfile_gpinitsystem
-
初始化database
gpadmin用户
gpinitsystem -c /home/gpadmin/gpconfig/gpinitsystem_config -h /home/gpadmin/gpconfig/hostfile_gpinitsystem 如何添加master standby和修改mirror分布策略spread mirror gpinitsystem -c gpconfigs/gpinitsystem_config -h gpconfigs/hostfile_gpinitsystem -s standby_master_hostname -S (with a standby master and a spread mirror configuration)
-
检查环境变量
MASTER_DATA_DIRECTORY=/data/master/gpseg-1 GPHOME=/usr/local/greenplum-db PGDATABASE=gpadmin [gpadmin@mdw ~]$ cat .bash_profile \# .bash_profile \# Get the aliases and functions if [ -f ~/.bashrc ]; then • . ~/.bashrc fi \# User specific environment and startup programs PATH=$PATH:$HOME/.local/bin:$HOME/bin export PATH export MASTER_DATA_DIRECTORY=/greenplum/gpdata/master/gpseg-1 source /usr/local/greenplum-db/greenplum_path.sh export PGPORT=5432 export PGDATABASE=archdata
第 6 章 连接测试
6.1设置gpadmin远程访问密码
psql postgres gpadmin
alter user gpadmin encrypted password 'gpadmin';
\q
6.2查询测试
psql -hmdw -p 5432 -d postgres -U gpadmin -c 'select dfhostname, dfspace,dfdevice from gp_toolkit.gp_disk_free order by dfhostname;'
[gpadmin@mdw ~]$ psql -hmdw -p 5432 -d postgres -U gpadmin -c 'select dfhostname, dfspace,dfdevice from gp_toolkit.gp_disk_free order by dfhostname;'
dfhostname | dfspace | dfdevice
------------+----------+----------------------------
sdw1 | 98708120 | /dev/mapper/VolGroup-root
sdw1 | 98708120 | /dev/mapper/VolGroup-root
sdw2 | 98705600 | /dev/mapper/VolGroup-root
sdw2 | 98705600 | /dev/mapper/VolGroup-root
sdw3 | 98705144 | /dev/mapper/VolGroup-root
sdw3 | 98705144 | /dev/mapper/VolGroup-root
(6 rows)
psql -h hmdw -p 5432 -d postgres -U gpadmin -c '\l+'
[gpadmin@mdw ~]$ psql -h mdw -p 5432 -d postgres -U gpadmin -c '\l+'
List of databases
Name | Owner | Encoding | Access privileges | Size | Tablespace | Description
-----------+---------+----------+---------------------+-------+------------+---------------------------
postgres | gpadmin | UTF8 | | 73 MB | pg_default |
template0 | gpadmin | UTF8 | =c/gpadmin | 72 MB | pg_default |
: gpadmin=CTc/gpadmin
template1 | gpadmin | UTF8 | =c/gpadmin | 73 MB | pg_default | default template database
: gpadmin=CTc/gpadmin
(3 rows)
[gpadmin@mdw ~]$
第 7 章 常用命令
-
数据库启动:gpstart 常用可参数: -a : 直接启动,不提示终端用户输入确认 -m:只启动master 实例,主要在故障处理时使用
-
数据库停止:gpstop: 常用可参数:-a:直接停止,不提示终端用户输入确认 -m:只停止master 实例,与gpstart –m 对应使用 -M fast:停止数据库,中断所有数据库连接,回滚正在运 行的事务 -u:不停止数据库,只加载pg_hba.conf 和postgresql.conf中运行时参数,当改动参数配置时候使用。 评:-a用在shell里,最多用的还是-M fast。
-
查看实例配置和状态 select * from gp_configuration order by 1 ; 主要字段说明: Content:该字段相等的两个实例,是一对P(primary instance)和M(mirror Instance) Isprimary:实例是否作为primary instance 运行 Valid:实例是否有效,如处于false 状态,则说明该实例已经down 掉。 Port:实例运行的端口 Datadir:实例对应的数据目录
-
gpstate :显示Greenplum数据库运行状态,详细配置等信息 常用可参数:-c:primary instance 和 mirror instance 的对应关系 -m:只列出mirror 实例的状态和配置信息 -f:显示standby master 的详细信息 -Q:显示状态综合信息 该命令默认列出数据库运行状态汇总信息,常用于日常巡检。 评:最开始由于网卡驱动的问题,做了mirror后,segment经常down掉,用-Q参数查询综合信息还是比较有用的。
-
查看用户会话和提交的查询等信息 select * from pg_stat_activity 该表能查看到当前数据库连接的IP 地址,用户名,提交的查询等。另外也可以在master 主机上查看进程,对每个客户端连接,master 都会创建一个进程。ps -ef |grep -i postgres |grep -i con 评:常用的命令,我经常用这个查看数据库死在那个sql上了。
-
查看数据库、表占用空间 select pg_size_pretty(pg_relation_size('schema.tablename')); select pg_size_pretty(pg_database_size('databasename')); 必须在数据库所对应的存储系统里,至少保留30%的自由空间,日常巡检,要检查存储空间的剩余容量。 评:可以查看任何数据库对象的占用空间,pg_size_pretty可以显示如mb之类的易读数据,另外,可以与pg_tables,pg_indexes之类的系统表链接,统计出各类关于数据库对象的空间信息。
-
收集统计信息,回收空间 定期使用Vacuum analyze tablename 回收垃圾和收集统计信息,尤其在大数据量删除,导入以后,非常重要 评:这个说的不全面,vacuum分两种,一种是analize,优化查询计划的,还有一种是清理垃圾数据,postres删除工作,并不是真正删除数据,而是在被删除的数据上,坐一个标记,只有执行vacuum时,才会真正的物理删除,这个非常重用,有些经常更新的表,各种查询、更新效率会越来越慢,这个多是因为没有做vacuum的原因。
-
查看数据分布情况 两种方式: l Select gp_segment_id,count(*) from tablename group by 1 ; l 在命令运行:gpskew -t public.ate -a postgres 如数据分布不均匀,将发挥不了并行计算的优势,严重影响性能。 评:非常用,gp要保障数据分布均匀。
-
实例恢复:gprecoverseg 通过gpstate 或gp_configuration 发现有实例down 掉以后,使用该命令进行回复。
-
查看锁信息: SELECT locktype, database, c.relname, l.relation, l.transactionid, l.transaction, l.pid, l.mode, l.granted, a.current_query FROM pg_locks l, pg_class c, pg_stat_activity a WHERE l.relation=c.oid AND l.pid=a.procpid ORDER BY c.relname; 主要字段说明: relname: 表名 locktype、mode 标识了锁的类型
-
explain:在提交大的查询之前,使用explain分析执行计划、发现潜在优化机会,避免将系统资源熬尽。 评:少写了个analyze,如果只是explain,统计出来的执行时间,是非常坑爹的,如果希望获得准确的执行时间,必须加上analyze。
-
数据库备份 gp_dump 常用参数:-s: 只导出对象定义(表结构,函数等) -n: 只导出某个schema gp_dump 默认在master 的data 目录上产生这些文件: gp_catalog_1__ :关于数据库系统配置的备份文件 gp_cdatabase_1:数据库创建语句的备份文件 gp_dump_1:数据库对象ddl语句 gp_dump_status_1:备份操作的日志 在每个segment instance 上的data目录上产生的文件: gp_dump_0:用户数据备份文件 gp_dump_status_0:备份日志
-
数据库恢复 gp_restore 必参数:--gp-k=key :key 为gp_dump 导出来的文件的后缀时间戳 -d dbname :将备份文件恢复到dbname
-
登陆与退出Greenplum #正常登陆 psql gpdb psql -d gpdb -h gphostm -p 5432 -U gpadmin #使用utility方式 PGOPTIONS="-c gp_session_role=utility" psql -h -d dbname hostname -p port #退出 在psql命令行执行\q
-
参数查询 psql -c 'SHOW ALL;' -d gpdb gpconfig --show max_connections 评:这个用,可以管道给grep。
-
创建数据库 createdb -h localhost -p 5432 dhdw
-
创建GP文件系统 # 文件系统名 gpfsdw # 子节点,视segment数创建目录 mkdir -p /gpfsdw/seg1 mkdir -p /gpfsdw/seg2 chown -R gpadmin:gpadmin /gpfsdw # 主节点 mkdir -p /gpfsdw/master chown -R gpadmin:gpadmin /gpfsdw gpfilespace -o gpfilespace_config gpfilespace -c gpfilespace_config
-
创建GP表空间 psql gpdb create tablespace TBS_DW_DATA filespace gpfsdw; SET default_tablespace = TBS_DW_DATA;
-
删除GP数据库 gpdeletesystem -d /gpmaster/gpseg-1 -f
-
查看segment配置 select * from gp_segment_configuration;
-
文件系统 select * from pg_filespace_entry;
-
磁盘、数据库空间 SELECT * FROM gp_toolkit.gp_disk_free ORDER BY dfsegment; SELECT * FROM gp_toolkit.gp_size_of_database ORDER BY sodddatname; 日志 SELECT * FROM gp_toolkit.gp_log_master_ext; SELECT * FROM gp_toolkit.gp_log_segment_ext;
-
表数据分布 SELECT gp_segment_id, count(*) FROM GROUP BY gp_segment_id;
-
表占用空间 SELECT relname as name, sotdsize/1024/1024 as size_MB, sotdtoastsize as toast, sotdadditionalsize as other FROM gp_toolkit.gp_size_of_table_disk as sotd, pg_class WHERE sotd.sotdoid = pg_class.oid ORDER BY relname;
-
索引占用空间 SELECT soisize/1024/1024 as size_MB, relname as indexname FROM pg_class, gp_toolkit.gp_size_of_index WHERE pg_class.oid = gp_size_of_index.soioid AND pg_class.relkind='i';
-
OBJECT的操作统计 SELECT schemaname as schema, objname as table, usename as role, actionname as action, subtype as type, statime as time FROM pg_stat_operations WHERE objname = '';
-
锁 SELECT locktype, database, c.relname, l.relation, l.transactionid, l.transaction, l.pid, l.mode, l.granted, a.current_query FROM pg_locks l, pg_class c, pg_stat_activity a WHERE l.relation=c.oid AND l.pid=a.procpid ORDER BY c.relname;
-
队列 SELECT * FROM pg_resqueue_status;
-
gpfdist外部表 # 启动服务 gpfdist -d /share/txt -p 8081 –l /share/txt/gpfdist.log & # 创建外部表,分隔符为’/t’ drop EXTERNAL TABLE TD_APP_LOG_BUYER; CREATE EXTERNAL TABLE TD_APP_LOG_BUYER ( IP text, ACCESSTIME text, REQMETHOD text, URL text, STATUSCODE int, REF text, name text, VID text) LOCATION ('gpfdist://gphostm:8081/xxx.txt') FORMAT 'TEXT' (DELIMITER E'/t' FILL MISSING FIELDS) SEGMENT REJECT LIMIT 1 percent;
-
创建普通表 create table test select * from TD_APP_LOG_BUYER; # 索引 # CREATE INDEX idx_test ON test USING bitmap (ip); # 查询数据 select ip , count() from test group by ip order by count(); gpload # 创建控制文件 # 加载数据 gpload -f my_load.yml copy COPY country FROM '/data/gpdb/country_data' WITH DELIMITER '|' LOG ERRORS INTO err_country SEGMENT REJECT LIMIT 10 ROWS;
-
gpfdist外部表 创建可写外部表 CREATE WRITABLE EXTERNAL TABLE unload_expenses ( LIKE expenses ) LOCATION ('gpfdist://etlhost-1:8081/expenses1.out', 'gpfdist://etlhost-2:8081/expenses2.out') FORMAT 'TEXT' (DELIMITER ',') DISTRIBUTED BY (exp_id); # 写权限 GRANT INSERT ON writable_ext_table TO ; # 写数据 INSERT INTO writable_ext_table SELECT * FROM regular_table; copy COPY (SELECT * FROM country WHERE country_name LIKE 'A%') TO '/home/gpadmin/a_list_countries.out'; 执行sql文件 psql gpdbname –f yoursqlfile.sql 或者psql登陆后执行 \i yoursqlfile.sql