greenplum—1、部署安装

1、概念简介

官网地址:https://greenplum.org/

2、部署安装

github下载地址:https://github.com/greenplum-db/gpdb/releases

按照linux操作系统版本下载对于安装包:open-source-greenplum-db-6.21.0-rhel7-x86_64.rpm

部署指导:https://docs.vmware.com/en/VMware-Tanzu-Greenplum/6/greenplum-database/GUID-install_guide-install_guide.html
参考指令:https://www.cnblogs.com/pl-boke/p/9852383.html

2.1、架构设计

greenplum集群模式下,按照 M-S主从模式分布,其中管理节点分为 master节点和 standy节点;数据节点即segment节点,分为primary节点和 mirror节点;
其中 standy节点相当于master节点的备份,当master节点down机时提供服务;

参考文档:https://www.jianshu.com/p/1761a9a9acb6

mirror分为两种部署模式:
A:grouped模式:该模式下主机的mirror节点全部放在下一个主机上;
B:spread模式:该模式下主机的第1个mirror在下个主机,第二个mirror在次下个主机,第三mirror在次次下个主机,以此类推;

优缺点比较:
A:grouped模式:如果其中一台挂掉,那么拥有该主机mirror的机器负载加重一倍;在segment主机数非常多的情况下,至少允许两台主机挂掉;
B:spread模式:如果其中一台挂掉,那么拥有该主机mirror的机器负载均衡,没有grouped的压力大;在segment主机数非常多的情况下,只可以挂掉一台主机;

提醒:这里使用 spread模式时,slave节点数量要 大于等于 单台 节点的 primary seg 数量;

在这里插入图片描述
说明:该种镜像模式需要手动配置,在某些场景下会比较适合;

2.2、单节点

master/standy节点规划:
/datadir/master/primary

segment节点规划:
/datadir/data1/primary/gpseg0
/datadir/data1/mirror/gpseg0
/datadir/data2/primary/gpseg1
/datadir/data2/mirror/gpseg1
/datadir/data3/primary/gpseg2
/datadir/data3/mirror/gpseg2

添加如下内容到:/home/gpadmin/.bash_profile

# add for gp-config
source /usr/local/greenplum-db/greenplum_path.sh
export MASTER_DATA_DIRECTORY=/datadir/master/primary
export PGPORT=5432
export PGUSER=gpadmin
export PGDATABASE=postgres

gpinitsystem_config

ARRAY_NAME="Greenplum Data Platform"
SEG_PREFIX=gpseg
PORT_BASE=6000
declare -a DATA_DIRECTORY=(/dbdata/greenplum/data1/primary /dbdata/greenplum/data2/primary /dbdata/greenplum/data3/primary)
MASTER_HOSTNAME=VirtualServer
MASTER_DIRECTORY=/datadir/master/primary
MASTER_PORT=5432
TRUSTED_SHELL=ssh
CHECK_POINT_SEGMENTS=8
ENCODING=UNICODE
# 快速停止进群
gpstop -a

gpstop -M fast

  • 访问管理

Greenplum的IP白名单配置文件 pg_hba.conf;该文件在 master 节点和 slave节点都存在;但是master节点的文件实现外部访问集群的IP策略控制;slave节点的配置文件实现 master 到 slave节点的IP访问策略控制;

local、host 方式
trust、md5、等方式;

在这里插入代码片# PostgreSQL Client Authentication Configuration File
# ===================================================
#
# Refer to the "Client Authentication" section in the PostgreSQL
# documentation for a complete description of this file.  A short
# synopsis follows.
#
# This file controls: which hosts are allowed to connect, how clients
# are authenticated, which PostgreSQL user names they can use, which
# databases they can access.  Records take one of these forms:
#
# local      DATABASE  USER  METHOD  [OPTIONS]
# host       DATABASE  USER  CIDR-ADDRESS  METHOD  [OPTIONS]
# hostssl    DATABASE  USER  CIDR-ADDRESS  METHOD  [OPTIONS]
# hostnossl  DATABASE  USER  CIDR-ADDRESS  METHOD  [OPTIONS]
#
# (The uppercase items must be replaced by actual values.)
#
# The first field is the connection type: "local" is a Unix-domain
# socket, "host" is either a plain or SSL-encrypted TCP/IP socket,
# "hostssl" is an SSL-encrypted TCP/IP socket, and "hostnossl" is a
# plain TCP/IP socket.
#
# DATABASE can be "all", "sameuser", "samerole", a
# database name, or a comma-separated list thereof.
#
# USER can be "all", a user name, a group name prefixed with "+", or a
# comma-separated list thereof.  In both the DATABASE and USER fields
# you can also write a file name prefixed with "@" to include names
# from a separate file.
#
# CIDR-ADDRESS specifies the set of hosts the record matches.  It is
# made up of an IP address and a CIDR mask that is an integer (between
# 0 and 32 (IPv4) or 128 (IPv6) inclusive) that specifies the number
# of significant bits in the mask.  Alternatively, you can write an IP
# address and netmask in separate columns to specify the set of hosts.
# Instead of a CIDR-address, you can write "samehost" to match any of
# the server's own IP addresses, or "samenet" to match any address in
# any subnet that the server is directly connected to.
#
# METHOD can be "trust", "reject", "md5", "password", "gss", "sspi",
# "krb5", "ident", "pam", "ldap", "radius" or "cert".  Note that
# "password" sends passwords in clear text; "md5" is preferred since
# it sends encrypted passwords.
#
# OPTIONS are a set of options for the authentication in the format
# NAME=VALUE.  The available options depend on the different
# authentication methods -- refer to the "Client Authentication"
# section in the documentation for a list of which options are
# available for which authentication methods.
#
# Database and user names containing spaces, commas, quotes and other
# special characters must be quoted.  Quoting one of the keywords
# "all", "sameuser", "samerole" or "replication" makes the name lose
# its special character, and just match a database or username with
# that name.
#
# This file is read on server startup and when the postmaster receives
# a SIGHUP signal.  If you edit the file on a running system, you have
# to SIGHUP the postmaster for the changes to take effect.  You can
# use "pg_ctl reload" to do that.
# Put your actual configuration here
# ----------------------------------
#
# If you want to allow non-local connections, you need to add more
# "host" records.  In that case you will also need to make PostgreSQL
# listen on a non-local interface via the listen_addresses
# configuration parameter, or via the -i or -h command line switches.
# CAUTION: Configuring the system for local "trust" authentication allows
# any local user to connect as any PostgreSQL user, including the database
# superuser. If you do not trust all your local users, use another
# authentication method.
# TYPE  DATABASE    USER        CIDR-ADDRESS          METHOD
# "local" is for Unix domain socket connections only
# IPv4 local connections:
# IPv6 local connections:
local    all         gpadmin         ident
local    all         all             md5
host     all         all             0.0.0.0/0                  md5
#host     all         gpadmin         0.0.0.0/0                 trust
host     all         gpadmin         xx.xx.131.0/24            trust
host     all         gpadmin         xx.xx.132.252/32          trust
host     all         gpadmin         xx.xx.0.1/32              trust
host     all         gpadmin         xx.xx.132.253/32          trust
host     all         gpadmin         xx.xx.130.0/24            trust
host     all         gpadmin         ::1/128       trust
local    replication gpadmin         ident
host     replication gpadmin         samenet       trust

2.3、集群

2.4、参数配置

gp参数官方计算器:
https://greenplum.org/calculator/

修改/etc/sysctl.conf,添加如下设置:

kernel.shmall = 197951838 # 官方建议为物理内存一半,单位为页数:echo $(expr $(getconf _PHYS_PAGES) / 2)
kernel.shmmax = 810810728448 # 官方建议为物理内存一半,单位为字节:echo $(expr $(getconf _PHYS_PAGES) / 2 \* $(getconf PAGE_SIZE))
kernel.shmmni = 4096
vm.overcommit_memory = 2 # OOM相关参数设置,必须设置为2
vm.overcommit_ratio = 95 # OOM相关参数设置,设置多少百分比的内存供应用使用,剩余留给操作系统

net.ipv4.ip_local_port_range = 10000 65535 # 使用gpinitsystem初始化集群时,不得使用这个范围内的端口,或者使用net.ipv4.ip_local_reserved_ports设置保留端口
kernel.sem = 500 2048000 200 4096
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.conf.all.arp_filter = 1
net.core.netdev_max_backlog = 10000
net.core.rmem_max = 2097152
net.core.wmem_max = 2097152
vm.swappiness = 10
vm.zone_reclaim_mode = 0
vm.dirty_expire_centisecs = 500
vm.dirty_writeback_centisecs = 100
vm.dirty_background_ratio = 0 # See System Memory
vm.dirty_ratio = 0
vm.dirty_background_bytes = 1610612736 # 物理内存超过64G时设置为1.5G,如果小于64G,则使用vm.dirty_background_ratio = 3替换该行
vm.dirty_bytes = 4294967296 # 物理内存超过64G时设置为4G,如果小于64G,则使用vm.dirty_ratio = 10替换该行
  • 内存参数
show optimizer;
show gp_vmem_protect_limit; --每个segment的分配总内存,MB单位,单个设置为segment25G
/*
控制了每个段数据库为所有运行的查询分配的内存总量。如果查询需要的内存超过此值,则会失败。使用下面公式确定合适的值:

(swap + (RAM * vm.overcommit_ratio)) * .9 / number_of_Segments_per_server
例如,具有下面配置的段服务器:

8GB 交换空间
128GB 内存
vm.overcommit_ratio = 50
8 个段数据库
(8 + (128 * .5)) * .9 / 8 = 8 GB, 则设置gp_vmem_protect_limit为 8GB:
这个数值是通过网页计算出来的,
需要注意的是,连接的内存并没有单独计算
 */
show gp_resqueue_priority_cpucores_per_segment; --每个segment的分配线程数
show maintenance_work_mem;  --每个segment用于VACUUM,CREATE INDEX等操作的内存大小
--64MB
show shared_buffers;        --每个segment用作磁盘读写的内存缓冲区,提高该值可以减少数据库的磁盘IO。系统已自带
--1250MB
show effective_cache_size;  --每个segment能使用的缓存大小?? 设置稍大,优化器更倾向使用索引扫描而不是顺序扫描
--512MB
show work_mem;              --每个segment用作sort,hash操作的内存大小
--32MB
show statement_mem;         --单作业使用内存
/*
服务器配置参数 gp_statement_mem 控制段数据库上单个查询可以使用的内存总量。如果语句需要更多内存,则会溢出数据到磁盘。用下面公式确定合适的值:

(gp_vmem_protect_limit * .9) / max_expected_concurrent_queries
例如,如果并发度为40, gp_vmeme_protect_limit为8GB,则 gp_statement_mem 为:

(8192MB * .9) / 40 = 184MB,每个查询最多可以使用 184MB 内存,之后将溢出到磁盘。
 */
--125MB
show max_statement_mem;     --单作业最大使用内存
--2000MB

--默认参数测试
select count(*) from sjck.WD_NSR_C0 WHERE ZTNSRBZ = '1'; --2268648,cost   656ms

select count(*) from DZDZ.dzdz_fpxx_zzsfp;  --88875676 cost 1s 687ms

SELECT NSRSBH
FROM sjck.wd_nsr_c0 N
  LEFT JOIN DZDZ.dzdz_fpxx_zzsfp T ON N.NSRSBH = T.GFSBH  WHERE ZTNSRBZ = '1' LIMIT 1; --9s 937ms,作为对照
--ORACLE未开并发,cost 3分30秒

--改参数,就是加大内存
SET effective_cache_size TO '1GB';
set work_mem to '1000MB';
set optimizer to on;
show effective_cache_size
--改好参数后上述join 7s 953ms

说明:如果集群共4个节点,每个节点是 3个主 seg,则最大 max_acting_primary_segments只能达到4;因为当超过2个几点down时候,集群将不可用;

1、vmem异常

postgresql.util.PSOLException: ERROR Canceling guery because of high ymem usage

分析思路:
1)按照配置验证 gp_vmem_protect_limit 参数;
2)调整 vm.overcommit_ratio = 95 参数的配置值为 75;
3)查看 max_statement_mem、statement_mem 的参数配置;
4)查看 gp_workfile_limit_files_per_query 的参数配置;
5)优化器 optimizer 是否开启;
6)统计信息和对象的分布键以及数据分布情况;

=========================================== over ===========================================

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值