数据存储---GreenPlum实践

原创 2017年10月06日 22:14:35

       GreenPlum作为一个很好的开源的数据仓库,一直想体验一把,最近装了一个.关于安装,网上资料介绍也多了不少,至少越来越多,不过真正装一把才知道还是有各种坑的,不是那么一帆风顺的.

       Greenplum在线安装文档 http://gpdb.docs.pivotal.io/500/install_guide/prep_os_install_gpdb.html#topic8

       整个文档的确是相当长,而且还需要一点E文和Linux.我也没有完全按照上面说的来操作.下面记录下基本的操作和遇到的一些问题.

       1, 调整系统参数,具体的不贴了,文档上面也有,Centos6.5和7上面的操作还是有些区别的

  vi /etc/sysctl.conf
  vi /etc/security/limits.conf

       2, 防火墙和HOSTS配置,这些操作看似简单,但是最容易出错,一定要注意检查.

       3, 下载,解压,安装,具体步骤省略......

      gpadmin用户下

      vi ~/.bash_profile 

      source /usr/local/greenplum-db/greenplum_path.sh
      export MASTER_DATA_DIRECTORY=/data/master/gpseg-1

  或者 source /usr/local/greenplum-db/greenplum_path.sh

      指定hostfile_exkeys文件,里面包括所有的主机,这个文件放的位置可以随意,只要下面执行的能找到文件或者参数里面把路径加上就可以

  gpseginstall -f hostfile_exkeys

     确认安装情况

gpssh -f hostfile_exkeys -e ls -l $GPHOME

gpssh-exkeys -f hostfile_exkeys


MASTER上创建数据存储目录
mkdir /data/master
chown gpadmin /data/master

Segment上创建数据存储目录
gpssh -f hostfile_gpssh_segonly -e 'mkdir /data/primary'
gpssh -f hostfile_gpssh_segonly -e 'mkdir /data/mirror'
gpssh -f hostfile_gpssh_segonly -e 'chown gpadmin /data/primary'
gpssh -f hostfile_gpssh_segonly -e 'chown gpadmin /data/mirror'

     4, 修改配置文件

     cd /home/gpadmin/gpconfigs (没有就创建目录)

    vi gpinitsystem_config

ARRAY_NAME="EMC Greenplum DW"
SEG_PREFIX=gpseg
PORT_BASE=40000

declare -a DATA_DIRECTORY=(/data/primary)

MASTER_HOSTNAME=vm11
MASTER_DIRECTORY=/data/master
MASTER_PORT=5432

TRUSTED_SHELL=ssh
CHECK_POINT_SEGMENTS=8
ENCODING=UNICODEdeclare -a MIRROR_DATA_DIRECTORY=(/data/mirror)

DATABASE_NAME=gpexmp
MACHINE_LIST_FILE=/home/gpadmin/seg_hosts

    我把所有的注释都去掉了,这里面几个是一定要配置的,需要注意!

      5, 开始初始化GP.

     gpinitsystem -c /home/gpadmin/gpconfigs/gpinitsystem_config  执行之后的日志是这样的:



[gpadmin@vm11 ~]$ gpinitsystem -c /home/gpadmin/gpconfigs/gpinitsystem_config
20171006:06:17:43:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Checking configuration parameters, please wait...
20171006:06:17:43:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Reading Greenplum configuration file /home/gpadmin/gpconfigs/gpinitsystem_config
20171006:06:17:43:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Locale has not been set in /home/gpadmin/gpconfigs/gpinitsystem_config, will set to default value
20171006:06:17:44:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Locale set to en_US.utf8
20171006:06:17:44:109203 gpinitsystem:vm11:gpadmin-[INFO]:-MASTER_MAX_CONNECT not set, will set to default value 250
20171006:06:17:44:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Checking configuration parameters, Completed
20171006:06:17:44:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Commencing multi-home checks, please wait...
...
20171006:06:17:45:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Configuring build for standard array
20171006:06:17:45:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Commencing multi-home checks, Completed
20171006:06:17:45:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Building primary segment instance array, please wait...
...
20171006:06:17:47:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Checking Master host
20171006:06:17:47:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Checking new segment hosts, please wait...
..20171006:06:17:52:109203 gpinitsystem:vm11:gpadmin-[WARN]:-----------------------------------------------------------
20171006:06:17:52:109203 gpinitsystem:vm11:gpadmin-[WARN]:-Host vm22 is assigned as localhost in /etc/hosts
20171006:06:17:52:109203 gpinitsystem:vm11:gpadmin-[WARN]:-This will cause segment->master communication failures
20171006:06:17:52:109203 gpinitsystem:vm11:gpadmin-[WARN]:-Remove vm22 from local host line in /etc/hosts
20171006:06:17:52:109203 gpinitsystem:vm11:gpadmin-[WARN]:-----------------------------------------------------------
.
20171006:06:17:54:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Checking new segment hosts, Completed
20171006:06:17:54:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Greenplum Database Creation Parameters
20171006:06:17:54:109203 gpinitsystem:vm11:gpadmin-[INFO]:---------------------------------------
20171006:06:17:54:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Master Configuration
20171006:06:17:54:109203 gpinitsystem:vm11:gpadmin-[INFO]:---------------------------------------
20171006:06:17:54:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Master instance name       = EMC Greenplum DW
20171006:06:17:54:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Master hostname            = vm11
20171006:06:17:54:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Master port                = 5432
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Master instance dir        = /data/master/gpseg-1
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Master LOCALE              = en_US.utf8
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Greenplum segment prefix   = gpseg
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Master Database            = gpexmp
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Master connections         = 250
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Master buffers             = 128000kB
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Segment connections        = 750
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Segment buffers            = 128000kB
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Checkpoint segments        = 8
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Encoding                   = UNICODE
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Postgres param file        = Off
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Initdb to be used          = /usr/local/greenplum-db/./bin/initdb
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-GP_LIBRARY_PATH is         = /usr/local/greenplum-db/./lib
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Ulimit check               = Passed
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Array host connect type    = Single hostname per node
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Master IP address [1]      = 192.168.122.1
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Master IP address [2]      = 192.168.37.128
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Master IP address [3]      = fe80::20c:29ff:feb1:4c90
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Standby Master             = Not Configured
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Primary segment #          = 1
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Total Database segments    = 3
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Trusted shell              = ssh
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Number segment hosts       = 3
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Mirroring config           = OFF
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:----------------------------------------
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Greenplum Primary Segment Configuration
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:----------------------------------------
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-vm11         /data/primary/gpseg0    40000   2       0
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-vm22         /data/primary/gpseg1    40000   3       1
20171006:06:17:55:109203 gpinitsystem:vm11:gpadmin-[INFO]:-vm33         /data/primary/gpseg2    40000   4       2
Continue with Greenplum creation Yy/Nn>
Y
20171006:06:18:22:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Building the Master instance database, please wait...
20171006:06:18:30:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Starting the Master in admin mode
20171006:06:18:56:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Commencing parallel build of primary segment instances
20171006:06:18:57:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Spawning parallel processes    batch [1], please wait...
...
20171006:06:18:57:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Waiting for parallel processes batch [1], please wait...
............................
20171006:06:19:26:109203 gpinitsystem:vm11:gpadmin-[INFO]:------------------------------------------------
20171006:06:19:26:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Parallel process exit status
20171006:06:19:26:109203 gpinitsystem:vm11:gpadmin-[INFO]:------------------------------------------------
20171006:06:19:26:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Total processes marked as completed           = 3
20171006:06:19:26:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Total processes marked as killed              = 0
20171006:06:19:26:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Total processes marked as failed              = 0
20171006:06:19:26:109203 gpinitsystem:vm11:gpadmin-[INFO]:------------------------------------------------
20171006:06:19:27:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Deleting distributed backout files
20171006:06:19:27:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Removing back out file
20171006:06:19:27:109203 gpinitsystem:vm11:gpadmin-[INFO]:-No errors generated from parallel processes
20171006:06:19:27:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Restarting the Greenplum instance in production mode
20171006:06:19:27:122973 gpstop:vm11:gpadmin-[INFO]:-Starting gpstop with args: -a -l /home/gpadmin/gpAdminLogs -i -m -d /data/master/gpseg-1
20171006:06:19:27:122973 gpstop:vm11:gpadmin-[INFO]:-Gathering information and validating the environment...
20171006:06:19:27:122973 gpstop:vm11:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20171006:06:19:27:122973 gpstop:vm11:gpadmin-[INFO]:-Obtaining Segment details from master...
20171006:06:19:27:122973 gpstop:vm11:gpadmin-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 4.3.16.1 build 1'
20171006:06:19:27:122973 gpstop:vm11:gpadmin-[INFO]:-There are 0 connections to the database
20171006:06:19:27:122973 gpstop:vm11:gpadmin-[INFO]:-Commencing Master instance shutdown with mode='immediate'
20171006:06:19:27:122973 gpstop:vm11:gpadmin-[INFO]:-Master host=vm11
20171006:06:19:27:122973 gpstop:vm11:gpadmin-[INFO]:-Commencing Master instance shutdown with mode=immediate
20171006:06:19:27:122973 gpstop:vm11:gpadmin-[INFO]:-Master segment instance directory=/data/master/gpseg-1
20171006:06:19:28:122973 gpstop:vm11:gpadmin-[INFO]:-Attempting forceful termination of any leftover master process
20171006:06:19:28:122973 gpstop:vm11:gpadmin-[INFO]:-Terminating processes for segment /data/master/gpseg-1
20171006:06:19:29:123060 gpstart:vm11:gpadmin-[INFO]:-Starting gpstart with args: -a -l /home/gpadmin/gpAdminLogs -d /data/master/gpseg-1
20171006:06:19:29:123060 gpstart:vm11:gpadmin-[INFO]:-Gathering information and validating the environment...
20171006:06:19:29:123060 gpstart:vm11:gpadmin-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.16.1 build 1'
20171006:06:19:29:123060 gpstart:vm11:gpadmin-[INFO]:-Greenplum Catalog Version: '201310150'
20171006:06:19:29:123060 gpstart:vm11:gpadmin-[INFO]:-Starting Master instance in admin mode
20171006:06:19:30:123060 gpstart:vm11:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20171006:06:19:30:123060 gpstart:vm11:gpadmin-[INFO]:-Obtaining Segment details from master...
20171006:06:19:30:123060 gpstart:vm11:gpadmin-[INFO]:-Setting new master era
20171006:06:19:30:123060 gpstart:vm11:gpadmin-[INFO]:-Master Started...
20171006:06:19:30:123060 gpstart:vm11:gpadmin-[INFO]:-Shutting down master
20171006:06:19:31:123060 gpstart:vm11:gpadmin-[INFO]:-Commencing parallel segment instance startup, please wait...
........ 
20171006:06:19:39:123060 gpstart:vm11:gpadmin-[INFO]:-Process results...
20171006:06:19:39:123060 gpstart:vm11:gpadmin-[INFO]:-----------------------------------------------------
20171006:06:19:39:123060 gpstart:vm11:gpadmin-[INFO]:-   Successful segment starts                                            = 3
20171006:06:19:39:123060 gpstart:vm11:gpadmin-[INFO]:-   Failed segment starts                                                = 0
20171006:06:19:39:123060 gpstart:vm11:gpadmin-[INFO]:-   Skipped segment starts (segments are marked down in configuration)   = 0
20171006:06:19:39:123060 gpstart:vm11:gpadmin-[INFO]:-----------------------------------------------------
20171006:06:19:39:123060 gpstart:vm11:gpadmin-[INFO]:-
20171006:06:19:39:123060 gpstart:vm11:gpadmin-[INFO]:-Successfully started 3 of 3 segment instances 
20171006:06:19:39:123060 gpstart:vm11:gpadmin-[INFO]:-----------------------------------------------------
20171006:06:19:39:123060 gpstart:vm11:gpadmin-[INFO]:-Starting Master instance vm11 directory /data/master/gpseg-1 
20171006:06:19:51:123060 gpstart:vm11:gpadmin-[INFO]:-Command pg_ctl reports Master vm11 instance active
20171006:06:19:53:123060 gpstart:vm11:gpadmin-[INFO]:-No standby master configured.  skipping...
20171006:06:19:53:123060 gpstart:vm11:gpadmin-[INFO]:-Database successfully started
20171006:06:19:58:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Completed restart of Greenplum instance in production mode
20171006:06:19:58:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Loading gp_toolkit...
20171006:06:20:09:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Scanning utility log file for any warning messages
20171006:06:20:09:109203 gpinitsystem:vm11:gpadmin-[WARN]:-*******************************************************
20171006:06:20:09:109203 gpinitsystem:vm11:gpadmin-[WARN]:-Scan of log file indicates that some warnings or errors
20171006:06:20:09:109203 gpinitsystem:vm11:gpadmin-[WARN]:-were generated during the array creation
20171006:06:20:09:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Please review contents of log file
20171006:06:20:09:109203 gpinitsystem:vm11:gpadmin-[INFO]:-/home/gpadmin/gpAdminLogs/gpinitsystem_20171006.log
20171006:06:20:09:109203 gpinitsystem:vm11:gpadmin-[INFO]:-To determine level of criticality
20171006:06:20:09:109203 gpinitsystem:vm11:gpadmin-[INFO]:-These messages could be from a previous run of the utility
20171006:06:20:09:109203 gpinitsystem:vm11:gpadmin-[INFO]:-that was called today!
20171006:06:20:09:109203 gpinitsystem:vm11:gpadmin-[WARN]:-*******************************************************
20171006:06:20:09:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Greenplum Database instance successfully created
20171006:06:20:09:109203 gpinitsystem:vm11:gpadmin-[INFO]:-------------------------------------------------------
20171006:06:20:09:109203 gpinitsystem:vm11:gpadmin-[INFO]:-To complete the environment configuration, please 
20171006:06:20:09:109203 gpinitsystem:vm11:gpadmin-[INFO]:-update gpadmin .bashrc file with the following
20171006:06:20:09:109203 gpinitsystem:vm11:gpadmin-[INFO]:-1. Ensure that the greenplum_path.sh file is sourced
20171006:06:20:09:109203 gpinitsystem:vm11:gpadmin-[INFO]:-2. Add "export MASTER_DATA_DIRECTORY=/data/master/gpseg-1"
20171006:06:20:09:109203 gpinitsystem:vm11:gpadmin-[INFO]:-   to access the Greenplum scripts for this instance:
20171006:06:20:09:109203 gpinitsystem:vm11:gpadmin-[INFO]:-   or, use -d /data/master/gpseg-1 option for the Greenplum scripts
20171006:06:20:10:109203 gpinitsystem:vm11:gpadmin-[INFO]:-   Example gpstate -d /data/master/gpseg-1
20171006:06:20:10:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Script log file = /home/gpadmin/gpAdminLogs/gpinitsystem_20171006.log
20171006:06:20:10:109203 gpinitsystem:vm11:gpadmin-[INFO]:-To remove instance, run gpdeletesystem utility
20171006:06:20:10:109203 gpinitsystem:vm11:gpadmin-[INFO]:-To initialize a Standby Master Segment for this Greenplum instance
20171006:06:20:10:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Review options for gpinitstandby
20171006:06:20:10:109203 gpinitsystem:vm11:gpadmin-[INFO]:-------------------------------------------------------
20171006:06:20:10:109203 gpinitsystem:vm11:gpadmin-[INFO]:-The Master /data/master/gpseg-1/pg_hba.conf post gpinitsystem
20171006:06:20:10:109203 gpinitsystem:vm11:gpadmin-[INFO]:-has been configured to allow all hosts within this new
20171006:06:20:10:109203 gpinitsystem:vm11:gpadmin-[INFO]:-array to intercommunicate. Any hosts external to this
20171006:06:20:10:109203 gpinitsystem:vm11:gpadmin-[INFO]:-new array must be explicitly added to this file
20171006:06:20:10:109203 gpinitsystem:vm11:gpadmin-[INFO]:-Refer to the Greenplum Admin support guide which is
20171006:06:20:10:109203 gpinitsystem:vm11:gpadmin-[INFO]:-located in the /usr/local/greenplum-db/./docs directory
20171006:06:20:10:109203 gpinitsystem:vm11:gpadmin-[INFO]:-------------------------------------------------------

    出错过好几次,总算搞定了. gpstop出错,目录存在,系统已经在运行...等等.

     ps -ef | grep gpadmin

[gpadmin@vm11 ~]$ kill -9 101840  
[gpadmin@vm11 ~]$  rm /tmp/.s.PGSQL.5432.lock


rm -rf /data/master/gpseg-1
rm -rf /data/primary/gpseg1
rm -rf /data/primary/gpseg0
rm -rf /data/primary/gpseg1
rm -rf /data/primary/gpseg2

    6, 运行SQL

[gpadmin@vm11 conf]$ psql -p 5432 -d gpexmp
psql (8.2.15)
Type "help" for help.


create table t1(id int primary key,cn int,name varchar(40)) distributed by (id);
insert into t1 select generate_series(1,100),generate_series(1,100),generate_series(1,100);


select gp_segment_id,count(1) from t1 group by 1;
 gp_segment_id | count 
---------------+-------
             2 |    37
             1 |    32
             0 |    31

看到效果了,成功了

版权声明:本文为博主原创文章,未经博主允许不得转载。

安装greenplum-loader

1.将greenplum-loaders-5.2.0-rhel6-x86_64.zip上传到linux上,根据文件名可以看出本次安装的loaders 版本为5.2.0, redhat 6,32与64...

数据存储技术与实践

  • 2016年09月06日 13:15
  • 26.31MB
  • 下载

HBASE---数据存储实践

HBASE(来自Hadoop database)是一个很好的BigTable的实现,能够存储上百亿行和百万列的数据,是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统。          HBAS...
  • hugolyl
  • hugolyl
  • 2016年03月13日 13:48
  • 274

千亿级服务器监控数据存储实践

千亿级服务器监控数据存储实践 背景 近几年开源的大数据处理系统已经逐步发展到一个比较成熟的阶段了,各类大数据处理的场景都有了相应的解决方案,如同 mysql 在当今互联...

第14周上机实践项目5——体会棋盘游戏中的数据存储(1)

问题及代码 /* * Copyright (c) 2014, 烟台大学计算机学院 * All rights reserved. * 文件名称:test.cpp * 作 者:辛彬 * ...

第14周上机实践项目5——体会棋盘游戏中的数据存储(3)

问题及代码 /* * Copyright (c) 2014, 烟台大学计算机学院 * All rights reserved. * 文件名称:test.cpp * 作 者:辛彬 * ...

第14周上机实践项目5——体会棋盘游戏中的数据存储(2)

问题及代码 /* * Copyright (c) 2014, 烟台大学计算机学院 * All rights reserved. * 文件名称:test.cpp * 作 者:辛彬 * ...

【BDTC2016】大数据分析与生态系统论坛:大数据存储、处理技术大比评 百花齐放落地实践大展现

2016年12月8-10日,由中国计算机学会(CCF)主办,CCF大数据专家委员会承办,中国科学院计算技术研究所、中科天玑数据科技股份有限公司与CSDN共同协办,以“聚焦行业最佳实践,数据与应用的深度...
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:数据存储---GreenPlum实践
举报原因:
原因补充:

(最多只允许输入30个字)