记录一下最近做的一个大数据集群安装:
安装版本:presto317、kudu1.10、hive3.1.2、hadoop3.1.2
几个主要软件的下载地址:
presto https://prestosql.io/docs/current/index.html
kudu rpm包地址 https://github.com/MartinWeindel/kudu-rpm/releases
hive http://mirror.bit.edu.cn/apache/hive/
hdfs http://archive.apache.org/dist/hadoop/core/
机器分布:
以下除非特殊说明,否则都是在所有主机同步操作,批量化操作方法看个人喜欢,本人用的saltstack。
1、准备工作
关闭selinux、防火墙、安装基础软件工具、安装JDK(jdk1.8.0_211)、所有主机配置好SSH免密登录、配置hosts文件(127.0.0.1的2行删除)
192.168.86.101 centos101 master
192.168.86.102 centos102 slave1
192.168.86.103 centos103 slave2
192.168.86.104 centos104 slave3
192.168.86.105 centos105 slave4
192.168.86.106 centos106 slave5
将主要程序的安装包下载解压到/data目录,如下:
[root@centos102 ~]# ll /data
total 765476
drwxr-xr-x 12 root root 183 Sep 14 16:29 hadoop-3.1.2
drwxr-xr-x 7 10 143 245 Apr 2 11:51 jdk1.8.0_211
drwxr-xr-x 4 kudu kudu 35 Sep 14 22:24 kudu
drwxr-xr-x 3 root root 42 Sep 15 10:49 presto
drwxr-xr-x 6 root root 85 Sep 14 23:19 presto-server-317
其中/data/presto是presto的数据目录,创建一下
mkdir -p /data/presto/
配置环境变量,以下加到/etc/profile文件里
JAVA_HOME=/data/jdk1.8.0_211
JRE_HOME=/data/jdk1.8.0_211/jre
CLASS_PATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export JAVA_HOME JRE_HOME CLASS_PATH PATH
export JAVA_HOME=/data/jdk1.8.0_211
export HADOOP_HOME=/data/hadoop-3.1.2
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
#以下看情况添加
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin
2、安装hadoop
vi /data/hadoop-3.1.2/etc/hadoop/core-site.xml 增加以下内容
<configuration>
<!-- master主机名 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:8020</value> 文件系统默认8020端口,hbase.rootdir同
</property>
<!-- Size of read/write buffer used in SequenceFiles. -->
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop-3.1.2/tmp</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
vi /data/hadoop-3.1.2/etc/hadoop/hadoop-env.sh 增加以下内容
export JAVA_HOME=/data/jdk1.8.0_211
export HADOOP_HOME=/data/hadoop-3.1.2
export PATH=$PATH:/data/hadoop-3.1.2/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
vi /data/hadoop-3.1.2/etc/hadoop/hdfs-site..xml 增加以下内容
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/hadoop-3.1.2/dfs/data</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/hadoop-3.1.2/dfs/name</value>
</property>
<property>
<name>dfs.http.address</name>
<value>0.0.0.0:50070</value>
</property>
</configuration>
vi /data/hadoop-3.1.2/etc/hadoop/mapred-site.xml 增加以下内容
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
vi /data/hadoop-3.1.2/etc/hadoop/workers 增加以下内容
slave1 192.168.86.102
slave2 192.168.86.103
slave3 192.168.86.104
slave4 192.168.86.105
slave5 192.168.86.106
vi /data/hadoop-3.1.2/etc/hadoop/yarn-site.xml 增加以下内容
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<!-- 针对mapreduce报错的配置 -->
<property>
<name>mapreduce.application.classpath</name>
<value>/data/hadoop-3.1.2/share/hadoop/mapreduce/*,/data/hadoop-3.1.2/share/hadoop/mapreduce/lib/*</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
</configuration>
hadoop一共6个配置文件,另外还有4个启停脚本需要修改:
/data/hadoop-3.1.2/sbin/start-dfs.sh、stop-dfs.sh 文件头部增加以下内容
HDFS_DATANODE_USER=root
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
data/hadoop-3.1.2/sbin/start-yarn.sh、stop-yarn.sh 文件头部增加以下内容
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
配置好之后,所有服务器新建用户:
useradd -m hadoop -G root -s /bin/bash
useradd -m hdfs -G root -s /bin/bash
useradd -m yarn -G root -s /bin/bash
所有服务器初始格式化
hadoop namenode -format
如果没报错,hdfs master主节点 centos101执行
/data/hadoop-3.1.2/sbin/start-all.sh
如果没报错,所有服务器jps检查进程情况,正常应该是和表格中规划一致。
启动如果提示权限问题,将bin、sbin目录文件给执行权限再试一下:
chmod +x /data/hadoop-3.1.2/bin/*
chmod +x /data/hadoop-3.1.2/sbin/*
3、centos101主节点安装hive
cd /data/hive-3.1.2/conf/
cp hive-default.xml.template hive-site.xml 从模板复制一个hive配置文件
vi /data/hive-3.1.2/conf/hive-site.xml 增加以下内容
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>hive.exec.mode.local.auto</name>
<value>true</value>
<description> Let Hive determine whether to run in local mode automatically </description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>111111</value>
</property>
<!-- 显示表的列名 -->
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<!-- 显示数据库名称 -->
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
<property>
<name>hive.server2.authentication</name>
<value>NONE</value>
</property>
</configuration>
cd /data/hive-3.1.2/conf/
cp hive-env.sh.template hive-env.sh
vi /data/hive-3.1.2/conf/hive-env.sh增加以下配置项
HADOOP_HOME=/data/hadoop-3.1.2
export HIVE_CONF_DIR=/data/hive-3.1.2/conf
export HIVE_AUX_JARS_PATH=/data/hive-3.1.2/lib
将mysql驱动文件放到hive lib目录下
cp mysql-connector-java-5.1.46.jar /data/hive-3.1.2/lib
centos101主节点安装mysql
yum install -y mariadb-server
systemctl start mariadb
systemctl enable mariadb
初始化mysql
mysql_secure_installation
创建hive元数据库
create database hive character set utf8 ;
CREATE USER 'hive'@'%'IDENTIFIED BY '111111';
GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%';
FLUSH PRIVILEGES;
hdfs中创建hive目录和临时目录
hadoop fs -mkdir -p /user/hive/warehouse
hadoop fs -chmod g+w /user/hive/warehouse
hadoop fs -mkdir -p /tmp
hadoop fs -chmod g+w /tmp
初始化hive元数据库
schematool -dbType mysql -initSchema
启动hive service
nohup sh bin/hive --service metastore -p 9083 &
nohup sh bin/hive --service hiveserver2 &
也可以到bin目录直接hive连接检查验证。
4、安装kudu
cd /data&&rpm -ivh kudu-1.10.0-1.x86_64.rpm
在centos101、centos102、centos103三台主机执行创建master数据目录命令:
mkdir -p /data/kudu/master/logs /data/kudu/master/wals /data/kudu/master/data
在所有节点执行创建tserver数据目录命令
mkdir -p /data/kudu/tserver/logs /data/kudu/tserver/data /data/kudu/tserver/wals
vi /etc/kudu/conf/master.gflagfile
## Comma-separated list of the RPC addresses belonging to all Masters in this cluster.
## NOTE: if not specified, configures a non-replicated Master.
#--master_addresses=
--master_addresses=centos101:7051,centos102:7051,centos103:7051
--log_dir=/data/kudu/master/logs
--fs_wal_dir=/data/kudu/master/wals
--fs_data_dirs=/data/kudu/master/data
vi /etc/kudu/conf/tserver.gflagfile
#Comma separated addresses of the masters which the tablet server should connect to.
--tserver_master_addrs=centos101:7051,centos102:7051,centos103:7051
--log_dir=/data/kudu/tserver/logs
--fs_wal_dir=/data/kudu/tserver/wals
--fs_data_dirs=/data/kudu/tserver/data
修改kudu目录权限
chown -R kudu:kudu /data/kudu
在centos101、centos102、centos103三台主机执行kudu master启动命令
systemctl start kudu-master
在所有节点执行kudu tserver启动命令
systemctl start kudu-tserver
5、安装presto
mkdir -p /data/presto-server-317/etc/catalog
cd /data/presto-server-317/etc
主节点配置文件
vi config.properties
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=18080
discovery-server.enabled=true
discovery.uri=http://192.168.86.101:18080
query.max-memory=8GB
query.max-memory-per-node=1GB
query.max-run-time=600s
从节点配置文件
coordinator=false
http-server.http.port=18080
discovery.uri=http://192.168.86.101:18080
query.max-memory=8GB
query.max-memory-per-node=1GB
以下所有节点相同
vi jvm.config
-server
-Xmx20G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p
-XX:+CMSClassUnloadingEnabled
-XX:+AggressiveOpts
-DHADOOP_USER_NAME=root
vi log.properties
com.facebook.presto=INFO
vi node.properties
node.environment=mycluster
#这里需要注意,node_id的101跟着主机名修改,在centos102主节,即改为102,不能重复
node.id=node_coordinator_101
node.data-dir=/data/presto/
cd /data/presto-server-317/etc/catalog
vi hive.properties
#这里需要注意,连接名称影响调用的驱动版本,目前必须是hive-hadoop2
connector.name=hive-hadoop2
##hive元数据库
hive.metastore.uri=thrift://centos101:9083
hive.config.resources=/data/hadoop-3.1.2/etc/hadoop/core-site.xml,/data/hadoop-3.1.2/etc/hadoop/hdfs-site.xml
vi kudu.properties
connector.name=kudu
## List of Kudu master addresses, at least one is needed (comma separated)
## Supported formats: example.com, example.com:7051, 192.0.2.1, 192.0.2.1:7051,
## [2001:db8::1], [2001:db8::1]:7051, 2001:db8::1
kudu.client.master-addresses=centos101:7051,centos102:7051,centos103:7051
## Kudu does not support schemas, but the connector can emulate them optionally.
## By default, this feature is disabled, and all tables belong to the default schema.
## For more details see connector documentation.
kudu.schema-emulation.enabled=true
## Prefix to use for schema emulation (only relevant if `kudu.schema-emulation.enabled=true`)
## The standard prefix is `presto::`. Empty prefix is also supported.
## For more details see connector documentation.
#kudu.schema-emulation.prefix=
#######################
### Advanced Kudu Java client configuration
#######################
## Default timeout used for administrative operations (e.g. createTable, deleteTable, etc.)
#kudu.client.defaultAdminOperationTimeout = 30s
kudu.client.default-admin-operation-timeout = 60s
## Default timeout used for user operations
#kudu.client.defaultOperationTimeout = 30s
kudu.client.default-operation-timeout = 60s
## Default timeout to use when waiting on data from a socket
#kudu.client.defaultSocketReadTimeout = 30s
kudu.client.default-socket-read-timeout = 60s
## Disable Kudu client's collection of statistics.
#kudu.client.disableStatistics = false
启动presto服务
/data/presto-server-317/bin/launcher start
到这里安装完成,中间有报错的,找对应的日志排查一下,最后附上几个主要服务的web页面地址:
hdfs http://192.168.86.101:50070
yarn http://192.168.86.101:8088
kudu http://192.168.86.101:8051
presto http://192.168.86.101:18080