基础环境
centos6.8
hadoop-3.1.1.tar.gz
spark-2.3.0-bin-hadoop2.7.tgz
zookeeper-3.4.9.tar.gz
pip-18.0.tar.gz
setuptools-40.2.0.zip
三台服务器
10.0.0.11 s11
10.0.0.12 s12
10.0.0.13 s13
准备工作
以下操作均使用root用户操作
安装jdk1.8 并配置环境变量
修改 /etc/profile
export JAVA_HOME=/usr/java/jdk1.8.0_121
export JRE_HOME=/usr/java/jdk1.8.0_121/jre/
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH
配置scala环境
解压 scala-2.12.6.tgz
重命名为scala
修改 /etc/profile
export SCALA_HOME=/app/appuser/apps/scala
export PATH=$SCALA_HOME/bin:$PATH
配置hosts
在 /etc/hosts 文件中,加入 ip hostname对应关系
配置打开文件数以及线程数
修改/etc/security/limits.conf 文件末尾加入
* soft nofile 65536
* hard nofile 131072
* soft nproc 2048
* hard nproc 4096
修改/etc/security/limits.d/90-nproc.conf
* soft nproc 2048
* hard nproc 4096
升级python到2.7
安装python可能会用到的依赖
yum install -y zlib-devel bzip2-devel openssl-devel xz-libs wget gcc
编译安装 python
./configure
make && make install
解决yum依赖python2.6.6的问题
mv /usr/bin/python /usr/bin/python2.6.6
ln -s /usr/local/bin/python2.7 /usr/bin/python
修改 /usr/bin/yum
#!/usr/bin/python
改为
#!/usr/bin/python2.6.6
安装 setuptools
python setup.py install
装pip
python setup.py install
安装python连接hdfs spark模块
pip install hdfs
pip install pyspark
app用户做如下修改:
appuser用户在三台服务器做两两免认证登录,并且各自登录自己也免认证
安装zookeeper集群
以下如无特殊说明,均使用appuser用户执行
安装hdfs
角色分配
s11 部署 namenode; journalnode; datanode; resourceManager;nodeManager
s12 部署 namenode; journalnode; datanode;nodeManager
s13 部署 ournalnode; datanode;nodeManager
配置文件修改
修改core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either