文章目录
1.集群环境
CentOS:7
JDK:1.8
Hadoop:3.0.0
Spark:2.4.0
HBase:
集群包括三个节点
主机名 IP
node01 192.168.4.211
node02 192.168.4.212
node03 192.168.4.213
修改hosts
vim /etc/hosts
192.168.4.211 node01
192.168.4.212 node02
192.168.4.213 node03
集群免密登录
https://blog.csdn.net/jia_7m/article/details/88240369
配置JDK
# 下载 jdk-8u171-linux-x64.tar.gz 并解压
cd /user
tar -zxvf ./tgz/jdk-8u171-linux-x64.tar.gz
# 配置环境变量
vim /etc/profile
JAVA_HOME=/user/java8
PATH=$PATH:$JAVA_HOME/bin
source /etc/profile
# 测试配置是否成功
java -version
下载Hadoop、HBase、Zookeeper安装包
1)、下载Hadoop 2.5.2
https://dist.apache.org/repos/dist/release/hadoop/common/
2)、下载hbase1.0.3
http://mirrors.hust.edu.cn/apache/hbase/
3)、下载Zookeeper 3.4.6
http://mirrors.hust.edu.cn/apache/zookeeper/
并把下载好的安装包上传到Linux的/user/tgz
1.1 Zookeeper集群配置
cd /user
# 解压
tar -zxvf ./tgz/zookeeper-3.4.13.tar.gz
cd ./zookeeper-3.4.13/conf
1) vim zoo.cfg
dataDir=/tmp/zookeeper
server.1=node01:2888:3888
server.2=node02:2888:3888
server.3=node03:2888:3888
2) 将配置好的zookeeper拷贝到其他节点
scp -r /user/zookeeper-3.4.13/ root@192.168.4.212
1.2 Hadoop安装
cd /user
# 解压
tar -zxvf ./tgz/hadoop-3.0.0.tar.gz
1) vim /etc/profile
export HADOOP_HOME=/user/hadoop-3.0.0
PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
# 重启 /etc/profile
source /etc/profile
2) 配置hadoop-env.sh java环境
cd ./hadoop-3.0.0/etc/hadoop/
vim hadoop-env.sh
export JAVA_HOME=/user/java8
3) 配置core-site.xml文件
vim core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/user/hadoop-3.0.0/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://node01:9000</value>
</property>
</configuration>
4) 配置mapred-site.xml文件
vim mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>node01:49001</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/user/hadoop-3.0.0/var</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
5) 配置hdfs-site.xml文件
vim hdfs-site.xml
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/user/hadoop-3.0.0/dfs/name</value>
<description>Path on the local filesystem where theNameNode stores the namespace and transactions logs persistently.</description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/user/hadoop-3.0.0/dfs/data</value>
<description>Comma separated list of paths on the localfilesystem of a DataNode where it should store its blocks.</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
<description>need not permissions</description>
</property>
</configuration>
6) 配置yarn-site.xml
vim yarn-site.xml
<property>
<!-- 这里设置主节点 -->
<name>yarn.resourcemanager.hostname</name>
<value>node01</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<description>The https adddress of the RM web application.</description>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<description>The https adddress of the RM web application.</description>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
然后拷贝到其他节点
1.3 HBase集群配置
cd /user
# 解压
tar -zxvf ./tgz/hbase-2.1.0-bin.tar.gz
1) vim /etc/profile
export HBASE_HOME=/user/hbase-2.1.0
export PATH=$PATH:$HBASE_HOME/bin
# 重启 /etc/profile
source /etc/profile
2) 配置hbase-env.sh
cd /user/hbase-2.1.0/conf/
vim hbase-env.sh
export JAVA_HOME=/user/java8
export HBASE_MANAGES_ZK=false
# HBASE_MANAGES_ZK=false 表示启动的是独立的zookeeper,而配置成true则是hbase自带的zookeeper
2) 配置hbase-site.xml
vim hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://node01:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master</name>
<value>hdfs:node01:60000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>node01,node02,node03</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/tmp/zookeeper/hbase</value>
</property>
</configuration>
说明:
hbase.cluster.distributed指定了Hbase的运行模式。false是单机模式,true是分布式模式。
hbase.zookeeper.quorum是Zookeeper集群的地址列表,用逗号分割。
3) 配置regionservers
node01
node02
node03
同样将HBase拷贝到其他节点
1.4 启动集群
1、启动Zookeeper集群 (每台节点都执行)
cd /user/zookeeper-3.4.13
./bin/zkServer.sh start
2、 查看Zookeeper状态 (leader、follower)
./bin/zkServer.sh status
1、启动Hadoop集群(Master节点)
hadoop namenode -format # 第一次启动,格式namenode
./sbin/start-all.sh
启动HBase集群(Master节点)
./bin/start-hbase.sh
jps查看进程
24241 ThriftServer
3362 ResourceManager
6082 QuorumPeerMain
18098 HMaster
9970 Master
15460 RunJar
3958 Jps
2839 NameNode
3111 SecondaryNameNode
10057 Worker
18267 HRegionServer
1.5 Spark安装
# 解压
cd /user
tar ./tgz/spark-2.4.0-bin-hadoop2.7.tgz
1) 配置环境变量
vim /etc/profile
# 添加
export SPARK_HOME=/user/spark-2.4.0
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
2) 配置shark-env.sh
cd ./spark-2.4.0
cp conf/spark-env.sh.template conf/spark-env.sh #copy
vim conf/spark-env.sh
export JAVA_HOME=/user/java8
export HADOOP_HOME=/user/hadoop-3.0.0
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_MASTER_IP=192.168.4.211
export SPARK_WORKER_MEMORY=1g # 每个worker节点能够最大分配给exectors的内存大小
export SPARK_WORKER_CORES=1 # 每个worker节点所占有的CPU核数目
export SPARK_WORKER_INSTANCES=1 # 每台机器上开启的worker节点的数目
3) 配置slaves
vim conf/slaves
node01
node02
node03
4) 将spark-2.4.0拷贝到其他节点
5) 启动集群
./sbin/star-all.sh
1.6 Kafka安装
# 解压
cd /user
tar ./tgz/kafka_2.12-2.1.0.tgz
1) 配置server.properties
cd kafka_2.12-2.1.0
vim config/server.properties
broker.id=2 这个ID 要求每台节点都不相同 且必须是自然数
host.name=192.168.4.211 这个的值是节点IP
listeners = PLAINTEXT://192.168.4.211:9092 后面是IP + port 每个节点都不相同
zookeeper.connect=192.168.4.211:2181 填写自己的zookeeper通信地址, 如有多个 用逗号隔开
2) 修改zookeeper 的配置文件
a. #cd zookeeper/conf
b. #cp zoo_sample.cfg zoo.cfg
c. #设置
3) 将kafka复制给其他的 节点
并且修改其中的 broker.id
4) 启动
#zookeeper/bin/zkServer.sh start
#kafka/bin/kafka-server-start.sh -daemon config/server.properties
5) 验证
#jps
如果每台节点都有kafka 进程表示成功启动集群
2. 错误
2.1 大数据之Zookeeper启动错误
启动Zookeeper
# cd /user/zookeeper-3.4.13
# ./bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /user/zookeeper-3.4.13/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
查看状态
# ./bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /user/zookeeper-3.4.13/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.
检查配置文件 conf/zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/tmp/zookeeper
# the port at which the clients will connect
clientPort=2181
server.1=node01:2888:3888
server.2=node02:2888:3888
server.3=node03:2888:3888
检查/tmp/zookeeper/myid
No such file or directory 缺少myid文件
# touch /tmp/zookeeper/myid
# echo 1 > /tmp/zookeeper/myid
重新启动