三、Hadoop集群的安装与配置
3.1Hadoop集群部署规划
// linux101 linu102 linux103
// HDFS NameNode,Datanode Datanode SecondaryNameNode,DataNode
// YARN NodeManager ResourceManager,NodeManager NodeManager
3.2 Hadoop安装与配置
-
上传hadoop3.1.3并解压配置环境变量
[longlong@linux101 software]$ sudo vim /etc/profile.d/my-env.sh ## HADOOP_HOME export HADOOP_HOME=/opt/module/hadoop-3.1.3 export PATH=$HADOOP_HOME/bin:$PATH export PATH=$HADOOP_HOME/sbin:$PATH
-
配置集群
-
配置core-site.xml
<configuration> <!-- 指定NameNode的地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://linux101:8020</value> </property> <!-- 指定hadoop数据的存储目录 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/module/hadoop-3.1.3/data</value> </property> <!-- 配置HDFS网页登录使用的静态用户为longlong --> <property> <name>hadoop.http.staticuser.user</name> <value>longlong</value> </property> <!-- 配置该longlong(superUser)允许通过代理访问的主机节点 --> <property> <name>hadoop.proxyuser.longlong.hosts</name> <value>*</value> </property> <!-- 配置该longlong(superUser)允许通过代理用户所属组 --> <property> <name>hadoop.proxyuser.longlong.groups</name> <value>*</value> </property> <!-- 配置该longlong(superUser)允许通过代理的用户--> <property> <name>hadoop.proxyuser.longlong.users</name> <value>*</value> </property> </configuration>
-
配置hdfs-site.xml
<configuration> <!-- nn web端访问地址--> <property> <name>dfs.namenode.http-address</name> <value>linux101:9870</value> </property> <!-- 2nn web端访问地址--> <property> <name>dfs.namenode.secondary.http-address</name> <value>linux103:9868</value> </property> <!-- 测试环境指定HDFS副本的数量1 --> <property> <name>dfs.replication</name> <value>3</value> </property> </configuration>
-
配置yarn-site.xml
<configuration> <!-- 指定MR走shuffle --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- 指定ResourceManager的地址--> <property> <name>yarn.resourcemanager.hostname</name> <value>linux102</value> </property> <!-- 环境变量的继承 --> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property> <!-- yarn容器允许分配的最大最小内存 --> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>1024</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>4096</value> </property> <!-- yarn容器允许管理的物理内存大小 --> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>4096</value> </property> <!-- 关闭yarn对物理内存和虚拟内存的限制检查 --> <property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> <!-- 开启日志聚集功能 --> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <!-- 设置日志聚集服务器地址 --> <property> <name>yarn.log.server.url</name> <value>http://linux101:19888/jobhistory/logs</value> </property> <!-- 设置日志保留时间为7天 --> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property> </configuration>
-
配置map-reduce.xml
<configuration> <!-- 指定MapReduce程序运行在Yarn上 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!-- 历史服务器端地址 --> <property> <name>mapreduce.jobhistory.address</name> <value>linux101:10020</value> </property> <!-- 历史服务器web端地址 --> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>linux101:19888</value> </property> </configuration>
-
配置workers
[longlong@linux101 hadoop]$ vim /opt/module/hadoop-3.1.3/etc/hadoop/workers linux101 linux102 linux103
-
分发hadoop至其他机器
-
3.3 集群的启动与测试
-
集群启动
-
首次启动需要在NameNode上格式化
[longlong@linux101 bin]$ hdfs namenode --format
-
启动hdfs(namenode 所在节点)
[longlong@linux101 bin]$ start-dfs.sh
-
启动Yarn(在ResourceManager所在节点)
[longlong@linux102 ~]$ start-yarn.sh
-
-
集群测试
-
后台服务
[longlong@linux101 hadoop-3.1.3]$ myjps.sh =================> linux101 JPS <================= 16834 Jps 16281 NameNode 16732 NodeManager 16431 DataNode =================> linux102 JPS <================= 2602 ResourceManager 2412 DataNode 3053 Jps 2909 NodeManager =================> linux103 JPS <================= 15122 NodeManager 15222 Jps 15016 SecondaryNameNode 14907 DataNode [longlong@linux101 hadoop-3.1.3]$
-
Web界面
**3.4 HDFS高可用的搭建与配置 **
-
-
上传解压至linux101,配置环境变量并分发至其他机器
# 配置环境变量 [longlong@linux101 hadoop-3.1.3]$ sudo vim /etc/profile.d/my-env.sh # HADOOP export HADOOP_HOME=/opt/module/hadoop-3.1.3 export PATH=$HADOOP_HOME/bin:$PATH export PATH=$HADOOP_HOME/sbin:$PATH # 分发配置 [longlong@linux101 hadoop-3.1.3]$ scp /etc/profile.d/my-env.sh root@linux102:/etc/profile.d/
-
配置nameservice服务,修改hdfs-site.xml文件
## 添加如下配置 <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!--配置nameservice--> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <!-- NameNode数据存储目录 --> <property> <name>dfs.namenode.name.dir</name> <value>file://${hadoop.tmp.dir}/name</value> </property> <!-- DataNode数据存储目录 --> <property> <name>dfs.datanode.data.dir</name> <value>file://${hadoop.tmp.dir}/data</value> </property> <!-- JournalNode数据存储目录 --> <property> <name>dfs.journalnode.edits.dir</name> <value>${hadoop.tmp.dir}/jn</value> </property> <!-- 完全分布式集群名称 --> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <!-- 集群中NameNode节点都有哪些 --> <property> <name>dfs.ha.namenodes.mycluster</name> <value>linux101,linux102,linux103</value> </property> <!-- NameNode的RPC通信地址 --> <property> <name>dfs.namenode.rpc-address.mycluster.linux101</name> <value>linux101:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.linux102</name> <value>linux102:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.linux103</name> <value>linux103:8020</value> </property> <!-- NameNode的http通信地址 --> <property> <name>dfs.namenode.http-address.mycluster.linux101</name> <value>linux101:9870</value> </property> <property> <name>dfs.namenode.http-address.mycluster.linux102</name> <value>linux102:9870</value> </property> <property> <name>dfs.namenode.http-address.mycluster.linux103</name> <value>linux103:9870</value> </property> <!-- 指定NameNode元数据在JournalNode上的存放位置 --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://linux101:8485;linux102:8485;linux103:8485/mycluster</value> </property> <!-- 访问代理类:client用于确定哪个NameNode为Active --> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!-- 配置隔离机制,即同一时刻只能有一台服务器对外响应 --> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <!-- 使用隔离机制时需要ssh无秘钥登录--> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/longlong/.ssh/id_rsa</value> </property> <!-- 关闭权限检查--> <property> <name>dfs.permissions.enable</name> <value>false</value> </property> <!--配置故障自动转义--> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <!-- 2nn web端访问地址--> <property> <name>dfs.namenode.secondary.http-address</name> <value>linux103:9868</value> </property> <!-- 测试环境指定HDFS副本的数量1 --> <property> <name>dfs.replication</name> <value>3</value> </property> </configuration>
-
配置core-site.xml
<configuration> <!--指定defaultFS--> <property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> <!--指定jn存储路径--> <property> <name>dfs.journalnode.edits.dir</name> <value>/opt/module/hadoop-3.1.3/JN/data</value> </property> <!--配置zookeeper地址--> <property> <name>ha.zookeeper.quorum</name> <value>linux101:2181,linux102:2181,linux103:2181</value> </property> <!-- 指定hadoop数据的存储目录 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/module/hadoop-3.1.3/data</value> </property> <!-- 配置HDFS网页登录使用的静态用户为longlong --> <property> <name>hadoop.http.staticuser.user</name> <value>longlong</value> </property> <!-- 配置该longlong(superUser)允许通过代理访问的主机节点 --> <property> <name>hadoop.proxyuser.longlong.hosts</name> <value>*</value> </property> <!-- 配置该longlong(superUser)允许通过代理用户所属组 --> <property> <name>hadoop.proxyuser.longlong.groups</name> <value>*</value> </property> <!-- 配置该longlong(superUser)允许通过代理的用户--> <property> <name>hadoop.proxyuser.longlong.users</name> <value>*</value> </property> </configuration>
-
配置hadoop-env.sh
[longlong@linux101 hadoop]$ vim hadoop-env.sh export JAVA_HOME=/opt/module/java-xxx export HDFS_NAMENODE_USER=longlong export HDFS_DATANODE_USER=longlong export HDFS_ZKFC_USER=longlong export HDFS_JOURNALNODE_USER=longlong export YARN_RESOURCEMANAGER_USER=longlong export YARN_NODEMANAGER_USER=longlong
-
分发配置文件至linux102、linux103
-
集群启动测试
# 0. 必须先启动Zookeeper集群 # 1. 在各个JournalNode上启动服务 [longlong@linux101 ~]$ hdfs --daemon start journalnode [longlong@linux102 ~]$ hdfs --daemon start journalnode [longlong@linux103 ~]$ hdfs --daemon start journalnode # 2. 第一次启动在nn1上进行格式化并启动 [longlong@linux101 ~]$ hdfs namenode -format [longlong@linux101 ~]$ hdfs --daemon start namenode # 3. 在nn2、nn3上同步nn1的元数据 [longlong@linux102 ~]$ hdfs namenode --bootstrapStandby [longlong@linux103 ~]$ hdfs namenode --bootstrapStandby # 4. 启动nn2和nn3 [longlong@linux102 ~]$ hdfs --daemon start namenode [longlong@linux103 ~]$ hdfs --daemon start namenode # 5. 关闭服务 [longlong@linux101 module]$ stop-dfs.sh # 6. 初始化HA在zookeeper的状态 [longlong@linux101 ~]$ hdfs zkfc -formatZK # 7. 启动集群服务 [longlong@linux101 module]$ start-dfs.sh
四、MySQL数据库的安装
-
卸载linux自带的mariadb
[longlong@linux101 mysql-8.0.18]$ sudo yum -y remove mariadb*
-
上传mysql8.1.18.rpm包进行解压
[longlong@linux101 mysql-8.0.18]$ tar -xf mysql-8.0.18-1.el7.x86_64.rpm-bundle.tar
-
开始安装
# 1. [longlong@linux101 mysql-8.0.18]$ sudo rpm -ivh mysql-community-common-8.0.18-1.el7.x86_64.rpm # 2. [longlong@linux101 mysql-8.0.18]$ sudo rpm -ivh mysql-community-libs-8.0.18-1.el7.x86_64.rpm # 3. [longlong@linux101 mysql-8.0.18]$ sudo rpm -ivh mysql-community-client-8.0.18-1.el7.x86_64.rpm # 4. [longlong@linux101 mysql-8.0.18]$sudo yum -y install libaio # 5. [longlong@linux101 mysql-8.0.18]$ sudo rpm -ivh mysql-community-server-8.0.18-1.el7.x86_64.rpm
-
启动修改密码以及远程登录
# 1. 启动mysqld服务 [longlong@linux101 mysql-8.0.18]$ systemctl start mysqld # 2. 查找密码并登录mysql [longlong@linux101 mysql-8.0.18]$ sudo cat /var/log/mysqld.log | grep password # 3. 登录 [longlong@linux101 mysql-8.0.18]$ mysql -u root -p # 4. 修改密码 # 4.1 修改策略 set global validate_password.policy=low; set global validate_password.length=11; # 4.2 修改加密方式 ALTER USER 'root'@'localhost' IDENTIFIED BY 'SJL@123456#' PASSWORD EXPIRE NEVER; # 4.3 修改密码 ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'SJL@123456#'; # 5. 刷新权限 FLUSH PRIVILEGES; # 6. 修改远程登录 use mysql; update user set host ='%' where user = 'root'; commit; exit;
五、hive的安装与配置
5.1 hive的安装
-
hive的解压与上传
-
配置环境变量
## HIVE_HOME export HIVE_HOME=/opt/module/hive-3.1.2 export PATH=$HIVE_HOME/bin:$PATH
-
解决 Jar 包冲突
[longlong@linux101 hive-3.1.2]$ mv $HIVE_HOME/lib/log4j-slf4j-impl-2.10.0.jar $HIVE_HOME/lib/log4j-slf4j-impl-2.10.0.bak
5.2 hive元数据配置到mysql
-
拷贝mysql的驱动至hive的lib中
-
配置Metastore到mysql
[longlong@linux101 conf]$ vim hive-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- jdbc连接的URL --> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://linux101:3306/metastore?useUnicode=true&serverTimezone=GMT&characterEncoding=UTF-8&useSSL=false</value> </property> <!-- jdbc连接的Driver--> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.cj.jdbc.Driver</value> </property> <!-- jdbc连接的username--> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <!-- jdbc连接的password --> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>SJL@123456#</value> </property> <!-- Hive默认在HDFS的工作目录 --> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> </property> <!-- Hive元数据存储的验证 --> <property> <name>hive.metastore.schema.verification</name> <value>false</value> </property> <!-- 元数据存储授权 --> <property> <name>hive.metastore.event.db.notification.api.auth</name> <value>false</value> </property> <property> <name>hive.cli.print.header</name> <value>true</value> </property> <property> <name>hive.cli.print.current.db</name> <value>true</value> </property> <!-- 指定hiveserver2连接的host --> <property> <name>hive.server2.thrift.bind.host</name> <value>linux101</value> </property> <!-- 指定hiveserver2连接的端口号 --> <property> <name>hive.server2.thrift.port</name> <value>10000</value> </property> <!-- 指定存储元数据要连接的地址 --> <property> <name>hive.metastore.uris</name> <value>thrift://linux101:9083</value> </property> </configuration>
5.3 启动hive
-
初始化元数据库
-
登录mysql数据库,创建metastore数据库
-
初始化hive元数据库
[longlong@linux101 conf]$ schematool -initSchema -dbType mysql -verbose
-
-
启动hive
-
先启动hadoop集群
-
**元数据 **启动hive
[longlong@linux101 conf]$ hive --service metastore 2022-03-05 15:32:23: Starting Hive Metastore Server [longlong@linux101 conf]$ hive
-
JDBC 访问hive
-
启动元数据
[longlong@linux101 conf]$ hive --service metastore
-
启动 hiveserver2服务
[longlong@linux101 conf]$ hive --service hiveserver2 | hiveserver2
-
启动 beeline客户端
[longlong@linux101 ~]$ beeline -u jdbc:hive2://linux101:10000 -n longlong
-
-
六、Kafka安装与配置
6.1 kafka 的安装
-
上传并解压
-
在 kafka 文件夹中创建 logs
-
修改配置文件
[longlong@linux101 config]$ vim server.properties # broker的全局唯一编号,不能重复 broker.id=0 log.dirs=/opt/module/kafka_2.11-2.4.1/logs zookeeper.connect=linux101:2181,linux102:2181,linux103:2181
-
配置环境变量
## KAFKA_HOME export KAFKA_HOME=/opt/module/kafka_2.11-2.4.1 export PATH=$KAFKA_HOME/bin:$PATH
-
分发kafka
-
在linux102、linux103上修改 broker.id
linux102 => broker.id = 1 linux103 => broker.id = 2
6.2 kafka启动测试
# 1. 先开启zookeeper集群
# 2. 再启动kafka
kafka-server-start.sh -daemon config/server.properties
七、HBase安装与配置
7.1 Hbase的安装
-
上传并解压hbase2.4.10
-
配置环境变量
## HBASE_HOME export HBASE_HOME=/opt/module/hbase-2.4.10 export PATH=$HBASE_HOME/bin:$PATH
7.2 修改配置
-
修改 conf/regionservers 配置文件
[longlong@linux101 hbase-2.4.10]$ vim conf/regionservers linux101 linux102 linux103
-
在conf目录创建一个名为 backup-masters的文件添加主机名为linux102
[longlong@linux101 hbase-2.4.10]$ vim conf/backup-masters linux102
-
修改conf下的hbase-site.xml文件
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hbase.rootdir</name> <value>hdfs://linux101/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>linux101,linux102,linux103</value> </property> <property> <name>hbase.master.port</name> <value>16000</value> </property> <property> <name>hbase.tmp.dir</name> <value>./tmp</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/home/longlong/zookeeper-hbase</value> </property> <property> <name>hbase.unsafe.stream.capability.enforce</name> <value>false</value> </property> </configuration>
-
在hbase-env.sh 末尾添加如下配置
export JAVA_HOME=/opt/module/jdk1.8.0_212 export HBASE_MANAGES_ZK=false
-
拷贝hdfs-site.xml到conf目录下
[longlong@linux101 conf]$ cp /opt/module/hadoop-3.1.3/etc/hadoop/hdfs-site.xml /opt/module/hbase-2.4.10/conf/
-
分发配置
7.3 HBase的测试
-
启动zookeeper
-
启动hadoop
-
启动Hbase
start-hbase.sh
-
HBaseUI界面