基础信息
系统初始化
vm创建虚拟机,使用CentOS7做镜像
集群配置
linux服务器名称 | hadoop01 [192.168.3.201] | hadoop02 [192.168.3.201] | hadoop03 [192.168.3.201] |
---|---|---|---|
hadoop3.2.2 | namenode:9870 | namenode:9868 | |
hadoop3.2.2 | datanode | datanode | datanode |
hadoop3.2.2 | resourcemanager | ||
hadoop3.2.2 | jobhistory:19888 | ||
mysql5.7 | mysql | ||
hive3.1.2 | hive:10000 | ||
zookeeper3.7.0 | √ | √ | √ |
服务器目录配置
linux路径地址 | 路径注释 |
---|---|
/opt/soft | 安装包 |
/opt/apps | 安装目录 |
/opt/data | 数据保存目录 |
/opt/logs | 日志保存目录 |
~/bin | 自动化脚本文件夹 |
WEB监控地址
项目名称 | 路径 |
---|---|
hadoop | http://hadoop01:9870/dfshealth.html#tab-overview |
yarn监控地址 | http://hadoop02:8088/cluster |
JobHistory历史服务器 | http://hadoop01:19888/jobhistory |
hbase监控地址 | http://hadoop01:16010/master-status |
组件安装
关系型数据库组件
MySQL
-
找到
mysql-5.7.36-1.el7.x86_64.rpm-bundle.tar
该文件,解压后进行安装 -
解压并安装
mkdir mysql tar -xvf mysql-5.7.36-1.el7.x86_64.rpm-bundle.tar -C mysql rpm -ivh mysql-community-common-5.7.36-1.el7.x86_64.rpm rpm -ivh mysql-community-libs-5.7.36-1.el7.x86_64.rpm rpm -ivh mysql-community-libs-compat-5.7.36-1.el7.x86_64.rpm rpm -ivh mysql-community-client-5.7.36-1.el7.x86_64.rpm rpm -ivh mysql-community-server-5.7.36-1.el7.x86_64.rpm
-
数据库初始化
mysqld --initialize --user=root
-
获取默认密码
grep 'temporary password' /var/log/mysqld.log -- 修改默认密码 alter user 'root'@'localhost' identified by 'Admin@123'; -- 开启远程访问权限 grant all PRIVILEGES on *.* to root@'%' identified by 'Admin@123'; -- 刷新权限 flush privileges;
大数据组件
Hadoop
-
core-size.xml配置文件
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- 指定NameNode地址 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/data/hadoopdata</value> </property> <!-- 指定hadoop数据存储目录 --> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop01:9000</value> </property> <property> <name>hadoop.http.staticuser.user</name> <value>root</value> </property> <!-- 不开启权限检查 --> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property> <!-- 启动超级代理用户 --> <property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property> </configuration>
-
hdfs-size.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/opt/data/hadoopdata/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/opt/data/hadoopdata/data</value> </property> <property> <name>dfs.namenode.http-address</name> <value>hadoop01:9870</value> <description>nn web 端访问地址</description> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>hadoop03:9868</value> <description>2nn web 端访问地址</description> </property> </configuration>
-
mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop01:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop01:19888</value> </property> </configuration>
-
yarn-size.xml
<?xml version="1.0"?> <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop02</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>512</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>4096</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>4096</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property> <!-- flink 设置项 --> <!-- master(JobManager)失败重启的最大尝试次数--> <property> <name>yarn.resourcemanager.am.max-attempts</name> <value>4</value> </property> <!-- 关闭yarn内存检查 --> <!-- 是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认为 true --> <property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value> </property> <!-- 因为对于 flink 使用 yarn 模式下,很容易内存超标,这个时候 yarn 会自动杀掉 job,因此需要关掉--> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> </configuration>
-
workers 【datanode集群ip】
hadoop01 hadoop02 hadoop03
Hive
-
上传安装包并解压
-
将安装路径配置到
/etc/profile
vim /etc/profile #HIVE_HOME export HIVE_HOME=/opt/apps/hive-3.1.2 export PATH=$PATH:$HIVE_HOME/bin
-
将lib中的jar包修改一下名称,防止hive启动时日志包冲突
mv /opt/apps/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar /opt/apps/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar.bak
-
创建hive在mysql中的元数据库
-
创建
conf/hive-site.xml
,并修改xml文件信息<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- jdbc连接的URL --> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://hadoop01:3306/hive?useSSL=false&useUnicode=true&characterEncoding=UTF-8</value> </property> <!-- jdbc连接的Driver--> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <!-- jdbc连接的username--> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <!-- jdbc连接的password --> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>Admin@123</value> </property> <!-- Hive默认在HDFS的工作目录 --> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> </property> <!-- Hive元数据存储的验证 --> <property> <name>hive.metastore.schema.verification</name> <value>false</value> </property> <property> <name>datanucleus.schema.autoCreateAll</name> <value>true</value> </property> <!-- 指定存储元数据要连接的地址 --> <property> <name>hive.metastore.uris</name> <value>thrift://hadoop01:9083</value> </property> <!-- 指定hiveserver2连接的host --> <property> <name>hive.server2.thrift.bind.host</name> <value>hadoop01</value> </property> <!-- 指定hiveserver2连接的端口号 --> <property> <name>hive.server2.thrift.port</name> <value>10000</value> </property> <!-- 显示表的列名 --> <property> <name>hive.cli.print.header</name> <value>true</value> </property> <!-- 显示数据库名称 --> <property> <name>hive.cli.print.current.db</name> <value>true</value> </property> <property> <name>hive.exec.post.hooks</name> <value>org.apache.atlas.hive.hook.HiveHook</value> </property> <property> <name>hive.metastore.event.db.notification.api.auth</name> <value>false</value> </property> <property> <name>hive.server2.enable.doAs</name> <value>false</value> </property> </configuration>
-
初始化hive数仓
schematool -initSchema -dbType mysql -verbose
-
由于hive数据库中注释中文不可显示,则将元数据库中的字段修改编码格式
-- 修改字段注释字符集 alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8; -- 修改表注释字符集 alter table TABLE_PARAMS modify column PARAM_VALUE varchar(20000) character set utf8; -- 修改分区参数 alter table PARTITION_PARAMS modify column PARAM_VALUE varchar(20000) character set utf8; alter table PARTITION_KEYS modify column PKEY_COMMENT varchar(20000) character set utf8; -- 修改索引名注释 alter table INDEX_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8; -- 修改视图 alter table TBLS modify column view EXPANDED TEXT MEDIUMTEST character set utf8; alter table TBLS modify column view ORIGINAL TEXT MEDIUMTEST character set utf8;
ZooKeeper
-
上传安装包并解压
tar -zxvf apache-zookeeper-3.7.0-bin.tar.gz -C /opt/apps/
-
新建文件夹
zkData
下创建myid文件,并写入1
, ==该文件识别每台服务器的id ==mkdir zkData vim myid >> 1
-
在conf目录下,修改
zoo_sample.cfg
为zoo.conf# 修改路径 dataDir=/opt/apps/zookeeper-3.7.0/zkData # 添加集群地址 server.1=hadoop01:2888:3888 server.2=hadoop02:2888:3888 server.3=hadoop03:2888:3888
-
zookeeper 安装包集群分发
-
修改各集群zkData/myid中的id,分别改成2,3
-
启动zk集群 依次启动
#!/bin/bash ssh hadoop01 "source /etc/profile && $ZOOKEEPER_HOME/bin/zkServer.sh start" ssh hadoop02 "source /etc/profile && $ZOOKEEPER_HOME/bin/zkServer.sh start" ssh hadoop03 "source /etc/profile && $ZOOKEEPER_HOME/bin/zkServer.sh start"
HBase
-
上传并解压安装包
tar -zxvf hbase-2.4.9-bin.tar.gz -C /opt/apps/
-
修改
hbase-env.sh
配置文件export HBASE_MANAGES_ZK=false # 使用外部zk管理hbase
-
修改
hbase-site.sh
配置文件<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!--设置HBase表数据,也就是真正的HBase数据在hdfs上的存储根目录--> <property> <name>hbase.rootdir</name> <value>hdfs://hadoop01:8020/HBase</value> </property> <!--是否为分布式模式部署,true表示分布式部署--> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <!--zookeeper集群的URL配置,多个host中间用逗号--> <property> <name>hbase.zookeeper.quorum</name> <value>hadoop01,hadoop02,hadoop03</value> </property> <!--HBase在zookeeper上数据的根目录znode节点--> <property> <name>zookeeper.znode.parent</name> <value>/hbase</value> </property> </configuration>
-
修改
regionservers
配置文件hadoop01 hadoop02 hadoop03
-
安装安装文件集群分发
-
配置环境变量并即时生效
-
启动hbase
start-hbase.sh # 启动hbase集群