5 搭建Hadoop
5.1 解压安装包并修改解压后的文件夹名
将hadoop安装包解压至master的/opt目录:
tar -zxvf hadoop-3.1.3.tar.gz -C /opt
mv /opt/hadoop-3.1.3 /opt/hadoop
5.2 配置环境变量
vi /etc/profile
export HADOOP_HOME=/opt/hadoop
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_COF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export LD_LIBRARY_path=$HADOOP_HOME/lib/native
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$PATH
source /etc/profile
5.3 修改配置
5.3.1 配置hadoop-env.sh
进入hadoop配置文件目录:
cd /opt/hadoop/etc/hadoop/
vi hadoop-env.sh
export JAVA_HOME=/opt/jdk1.8.0_251
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_JOURNALNODE_USER=root
export HDFS_ZKFC_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
export HADOOP_CLIENT_OPTS="-Xmx2048m $HADOOP_CLIENT_OPTS"
export HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib/native"
添加以上内容后保存并退出。
5.3.2 配置core-site.xml
进入hadoop配置文件目录:
cd /opt/hadoop/etc/hadoop/
vi core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/srv/data2/hadoop/tmp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>openlookeng-gz-gh-0001:2181,openlookeng-gz-gh-0002:2181,openlookeng-gz-gh-0003:2181</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
添加以上内容后保存,其中斜体标红的部分修改为实际节点对应的主机名,hadoop.tmp.dir的值可根据实际情况自行修改。
5.3.3 配置hdfs-site.xml
vi hdfs-site.xml
<configuration>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<!-- 定义hadoop集群文件的副本数量 -->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!-- mycluster下面有两个NameNode,分别是nn1,nn2 -->
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<!-- nn1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>openlookeng-gz-gh-0001:9000</value>
</property>
<!-- nn1的http通信地址,有个管理界面用于下载文件 -->
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>openlookeng-gz-gh-0001:50070</value>
</property>
<!-- nn2的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>openlookeng-gz-gh-0002:9000</value>
</property>
<!-- nn2的http通信地址 -->
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>openlookeng-gz-gh-0002:50070</value>
</property>
<!-- nn3的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.mycluster.nn3</name>
<value>openlookeng-gz-gh-0003:9000</value>
</property>
<!-- nn3的http通信地址 -->
<property>
<name>dfs.namenode.http-address.mycluster.nn3</name>
<value>openlookeng-gz-gh-0003:50070</value>
</property>
<!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://openlookeng-gz-gh-0001:8485;openlookeng-gz-gh-0002:8485;openlookeng-gz-gh-0003:8485/mycluster</value>
</property>
<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/srv/data2/hadoop/journal</value>
</property>
<!-- 开启NameNode失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<!-- 使用sshfence隔离机制时需要ssh免登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<!-- 配置sshfence隔离机制超时时间(active坏了之后,standby如果没有在30秒之内未连接上,那么standby将变成active) -->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<property>
<name>dfs.permissions.enable</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.acls.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/srv/data1/hadoop,/srv/data2/hadoop,/srv/data3/hadoop,/srv/data4/hadoop,/srv/data5/hadoop,/srv/data6/hadoop,/srv/data7/hadoop,/srv/data8/hadoop,/srv/data9/hadoop</value>
</property>
</configuration>
以上内容中标红的内容根据实际情况进行修改。
5.3.4 配置mapred-site.xml
vi mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
5.3.5 配置yarn-site.xml
vi yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 指定RM的cluster id -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<!-- 指定RM的名字 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 分别指定RM的地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>openlookeng-gz-gh-0002</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>openlookeng-gz-gh-0003</value>
</property>
<!-- 指定zk集群地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>openlookeng-gz-gh-0001:2181,openlookeng-gz-gh-0002:2181,openlookeng-gz-gh-0003:2181</value>
</property>
<!--reduce获取数据时通过shuffle方式-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>5</value>
</property>
</configuration>
以上内容中标红的内容根据实际情况进行修改。
5.3.6 配置slaves和workers
在slaves和workers文件中写入集群各节点的主机名。
openlookeng-gz-gh-0001
openlookeng-gz-gh-0002
openlookeng-gz-gh-0003
5.4 配置其它节点
将master解压并配置好的hadoop文件夹传至其它节点上:
scp -r /opt/hadoop-3.1.3 root@192.168.0.200:/opt
scp -r /opt/hadoop-3.1.3 root@192.168.0.158:/opt
配置环境变量,同5.2
每个节点上创建core-site.xml配置的hadoop.tmp.dir目录:
mkdir -p /srv/data2/hadoop/tmp
5.5 格式化namenode和zkfc服务
1、启动各节点的journalnode服务:
cd /opt/hadoop-3.1.3/sbin
hdfs --daemon start journalnode
2、Master节点上执行命令格式化namenode:
hdfs namenode -format
3、将格式化nm1生成的文件目录拷贝到nm2的对应目录下:
scp -r /srv/data2/hadoop/tmp/dfs root@192.168.0.200:/srv/data2/hadoop/tmp/
4、Master节点上执行命令格式化zkfc:
hdfs zkfc -formatZK
5.6 启动hadoop
在集群master节点上执行脚本启动hadoop和yarn服务:
start-all.sh
停止服务:stop-all.sh
6 搭建hive
6.1 安装mysql
6.1.1 安装MySQL并启动服务
6.1.2 修改root用户密码及权限
- 查看初始密码:cat /var/log/mysqld.log |grep password
- 使用初始密码进入mysql客户端:mysql -u root -p (回车之后键入初始密码)
- 修改root用户密码及权限
alter user 'root'@'localhost' identified by 'openLooKeng@123';
use mysql;
update user set host='%' where user='root';
select host, user from user;
flush privileges;
grant all PRIVILEGES on *.* to 'root'@'%' WITH GRANT OPTION;
flush privileges;
6.2 配置hive
6.2.1 解压安装包并修改文件夹名
将hive安装包解压至master的/opt目录:
tar -zxvf apache-hive-3.1.2-bin.tar.gz -C /opt
cd /opt && mv apache-hive-3.1.2-bin hive
并将mysql的jdbc驱动jar包放到hive的lib目录下。
6.2.2 配置环境变量
vi /etc/profile
export HIVE_HOME=/opt/hive
export HIVE_CONF_DIR=/opt/hive/conf
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$PATH
source /etc/profile
6.2.3 修改配置文件
进入hive的配置文件目录:
cd /opt/hive-3.1.2/conf
将hadoop的core-site.xml和hdfs-site.xml文件拷贝至改目录:
cp /opt/hadoop-3.1.3/etc/hadoop/core-site.xml .
cp /opt/hadoop-3.1.3/etc/hadoop/hdfs-site.xml .
配置hive-site.xml:
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.0.131:3306/hive?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=UTF-8&useSSL=false&allowPublicKeyRetrieval=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>openLooKeng@123</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://192.168.0.37:9083,thrift://192.168.0.157:9083</value>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value/>
<description>Bind host on which to run the HiveServer2 Thrift service.</description>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
<description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'.</description>
</property>
<property>
<name>hive.server2.logging.operation.enabled</name>
<value>true</value>
<description>When true, HS2 will save operation logs and make them available for clients</description>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/tmp/operation_logs</value>
<description>Top level directory where operation logs are stored if logging functionality is enabled</description>
</property>
<property>
<name>hive.execution.engine</name>
<value>mr</value>
</property>
<property>
<name>hive.exec.mode.local.auto</name>
<value>true</value>
</property>
<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
</configuration>
以上内容中标红的内容根据实际情况进行修改。
6.2.4配置其它节点
将master解压并配置好的hive文件夹传至其它节点上:
scp -r /opt/hive-3.1.2 root@192.168.0.200:/opt
scp -r /opt/hive-3.1.2 root@192.168.0.158:/opt
配置环境变量,同6.2.2
6.2.5初始化数据库
- hdfs文件系统上创建目录并修改权限:
hdfs dfs -mkdir -p /user/hive/warehouse
hdfs dfs -mkdir -p /tmp/
hdfs dfs -chmod -R 777 /user/hive/warehouse
hdfs dfs -chmod -R 777 /tmp/
2. Master上执行命令初始化数据库:
schematool -initSchema -dbType mysql
6.3 启动hive
执行命令启动元数据服务:
nohup hive --service metastore>metastore.log &
启动hiveserver2服务后可使用beeline登录客户端:
nohup hive --service hiveserver2 > hiveserver2.log &
使用Beeline登录客户端的方式:
beeline -u jdbc:hive2://192.168.0.131:10000 -n root