Centos7.5+CDH 6.2搭建大数据平台
1.CDH介绍
目前Hadoop比较流行的主要有2个版本,Apache和Cloudera版本。
- Apache Hadoop:社区人员比较多,更新频率比较快,但是稳定性比较差,安装配置繁琐,实际使用者少。
- Cloudera Hadoop(CDH):Cloudera公司的发行版本,基于Apache Hadoop的二次开发,优化了组件兼容和交互接口、简化安装配置、提供界面统一管理程序。
2.Cloudera Manager 介绍
Cloudera Manager 是用于管理cdh集群的端到端应用程序,统一管理和安装。CDH除了可以通过cm安装也可以通过yum,tar,rpm安装。主要由如下几部分组成:
-
服务端/Server:
Cloudera Manager 的核心。主要用于管理 web server 和应用逻辑。它用于安装软件,配置,开始和停止服务,以及管理服务运行的集群。 -
代理/agent:
安装在每台主机上。它负责启动和停止进程,部署配置,触发安装和监控主机。 -
数据库/Database:
存储配置和监控信息。通常可以在一个或多个数据库服务器上运行的多个逻辑数据库。例如,所述的 Cloudera 管理器服务和监视,后台程序使用不同的逻辑数据库。
Cloudera Repository:由cloudera manager 提供的软件分发库。 -
客户端/Clients:
提供了一个与 Server 交互的接口。
3.环境准备
3.1.节点准备(四个节点)
主机名 | IP | CM管理软件 |
---|---|---|
nn01 | 192.168.18.110 | Cloudera Manager Server&Agent ,MySQL |
dn01 | 192.168.18.111 | Cloudera Manager Agent |
dn02 | 192.168.18.112 | Cloudera Manager Agent |
dn03 | 192.168.18.113 | Cloudera Manager Agent |
3.2.配置主机名和hosts解析(所有节点)
编辑/etc/hostname,修改主机名,并使用命令hostname使其立刻生效。编辑文件/etc/hosts,增加如下内容。
192.168.18.110 nn01.yunlu.cn nn01
192.168.18.111 dn01.yunlu.cn dn01
192.168.18.112 dn02.yunlu.cn dn02
192.168.18.113 dn03.yunlu.cn dn03
3.3.关闭防火墙
# systemctl stop firewalld.service && systemctl disable firewalld.service
3.4.关闭SELinux
# sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config
# setenforce 0
3.5.配置时间同步
chrony既可作时间服务器服务端,也可作客户端。chrony性能比ntp要好很多,且chrony配置简单、管理方便。 但是此次我们采用定时任务同步网络时间的方法。
- 添加定时任务
# echo "$((RANDOM%60)) $((RANDOM%24)) * * * /usr/sbin/ntpdate time1.aliyun.com" >> /var/spool/cron/root
3.6.禁用透明大页面压缩,CDH配置需要
# echo never > /sys/kernel/mm/transparent_hugepage/defrag
# echo never > /sys/kernel/mm/transparent_hugepage/enabled
- 并将上面的两条命令写入开机自启动/etc/rc.local
tee -a /etc/rc.local <<-'EOF'
echo never > /sys/kernel/mm/transparent_hugepage/defrag
echo never > /sys/kernel/mm/transparent_hugepage/enabled
EOF
3.7.优化交换分区
# echo "vm.swappiness = 10" >> /etc/sysctl.conf
# sysctl -p
3.8.配置SSH免密登录
- 所有节点执行如下命令(四次回车):
# ssh-keygen -t rsa
- 用拷贝的方法分发秘钥,所有节点执行如下命令:
# ssh-copy-id [nn01,dn01-dn03]
总共四次拷贝,每次拷贝按提示输入
yes
和相应节点的密码。
4.安装 CM 和 CDH
4.1.配置 Cloudera Manager 仓库(所有节点)
# wget https://archive.cloudera.com/cm6/6.2.0/redhat7/yum/cloudera-manager.repo -P /etc/yum.repos.d/
# rpm --import https://archive.cloudera.com/cm6/6.2.0/redhat7/yum/RPM-GPG-KEY-cloudera
使用在线安装会比较慢,建议先把需要的rpm下载下来,进行离线安装或者建私有仓库,涉及下面三个软件包:
cloudera-manager-agent-6.2.0-968826.el7.x86_64.rpm
cloudera-manager-server-6.2.0-968826.el7.x86_64.rpm
cloudera-manager-daemons-6.2.0-968826.el7.x86_64.rpm
4.2.配置 JDK (所有节点)
# rpm -ivh jdk-8u202-linux-x64.rpm
4.3.安装 CM Server 和 Agent
建议离线安装,把rpm包下载到服务器上面,传到其他节点一份,再本地安装,速度会快很多。
- nn01:
# yum localinstall cloudera-manager-daemons-6.2.0-968826.el7.x86_64.rpm -y
# yum localinstall cloudera-manager-agent-6.2.0-968826.el7.x86_64.rpm -y
# yum localinstall cloudera-manager-server-6.2.0-968826.el7.x86_64.rpm -y
- dn[01-03]:
# yum localinstall cloudera-manager-daemons-6.2.0-968826.el7.x86_64.rpm -y
# yum localinstall cloudera-manager-agent-6.2.0-968826.el7.x86_64.rpm -y
4.4.安装MySQL数据库(在nn01节点)
此次安装 mysql 是按照官网教程安装的,链接地址:
https://www.cloudera.com/documentation/enterprise/6/6.0/topics/cm_ig_mysql.html#cmig_topic_5_5
# wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm
# rpm -ivh mysql-community-release-el7-5.noarch.rpm
# yum install mysql-server -y
# systemctl start mysqld
- 查看状态
- 可选步骤。根据官方推荐的配置,编辑文件
/etc/my.cnf
,修改成如下内容:
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
transaction-isolation = READ-COMMITTED
# Disabling symbolic-links is recommended to prevent assorted security risks;
# to do so, uncomment this line:
symbolic-links = 0
key_buffer_size = 32M
max_allowed_packet = 32M
thread_stack = 256K
thread_cache_size = 64
query_cache_limit = 8M
query_cache_size = 64M
query_cache_type = 1
max_connections = 550
#expire_logs_days = 10
#max_binlog_size = 100M
#log_bin should be on a disk with enough free space.
#Replace '/var/lib/mysql/mysql_binary_log' with an appropriate path for your
#system and chown the specified folder to the mysql user.
log_bin=/var/lib/mysql/mysql_binary_log
#In later versions of MySQL, if you enable the binary log and do not set
#a server_id, MySQL will not start. The server_id must be unique within
#the replicating group.
server_id=1
binlog_format = mixed
read_buffer_size = 2M
read_rnd_buffer_size = 16M
sort_buffer_size = 8M
join_buffer_size = 8M
# InnoDB settings
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit = 2
innodb_log_buffer_size = 64M
innodb_buffer_pool_size = 4G
innodb_thread_concurrency = 8
innodb_flush_method = O_DIRECT
innodb_log_file_size = 512M
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
sql_mode=STRICT_ALL_TABLES
以上配置的含义,请参考本节开头的文档。
- 设置MySQL的登录密码,按照相关提示操作即可
# /usr/bin/mysql_secure_installation
- 将mysql 加到 开机启动中
# systemctl enable mysqld
4.5.安装 MySQL JDBC 驱动(所有节点)
用于各节点连接数据库。
# wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.46.tar.gz
# tar xf mysql-connector-java-5.1.46.tar.gz
# mkdir -p /usr/share/java/
# cd mysql-connector-java-5.1.46
# cp mysql-connector-java-5.1.46-bin.jar /usr/share/java/mysql-connector-java.jar
4.6.为 Cloudera 各软件创建数据库(在nn01节点)
Service | Database | User |
---|---|---|
Cloudera Manager Server | scm | scm |
Activity Monitor | amon | amon |
Reports Manager | rman | rman |
Sentry Server | sentry | sentry |
Cloudera Navigator Audit Server | nav | nav |
Cloudera Navigator Metadata Server | navms | navms |
Hive Metastore Server | hive | hive |
Hue | hue | hue |
Oozie | oozie | oozie |
- 将如下内容,写入到
cdh.sql
文件中
CREATE DATABASE scm DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON scm.* TO 'scm'@'%' IDENTIFIED BY 'scm';
CREATE DATABASE amon DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON amon.* TO 'amon'@'%' IDENTIFIED BY 'amon';
CREATE DATABASE rman DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON rman.* TO 'rman'@'%' IDENTIFIED BY 'rman';
CREATE DATABASE hue DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON hue.* TO 'hue'@'%' IDENTIFIED BY 'hue';
CREATE DATABASE hive DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON hive.* TO 'hive'@'%' IDENTIFIED BY 'hive';
CREATE DATABASE sentry DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON sentry.* TO 'sentry'@'%' IDENTIFIED BY 'sentry';
CREATE DATABASE nav DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON nav.* TO 'nav'@'%' IDENTIFIED BY 'nav';
CREATE DATABASE navms DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON navms.* TO 'navms'@'%' IDENTIFIED BY 'navms';
CREATE DATABASE oozie DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON oozie.* TO 'oozie'@'%' IDENTIFIED BY 'oozie';
- 执行sql文件
# mysql -uroot -p<ROOT_PASSWORD> < ./cdh.sql
4.7.设置 Cloudera Manager 数据库
# /opt/cloudera/cm/schema/scm_prepare_database.sh mysql scm scm
接着,输入scm数据库密码
4.8.安装 CDH(在nn01节点)
CM安装成功之后,接下来我们就可以通过CM安装CDH的方式构建企业大数据平台。所以首先需要把CDH的parcels包下载到CM主服务器上。同样的,我们为了加速我们的安装,我们可以把需要下载的软件包提前下载下来,也可以创建CDH私有仓库。
- 下载CDH的软件包 parcels
# cd /opt/cloudera/parcel-repo
# wget https://archive.cloudera.com/cdh6/6.2.0/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373-el7.parcel
# wget https://archive.cloudera.com/cdh6/6.2.0/parcels/manifest.json
- 生成sha文件
# sha1sum CDH-6.2.0-1.cdh6.2.0.p0.967373-el7.parcel | awk '{ print $1 }' > CDH-6.2.0-1.cdh6.2.0.p0.967373-el7.parcel.sha
- 修改属主属组
# chown -R cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo/*
4.9.启动 Cloudera Manager Server(在nn01节点)
# systemctl start cloudera-scm-server
如果启动中有什么问题,可以查看日志。
# tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log
在最后显示的日志中,有显示启动监听的端口。
Started ServerConnector@da518cb{SSL,[ssl, http/1.1]}{0.0.0.0:7183}
Started ServerConnector@a77165b{HTTP/1.1,[http/1.1]}{0.0.0.0:7180}
5.初始化 Cloudera Manager
稍等下,浏览器打开http://nn01:7180,用户名和密码默认都是admin。
- 然后按需,继续下一步操作即可。
5.1.CDH集群安装
- 按照提示操作即可,一般选默认就行。
5.2.集群设置
- 数据库设置
- 其它按照提示操作即可,一般选默认就行。