目录
目录
前言
该文是本人自己在搭建大数据集群的时候,网上找了各种安装教程资料,最后发现在安装过程中会遇到各种棘手的问题,报各种错误,最后在各种查找报错原因的情况下总结得到了搭建集群行之有效的详细安装方法,针对易配置错的地方,有红色标准,应该引以为意,不可马虎。学习的小伙伴可以作为参考,大神轻喷!
废话少说,上来就是干!
第1章 配置环境
1.1.准备阶段
1.1.1.服务器
至少准备三台服务器
IP地址 | 机器名 |
192.168.80.239 | n1 |
192.168.80.192 | n2 |
192.168.80.190 | n3 |
1.1.2.操作系统
(1)centos7
http://isoredirect.centos.org/centos/7/isos/x86_64/CentOS-7-x86_64-DVD 1708.iso
(2)若在虚拟机内安装centos7,下载VMware-workstation软件
1.1.3.Oracle jdk8u131
1.1.4.Mysql数据库
1.1.5.数据库驱动包
1.1.6. Cloudera大数据相关软件包
(1)Cloudera Manager5.13.0
(2)CDH5.13.0
1.2.安装配置
1.2.1.安装操作系统
(1)光盘安装,直接下一步就好(后续操作全部使用root用户)
(2)U盘安装,某些主办无法识别U盘名称,在制作U盘为启动盘的时候盘符名称会超出规定字符数不现实造成的,故手动指向U盘的挂载目录
(3)虚拟机的话更灵活,直接选择centos操作系统加载就可以,配置网络的时候选择桥接模式
1.2.2.配置操作系统
(1)设置主机名称(n1为主namenode,n2-n3为datanode) (root身份 )(所有节点)
#vi /etc/sysconfig/network NETWORKING=yes HOSTNAME=n1
#source /etc/sysconfig/network |
通过 service network restart 重启网络服务生效。
(2)设置IP和机器名的映射关系(所有节点)
#vi /etc/hosts 192.168.80.239 n1 192.168.80.192 n2 192.168.80.190 n3 |
(3)设置网络(所有节点)
#vi /etc/sysconfig/network-scripts/ifcfg-ens33 TYPE="Ethernet" PROXY_METHOD="none" BROWSER_ONLY="no" BOOTPROTO="static" DEFROUTE="yes" IPV4_FAILURE_FATAL="no" IPV6INIT="yes" IPV6_AUTOCONF="yes" IPV6_DEFROUTE="yes" IPV6_FAILURE_FATAL="no" IPV6_ADDR_GEN_MODE="stable-privacy" NAME="ens33" UUID="a278b039-9450-40e3-bbfc-be31b1d35e21" DEVICE="ens33" ONBOOT="yes" GATEWAY="192.168.80.1" IPADDR="192.168.80.239" NETMASK="255.255.255.0" DNS1="8.8.8.8" DNS2="8.8.4.4" |
注意:不是所有的机器都是配置“ifcfg-ens33”这个文件,到/etc/sysconfig/network-scripts/目录下可以看以ifcfg-ens开头的文件
UUID不要改动
(4)重新启动网络,并且测试网络联通情况(三台机器配置基本相同,在此只以n1配置为例)(所有节点)
#/etc/init.d/network restart 验证: #ping n1 #ping www.baidu.com |
(5)设置防火墙(所有节点)
关闭防火墙
#systemctl stop firewalld.service |
开机不启动
#systemctl disable firewalld.service |
查看状态
#firewall-cmd --state |
(6)关闭SELINUX (所有节点)
#vi /etc/selinux/config SELINUX=disabled |
(7)安装更新部分linux软件包(逐个安装,不要一起拷贝)(所有节点)
#yum install -y chkconfig #yum install -y python #yum install -y bind-utils #yum install -y psmisc #yum install -y libxslt #yum install -y zlib #yum install -y sqlite #yum install -y cyrus-sasl-gssapi #yum install -y fuse #yum install -y portmap #yum install -y fuse-libs #yum install -y redhat-lsb #yum install -y iw #yum install -y net-tools #yum install -y perl perl-devel autoconf libaio #yum install -y python-lxml #yum install -y python-psycopg2 #yum install -y mod_ssl #yum install -y httpd #yum install -y MySQL-python #yum install krb5-devel cyrus-sasl-gssapi cyrus-sasl-devel libxml2-devel libxslt-devel mysql mysql-devel openldap-devel python-devel python-simplejson sqlite-devel |
1.2.3.安装Oracle JDK (所有节点)
自带jdk的卸载:
rpm -qa | grep jdk rpm -e rpm -e rpm -e |
通过linux连接工具把jdk安装包放置在/tmp目录下
创建java目录
#mkdir /usr/java |
拷贝安装包到java目录
#cp /tmp/jdk-8u131-linux-x64.tar /usr/java |
减压安装包
#tar -xf /usr/java/jdk-8u131-linux-x64.tar |
把减压后的安装包修改名称
#mv /usr/java/jdk1.8.0_131 /usr/java/jdk1.8 |
配置javaclasspath
#vi /etc/profile |
在文件末尾增加
JAVA_HOME=/usr/java/jdk1.8 JRE_HOME=/usr/java/jdk1.8/jre CLASS_PATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin export JAVA_HOME JRE_HOME CLASS_PATH PATH |
使配置生效
#source /etc/profile |
验证
#java -version #java #javac |
1.2.4.配置SSH免密
(1)(所有节点)都执行ssh-keygen -t rsa,一路回车,生成无密码的密钥对。
#ssh-keygen -t rsa |
在各自的~/.ssh目录下会生成两个文件
在各节点执行如下命令
(在其他节点可能出现错误,需要设置IP和机器名的映射关系:#vi /etc/hosts)
# ssh-copy-id n1 # ssh-copy-id n2 # ssh-copy-id n3 |
必须要做如下操作!!!!!!
chmod -R 700 .ssh/
chmod 600 .ssh/authorized_keys
测试:在主节点上ssh n2,正常情况下,不需要密码就能直接登陆进去了。
#ssh root@n2 #ssh root@n3 |
1.2.5.安装配置mysql
仅n1(主)节点
通过 linux 连接工具把 mysql-5.7.21-linux-glibc2.12-x86_64.tar.gz 拷贝到/tmp目录下
把安装包拷贝到安装目录
#cp /tmp/mysql-5.7.21-linux-glibc2.12-x86_64.tar.gz /usr/local/ |
减压安装包
#cd /usr/local/ #tar -zvxf mysql-5.7.21-linux-glibc2.12-x86_64.tar.gz |
重命名安装包
#mv mysql-5.7.21-linux-glibc2.12-x86_64 mysql |
检查是否安装mariadb
#rpm -qa|grep mariadb |
卸载mariadb
#rpm -e –nodeps [文件名] |
删除etc目录下的my.cnf(存在的话)
#cd /etc #rm -rf my.cnf |
创建mysql的组
#groupadd mysql |
创建mysql的用户并加入mysql组
#useradd -g mysql mysql |
配置my.cnf文件
#vi /etc/my.cnf
[client] #character-set-server=utf8 port=3306 socket=/var/lib/mysql/mysql.sock [mysqld] #skip-grant-tables #设置3306端口 port=3306 socket=/var/lib/mysql/mysql.sock #设置mysql的安装目录 basedir=/usr/local/mysql #设置mysql数据库的数据的存放目录 datadir=/usr/local/mysql/data #允许最大连接数 max_connections=200 #服务端使用的字符集默认为8比特编码的latin1字符集 character-set-server=utf8 #创建新表时将使用的默认存储引擎 default-storage-engine=INNODB user=mysql [mysqld_safe] pid-file=/usr/local/mysql/data/n1.pid |
创建mysql目录并且给mysql用户赋权socket=/var/lib/mysql/mysql.sock
#mkdir /var/lib/mysql #chown -R mysql:mysql /var/lib/mysql |
设置/usr/local/mysql目录拥有者为mysql用户
#cd /usr/local/mysql #chown -R mysql:mysql ./ |
安装和初始化数据库
#/usr/local/mysql/bin/./mysql_install_db --user=mysql --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data/ |
修改data为mysql用户所有
#chown -R mysql:mysql data |
授权my.cnf
#chown 777 /etc/my.cnf |
复制启动脚本到资源目录
#cp -a /usr/local/mysql/support-files/mysql.server /etc/init.d/mysqld |
增加mysqld服务控制脚本执行权限
#chmod +x /etc/init.d/mysqld |
将mysqld服务加入到系统服务
#chkconfig --add /etc/init.d/mysqld #/usr/local/mysql/bin./mysqld_safe --user=mysql & |
配置MYSQL_HOME
#vi /etc/profile MYSQL_HOME=/usr/local/mysql PATH=$PATH:$MYSQL_HOME/bin export PATH
source /etc/profile |
启动mysql服务
#/etc/init.d/mysqld restart |
显示默认密码
#cat /root/.mysql_secret |
第一次登录
#/usr/local/mysql/bin./mysql -uroot -p |
在/etc/my.cnf 中[mysqld]下面增加:skip-grant-tables
操作完毕后,再注释掉
修改密码
mysql>SET PASSWORD = PASSWORD('jtv.123456'); |
添加远程访问:
mysql>use mysql; mysql>update user set host = '%' where user = 'root'; mysql>select host, user from user; |
重启mysql服务配置生效
#/etc/init.d/mysqld restart |
1.2.6. NTP时间同步
(1)(所有节点)都安装ntp包
检查安装ntp服务没有
#rpm -q ntp |
没有安装进行安装,安装略过
#yum -y install ntp |
设置时区
#timedatectl set-timezone Asia/Shanghai |
开机自启动
#systemctl enable ntpd |
启动服务
#systemctl start ntpd |
(2)设置n1节点(ntp服务器)
第一次同步时间
#ntpdate -u cn.pool.ntp.org |
设置红色的部分
# vi /etc/ntp.conf
# For more information about this file, see the man pages # ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5).
driftfile /var/lib/ntp/drift
# Permit time synchronization with our time source, but do not # permit the source to query or modify the service on this system. restrict default nomodify notrap nopeer noquery
# Permit all access over the loopback interface. This could # be tightened as well, but to do so would effect some of # the administrative functions. restrict 127.0.0.1 restrict ::1
# Hosts on local network are less restricted. #restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap
restrict 192.168.80.0 mask 255.255.255.0 nomodify notrap
# Use public servers from the pool.ntp.org project. # Please consider joining the pool (http://www.pool.ntp.org/join.html). #server 0.centos.pool.ntp.org iburst #server 1.centos.pool.ntp.org iburst #server 2.centos.pool.ntp.org iburst #server 3.centos.pool.ntp.org iburst
server 2.cn.pool.ntp.org server 1.asia.pool.ntp.org server 2.asia.pool.ntp.org
#broadcast 192.168.1.255 autokey # broadcast server #broadcastclient # broadcast client #broadcast 224.0.1.1 autokey # multicast server #multicastclient 224.0.1.1 # multicast client #manycastserver 239.255.254.254 # manycast server #manycastclient 239.255.254.254 autokey # manycast client
# 允许上层时间服务器主动修改本机时间 restrict 2.cn.pool.ntp.org nomodify notrap noquery restrict 1.asia.pool.ntp.org nomodify notrap noquery restrict 2.asia.pool.ntp.org nomodify notrap noquery server 127.0.0.1 # local clock fudge 127.0.0.1 stratum 10
# Enable public key cryptography. #crypto
includefile /etc/ntp/crypto/pw
# Key file containing the keys and key identifiers used when operating # with symmetric key cryptography. keys /etc/ntp/keys
# Specify the key identifiers which are trusted. #trustedkey 4 8 42
# Specify the key identifier to use with the ntpdc utility. #requestkey 8
# Specify the key identifier to use with the ntpq utility. #controlkey 8
# Enable writing of statistics records. #statistics clockstats cryptostats loopstats peerstats
# Disable the monitoring facility to prevent amplification attacks using ntpdc # monlist command when default restrict does not include the noquery flag. See # CVE-2013-5211 for more details. # Note: Monitoring will not be disabled with the limited restriction flag. disable monitor |
(3)设置ntp客户端时间同步,修改红色的部分(其他节点)
# For more information about this file, see the man pages # ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5).
driftfile /var/lib/ntp/drift
# Permit time synchronization with our time source, but do not # permit the source to query or modify the service on this system. restrict default nomodify notrap nopeer noquery
# Permit all access over the loopback interface. This could # be tightened as well, but to do so would effect some of # the administrative functions. restrict 127.0.0.1 restrict ::1
# Hosts on local network are less restricted. #restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap
# Use public servers from the pool.ntp.org project. # Please consider joining the pool (http://www.pool.ntp.org/join.html). #server 0.centos.pool.ntp.org iburst #server 1.centos.pool.ntp.org iburst #server 2.centos.pool.ntp.org iburst #server 3.centos.pool.ntp.org iburst
server 192.168.80.1 restrict 192.168.80.1 nomodify notrap noquery server 127.0.0.1 fudge 127.0.0.1 stratum 10
#broadcast 192.168.1.255 autokey # broadcast server #broadcastclient # broadcast client #broadcast 224.0.1.1 autokey # multicast server #multicastclient 224.0.1.1 # multicast client #manycastserver 239.255.254.254 # manycast server #manycastclient 239.255.254.254 autokey # manycast client
# Enable public key cryptography. #crypto
includefile /etc/ntp/crypto/pw
# Key file containing the keys and key identifiers used when operating # with symmetric key cryptography. keys /etc/ntp/keys
# Specify the key identifiers which are trusted. #trustedkey 4 8 42
# Specify the key identifier to use with the ntpdc utility. #requestkey 8
# Specify the key identifier to use with the ntpq utility. #controlkey 8
# Enable writing of statistics records. #statistics clockstats cryptostats loopstats peerstats
# Disable the monitoring facility to prevent amplification attacks using ntpdc # monlist command when default restrict does not include the noquery flag. See # CVE-2013-5211 for more details. # Note: Monitoring will not be disabled with the limited restriction flag. disable monitor |
启动ntp服务(所有节点)
#systemctl restart ntpd |
第2章 安装CDH
2.1.安装配置Cloudera Manager
(1)配置n1节点
通过工具将下面的安装包上传到/tmp目录
cloudera-manager-centos7-cm5.13.0_x86_64.tar.gz
CDH-5.13.0-1.cdh5.13.0.p0.29-el7.parcel
CDH-5.13.0-1.cdh5.13.0.p0.29-el7.parcel.sha1
manifest.json
拷贝CM安装包到/opt目录下
#cp /tmp/cloudera-manager-centos7-cm5.13.0_x86_64.tar.gz /opt/ |
把CM解压缩
#cd /opt/ #tar -zxvf cloudera-manager-centos7-cm5.13.0_x86_64.tar.gz |
将解压出来的两个目录的所有者改成root
# chown -R root.root /opt/cloudera/ # chown -R root.root /opt/cm-5.13.0/ |
把除CM外的三个文件拷贝到CM解压后的/opt/cloudera/parcel-repo下
#cp /tmp/CDH-5.13.0-1.cdh5.13.0.p0.29-el7.parcel /opt/cloudera/parcel-repo #cp /tmp/CDH-5.13.0-1.cdh5.13.0.p0.29-el7.parcel.sha1 /opt/cloudera/parcel-repo #cp /tmp/manifest.json /opt/cloudera/parcel-repo |
把CDH-5.13.0-1.cdh5.13.0.p0.29-el7.parcel.sha1后面的1去掉
# cd /opt/cloudera/parcel-repo # mv CDH-5.13.0-1.cdh5.13.0.p0.29-el7.parcel.sha1 CDH-5.13.0-1.cdh5.13.0.p0.29-el7.parcel.sha |
配置映射服务器的机器名
#vi /opt/cm-5.13.0/etc/cloudera-scm-agent/config.ini server_host=n1 |
创建cloudera-scm-agent的pid目录
#mkdir /opt/cm-5.13.0/run/cloudera-scm-agent |
将mysql的JDBC驱动放入CM的/opt/cm/share/cmf/lib/目录下
#cp /tmp/mysql-connector-java-5.1.45-bin.jar /opt/cm-5.13.0/share/cmf/lib |
在mysql中创建CM所用数据库
--hive数据库 mysql>create database hive DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
--oozie数据库 mysql>create database oozie DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
--hue数据库 mysql>create database hue DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
--reports数据库 mysql>create database reports DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
访问授权 mysql>grant all privileges on *.* to 'root'@'n1' identified by 'jtv.123456' with grant option; mysql>flush privileges; |
为CM创建数据库
#/opt/cm-5.13.0/share/cmf/schema/scm_prepare_database.sh mysql cm -h localhost -uroot -pjtv.123456 --scm-host localhost scm scm scm |
设置swap空间
#vi /etc/sysctl.conf 末尾加上 vm.swappiness=10 |
创建cloudera-scm用户 (所有节点)
#useradd --system --home=/opt/cm-5.13.0/run/cloudera-scm-server --no-create-home --shell=/bin/false --comment "Cloudera SCM User" cloudera-scm |
(2)配置子节点
拷贝/etc/sysctl.conf到子节点
#scp /etc/sysctl.conf root@n2:/etc/sysctl.conf #scp /etc/sysctl.conf root@n3:/etc/sysctl.conf |
拷贝/opt/ cm-5.13.0到其余节点
#scp -r /opt/cm-5.13.0 root@n2:/opt/ #scp -r /opt/cm-5.13.0 root@n3:/opt/ |
(3)启动服务
三台机器分别重启
#reboot |
启动CM server服务(n1节点)
#/opt/cm-5.13.0/etc/init.d/cloudera-scm-server start |
分别启动CM agent服务(n1,n2,n3节点)
#/opt/cm-5.13.0/etc/init.d/cloudera-scm-agent start |
启动服务需要等待几分钟,然后访问安装界面
n1:7180/cmf/login
用户名/密码默认(admin/admin)
2.2.安装配置CDH
(1)登录后,选择免费版本,已经没有节点限制了
(2)当前管理的主机正确的话可以看到三台,n1,n2,n3,选中点击继续,如果看不全主机可能有三种原因
原因1:各个节点没有指向n1服务器
#vi /opt/cm-5.13.0/etc/cloudera-scm-agent/config.ini
server_host=n1
原因2:删除各节点随机uuid
#/opt/cm-5.13.0/etc/init.d/cloudera-scm-agent stop
#rm /opt/cm-5.13.0/lib/cloudera-scm-agent/uuid
#/opt/cm-5.13.0/etc/init.d/cloudera-scm-agent start
原因3:检查ssh是否设置免密访问
#ssh root@n2
#ssh root@n3
(3)选择CDH-5.13.0-1.cdh5.13.0.p0.29这个版本,然后点击继续
集群安装,Parcel,正常的话可以安装完成,需要十几分钟,若出现主机不良
#/opt/cm-5.13.0/etc/init.d/cloudera-scm-agent stop
#rm -f /opt/cm-5.13.0/lib/cloudera-scm-agent/cm_guid
#/opt/cm-5.13.0/etc/init.d/cloudera-scm-agent start
若出现Failure due to stall on seeded torrent
重启提示节点的agent服务
(4)集群安装
(5)继续,部分警告没有关系,当然是可以解决的。解决方案如下:
解决方法:
# echo 0 > /proc/sys/vm/swappiness # echo never > /sys/kernel/mm/transparent_hugepage/defrag # echo "echo 0 > /proc/sys/vm/swappiness" >>/etc/rc.d/rc.local # echo "echo never > /sys/kernel/mm/transparent_hugepage/defrag" >>/etc/rc.d/rc.local |
执行解决方法之后点击重新运行:
(6)机器配置不高的话可选择自定义服务,选择当前所需,后续可以再追加安装。
这里选择所有服务。
(7)服务配置,一般情况下保持默认就可以了(Cloudera Manager会根据机器的配置自动进行配置,如果需要特殊调整,自行进行设置就可以了)
(8)接下来是数据库的设置,检查通过后就可以进行下一步的操作了:
如果出现以下错误。
解决方案:(主节点)依次执行以下命令。
# yum install python-psycopg2 # yum install libxml2-python # yum install mysql* |
(9)下面是集群设置的审查页面,全部保持默认配置即可:
(10)终于到安装各个服务的地方了,注意,这里安装Hive,或oozie的时候可能会报错,因为我们使用了MySql作为hive的元数据存储,hive默认没有带mysql的驱动,通过以下命令拷贝一个就行了:
(里面的部分目录不对,请根据实际的做修正):
# cp /opt/cm-5.13.0/share/cmf/lib/mysql-connector-java-5.1.45-bin.jar /opt/cloudera/parcels/CDH-5.13.0-1.cdh5.13.0.p0.29/lib/hive/lib/
# cp /opt/cm-5.13.0/share/cmf/lib/mysql-connector-java-5.1.45-bin.jar /opt/cloudera/parcels/CDH-5.13.0-1.cdh5.13.0.p0.29/lib/oozie/lib/
# cp /opt/cm-5.13.0/share/cmf/lib/mysql-connector-java-5.1.45-bin.jar /var/lib/oozie/
# cp /opt/cm-5.13.0/share/cmf/lib/mysql-connector-java-5.1.45-bin.jar /usr/share/java/mysql-connector-java.jar |
大约十几分钟,安装完成。
注:但是运行到这里可能会出现以下错误:
返回cloudera manager主页,查看已经有了oozie的服务,直接启动它!
大功告成!!!
末尾设置:设置HADOOP_CLASSPATH
# vi ~/.bash_profile
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/cloudera/parcels/CDH-5.13.0-1.cdh5.13.0.p0.29/lib/hive/lib/* PATH=$PATH:$HOME/bin:$HOME/.local/bin export PATH # source ~/.bash_profile |
2.3 CDH的组件升级
参考大神的博客:http://f.dataguru.cn/spark-919931-1-1.html