How to build and setup a Cloudera Distribution Hadoop environment

2 篇文章 0 订阅
1 篇文章 0 订阅

CDH (Cloudera Distribution Hadoop) is open-source Apache Hadoop distribution provided by Cloudera Inc which is a Palo Alto-based American enterprise software company. CDH (Cloudera’s Distribution Including Apache Hadoop) is the most complete, tested, and widely deployed distribution of Apache Hadoop.

 
 

1.Preparation

Operating System

I am using CentOS 7.2 at http://vault.centos.org/
Burn it into a CDROM or USB boot device or just install it as a VM. I chose the VM here.

#turn off firewall service
systemctl stop firewalld.service
systemctl disable firewalld.service

#turn off selinux
vi /etc/selinux/config
SELINUX=disabled
save and reboot

#hostnames
vi /etc/sysconfig/network, add HOSTNAME=#HOSTNAME#
vi /etc/hosts, add entries for all the nodes
vi /etc/sysctl.conf, add kernel.hostname=#HOSTNAME#

#install and set ntp service
yum install ntp
vi /etc/ntp.conf
server cn.ntp.org.cn iburst
save file and exit
systemctl start ntpd
systemctl enable ntpd

#ssh login without password verification
master node: 10.1.90.90
slave nodes: 10.1.90.91-10.1.90.93
ssh-keygen -t rsa
Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): 回车
Enter passphrase (empty for no passphrase): 回车
Enter same passphrase again: 回车
Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is: bc:14:e2:59:42:1f:21:80:eb:49:d8:3b:7c:8d:b7:f9 root@cdh-centos72-test1
The key’s randomart image is:
±-[ RSA 2048]----+
| …o o. |
| . . o . |
| o . o + |
|. + . * . |
| + o oo S |
| * o o. . |
| o . o. |
| o |
| .E |
±----------------+
ssh-copy-id 10.1.90.91
ssh-copy-id 10.1.90.92
ssh-copy-id 10.1.90.93

#other configs and dependencies
yum -y install psmisc MySQL-python at bc bind-libs bind-utils cups-client cups-libs cyrus-sasl-gssapi cyrus-sasl-plain ed fuse fuse-libs httpd httpd-tools
keyutils-libs-devel krb5-devel libcom_err-devel libselinux-devel libsepol-devel libverto-devel mailcap noarch mailx mod_ssl openssl-devel pcre-devel postgresql-libs
python-psycopg2 redhat-lsb-core redhat-lsb-submod-security x86_64 spax time zlib-devel
chmod +x /etc/rc.d/rc.local
echo “echo 0 > /proc/sys/vm/swappiness” >>/etc/rc.d/rc.local
echo “echo never > /sys/kernel/mm/transparent_hugepage/defrag” >>/etc/rc.d/rc.local
echo 0 > /proc/sys/vm/swappiness
echo never > /sys/kernel/mm/transparent_hugepage/defrag
yum -y install rpcbind
systemctl start rpcbind
echo “systemctl start rpcbind” >> /etc/rc.d/rc.local

 
 

Dependencies

JDK
#download
https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

#remove openjdk
rpm -qa | grep java
rpm -e --nodeps java-1.8.0-openjdk-headless-1.8.0.64-0.b14.el7_2.x86_64
rpm -e --nodeps java-1.8.0-openjdk-1.8.0.640.b14.el7_2.x86_64

#deploy and setup jdk
tar xvf jdk-8u211-linux-x64.tar.gz
vi /root/.bash_profile
export JAVA_HOME=/opt/java/jdk1.8.0_211
PATH= P A T H : PATH: PATH:HOME/bin:/opt/java/gradle-3.4/bin:/opt/java/jdk1.8.0_211/bin
export PATH
Save and source it

 

Database
I am using mysql 5.7 here. But according to this page https://www.cloudera.com/documentation/enterprise/release-notes/topics/rn_consolidated_pcm.html#cdh_cm_supported_db only 5.6.x is officially supported by CDH 5.7.0 that I have here. So i m gambling.
use below links for other options

 

#links
https://dev.mysql.com/doc/mysql-yum-repo-quick-guide/en/
https://dev.mysql.com/downloads/repo/yum/
https://dev.mysql.com/downloads/mysql/
https://dev.mysql.com/get/mysql57-community-release-el7-11.noarch.rpm
https://www.linuxidc.com/Linux/2017-05/143934.htm
https://mvnrepository.com/artifact/mysql/mysql-connector-java

 

#remove mariadb which is the default for CentOS7
rpm -qa | grep mariadb mariadb-libs-5.5.41-2.el7_0.x86_64
rpm -e --nodeps mariadb-libs-5.5.41-2.el7_0.x86_64

 

#get and install repo
wget https://dev.mysql.com/get/mysql57-community-release-el7-11.noarch.rpm
rpm -ivh mysql57-community-release-el7-11.noarch.rpm

 

#install mysql
yum install mysql-server

 

#modify configs
vi /etc/my.cnf
add “skip-grant-tables” under mysqld for skipping auth if necessary, for modifying root password

 

[mysqld]
default-storage-engine = innodb
innodb_file_per_table
collation-server = utf8_general_ci
init-connect = ‘SET NAMES utf8’
character-set-server = utf8
default-time-zone = ‘+8:00’

 

#restart service
systemctl restart mysqld
chkconfig mysql on
yum install -y perl-Module-Install.noarch

 

#check initial pwd
grep ‘temporary password’ /var/log/mysqld.log

 

#login and modify password
mysql -u root -p
SET PASSWORD FOR ‘root’@‘localhost’ = PASSWORD(‘newpass’);
FLUSH PRIVILEGES;

 

#prepare needed databases
CREATE DATABASE hive DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
CREATE DATABASE amon DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
CREATE DATABASE hue DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
CREATE DATABASE monitor DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
CREATE DATABASE oozie DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
GRANT ALL ON . TO root@"%" IDENTIFIED BY “123456”;
FLUSH PRIVILEGES;

 

Files and Packages
CM 5.7.0
5.7.0 cloudera-manager-centos7-cm5.7.0_x86_64.tar.gz
http://archive.cloudera.com/cm5/cm/5/cloudera-manager-centos7-cm5.7.0_x86_64.tar.gz

 

CDH 5.7.0
CDH-5.7.0-1.cdh5.7.0.p0.45-el7.parcel
CDH-5.7.0-1.cdh5.7.0.p0.45-el7.parcel.sha1
manifest.json
http://archive.cloudera.com/cdh5/parcels/5.7/CDH-5.7.1-1.cdh5.7.1.p0.11-el7.parcel
http://archive.cloudera.com/cdh5/parcels/5.7/CDH-5.7.1-1.cdh5.7.1.p0.11-el7.parcel.sha1
http://archive.cloudera.com/cdh5/parcels/5.7/manifest.json

 

mysql-connector-java-5.1.22
https://mvnrepository.com/artifact/mysql/mysql-connector-java

 
 

2.Installation

(1) On all the nodes, put the files at /opt/ and tar it
ls
CDH-5.7.0-1.cdh5.7.0.p0.45-el7.parcel
cloudera-manager-centos7-cm5.7.0_x86_64.tar.gz manifest.json
CDH-5.7.0-1.cdh5.7.0.p0.45-el7.parcel.sha1
tar zxvf cloudera-manager-centos7-cm5.7.0_x86_64.tar.gz -C /opt/
ls
/opt/ cloudera cm-5.7.0

(2)On all the nodes, create user
useradd --system --home=/opt/cloudera-manager/cm-5.7.0/run/cloudera-scm-server --no-create-home --shell=/bin/false --comment “Cloudera SCM User” cloudera-scm

(3)On master node, create local meta data dir for cloudera-manager-server
mkdir /var/cloudera-scm-server
chown cloudera-scm:cloudera-scm /var/cloudera-scm-server
chown cloudera-scm:cloudera-scm /opt/cloudera-manager

(4)On slave nodes, point cloudera-manger-agent to master
vi /opt/cloudera-manager/cm-5.7.0/etc/cloudera-scm-agent/config.ini
将server_host改为cloudera-manager-server即cdh-centos72-test1

(5)On master node, create parcel-repo dir
mkdir -p /opt/cloudera/parcel-repo
chown cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo
cp CDH-5.7.0-1.cdh5.7.0.p0.45-el7.parcel CDH-5.7.0-1.cdh5.7.0.p0.45-el7.parcel.sha manifest.json /opt/cloudera/parcel-repo
注意:其中sha1 后缀要把1去掉

(6)On all the nodes, create parcels dir
mkdir -p /opt/cloudera/parcels
chown cloudera-scm:cloudera-scm /opt/cloudera/parcels
解释:Clouder-Manager将CDHs从主节点的/opt/cloudera/parcel-repo目录中抽取出来,分发解压激活到各个节点的/opt/cloudera/parcels目录中

(7)On master node, initialize the database
/opt/cloudera-manager/cm-5.7.0/share/cmf/schema/scm_prepare_database.sh mysql -hcdh1 -uroot -pPASSWORD --scm-host cdh-centos72-test1 scmdbn scmdbu scmdbp
说明:这个脚本就是用来创建和配置CMS需要的数据库的脚本。各参数是指:
mysql:数据库用的是mysql,如果安装过程中用的oracle,那么该参数就应该改为oracle。
-h cdh-centos72-test1:数据库建立在cdh1主机上面,也就是主节点上面
-u root:root身份运行mysql
-p PASSWORD:mysql的root密码
–scm-host cdh-centos72-test1:CMS的主机,一般是和mysql安装的主机是在同一个主机上,最后三个参数是:数据库名,数据库用户名,数据库密码

如果报错: ERROR com.cloudera.enterprise.dbutil.DbProvisioner - Exception when creating/dropping database with user ‘root’ and jdbc url ‘jdbc:mysql://localhost/?useUnicode=true&characterEncoding=UTF-8’ java.sql.SQLException: Access denied for user ‘root’@‘cdh-centos72-test1’ (using password: YES) 则参考 http://forum.spring.io/forum/spring-projects/web/57254-java-sql-sqlexception-access-denied-for-user-root-localhost-using-password-yes

运行如下命令:
update user set PASSWORD=PASSWORD(‘123456’) where user=‘root’; GRANT ALL PRIVILEGES ON . TO ‘root’@‘cdh1’ IDENTIFIED BY ‘123456’ WITH GRANT OPTION; FLUSH PRIVILEGES;

(8)On master node, start cloudera-scm-server
cp /opt/cloudera-manager/cm-5.7.0/etc/init.d/cloudera-scm-server /etc/init.d/cloudera-scm-server
chkconfig cloudera-scm-server on
vi /etc/init.d/cloudera-scm-server CMF_DEFAULTS=${CMF_DEFAULTS:-/etc/default}改为=/opt/cloudera-manager/cm-5.7.0/etc/default
service cloudera-scm-server start
edit /etc/rc.local with:service cloudera-scm-server restart

(9)On all the nodes, start cloudera-scm-agent
mkdir /opt/cloudera-manager/cm-5.7.0/run/cloudera-scm-agent
cp /opt/cloudera-manager/cm-5.7.0/etc/init.d/cloudera-scm-agent /etc/init.d/cloudera-scm-agent
chkconfig cloudera-scm-agent on
vi /etc/init.d/cloudera-scm-agent CMF_DEFAULTS=${CMF_DEFAULTS:-/etc/default}改为=/opt/cloudera-manager/cm-5.7.0/etc/default
service cloudera-scm-agent start
edit /etc/rc.local with:service cloudera-scm-agent restart

 
 

3.Access and Configuration

Use http://10.1.90.90:7180/ (master node IP address+ default port) for logging in, and then go by the steps:

#Select the hosts
在这里插入图片描述

#Select the correct parcels
在这里插入图片描述

#Start installing on all the nodes together
在这里插入图片描述

#Self-check & resolve one by one
在这里插入图片描述

#Make sure the databases are created and user has permissions
在这里插入图片描述

#Start everything by the console
在这里插入图片描述

#Now everything works fine
在这里插入图片描述

 
 

4.Verification and Usage

TO BE CONTINUED

 
 

5.Miscellaneous

#links
https://www.cloudera.com/documentation/enterprise/release-notes/topics/rn_consolidated_pcm.html
http://archive.cloudera.com/cm5/cm/5/
http://archive.cloudera.com/cdh5/parcels/5.7.0/
https://www.cnblogs.com/fujiangong/p/5620050.html
https://www.cnblogs.com/zhangleisanshi/p/7575579.html
https://blog.csdn.net/qq_42564846/article/details/81178847
https://www.cnblogs.com/ee900222/p/hadoop_3.html
https://mvnrepository.com/artifact/mysql/mysql-connector-java

#坑1
Problem:
Python in worker has different version X.X than that in driver X.X, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.
Solution:
Make sure the client side and server side have the same Python Major version such as 2.7 or 3.5.x or 3.6.x or 3.7.x

#坑2
Problem:
MySQL: Unknown system variable ‘tx_read_only’ when trying to read/write with gbase(technically a wrapped mysql instance)
Solution:
According to this page https://stackoverflow.com/questions/16515249/mysql-unknown-system-variable-tx-read-only:
Variable tx_read_only was introduced in MySQL 5.6.5.
Probably MySQL version is older than that, but Connector/J tries to use new variable anyway.
According release notes, support for this variable came in Connector/J 5.1.23.
In my situation i tried 5.1.22 and it worked fine, never tried the “gbase-connector-java-8.3.81.53-build54.1-bin.jar”

#坑3
Problem:
在CentOS 7上启cloudera-scm-agent时,日志一直报错:
[17/Jan/2018 15:21:41 +0000] 14583 MainThread agent ERROR Caught unexpected exception in main loop.
Traceback (most recent call last):
File “/opt/cm-5.7.1/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.7.1-py2.7.egg/cmf/agent.py”, line 686, in start
self._init_after_first_heartbeat_response(heartbeat_response[“data”])
File “/opt/cm-5.7.1/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.7.1-py2.7.egg/cmf/agent.py”, line 816, in _init_after_first_heartbeat_response
self.client_configs.load()
File “/opt/cm-5.7.1/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.7.1-py2.7.egg/cmf/client_configs.py”, line 682, in load
new_deployed.update(self._lookup_alternatives(fname))
File “/opt/cm-5.7.1/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.7.1-py2.7.egg/cmf/client_configs.py”, line 432, in _lookup_alternatives
return self._parse_alternatives(alt_name, out)
File “/opt/cm-5.7.1/lib64/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.7.1-py2.7.egg/cmf/client_configs.py”, line 444, in _parse_alternatives
path, _, _, priority_str = line.rstrip().split(" ")
ValueError: too many values to unpack
并且一直循环无法继续加载依赖
Solution:
https://www.jianshu.com/p/ef7d01b544b3 需要修改cdh下的代码以绕过问题

TO BE CONTINUED

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值