文章目录
一. 背景
CDH本身自带Spark ,但是版本都很低1.6,生产中需要自行安装Spark
Cloudera官网对于CDS安装
1.2 版本说明
CDS 2.2 and higher require JDK 8 only. If you are using CD 2.2 or higher, you must remove JDK 7 from all cluster and gateway hosts to ensure proper operation.
版本检查
CDS Powered by Apache Spark Version | Supported CDH Versions |
---|---|
2.4 Release 2 | CDH 5.10 and any higher CDH 5.x versions |
2.3 Release 4 | CDH 5.9 and any higher CDH 5.x versions |
Scala 2.11 Requirement
Spark 2 does not work with Scala 2.10. Use Scala 2.11 only.
Python Requirement
CDS Powered by Apache Spark requires one of the following Python versions:
Python 2.7 or higher, when using Python 2.
Python 3.4 or higher, when using Python 3. (CDS 2.0 only supports Python 3.4 and 3.5; CDS 2.1 and higher include support for Python 3.6 and higher.)
JDK 8 Requirement
CDS 2.2 and higher require JDK 8 only. If you are using CD 2.2 or higher, you must remove JDK 7 from all cluster and gateway hosts to ensure proper operation.
二. 安装准备
2.1 描述文件
[root@ifeng01 spark2_parcel]# wget http://archive.cloudera.com/spark2/csd/SPARK2_ON_YARN-2.4.0.cloudera2.jar
包裹文件下载
因为服务器OS版本为7.7 所以选择el7
wget http://archive.cloudera.com/spark2/parcels/2.4.0.cloudera2/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p0.1041012-el7.parcel
http://archive.cloudera.com/spark2/parcels/2.4.0.cloudera2/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p0.1041012-el7.parcel.sha1
wget http://archive.cloudera.com/spark2/parcels/2.4.0.cloudera2/manifest.json
Because CDS Powered by Apache Spark is only installable using the parcel mechanism, it can only be used on clusters managed by Cloudera Manager. Additionally, because Cloudera Manager does not support using parcels and packages in the same cluster, you cannot use CDS if you are using a package-based installation of CDH.
下载清单
-rw-r--r--. 1 root root 5181 Apr 29 2019 manifest.json
-rw-r--r--. 1 root root 14080531 Sep 8 05:59 spark-1.6.0-cdh5.16.2-src.tar.gz
-rw-r--r--. 1 root root 41 Sep 8 05:59 SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p0.1041012-el7.parcel.sha1
-rw-r--r--. 1 root root 16708823 Sep 8 05:59 spark-2.4.6.tgz
还需要吧sha1 重命名 为sha
sha1 代表正在下载中
三. 安装部署
3.1 描述文件准备
- 创建~/spark_parcel
mkdir ~/spark2_parcel
- 下载描述文件
[root@ifeng01 spark2_parcel]# wget http://archive.cloudera.com/spark2/csd/SPARK2_ON_YARN-2.4.0.cloudera2.jar
wget http://archive.cloudera.com/spark2/parcels/2.4.0.cloudera2/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p0.1041012-el7.parcel
wget http://archive.cloudera.com/spark2/parcels/2.4.0.cloudera2/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p0.1041012-el7.parcel.sha1
wget http://archive.cloudera.com/spark2/parcels/2.4.0.cloudera2/manifest.json
- 重命名sha1
3.2 默认default location for CSD files
Log on to the Cloudera Manager Server host, and copy the CDS Powered by Apache Spark service descriptor in the location configured for service descriptor files.
- 创建csd文件夹
mkdir /opt/cloudera/csd
- 把jar包copy到/opt/cloudera/csd
cp SPARK2_ON_YARN-2.4.0.cloudera2.jar /opt/cloudera/csd
- 改变用户和用户组 & 权限
Set the file ownership to cloudera-scm:cloudera-scm with permission 644.
chown cloudera-scm:cloudera-scm /opt/cloudera/csd/SPARK2_ON_YARN-2.4.0.cloudera2.jar
chmod 644 /opt/cloudera/csd/SPARK2_ON_YARN-2.4.0.cloudera2.jar
3.3 重启CM server
systemctl restart cloudera-scm-server
对HDFS 等服务无影响
3.4 add the CDS in Parcel Configuration Settings.
In the Cloudera Manager Admin Console, add the CDS Powered by Apache Spark parcel repository to the Remote Parcel Repository URLs in Parcel Settings as described in Parcel Configuration Settings.
3.5 配置镜像离线源
cp sp~
ark2_parcel /var/www/html/ -R
切换到第二台机器
yum install -y httpd
移动到第二台机器
[root@ifeng01 ~]# scp -r spark2_parcel root@10.2.0.7:/var/www/html/
3.6 CM配置
出现远程提供 说明离线源配置完成
点击下载
点击分发
注意:需要手动激活
- 返回
- 激活
下一步下一步