CDH集群安装mmlspark问题分析处理

本文详细介绍了在CDH集群上安装和使用mmlspark遇到的问题及解决办法,包括extraClassPath配置、libgomp.so.1缺失问题、pyFiles指定以及在spark-default.conf中设置。关键步骤包括:下载mmlspark依赖,以extraClass方式引入jar,处理libgomp.so.1缺失,以及py4j相关错误。通过这些配置,确保mmlspark在CDH集群上的正常运行。
摘要由CSDN通过智能技术生成

1、概述

MMLSpark ,即 Microsoft Machine Learning for Apache Spark,是微软开源的一个针对 Apache Spark 的深度学习和数据可学工具,为大型映像和文本数据库快速创建强大、可缩放性能优越的预测和分析模型。

2、下载安装包

按照官方示例的spark package安装方式进行安装使用:

spark-shell --packages com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1
pyspark --packages com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1
spark-submit --packages com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1 MyApp.jar

但是因为机房CDH网络归属国内原因,访问此项目maven仓库异常缓慢,基本无法使用,如

# pyspark --packages com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1 --repositories=https://mmlspark.azureedge.net/maven
执行后下载dependencies包异常缓慢,短则几个小时

分析spark --packages执行过程会先将依赖jar包下载缓存到本地/${user}/.ivy2/jars中,可利用AWS网络进行依赖包下载

[root@ip-192-168-15-101 spark]# pyspark --packages com.microsoft.ml.spark:mmlspark_2.11:1.0.0-rc1 --repositories=https://mmlspark.azureedge.net/maven
Python 2.7.18 (default, May  7 2020, 09:20:17)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
https://mmlspark.azureedge.net/maven added as a remote repository with the name: repo-1
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/usr/lib/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.microsoft.ml.spark#mmlspark_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-fcd33844-8550-4d17-b600-7a07aa1902c9;1.0
confs: [default]
found com.microsoft.ml.spark#mmlspark_2.11;1.0.0-rc1 in repo-1
found org.scalactic#scalactic_2.11;3.0.5 in central
found org.scala-lang#scala-reflect;2.11.12 in central
found org.scalatest#scalatest_2.11;3.0.5 in central
found org.scala-lang.modules#scala-xml_2.11;1.0.6 in central
found io.spray#spray-json_2.11;1.3.2 in central
found com.microsoft.cntk#cntk;2.4 in central
found org.openpnp#opencv;3.2.0-1 in central
found com.jcraft#jsch;0.1.54 in central
found org.apache.httpcomponents#httpclient;4.5.6 in central
found org.apache.httpcomponents#httpcore;4.4.10 in central
found commons-logging#commons-logging;1.2 in local-m2-cache
found commons-codec#commons-codec;1.10 in local-m2-cache
found com.microsoft.ml.lightgbm#lightgbmlib;2.3.100 in central
found com.github.vowpalwabbit#vw-jni;8.7.0.3 in central
:: resolution report :: resolve 690ms :: artifacts dl 16ms
:: modules in use:
com.github.vowpalwabbit#vw-jni;8.7.0.3 from central in [default]
com.jcraft#jsch;0.1.54 from central in [default]
com.microsoft.cntk#cntk;2.4 from central in [default]
com.microsoft.ml.lightgbm#lightgbmlib;2.3.100 from central in [default]
com.microsoft.ml.spark#mmlspark_2.11;1.0.0-rc1 from repo-1 in [default]
commons-codec#commons-codec;1.10 from local-m2-cache in [default]
commons-logging#commons-logging;1.2 from local-m2-cache in [default]
io.spray#spray-json_2.11;1.3.2 from central in [default]
org.apache.httpcomponents#httpclient;4.5.6 from central in [default]
org.apache.httpcomponents#httpcore;4.4.10 from central in [default]
org.openpnp#opencv;3.2.0-1 from central in [defa
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值