spark AE 编译

安装R 安装包及其依赖类库

  • 安装R语言环境
  • 安装R类库

R -e “install.packages(c(‘knitr’, ‘rmarkdown’, ‘devtools’, ‘e1071’, ‘survival’), repos=‘http://cran.us.r-project.org’)”
R -e “devtools::install_version(‘testthat’, version = ‘1.0.2’, repos=‘http://cran.us.r-project.org’)”
R -e “install.packages(c(‘roxygen2’), repos=‘http://cran.us.r-project.org’)”

  • 安装 miktex,到 https://miktex.org/download 这里下载dmg文件并安装
  • 执行 ./R/run-tests.sh 进行测试R环境
  • 执行 cd R && /Library/Frameworks/R.framework/Resources/bin/R CMD check --as-cran --no-tests SparkR_2.3.0.tar.gz 测试

编译报错:
LaTeX errors when creating PDF version.
This typically indicates Rd problems.
checking PDF version of manual without hyperrefs or index … ERROR

这个是因为缺少 miktex

编译Spark包

git clone https://github.com/Intel-bigdata/spark-adaptive.git

./dev/make-distribution.sh --name spark-ae-2.3 --pip --r --tgz -Psparkr -Phadoop-2.6 -Phive -Phive-thriftserver  -Pyarn -DskipTests

生产环境部署

sudo tar --same-owner -zxvf spark-2.3.0-bin-spark-ae-2.3.tgz
sudo chown -R hadoop:hadoop spark-2.3.0-bin-spark-ae-2.3

sudo scp -pr hadoop@x.x.x.x:/usr/lib/spark-2.3.0-bin-spark-ae-2.3 /usr/lib/spark-2.3.0-bin-spark-ae-2.3
sudo mkdir -p /etc/spark/conf && sudo chown -R hadoop:hadoop /etc/spark && sudo chmod -R 755 /etc/spark
sudo rm -rf /usr/lib/spark-2.3.0-bin-spark-ae-2.3/conf && sudo  ln -s /etc/spark/conf /usr/lib/spark-2.3.0-bin-spark-ae-2.3/conf
sudo mv spark-2.3.0-bin-spark-ae-2.3/conf/* /etc/spark/conf/


sudo ln -fTs /usr/lib/spark-2.3.0-bin-spark-ae-2.3 /usr/lib/spark

使用SBT 编译Apache spark

Spark 3.0 版本编译

git clone git@github.com:apache/spark.git
sbt package

Spark 3.0 默认使用的是scala-2.12版本进行编译的,使用2.11 编译会失败

[error] /Users/wankun/ws/apache/spark/core/src/main/scala/org/apache/spark/util/logging/DriverLogger.scala:178: type mismatch;
[error]  found   : () => Unit
[error]  required: Runnable
[error]         threadpool.execute(() => DfsAsyncWriter.this.close())
[error]                               ^

Spark 2.4 版本编译

sbt -Dscala.version=2.11.12 -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver package

CDH Spark spark2-2.4.0-cloudera1 版本编译

git clone https://github.com/cloudera/spark.git
# 根据对应的Tag创建对应的Branch 
git checkout -b spark2-2.4.0-cloudera1 spark2-2.4.0-cloudera1
  • 擦,这个版本的sbt配置文件居然还有Bug,project/SparkBuild.scala
 val sqlProjects@Seq(catalyst, sql, hive, hiveThriftServer, sqlKafka010, avro) = Seq(
-    "catalyst", "sql", "hive", "hive-thriftserver", "sql-kafka-0-10", "avro", "hive-exec"
+    "catalyst", "sql", "hive", "hive-thriftserver", "sql-kafka-0-10", "avro"
   ).map(ProjectRef(buildLocation, _))
  • 添加自定义repositories
    build/sbt
 }

+SBT_REPOSITORIES_CONFIG="$(dirname "$(realpath "$0")")/sbt-config/repositories"
+export SBT_OPTS="-Dsbt.override.build.repos=true -Dsbt.repository.config=$SBT_REPOSITORIES_CONFIG"
+
. "$(dirname "$(realpath "$0")")"/sbt-launch-lib.bash
  • pom 中修改对应软件版本
--- a/pom.xml
+++ b/pom.xml
@@ -129,15 +129,15 @@
     <hbase.version>${cdh.hbase.version}</hbase.version>
     <hbase.artifact>hbase-server</hbase.artifact>
     <flume.version>${cdh.flume-ng.version}</flume.version>
-    <zookeeper.version>${cdh.zookeeper.version}</zookeeper.version>
-    <curator.version>${cdh.curator.version}</curator.version>
+    <zookeeper.version>3.4.5-cdh5.13.3</zookeeper.version>
+    <curator.version>2.7.1</curator.version>
     <hive.group>org.apache.hive</hive.group>
     <!-- Version used in Maven Hive dependency -->
-    <hive.version>${cdh.hive.version}</hive.version>
+    <hive.version>1.1.0-cdh5.13.3</hive.version>
     <!-- Version used for internal directory structure -->
     <hive.version.short>1.1.0</hive.version.short>
     <derby.version>10.12.1.1</derby.version>
-    <parquet.version>${cdh.parquet.version}</parquet.version>
+    <parquet.version>1.5.0-cdh5.13.3</parquet.version>
     <orc.version>1.5.5</orc.version>
     <orc.classifier>nohive</orc.classifier>
     <hive.parquet.version>${parquet.version}</hive.parquet.version>
@@ -147,7 +147,7 @@
     <ivy.version>2.4.0</ivy.version>
     <oro.version>2.0.8</oro.version>
     <codahale.metrics.version>3.1.5</codahale.metrics.version>
-    <avro.version>${cdh.avro.version}</avro.version>
+    <avro.version>1.7.6-cdh5.13.3</avro.version>
     <avro.mapred.classifier>hadoop2</avro.mapred.classifier>
     <aws.kinesis.client.version>1.8.10</aws.kinesis.client.version>
     <!-- Should be consistent with Kinesis client dependency -->

--- a/build/sbt
+++ b/build/sbt
@@ -47,6 +47,9 @@ realpath () {
 )
 }

+SBT_REPOSITORIES_CONFIG="$(dirname "$(realpath "$0")")/sbt-config/repositories"
+export SBT_OPTS="-Dsbt.override.build.repos=true -Dsbt.repository.config=$SBT_REPOSITORIES_CONFIG"
+
 . "$(dirname "$(realpath "$0")")"/sbt-launch-lib.bash

--- a/dev/make-distribution.sh
+++ b/dev/make-distribution.sh
@@ -291,11 +291,11 @@ if [ -d "$SPARK_HOME/R/lib/SparkR" ]; then
 fi

 # CDH: remove scripts for which the actual code is not included.
-rm "$DISTDIR/bin/spark-sql"
-rm "$DISTDIR/bin/beeline"
-rm "$DISTDIR/bin/sparkR"
-rm "$DISTDIR/sbin/start-thriftserver.sh"
-rm "$DISTDIR/sbin/stop-thriftserver.sh"
+# rm "$DISTDIR/bin/spark-sql"
+# rm "$DISTDIR/bin/beeline"
+# rm "$DISTDIR/bin/sparkR"
+# rm "$DISTDIR/sbin/start-thriftserver.sh"
+# rm "$DISTDIR/sbin/stop-thriftserver.sh"



--- a/project/plugins.sbt
+++ b/project/plugins.sbt
@@ -43,3 +43,7 @@ addSbtPlugin("com.simplytyped" % "sbt-antlr4" % "0.7.11")
 // the plugin; this is tracked at SPARK-14401.

 addSbtPlugin("org.spark-project" % "sbt-pom-reader" % "1.0.0-spark")
+
+logLevel := Level.Debug
+
+addSbtPlugin("io.get-coursier" % "sbt-coursier" % "1.0.3")

修改文件列表

	modified:   build/sbt
	new file:   build/sbt-config/repositories
	modified:   pom.xml
	modified:   project/MimaBuild.scala
	modified:   project/MimaExcludes.scala
	modified:   project/SparkBuild.scala
	modified:   project/plugins.sbt
	new file:   project/project/build.properties
	new file:   project/project/plugins.sbt

sbt -Dscala.version=2.11.12 -Pyarn -Phive -Phive-thriftserver package

  • 可以给sbt添加-d 参数来调试程序
  • 程序遇到诡异的错误,可以尝试sbt clean ,然后再进行编译

CDH版本编译

dev/make-distribution.sh

 # CDH: remove scripts for which the actual code is not included.
-rm "$DISTDIR/bin/spark-sql"
-rm "$DISTDIR/bin/beeline"
-rm "$DISTDIR/bin/sparkR"
-rm "$DISTDIR/sbin/start-thriftserver.sh"
-rm "$DISTDIR/sbin/stop-thriftserver.sh"
+# rm "$DISTDIR/bin/spark-sql"
+# rm "$DISTDIR/bin/beeline"
+# rm "$DISTDIR/bin/sparkR"
+# rm "$DISTDIR/sbin/start-thriftserver.sh"
+# rm "$DISTDIR/sbin/stop-thriftserver.sh"

pom.xml

   <repositories>
+    <repository>
+      <id>spring</id>
+      <name>Spring repo</name>
+      <url>https://repo.spring.io/plugins-release/</url>
+      <releases>
+        <enabled>true</enabled>
+      </releases>
+    </repository>
     <repository>

./dev/make-distribution.sh --name 2.6.0-cdh5.13.3 --tgz  -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver -Dhadoop.version=2.6.0-cdh5.13.3 -Dhive.version=1.1.0-cdh5.13.3 -Dzookeeper.version=3.4.5-cdh5.13.3 -DskipTests  -T 4C
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值