文章目录
- 1.环境准备
- 1.1 安装[jdk](https://www.oracle.com/java/technologies/javase-downloads.html)
- 1.2 安装[scala](https://www.scala-lang.org/download/)
- 1.3 安装git,[git下载地址](https://git-scm.com/downloads),安装后可以使用git bash 编译spark 源代码
- 1.4 下载spark 源码,可以去[github地址](https://github.com/apache/spark)下载,也去[spark官网](https://archive.apache.org/dist/spark/)下载,如下载很慢的话,可以使用[码云](https://gitee.com)或者[github代下载服务](https://g.widora.cn/)
- 2.编译源码
- 3.运行 spark-examples 发现的问题
- 3.1 Error:(45, 66) not found: type SparkFlumeProtocol
- 3.2 Exception in thread "main" java.lang.NoClassDefFoundError: scala/Function1
- 3.3 Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkConf
- 3.4 Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/collect/Maps
- 3.5 org.apache.spark.SparkException: A master URL must be set in your configuration
- 3.6 Exception in thread "main" java.lang.NoClassDefFoundError: org/eclipse/jetty/util/thread/ThreadPool
1.环境准备
1.1 安装jdk
1.2 安装scala
1.3 安装git,git下载地址,安装后可以使用git bash 编译spark 源代码
1.4 下载spark 源码,可以去github地址下载,也去spark官网下载,如下载很慢的话,可以使用码云或者github代下载服务
2.编译源码
2.1 进入到源码目录,右键 -> 选择Git bash Here
打开效果如下:
2.2 设置maven可以使用的内存大小,防止编译时出现堆空间不足异常:[ERROR] Java heap space
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g"
2.3 使用build/mvn编译
./build/mvn -DskipTests clean package
期间会下载一下3个文件,而且下载速度也很慢,可以下载好后放到build目录:
http://archive.apache.org/dist/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz
http://downloads.typesafe.com/zinc/0.3.5.3/zinc-0.3.5.3.tgz
http://downloads.typesafe.com/scala/2.10.5/scala-2.10.5.tgz
下载依赖+编译大于需要40分钟左右
编译成功后,如图所示:
2.4 使用idea 导入编译好的项目
2.5 运行spark-examples org.apache.spark.examples.LocalPi 和 org.apache.spark.examples.SparkPi 测试一下
3.运行 spark-examples 发现的问题
3.1 Error:(45, 66) not found: type SparkFlumeProtocol
Error:(45, 66) not found: type SparkFlumeProtocol
val transactionTimeout: Int, val backOffInterval: Int) extends SparkFlumeProtocol with Logging {
解决方案:
在idea窗口右上角找到maven 窗口,找到maven的flume sink依赖,右键->Generate Sources and Update Folders
3.2 Exception in thread “main” java.lang.NoClassDefFoundError: scala/Function1
Exception in thread "main" java.lang.NoClassDefFoundError: scala/Function1
at org.apache.spark.examples.LocalPi.main(LocalPi.scala)
Caused by: java.lang.ClassNotFoundException: scala.Function1
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 1 more
解决方案:
1.打开该项目的Project Structure
2.把spark源码中的几个module 添加上scala依赖即可
3.3 Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/spark/SparkConf
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkConf
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:28)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.SparkConf
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 2 more
解决方案:
注释掉spark-example的pom.xml中 spark-core依赖的 provided
3.4 Exception in thread “main” java.lang.NoClassDefFoundError: com/google/common/collect/Maps
Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/collect/Maps
at org.apache.hadoop.metrics2.lib.MetricsRegistry.<init>(MetricsRegistry.java:42)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.<init>(MetricsSystemImpl.java:87)
at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.<init>(MetricsSystemImpl.java:133)
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:38)
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:36)
at org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:99)
at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:192)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2136)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2136)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2136)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:322)
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
解决方案:
1.spark-example的pom.xml中添加guava依赖
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</dependency>
2.注释掉源码根项目(spark-parent)中pom.xml中的 guava依赖 provided
3.5 org.apache.spark.SparkException: A master URL must be set in your configuration
Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration
at org.apache.spark.SparkContext.<init>(SparkContext.scala:401)
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
解决方案:
1.在run configuration中添加VM options -Dspark.master=local
2.在代码中添加 val conf = new SparkConf().setAppName(“Spark Pi”).setMaster(“local[2]”)
3.6 Exception in thread “main” java.lang.NoClassDefFoundError: org/eclipse/jetty/util/thread/ThreadPool
Exception in thread "main" java.lang.NoClassDefFoundError: org/eclipse/jetty/util/thread/ThreadPool
at org.apache.spark.ui.WebUI.attachPage(WebUI.scala:78)
at org.apache.spark.ui.WebUI$$anonfun$attachTab$1.apply(WebUI.scala:62)
at org.apache.spark.ui.WebUI$$anonfun$attachTab$1.apply(WebUI.scala:62)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.ui.WebUI.attachTab(WebUI.scala:62)
at org.apache.spark.ui.SparkUI.initialize(SparkUI.scala:63)
at org.apache.spark.ui.SparkUI.<init>(SparkUI.scala:76)
at org.apache.spark.ui.SparkUI$.create(SparkUI.scala:195)
at org.apache.spark.ui.SparkUI$.createLiveUI(SparkUI.scala:146)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:473)
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:29)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
解决方案:在spark-example的pom.xml中添加jetty相关依赖
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-util</artifactId>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-server</artifactId>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-continuation</artifactId>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-servlet</artifactId>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-security</artifactId>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-http</artifactId>
<scope>compile</scope>
</dependency>