一、为了阅读spark源码,需要编译spark源码,这样可以在里面写注释
方法如下:
1、官网下载源码-spark2.4.3并导入IDEA
2、编译步骤可以参考官网:http://spark.apache.org/docs/2.4.3/building-spark.html
不过很多网友也整理了很多方法,我就参考网友的方法整理了一下我编译的步骤:
a) 修改maven仓库镜像为阿里云的
<repositories>
<!-- This should be at top, it makes maven try the central repo first and then others and hence faster dep resolution <repository>
<id>central</id>
<name>Maven Repository</name>
<url>https://repo.maven.apache.org/maven2</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository> -->
<repository>
<id>maven-ali</id>
<url>http://maven.aliyun.com/nexus/content/groups/public//</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
<updatePolicy>always</updatePolicy>
<checksumPolicy>fail</checksumPolicy>
</snapshots>
</repository>
</repositories>
b)设置大maven内存,
vim /etc/profile
末尾新增
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"
配置生效
source /etc/profile
c)开始编译源码
mvn -Pyarn -Phadoop-2.7 -Dscala-2.11 -DskipTests clean package
整个过程大概20多分钟