1、安装maven
(1)设置MAVEN_HOME
(2)将$MAVEN_HOME/bin参加PATH变量。
(3)设置maven_opts内存参数
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"若不运行,编译时必定出现如下错误,因为spark编译需要很大的内存
[INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-2.10/classes... [ERROR] PermGen space -> [Help 1] [INFO] Compiling 203 Scala sources and 9 Java sources to /Users/me/Development/spark/core/target/scala-2.10/classes... [ERROR] Java heap space -> [Help 1]2、编译spark
(1)下载spark
http://spark.apache.org/downloads.html
(2)解压下载的文件
(3)进入根目录
修改源码:mllib\src\main\scala\org\apache\spark\mllib\optimization\Gradient.scala
[ERROR] Failed to execute goal org.scalastyle:scalastyle-maven-plugin:0.4.0:check (default) on project spark-mllib_2.10: Failed during scalastyle execution : You have 1 Scalastyle violation(s). -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <goals> -rf :spark-mllib_2.10
将带Our loss function的两行删除掉,否则在编译的时候报错
(4)在根目录下执行如下命令编译
mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package 当yarn与hadoop版本不一致时分别指定版本号mvn -Pyarn-alpha -Phadoop-2.6 -Dhadoop.version=2.6.0 -Dyarn.version=2.6.0 -DskipTests clean package 编译时间较长要耐心等待 (5) 可以跳过(4)使用./make-distribution.sh --name hadoop2.6 --tgz -Pyarn -Phive -Phive-thriftserver -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests 编译加打包