希望将Spark-LIBLINEAR-1.95的jar包作为一个dependency放到新建的Spark Application中,主函数的代码如下:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import tw.edu.ntu.csie.liblinear._
// LR
object LR {
def main(args:Array[String]) {
val logFile = "/user/gaopeng/epsilon_normalized_converse.txt"
val conf = new SparkConf().setAppName("Spark-LIBLINEAR")
val sc = new SparkContext(conf)
// Load training data in LIBSVM format.
val data = Utils.loadLibSVMData(sc, logFile)
//train model
val model = SparkLiblinear.train(data, "-s 0 -c 1 -e 0.00035 -N 8")
//predict<a target=_blank href="https://github.com/sbt/sbt-assembly/tree/0.11.2" target="_blank">https://github.com/sbt/sbt-assembly/tree/0.11.2</a>
val LabelAndPreds = data.map { point =>
val prediction = model.predict(point)
(point.y, prediction)
}
val accuracy = LabelAndPreds.filter(r => r._1 == r._2).count.toDouble / data.count
println("Training Accuracy = " + accuracy)
}
}
查阅了官网的说明:
http://spark.apache.org/docs/1.2.0/submitting-applications.html,需要使用SBT中的assembly来进行fat jar的创建,但是这个流程一直没有做到。。
这边列举一下查到的相关资料之后逐一调试:
http://www.scala-sbt.org/0.13/tutorial/
http://www.open-open.com/lib/view/open1393753753443.html
http://www.cnblogs.com/jerrylead/archive/2012/08/13/2636115.html
http://segmentfault.com/blog/timger/1190000002484984
http://www.tuicool.com/articles/f26Bjq
https://github.com/sbt/sbt-assembly/tree/0.11.2
https://github.com/CSUG/real_world_scala/blob/master/02_sbt.markdown
今天,我最后为了使用spark-submit运行我的APP,将最后的目录结构变成如下:
./build.sbt
./src/main/scala/Spark-LIBLINEAR.scala
./src/main/scala/tw(package)
最后在根目录下sbt package,就可以得到我的APP的jar包了。