spark之4:编程指南
@(SPARK)[spark, 大数据]
(一)快速入门:基本步骤
1、创建一个maven项目
2、增加pom.xml中的依赖
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.5.1</version>
</dependency>
3、写代码
package com.lujinhong.sparkdemo
import org.apache.spark.SparkContext
object GrepWord {
def grepCountLog(path: String, keyWord: String) {
println("grep " + keyWord + " in " + path + ", the lineCount is: ")
val all = new SparkContext().textFile(path)
val ret = all.filter(line => line.contains(keyWord))
println(ret.count)
}
def main(args: Array[String]) {
grepCountLog("/tmp/lujinhong", "\"server\"");
}
}
以上代码在hdfs中的某个目录grep “server”这个关键字。
4、打包代码
5、执行代码
/home/hadoop/spark/bin/spark-submit --master yarn-client --class com.lujinhong.sparkdemo.GrepWor4d target/sparkdemo-0.0.1-SNAPSHOT.jar