spark 编程

Spark program

Quick Start——Spark Shell

./bin/spark-shell


scala> val textFile = spark.read.textFile("README.md")

textFile: org.apache.spark.sql.Dataset[String] = [value: string]


scala> textFile.count() // Number of items in this Dataset

res0: Long = 126 // May be different from yours as README.md will change over time, similar to other outputs


scala> textFile.first() // First item in this Dataset

res1: String = # Apache Spark


scala> val wordCounts = textFile.flatMap(line => line.split(" ")).groupByKey(identity).count()

wordCounts: org.apache.spark.sql.Dataset[(String, Long)] = [value: string, count(1): bigint]


scala> linesWithSpark.cache()

res7: linesWithSpark.type = [value: string]


Applications

We’ll create a very simple Spark application, SimpleApp.java:

/* SimpleApp.java */

import org.apache.spark.sql.SparkSession;

import org.apache.spark.sql.Dataset;


public class SimpleApp {

  public static void main(String[] args) {

    String logFile = "YOUR_SPARK_HOME/README.md"; // Should be some file on your system

    SparkSession spark = SparkSession.builder().appName("Simple Application").getOrCreate();

    Dataset<String> logData = spark.read().textFile(logFile).cache();


    long numAs = logData.filter(s -> s.contains("a")).count();

    long numBs = logData.filter(s -> s.contains("b")).count();


    System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs);


    spark.stop();

  }

}



To build the program, we also write a Maven pom.xml file that lists Spark as a dependency. Note that Spark artifacts are tagged with a Scala version.

<project>

  <groupId>edu.berkeley</groupId>

  <artifactId>simple-project</artifactId>

  <modelVersion>4.0.0</modelVersion>

  <name>Simple Project</name>

  <packaging>jar</packaging>

  <version>1.0</version>

  <dependencies>

    <dependency> <!-- Spark dependency -->

      <groupId>org.apache.spark</groupId>

      <artifactId>spark-sql_2.11</artifactId>

      <version>2.3.0</version>

    </dependency>

  </dependencies>

</project>


We lay out these files according to the canonical Maven directory structure:

$ find .

./pom.xml

./src

./src/main

./src/main/java

./src/main/java/SimpleApp.java



Now, we can package the application using Maven and execute it with ./bin/spark-submit.

# Package a JAR containing your application

$ mvn package

...

[INFO] Building jar: {..}/{..}/target/simple-project-1.0.jar


# Use spark-submit to run your application

$ YOUR_SPARK_HOME/bin/spark-submit \

  --class "SimpleApp" \

  --master local[4] \

  target/simple-project-1.0.jar

...

Lines with a: 46, Lines with b: 23

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值