从零到一编写一个 spark 程序并提交到集群中运行

最新推荐文章于 2022-12-14 14:14:58 发布

置顶精诚所至金石为开

最新推荐文章于 2022-12-14 14:14:58 发布

阅读量800

点赞数

文章标签： spark

本文链接：https://blog.csdn.net/smartsteps/article/details/103598505

版权

怎样用IDEA编写spark程序并提交到集群上运行

1.安装scala sdk

1.下载 scala 安装程序

#下载地址
https://www.scala-lang.org/download
#windwos 下安装
https://downloads.lightbend.com/scala/2.13.1/scala-2.13.1.msi

2.配置环境变量

新增系统变量 SCALA_HOME

在这里插入图片描述

配置PATH环境变量新增：%SCALA_HOME%\bin;

在这里插入图片描述

配置CLASSPATH:
启动cmd：运行出现如下表示安装成功

在这里插入图片描述

2. IDEA 中的配置

1.安装scala插件

在这里插入图片描述
2. 创建 mave 项目

在这里插入图片描述
3.项目建好以后右键项目名称

3. 增加scala 框架的支持

4. 导入maven依赖

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>cn.itcast</groupId>
    <artifactId>spark</artifactId>
    <version>0.1.0</version>

    <properties>
        <scala.version>2.11.8</scala.version>
        <spark.version>2.2.0</spark.version>
        <slf4j.version>1.7.16</slf4j.version>
        <log4j.version>1.2.17</log4j.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.6.0</version>
        </dependency>

        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>jcl-over-slf4j</artifactId>
            <version>${slf4j.version}</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>${slf4j.version}</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-log4j12</artifactId>
            <version>${slf4j.version}</version>
        </dependency>
        <dependency>
            <groupId>log4j</groupId>
            <artifactId>log4j</artifactId>
            <version>${log4j.version}</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.10</version>
            <scope>provided</scope>
        </dependency>
    </dependencies>

    <build>
        <sourceDirectory>src/main/scala</sourceDirectory>
        <testSourceDirectory>src/test/scala</testSourceDirectory>
        <plugins>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.0</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                    <encoding>UTF-8</encoding>
                </configuration>
            </plugin>

            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.2.0</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                        <configuration>
                            <args>
                                <arg>-dependencyfile</arg>
                                <arg>${project.build.directory}/.scala_dependencies</arg>
                            </args>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>3.1.1</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                            <transformers>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <mainClass></mainClass>
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>

5.选择scala sdk
在这里插入图片描述
6. 创建 scala工作目录

本地调试

新建 worldcount.txt 文件写入一下内容

spark hello
hadoop hello
java hello
golang hello

def main(args: Array[String]): Unit = {
    // 1. 创建 Spark Context
    val conf = new SparkConf().setMaster("local[2]").setAppName("test")
    val sc: SparkContext = new SparkContext(conf)

    // 2. 读取文件并计算词频
    val source: RDD[String] = sc.textFile("e:/wordcount.txt", 2)
    val words: RDD[String] = source.flatMap { line => line.split(" ") }
    val wordsTuple: RDD[(String, Int)] = words.map { word => (word, 1) }
    val wordsCount: RDD[(String, Int)] = wordsTuple.reduceByKey { (x, y) => x + y }

    // 3. 查看执行结果
    wordsCount.foreach(println(_))
  }

运行结果

(spark,1)
(golang,1)
(hadoop,1)
(hello,4)
(java,1)

集群运行

1.将wordcount.txt 文件上传到 hdfs

hdfs dfs -put wordcount.txt /data

2.代码编写

def main(args: Array[String]): Unit = {
    // 1. 创建SparkContext
    val conf = new SparkConf().setAppName("word_count")
    val sc = new SparkContext(conf)

    // 2. 加载文件
    //     1. 准备文件
    //     2. 读取文件

    // RDD 特点:
    // 1. RDD是数据集
    // 2. RDD是编程模型
    // 3. RDD相互之间有依赖关系
    // 4. RDD是可以分区的
    val rdd1: RDD[String] = sc.textFile("hdfs:///data/wordcount.txt")

    // 3. 处理
    //     1. 把整句话拆分为多个单词
    val rdd2: RDD[String] = rdd1.flatMap(item => item.split(" ") )
    //     2. 把每个单词指定一个词频1
    val rdd3: RDD[(String, Int)] = rdd2.map(item => (item, 1) )
    //     3. 聚合
    val rdd4: RDD[(String, Int)] = rdd3.reduceByKey((curr, agg) => curr + agg )

    // 4. 得到结果
    val result: Array[(String, Int)] = rdd4.collect()
    result.foreach(item => println(item))
  }

3.spark-submit 命令

spark-submit [options] < app jar> < app options>
app jar 程序 Jar 包

app jar 程序 Jar 包
app options 程序 Main 方法传入的参数
options 提交应用的参数, 可以有如下选项

options 可选参数

参数	解释
–master < url>	同 Spark shell 的 Master, 可以是spark, yarn, mesos, kubernetes等 URL
–deploy-mode < client or cluster>	Driver 运行位置, 可选 Client 和 Cluster, 分别对应运行在本地和集群(Worker)中
–class < class full name>	Jar 中的 Class, 程序入口
–jars < dependencies path>	依赖 Jar 包的位置
–driver-memory < memory size>	Driver 程序运行所需要的内存, 默认 512M
–executor-memory < memory size>	Executor 的内存大小, 默认 1G

4.提交到 Spark Standalone 集群中运行
1.打包
在这里插入图片描述
2.生成的jar包

3.进入安装 spark 安装目录 /bin 文件夹下执行脚本

cd /user/local/spark/bin

spark-submit --master spark://zhen:7077 \   #提交的主机
--class scl.MyRdd \   #包路径
/home/spark/original-lmspark-1.0-SNAPSHOT.jar  #要运行的jar包

4.观察后台管理页面可以看到任务正在运行

在这里插入图片描述

5.运行结果
在这里插入图片描述

结束

好了今天就讲到这里了，欢迎加我微信一起提升，如有帮助感谢赞赏。

在这里插入图片描述

精诚所至金石为开

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
从零到一编写一个 spark 程序并提交到集群中运行

怎样用IDEA编写spark程序并提交到集群上运行1.安装scala sdk1.下载 scala 安装程序#下载地址https://www.scala-lang.org/download#windwos 下安装https://downloads.lightbend.com/scala/2.13.1/scala-2.13.1.msi2.配置环境变量新增系统变量 SCALA_HO...
复制链接

扫一扫