Spark On Yarn History Server配置、转换算子vs行为算子

小胖超凶哦！

已于 2022-05-15 21:50:04 修改

阅读量154

点赞数

分类专栏： Spark基础初学大数据文章标签： Spark

于 2022-05-15 21:29:11 首次发布

本文链接：https://blog.csdn.net/ZZJXP/article/details/124787410

版权

初学大数据同时被 2 个专栏收录

158 篇文章 1 订阅

订阅专栏

Spark基础

27 篇文章 0 订阅

订阅专栏

[root@master spark-2.4.5]# cd conf/
[root@master conf]# ls
docker.properties.template   slaves.template
fairscheduler.xml.template   spark-defaults.conf.template
log4j.properties.template    spark-env.sh
metrics.properties.template  spark-env.sh.template
slaves
[root@master conf]# cp spark-defaults.conf.template spark-defaults.conf
[root@master conf]# ls
docker.properties.template   slaves.template
fairscheduler.xml.template   spark-defaults.conf
log4j.properties.template    spark-defaults.conf.template
metrics.properties.template  spark-env.sh
slaves                       spark-env.sh.template
[root@master conf]# vim spark-defaults.conf

spark.eventLog.enabled  true
spark.eventLog.dir      hdfs://master:9000/user/spark/applicationHistory
spark.yarn.historyServer.address        master:18080
spark.eventLog.compress true
spark.history.fs.logDirectory   hdfs://master:9000/user/spark/applicationHistory
spark.history.retainedApplications      15

[root@master conf]# hdfs dfs -mkdir -p /user/spark/applicationHistory
[root@master conf]# cd ..
[root@master spark-2.4.5]# cd sbin/
[root@master sbin]# ls
slaves.sh                       start-slaves.sh
spark-config.sh                 start-thriftserver.sh
spark-daemon.sh                 stop-all.sh
spark-daemons.sh                stop-history-server.sh
start-all.sh                    stop-master.sh
start-history-server.sh         stop-mesos-dispatcher.sh
start-master.sh                 stop-mesos-shuffle-service.sh
start-mesos-dispatcher.sh       stop-shuffle-service.sh
start-mesos-shuffle-service.sh  stop-slave.sh
start-shuffle-service.sh        stop-slaves.sh
start-slave.sh                  stop-thriftserver.sh
[root@master sbin]# ./start-history-server.sh 
starting org.apache.spark.deploy.history.HistoryServer, logging to /usr/local/soft/spark-2.4.5/logs/spark-root-org.apache.spark.deploy.history.HistoryServer-1-master.out

[root@master soft]# cd spark-2.4.5/examples/jars/
[root@master jars]# spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --executor-memory 512M --num-executors 2 spark-examples_2.11-2.4.5.jar 100
[root@master jars]# jps
62017 Jps
62018 Jps
61589 HistoryServer
58953 NameNode
58554 ResourceManager
59163 SecondaryNameNode
[root@master jars]# cd ..
[root@master examples]# cd ..
[root@master spark-2.4.5]# ls
bin   examples    LICENSE   NOTICE  README.md  yarn
conf  jars        licenses  python  RELEASE
data  kubernetes  logs      R       sbin
[root@master spark-2.4.5]# ./sbin/stop-history-server.sh 
stopping org.apache.spark.deploy.history.HistoryServer

package com.shujia.core

import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}

object Demo03Map {
  def main(args: Array[String]): Unit = {
    //创建Spark Context
    val conf: SparkConf = new SparkConf()
    conf.setAppName("Demo03Map")
    conf.setMaster("local")

    val sc: SparkContext = new SparkContext(conf)

    //构建RDD 通过Scala中的集合构建
    val intRDD: RDD[Int] = sc.parallelize(List(1, 2, 3, 4, 5, 6, 7))

    val kvRDD: RDD[(String, String)] = sc.makeRDD(List(("k1", "v1"), "k2" -> "v2"))

    /**
     * Spark中常用的算子：
     * Transformations 转换算子：将一个RDD变成另一个RDD 懒执行 需要一个Action算子触发执行
     * Action 行为算子 操作算子：将RDD变成Scala中的类型
     *
     * 如何区分 转换算子/行为算子？
     * 就看算子的返回值是不是RDD
     *
     * Spark任务的层级：Application -> Job -> Stage -> Task
     * Job的数量由Action算子的数量决定
     */

    //将每个数*2
    /**
     * map转换算子 同List中的map方法类似
     * 将RDD中的每一条数据传入map中接收的方法f 进行转换得到一个新的RDD
     *
     * foreach行为算子 同List中的foreach方法类似
     * 将RDD中的每一条数据传入foreach中接收的方法 相比map而言 它没有返回值
     * 一般用于最后将数据输出或者保存到外部系统
     */
    val int2RDD: RDD[Int] = intRDD
      .map(i => {
        println("进入了map处理逻辑")
        i * 2
      })
    //每个Action算子都会触发一个Job
    int2RDD.foreach(println)//foreach Action行为算子 可以触发任务Job
    int2RDD.foreach(println)//foreach Action行为算子 可以触发任务Job
    int2RDD.foreach(println)//foreach Action行为算子 可以触发任务Job
    int2RDD.foreach(println)//foreach Action行为算子 可以触发任务Job
    int2RDD.foreach(println)//foreach Action行为算子 可以触发任务Job

    //Scala中的List调用map方法
    List(1,2,3,4,5,6).map(i=>{
      println("进入了List的map处理逻辑")
      i*3
    })

    while(true){

    }
  }
}

进入了map处理逻辑
2
进入了map处理逻辑
4
进入了map处理逻辑
6
进入了map处理逻辑
8
进入了map处理逻辑
10
进入了map处理逻辑
12
进入了map处理逻辑
14
进入了map处理逻辑
2
进入了map处理逻辑
4
进入了map处理逻辑
6
进入了map处理逻辑
8
进入了map处理逻辑
10
进入了map处理逻辑
12
进入了map处理逻辑
14
进入了map处理逻辑
2
进入了map处理逻辑
4
进入了map处理逻辑
6
进入了map处理逻辑
8
进入了map处理逻辑
10
进入了map处理逻辑
12
进入了map处理逻辑
14
进入了map处理逻辑
2
进入了map处理逻辑
4
进入了map处理逻辑
6
进入了map处理逻辑
8
进入了map处理逻辑
10
进入了map处理逻辑
12
进入了map处理逻辑
14
进入了map处理逻辑
2
进入了map处理逻辑
4
进入了map处理逻辑
6
进入了map处理逻辑
8
进入了map处理逻辑
10
进入了map处理逻辑
12
进入了map处理逻辑
14
进入了List的map处理逻辑
进入了List的map处理逻辑
进入了List的map处理逻辑
进入了List的map处理逻辑
进入了List的map处理逻辑
进入了List的map处理逻辑

小胖超凶哦！

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Spark On Yarn History Server配置、转换算子vs行为算子

[root@master spark-2.4.5]# cd conf/[root@master conf]# lsdocker.properties.template slaves.templatefairscheduler.xml.template spark-defaults.conf.templatelog4j.properties.template spark-env.shmetrics.properties.template spark-env.sh.template..
复制链接

扫一扫