SPARK随手笔记

本文是关于大数据处理工具Spark的工作笔记,涵盖了Spark的主要术语、部署模式、Yarn集群模式的工作流程、Dataset特性、Transformation操作及遇到的问题和解决方案,如Driver、ApplicationMaster的角色,以及如何处理Spark提交时找不到类的问题等。
摘要由CSDN通过智能技术生成

大数据工作笔记Spark 篇


Spark Terms
  • Driver the process running the main() function of the application, and creating spark context(DagSchedular and TaskSchedular). this is a application level process.
  • Yarn Application master (application level service) is a lightweight process that coordinates the execution of tasks of an application and asks the resource manager for resource containers for tasks. it monitor tasks, restarts failed ones, etc. it can run any type of tasks, be them MapReduce tasks or spark tasks.
  • Master is the cluster manager of spark standalone cluster, it is an external long running service for acquiring resources on cluster(e.g. standalone, mesos, yarn) at cluster level. Master webui
    is the web UI sever for the standalone master
INFO Master: Starting Spark master at spark://japila.local:7077
INFO Master: Running Spark version 1.6.0-SNAPSHOT

Spark Deploy Mode
  • Standalone (Cluster) - spark manage everything/cluster by itself (e.g. cluster manager(master), slaves node). This mode requires each application to run an executor on every node in the cluster by default.

    • it will acquire all cores by default in the cluster , which only make sense if you just run one application at a time. you can limit the number of cores by setting spark.cores.max in sparkconf.
    • alternatively add following to conf/spark-env.sh on the cluster master process to change the default for application that doesn’t set spark.cores.max.
      export SPARK_MASTER_OPTS="-Dspark.deploy.defaultCores=<value>"
  • Yarn - utilizing yarn resource manager

    • Client - when your application is submitted from machine that is physically co-located with your worker machines. Driver is launched directly within the spark-submit process which act as a client to cluster.

    • Cluster - when your application is submitted remotely (e.g. locally from laptop), it is common to use luster mode to minimize network latency between the driver and executor.

    • </
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 6
    评论
评论 6
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值