Spark面试术语总结

最新推荐文章于 2022-09-23 23:44:10 发布

墨卿风竹

最新推荐文章于 2022-09-23 23:44:10 发布

阅读量265

点赞数

文章标签： Spark面试术语总结

本文链接：https://blog.csdn.net/qq_43688472/article/details/85061127

版权

今天你比昨天更博学了么，今天你比昨天更进步了么，雨爱把时间荒废在碌碌无为上，还不如踏踏实实学点东西，可能你进步很慢，只要你不放弃，一定可以的，，，，，，，，，
————————————————-送给正在努力的你
今天的学习：
Glossary
The following table summarizes terms you’ll see used to refer to cluster concepts:

Term Meaning
Application User program built on Spark. Consists of a driver program and executors on the cluster.
应用程序用户程序构建在Spark上。由集群上的驱动程序和执行程序组成。
Application jar A jar containing the user’s Spark application. In some cases users will want to create an “uber jar” containing their application along with its dependencies. The user’s jar should never include Hadoop or Spark libraries, however, these will be added at runtime.
应用程序jia 包含用户的Spark应用程序的jar。在某些情况下，用户希望创建一个“uber jar”，其中包含他们的应用程序及其依赖项。用户的jar不应该包含Hadoop或Spark库，但是，这些库将在运行时添加。
Driver program The process running the main() function of the application and creating the SparkContext
驱动程序运行应用程序的main()函数并创建SparkContext的进程
Cluster manager An external service for acquiring resources on the cluster (e.g. standalone manager, Mesos, YARN)
集群管理器获取集群资源的外部服务(例如，独立管理器、Mesos、YARN)
Deploy mode Distinguishes where the driver process runs. In “cluster” mode, the framework launches the driver inside of the cluster. In “client” mode, the submitter launches the driver outside of the cluster.
部署模式区分驱动程序流程在何处运行。在“集群”模式下，框架在集群内部启动驱动程序。在“客户端”模式下，提交器在集群之外启动驱动程序。
Worker node Any node that can run application code in the cluster
工作节点可以在集群中运行应用程序代码的任何节点
Executor A process launched for an application on a worker node, that runs tasks and keeps data in memory or disk storage across them. Each application has its own executors.
执行器为工作节点上的应用程序启动的进程，它运行任务并将数据保存在内存或磁盘存储中。每个应用程序都有自己的执行器。
Task A unit of work that will be sent to one executor
tast 将发送给一个执行程序的工作单元
Job A parallel computation consisting of multiple tasks that gets spawned in response to a Spark action (e.g. save, collect); you’ll see this term used in the driver’s logs.
jop 作业由多个任务组成的并行计算，这些任务在触发操作(例如保存、收集)时生成;您将在驱动程序日志中看到这个术语。
Stage Each job gets divided into smaller sets of tasks called stages that depend on each other (similar to the map and reduce stages in MapReduce); you’ll see this term used in the driver’s logs.
阶段每个作业被划分成更小的任务集，称为相互依赖的阶段(类似于MapReduce中的map和reduce阶段);您将在驱动程序日志中看到这个术语。
总结一下：
Application = 1 Driver + N Executors
Driver：Process， main()创建SparkContext
client:
cluster:
Executor：Process，执行task(map/filter)
Worker ==> NM
算子
Job ==> action
Stage ==> 一个Job可能会被切分成多个stage
官网地址：https://spark.apache.org/docs/latest/cluster-overview.html
https://spark.apache.org/》Documentation》LatestRelease(Spark2.4.0)》Deploving》Overview
寻找流程

墨卿风竹

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Spark面试术语总结

今天你比昨天更博学了么，今天你比昨天更进步了么，雨爱把时间荒废在碌碌无为上，还不如踏踏实实学点东西，可能你进步很慢，只要你不放弃，一定可以的，，，，，，，，，————————————————-送给正在努力的你今天的学习：GlossaryThe following table summarizes terms you’ll see used to refer to cluster conc...
复制链接

扫一扫