spark hdfs java_Apache Spark：如何从hdfs文件中读取

最新推荐文章于 2024-08-11 07:00:00 发布

weixin_39765796

最新推荐文章于 2024-08-11 07:00:00 发布

阅读量212

点赞数

文章标签： spark hdfs java

本文链接：https://blog.csdn.net/weixin_39765796/article/details/114437421

版权

这篇博客讨论了如何使用`spark-submit`在集群模式下运行Spark应用程序，强调了--files标志的重要性，该标志用于将文件从驱动程序传递给工作节点。在集群模式下，由于驱动程序和工作节点可能位于不同机器，因此需要明确指定要发送的文件。博客还提到了其他参数如--master, --deploy-mode和--executor-memory等，并提供了配置示例。

摘要由CSDN通过智能技术生成

如果您使用spark-submit以集群模式运行应用程序，那么它可以采用标志 - 文件，用于将文件从驱动程序节点传递给工作人员 . 我相信你能够在本地模式下运行的原因是因为你的驱动程序和worker在同一台机器上，但是在集群模式下，驱动程序和worker可能在不同的机器上 . 在这种情况下，Spark需要知道要将哪些文件发送到工作节点 . 可以按照 Learning Spark by Holden Karau; Andy Konwinski; Patrick Wendell; Matei Zaharia 一书中的说明使用以下标志

--master

Indicates the cluster manager to connect to. The options for this flag are described in Table 7-1.

--deploy-mode

Whether to launch the driver program locally (“client”) or on one of the worker machines inside the cluster (“cluster”). In client mode spark-submit will run your driver on the same machine where spark-submit >s itself being invoked. In cluster mode, the driver will be shipped to execute on a worker node in the cluster. The default is client mode.

--class

The “main” class of your application if you’re running a Java or Scala program.

--name

A human-readable name for your application. This will be displayed in Spark’s web UI.

--jars

A list of JAR files to upload and place on the classpath of your application. If your application depends on a small number of third-party JARs, you can add them here.

--files

A list of files to be placed in the working directory of your application. This can be used for data files that you want to distribute to each node.

--py-files

A list of files to be added to the PYTHONPATH of your application. This can contain .py, .egg, or .zip files.

--executor-memory

The amount of memory to use for executors, in bytes. Suffixes can be used to specify larger quantities such as “512m” (512 megabytes) or “15g” (15 gigabytes).

--driver-memory

The amount of memory to use for the driver process, in bytes. Suffixes can be used to specify larger quantities such as “512m” (512 megabytes) or “15g” (15 gigabytes).

Update 我认为Kiran有Hadoop设置(正如他在外部提到的那样)并且无法以编程方式从HDFS中读取程序 . 如果不是这样，请忽略答案 .

weixin_39765796

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫