Spark Core: sc.textFile vs sc.WholeTextFiles

最新推荐文章于 2022-07-15 20:38:35 发布

「已注销」

最新推荐文章于 2022-07-15 20:38:35 发布

阅读量2.9k

点赞数

分类专栏： spark

spark 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

While loading a RDD from source data, there are two choices which look similar.

scala&gt; val movies = sc.textFile("movies")

scala&gt; val movies = sc.wholeTextFiles("movies")

sc.textFile

SparkContext’s TextFile method, i.e., sc.textFile in Spark Shell, creates a RDD with each line as an element. If there are 10 files in movies folder, 10 partitions will be created. You can verify the number of partitions by:

scala&gt; movies.partitions.length

sc.wholeTextFiles

SparkContext’s whole text files method, i.e., sc.wholeTextFiles in Spark Shell, creates a PairRDD with the key being the file name with a path. It’s a full path like “hdfs://m1.zettabytes.com:9000/user/hduser/movies/movie1.txt”. The value is the whole content of file in String. Here the number of partitions will be 1 or more depending upon how many executor cores you have.

「已注销」

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Spark Core: sc.textFile vs sc.WholeTextFiles

While loading a RDD from source data, there are two choices which look similar.scala> val movies = sc.textFile("movies")scala> val movies = sc.wholeTextFiles("m
复制链接

扫一扫

专栏目录

「已注销」 CSDN认证博客专家 CSDN认证企业博客

码龄12年

27: 原创

-: 周排名

-: 总排名

26万+: 访问

: 等级

2292: 积分

4: 粉丝

13: 获赞

11: 评论

24: 收藏

私信

关注

热门文章

分类专栏

hadoop 10篇
linux 8篇
OS
工具 1篇
Web 11篇
MySQL 18篇
PHP 9篇
ExtJs 2篇
shell 5篇
ExtJs4 2篇
Hive 1篇
Pyton 1篇
mahout 2篇
Java 1篇
dm 1篇
spark 1篇

最新评论

Replace into与Insert update
张某爱学习: 其实replace最容易导致的就是自增id溢出，哈哈哈当时很懵逼
Extjs4 使用store的post方法
hszl2020: 直接在proxy请求里面覆盖请求方法即可，添加getMethod : function() {return "post"} [code=javascript] var userStore = Ext.create('Ext.data.Store', { model: 'User', pageSize: 2, proxy : { type : 'ajax', action : 'post', url : '/demo/ext/table/list', reader : { type : 'json', rootProperty : 'items', totalProperty: 'total' }, getMethod : function(){return 'POST';} }, autoLoad: true }); [/code]
使用getopts处理shell中的输入参数
freenaut: mark！
elasticsearch的一些基本概念
adffssefs: Field不是filed
设置hadoop Job允许map task失败的比例
xun-ming: 也可以在代码中配置吧 [code=java] configuration.set("mapred.max.reduce.failures.percent","90"); [/code]

最新文章

目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。