Spark基础的transformation 和 action的函数操作

最新推荐文章于 2022-04-25 09:28:32 发布

forever19870418

最新推荐文章于 2022-04-25 09:28:32 发布

阅读量474

点赞数

分类专栏： spark 文章标签： spark

本文链接：https://blog.csdn.net/forever19870418/article/details/62222508

版权

spark 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Spark基础的transformation和 action的函数操作

函数示例：

map函数：

操作数据集：

In [1]: rdd = sc.textFile("file:/home/training/training_materials/data/weblogs/2014-03-15.log")

In [3]: rdd.map(lambda x: x.split()).take(2)
Out[3]:
[[u'234.206.18.239',
u'-',
u'8495',
u'[15/Mar/2014:23:59:30',
u'+0100]',
u'"GET',
u'/KBDOC-00082.html',
u'HTTP/1.0"',
u'200',
u'9054',
u'"http://www.loudacre.com"',
u'"Loudacre',
u'Mobile',
u'Browser',
u'Titanic',
u'2200"'],
[u'234.206.18.239',
u'-',
u'8495',
u'[15/Mar/2014:23:59:30',
u'+0100]',
u'"GET',
u'/theme.css',
u'HTTP/1.0"',
u'200',
u'4552',
u'"http://www.loudacre.com"',
u'"Loudacre',
u'Mobile',
u'Browser',
u'Titanic',
u'2200"']]

In [4]: rdd.map(lambda x: x.split()).map(lambda filed: (filed[0],filed[2])).take(2)
Out[4]: [(u'234.206.18.239', u'8495'), (u'234.206.18.239', u'8495')]

In [5]: rdd.map(lambda x: x.split()).map(lambda filed: (filed[0]+"/"+filed[2])).take(2)
Out[5]: [u'234.206.18.239/8495', u'234.206.18.239/8495']

In [8]: rdd.map(lambda x: len(x)).take(3)
Out[8]: [158, 151, 167]

byKey函数:

In [26]: rdd.keyBy(lambda x: x.split()[2]).take(2)
Out[26]:
[(u'8495',
u'234.206.18.239 - 8495 [15/Mar/2014:23:59:30 +0100] "GET /KBDOC-00082.html HTTP/1.0" 200 9054 "http://www.loudacre.com" "Loudacre Mobile Browser Titanic 2200"'),
(u'8495',
u'234.206.18.239 - 8495 [15/Mar/2014:23:59:30 +0100] "GET /theme.css HTTP/1.0" 200 4552 "http://www.loudacre.com" "Loudacre Mobile Browser Titanic 2200"')]

filter函数：

In [7]: rdd.filter(lambda x: ".jpg" in x).take(3)
Out[7]:
[u'64.190.73.51 - 74328 [15/Mar/2014:23:49:14 +0100] "GET /titanic_2400.jpg HTTP/1.0" 200 6021 "http://www.loudacre.com" "Loudacre Mobile Browser iFruit 5"',
u'25.101.226.55 - 72619 [15/Mar/2014:23:32:32 +0100] "GET /ifruit_2.jpg HTTP/1.0" 200 18225 "http://www.loudacre.com" "Loudacre Mobile Browser Sorrento F01L"',
u'34.5.4.77 - 72680 [15/Mar/2014:23:25:12 +0100] "GET /titanic_2100.jpg HTTP/1.0" 200 8290 "http://www.loudacre.com" "Loudacre Mobile Browser Titanic 2400"']

flatMap函数：

操作数据集：

In [9]: rdd = sc.textFile("file:/home/training/a.txt")

In [10]: rdd.flatMap(lambda x: x.split()).collect()
Out[10]: [u'a', u'b', u'c', u'd', u'e', u'f', u'g', u'h', u'e', u'f', u'g', u'h']

In [11]: rdd.flatMap(lambda x: x.split()).map(lambda x:(x,1)).collect()
Out[11]:
[(u'a', 1),
(u'b', 1),
(u'c', 1),
(u'd', 1),
(u'e', 1),
(u'f', 1),
(u'g', 1),
(u'h', 1),
(u'e', 1),
(u'f', 1),
(u'g', 1),
(u'h', 1)]

reduceByKey函数：

In [12]: rdd.flatMap(lambda x: x.split()).map(lambda x:(x,1)).reduceByKey(lambda x,y:(x+y)).collect()
Out[12]:
[(u'a', 1),
(u'c', 1),
(u'b', 1),
(u'e', 2),
(u'd', 1),
(u'g', 2),
(u'f', 2),
(u'h', 2)]

upper():

操作数据集

In [15]: rdd = sc.textFile("file:/home/training/c.txt")

In [16]: rdd.map(lambda x: x.upper()).collect()
Out[16]:
[u"I'VE NEVER SEEN A PURPLE COW.",
u'I NEVER HOPE TO SEE ONE;',
u'BUT I CAN TELL YOU, ANYHOW,',
u"I'D RATHER SEE THAN BE ONE."]

startswith('I')

In [20]: rdd.filter(lambda x: x.startswith('I')).collect()
Out[20]:
[u"I've never seen a purple cow.",
u'I never hope to see one;',
u"I'd rather see than be one."]

parallelize(collection)函数：

In [21]: myData = ["Alice","Carlos","Frank","Barbara"]

In [22]: myRdd = sc.parallelize(myData)

In [23]: myRdd.take(2)
Out[23]: ['Alice', 'Carlos']

union函数：

In [32]: rdd1 = sc.parallelize(['Chicago','Boston','Paris','San Francisco','Tokyo'])

In [33]: rdd2 = sc.parallelize(['San Francisco','Boston','Amsterdam','Mumbai','McMurdo Station'])

In [35]: rdd1.union(rdd2).collect()
Out[35]:
['Chicago',
'Boston',
'Paris',
'San Francisco',
'Tokyo',
'San Francisco',
'Boston',
'Amsterdam',
'Mumbai',
'McMurdo Station']

zip函数：

In [37]: rdd1.zip(rdd2).collect()
Out[37]:
[('Chicago', 'San Francisco'),
('Boston', 'Boston'),
('Paris', 'Amsterdam'),
('San Francisco', 'Mumbai'),
('Tokyo', 'McMurdo Station')]

forever19870418

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Spark基础的transformation 和 action的函数操作

Spark基础的transformation和 action的函数操作函数示例：map函数：操作数据集：In [1]: rdd = sc.textFile("file:/home/training/training_materials/data/weblogs/2014-03-15.log")In [3]: rdd.map(lambda x: x
复制链接

扫一扫