Python内置函数map、reduce、filter在文本处理中的应用

文件是由很多行组成的,这些行组成一个列表,python提供了处理列表很有用的三个函数:map、reduce、filter。因此在文本处理中,可以使用这三个函数达到代码的更加精简清晰。

这里的map、reduce是python的内置函数,跟hadoop的map、reduce函数没有关系,不过使用的目的有点类似,map函数做预处理、reduce函数一般做聚合。

map、reduce、filter在文本处理中的使用

下面是一个文本文件的内容,第1列是ID,第4列是权重,我们的目标是获取所有ID是奇数的行,将这些行的权重翻倍,最后返回权重值的总和。

ID权重
1name1value111
2name2value212
3name3value313
4name4value414
5name5value515
6name6value616
7name7value717
8name8value818
9name9value919
10name10value1020

使用filter、map、reduce函数的代码如下;

运行结果:

 

map、reduce、filter函数的特点

  • filter函数:以列表为参数,返回满足条件的元素组成的列表;类似于SQL中的where a=1
  • map函数:以列表为参数,对每个元素做处理,返回这些处理后元素组成的列表;类似于sql中的select a*2
  • reduce函数:以列表为参数,对列表进行累计、汇总、平均等聚合函数;类似于sql中的select sum(a),average(b)

这些函数官方的解释

map(function, iterable, …)

Apply function to every item of iterable and return a list of the results. If additional iterable arguments are passed, function must take that many arguments and is applied to the items from all iterables in parallel. If one iterable is shorter than another it is assumed to be extended with None items. If function is None, the identity function is assumed; if there are multiple arguments, map() returns a list consisting of tuples containing the corresponding items from all iterables (a kind of transpose operation). The iterable arguments may be a sequence or any iterable object; the result is always a list.

reduce(function, iterable[, initializer])

Apply function of two arguments cumulatively to the items of iterable, from left to right, so as to reduce the iterable to a single value. For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5). The left argument, x, is the accumulated value and the right argument, y, is the update value from the iterable. If the optional initializer is present, it is placed before the items of the iterable in the calculation, and serves as a default when the iterable is empty. If initializer is not given and iterable contains only one item, the first item is returned. Roughly equivalent to:

def reduce(function, iterable, initializer=None):
it = iter(iterable)
if initializer is None:
try:
initializer = next(it)
except StopIteration:
raise TypeError(‘reduce() of empty sequence with no initial value’)
accum_value = initializer
for x in it:
accum_value = function(accum_value, x)
return accum_value​

filter(function, iterable)

Construct a list from those elements of iterable for which function returns true. iterable may be either a sequence, a container which supports iteration, or an iterator. If iterable is a string or a tuple, the result also has that type; otherwise it is always a list. If function is None, the identity function is assumed, that is, all elements of iterable that are false are removed.

Note that filter(function, iterable) is equivalent to [item for item in iterable if function(item)] if function is not None and [item for item in iterable if item] if function is None.

See itertools.ifilter() and itertools.ifilterfalse() for iterator versions of this function, including a variation that filters for elements where the function returns false.

参考资料:

http://docs.python.org/2/library/functions.html

http://www.oschina.net/code/snippet_111708_16145

转载请注明来源: http://www.crazyant.net/1390.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

蚂蚁学Python

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值