Lecture 25: Map, Filter, Reduce

1 Definition of MapReduce

Map/Filter/Reduce: A design pattern that substantially simplifies the implementation of functions that operate over sequences of elements.
Functions can be seen as “first-class” data values, meaning that they can be stored in variables, passed as arguments to functions, and created dynamically like other values.

2 Abstracting out control flow

2.1 Iterator Abstraction

  • Iterator gives you a sequence of elements from a data structure, without you having to worry about whether the data structure is a set or a token stream or a list or an array — the Iterator looks the same no matter what the data structure is.
  • Any Iterable can be used with Java’s enhanced for statement — for (File f : files) — and under the hood, it uses an iterator.

2.2 Map/filter/reduce abstraction

  • The map/filter/reduce patterns in this reading do something similar to Iterator, but at an even higher level: they treat the entire sequence of elements as a unit, so that the programmer doesn’t have to name and work with the elements individually.

3 Map

Definition: Map applies a unary function to each element in the sequence and returns a new sequence containing the results, in the same order:

map:(EF)×Seq<E>Seq<F>

For example, in Python:

>>> from math import sqrt
>>> map(sqrt, [1, 4, 9, 16])
[1.0, 2.0, 3.0, 4.0]
>>> map(str.lower, ['A', 'b', 'C'])
['a', 'b', 'c']

map is built-in, but it is also straightforward to implement in Python:

def map(f, seq):
    result = []
    for elt in seq:
        result.append(f(elt))
    return result

4 Functions as values

Functions are first-class in Python, meaning that they can be assigned to variables, passed as parameters, used as return values, and stored in data structures.

The use of lambda expression:

lambda: k: 2**k

and this represents a function:

def powerOfTwo(k):
    return 2**k

and the use of it looks like this:

>>> (lambda k: 2**k)(5)
32
>>> map(lambda k: 2**k, [1, 2, 3, 4])
[2, 4, 8, 16]

4.1 More ways to use map

You can map a mutator operation over them:

map(IOBase.close, streams) # closes each stream on the list
map(Thread.join, threads)  # waits for each thread to finish

Some versions of map (including Python’s built-in map ) also support mapping functions with multiple arguments. For example, you can add two lists of numbers element-wise:

>>> import operator
>>> map(operator.add, [1, 2, 3], [4, 5, 6])
[5, 7, 9]

5 Filter

Filter, which tests each element with a unary predicate. Elements that satisfy the predicate are kept; those that don’t are removed. A new list is returned; filter doesn’t modify its input list.

filter:(Eboolean)×Seq<E>Seq<E>

for example:

>>> filter(lambda s: len(s)>0, ['abc', '', 'd'])
['abc', 'd']
>>> filter(str.isalpha, ['x', 'y', '2', '3', 'a']) 
['x', 'y', 'a']

6 Reduce

Reduce combines the elements of the sequence together, using a binary function.
In addition to the function and the list, it also takes an initial value that initializes the reduction, and that ends up being the return value if the list is empty.

reduce:(E×FF)×Seq<E>×FF

reduce(f, list, init) combines the elements of the list from left to right, as follows:

result 0 = init
result1 = f(result 0 , list[0])
result2 = f(result 1 , list[1])

result n = f(result n1 , list[n-1])

result n is the final result for an n-element list

For example:
Glue together a sequence into a string:

>>> reduce(lambda s,x: s+str(x), [1, 2, 3, 4], '')
'1234'

Flatten out nested sublists into a single list:

>>> reduce(operator.concat, [[1, 2], [3, 4], [], [5]], [])
[1, 2, 3, 4, 5]

This is a useful enough sequence operation that we’ll define it as flatten , although it’s just a reduce step inside:

def flatten(list):
    return reduce(operator.concat, list, [])

7 Benefites of Abstracting out control

  • Map/filter/reduce can often make code shorter and simpler, and allow the programmer to focus on the heart of the computation rather than on the details of loops, branches, and control flow.
  • By arranging our program in terms of map, filter, and reduce, and in particular using immutable datatypes and pure functions (functions that do not mutate data) as much as possible, we’ve created more opportunities for safe concurrency, MapReduce is a** pattern for parallelizing large computations**.

8 First-class functions in Java

  • In Java, the only first-class values are primitive values (ints, booleans, characters, etc.) and object references.
  • The way to implement a first-class function, in an object-oriented programming language like Java that doesn’t support first-class functions directly, is to use an object with a method representing the function.
    • The Runnable object that you pass to a Thread constructor is a first-class function, void run().
    • The Comparator<T> object that you pass to a sorted collection (e.g. SortedSet ) is a first-class function, int compare(T o1, T o2).
    • The KeyListener object that you register with the graphical user interface toolkit to get keyboard events is a bundle of several functions, keyPressed(KeyEvent) , keyReleased(KeyEvent), etc.

8.1 Lambda expressions in Java

Java’s lambda expression syntax provides a succinct way to create instances of functional objects. For example:

new Thread(new Runnable() {
    public void run() {
        System.out.println("Hello!");
    }
}).start();

We can use lambda expression instead:

new Thread(() -> {
    System.out.println("Hello");
}).start();

Java provides some standard functional interfaces we can use to write code in the map/filter/reduce pattern, e.g.:

  • Function<T,R> represents unary functions from T to R.
  • BiFunction<T,U,R> represents binary functions from T × U to R.
  • Predicate<T> represents functions from T to boolean.

use thoses we could implement map()

/**
 * Apply a function to every element of a list.
 * @param f function to apply
 * @param list list to iterate over
 * @return [f(list[0]), f(list[1]), ..., f(list[n-1])]
 */
public static <T,R> List<R> map(Function<T,R> f, List<T> list) {
    List<R> result = new ArrayList<>();
    for (T t : list) {
        result.add(f.apply(t));
    }
    return result;
}

8.2 Map/filter/reduce in Java

  • The abstract sequence type we defined above exists in Java as Stream , which defines map , filter , reduce , and many other operations.
  • Collection types like List and Set provide a stream() operation that returns a Stream for the collection, and there’s an Arrays.stream function for creating a Stream from an array.

8.3 Higher-order functions in Java

use Function,BiFunctionandPredicate, we can build highter-order functions which is used in Streamorlambda expression:

/**
 * Compose two functions.
 * @param f function A->B
 * @param g function B->C
 * @return new function A->C formed by composing f with g
 */
public static <A,B,C> Function<A,C> compose(Function<A,B> f,
                                            Function<B,C> g) {
    return t -> g.apply(f.apply(t));
    // --or--
    // return new Function<A,C>() {
    //     public C apply(A t) { return g.apply(f.apply(t)); }
    // };
}

Reference

[1] 6.005 — Software Construction on MIT OpenCourseWare | OCW 6.005 Homepage at https://ocw.mit.edu/ans7870/6/6.005/s16/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值