1 Definition of MapReduce
Map/Filter/Reduce: A design pattern that substantially simplifies the implementation of functions that operate over sequences of elements.
Functions can be seen as “first-class” data values, meaning that they can be stored in variables, passed as arguments to functions, and created dynamically like other values.
2 Abstracting out control flow
2.1 Iterator Abstraction
Iterator
gives you a sequence of elements from a data structure, without you having to worry about whether the data structure is a set or a token stream or a list or an array — theIterator
looks the same no matter what the data structure is.- Any
Iterable
can be used with Java’s enhanced for statement —for (File f : files)
— and under the hood, it uses an iterator.
2.2 Map/filter/reduce abstraction
- The map/filter/reduce patterns in this reading do something similar to Iterator, but at an even higher level: they treat the entire sequence of elements as a unit, so that the programmer doesn’t have to name and work with the elements individually.
3 Map
Definition: Map applies a unary function to each element in the sequence and returns a new sequence containing the results, in the same order:
For example, in Python:
>>> from math import sqrt
>>> map(sqrt, [1, 4, 9, 16])
[1.0, 2.0, 3.0, 4.0]
>>> map(str.lower, ['A', 'b', 'C'])
['a', 'b', 'c']
map
is built-in, but it is also straightforward to implement in Python:
def map(f, seq):
result = []
for elt in seq:
result.append(f(elt))
return result
4 Functions as values
Functions are first-class in Python, meaning that they can be assigned to variables, passed as parameters, used as return values, and stored in data structures.
The use of lambda expression:
lambda: k: 2**k
and this represents a function:
def powerOfTwo(k):
return 2**k
and the use of it looks like this:
>>> (lambda k: 2**k)(5)
32
>>> map(lambda k: 2**k, [1, 2, 3, 4])
[2, 4, 8, 16]
4.1 More ways to use map
You can map a mutator operation over them:
map(IOBase.close, streams) # closes each stream on the list
map(Thread.join, threads) # waits for each thread to finish
Some versions of map (including Python’s built-in map ) also support mapping functions with multiple arguments. For example, you can add two lists of numbers element-wise:
>>> import operator
>>> map(operator.add, [1, 2, 3], [4, 5, 6])
[5, 7, 9]
5 Filter
Filter, which tests each element with a unary predicate. Elements that satisfy the predicate are kept; those that don’t are removed. A new list is returned; filter doesn’t modify its input list.
for example:
>>> filter(lambda s: len(s)>0, ['abc', '', 'd'])
['abc', 'd']
>>> filter(str.isalpha, ['x', 'y', '2', '3', 'a'])
['x', 'y', 'a']
6 Reduce
Reduce combines the elements of the sequence together, using a binary function.
In addition to the function and the list, it also takes an initial value that initializes the reduction, and that ends up being the return value if the list is empty.
reduce(f, list, init)
combines the elements of the list from left to right, as follows:
result 0 = init
result1 = f(result 0 , list[0])
result2 = f(result 1 , list[1])
…
result n = f(result n−1 , list[n-1])
result n is the final result for an n-element list
For example:
Glue together a sequence into a string:
>>> reduce(lambda s,x: s+str(x), [1, 2, 3, 4], '')
'1234'
Flatten out nested sublists into a single list:
>>> reduce(operator.concat, [[1, 2], [3, 4], [], [5]], [])
[1, 2, 3, 4, 5]
This is a useful enough sequence operation that we’ll define it as flatten , although it’s just a reduce step inside:
def flatten(list):
return reduce(operator.concat, list, [])
7 Benefites of Abstracting out control
- Map/filter/reduce can often make code shorter and simpler, and allow the programmer to focus on the heart of the computation rather than on the details of loops, branches, and control flow.
- By arranging our program in terms of map, filter, and reduce, and in particular using immutable datatypes and pure functions (functions that do not mutate data) as much as possible, we’ve created more opportunities for safe concurrency, MapReduce is a** pattern for parallelizing large computations**.
8 First-class functions in Java
- In Java, the only first-class values are primitive values (ints, booleans, characters, etc.) and object references.
- The way to implement a first-class function, in an object-oriented programming language like Java that doesn’t support first-class functions directly, is to use an object with a method representing the function.
- The
Runnable
object that you pass to aThread
constructor is a first-class function,void run()
. - The
Comparator<T>
object that you pass to a sorted collection (e.g. SortedSet ) is a first-class function,int compare(T o1, T o2)
. - The
KeyListener
object that you register with the graphical user interface toolkit to get keyboard events is a bundle of several functions,keyPressed(KeyEvent)
,keyReleased(KeyEvent)
, etc.
- The
8.1 Lambda expressions in Java
Java’s lambda expression syntax provides a succinct way to create instances of functional objects. For example:
new Thread(new Runnable() {
public void run() {
System.out.println("Hello!");
}
}).start();
We can use lambda expression instead:
new Thread(() -> {
System.out.println("Hello");
}).start();
Java provides some standard functional interfaces we can use to write code in the map/filter/reduce pattern, e.g.:
Function<T,R>
represents unary functions from T to R.BiFunction<T,U,R>
represents binary functions from T × U to R.Predicate<T>
represents functions from T to boolean.
use thoses we could implement map()
/**
* Apply a function to every element of a list.
* @param f function to apply
* @param list list to iterate over
* @return [f(list[0]), f(list[1]), ..., f(list[n-1])]
*/
public static <T,R> List<R> map(Function<T,R> f, List<T> list) {
List<R> result = new ArrayList<>();
for (T t : list) {
result.add(f.apply(t));
}
return result;
}
8.2 Map/filter/reduce in Java
- The abstract sequence type we defined above exists in Java as
Stream
, which definesmap
,filter
,reduce
, and many other operations. - Collection types like
List
andSet
provide astream()
operation that returns aStream
for the collection, and there’s anArrays.stream
function for creating aStream
from an array.
8.3 Higher-order functions in Java
use Function
,BiFunction
andPredicate
, we can build highter-order functions which is used in Stream
orlambda expression
:
/**
* Compose two functions.
* @param f function A->B
* @param g function B->C
* @return new function A->C formed by composing f with g
*/
public static <A,B,C> Function<A,C> compose(Function<A,B> f,
Function<B,C> g) {
return t -> g.apply(f.apply(t));
// --or--
// return new Function<A,C>() {
// public C apply(A t) { return g.apply(f.apply(t)); }
// };
}
Reference
[1] 6.005 — Software Construction on MIT OpenCourseWare | OCW 6.005 Homepage at https://ocw.mit.edu/ans7870/6/6.005/s16/