我们知道Groovy中的集合操作collect是串行的。参见Groovy(1.8.6)的源代码org.codehaus.groovy.runtime.DefaultGroovyMethods
/**
* Iterates through this aggregate Object transforming each item into a new value using the
* <code>transform</code> closure, returning a list of transformed values.
* Example:
* <pre class="groovyTestCase">def list = [1, 'a', 1.23, true ]
* def types = list.collect { it.class }
* assert types == [Integer, String, BigDecimal, Boolean]</pre>
*
* @param self an aggregate Object with an Iterator returning its items
* @param transform the closure used to transform each item of the aggregate object
* @return a List of the transformed values
* @since 1.0
*/
public static <T> List<T> collect(Object self, Closure<T> transform) {
return (List<T>) collect(self, new ArrayList<T>(), transform);
}
collect最终使用Java的Iterator:
/**
* Iterates through this aggregate Object transforming each item into a new value using the <code>transform</code> closure
* and adding it to the supplied <code>collector</code>.
*
* @param self an aggregate Object with an Iterator returning its items
* @param collector the Collection to which the transformed values are added
* @param transform the closure used to transform each item of the aggregate object
* @return the collector with all transformed values added to it
* @since 1.0
*/
public static <T> Collection<T> collect(Object self, Collection<T> collector, Closure<? extends T> transform) {
for (Iterator iter = InvokerHelper.asIterator(self); iter.hasNext(); ) {
collector.add(transform.call(iter.next()));
}
return collector;
}
此处没有任何特殊的,自然就是串行执行transform.call()了。
如何为Collection增加并行处理能力?有个办法,原理很简单,就是将原始的closure包装到线程中,等所有线程完成后整个迭代操作才正式完成。
import java.util.concurrent.*
class ParallelFeature {
static POOL_SIZE = 10
static def collectParallel(collections, block) {
return collectParallel(collections, 60, block)
}
static def collectParallel(collections, timeout, block) {
def exec = Executors.newFixedThreadPool(POOL_SIZE)
def latch = new CountDownLatch(collections.size())
def result = collections.collect {
exec.submit(new Callable() {
def call() {
def result = block(it)
latch.countDown()
result
}
})
}
result = latch.await(timeout, TimeUnit.SECONDS) ? result.collect { it.get() } : null
return result
}
}
简单起见,该代码没有对异常过多处理。
此外,为了方便使用该方法,还需要用Groovy的metaClass在使用前将它植入。
java.util.Collection.metaClass.collectParallel = { block ->
ParallelFeature.collectParallel(delegate, block)
}
java.util.Collection.metaClass.collectParallel = { timeout, block ->
ParallelFeature.collectParallel(delegate, timeout, block)
}
然后可以直接替换原来代码中的collect操作了。
files.collectParallel { file ->
download(file)
}
是不是很简单?