Mapreduce and Mobile Algorithms.

MapReduce Algorithms:

Introductory slides:
http://code.google.com/edu/submissions/mapreduce-minilecture/lec2-mapred.ppt

Talk videos:
http://code.google.com/edu/submissions/mapreduce-minilecture/listing.html

Other tutorials:
http://www.cloudera.com/wp-content/uploads/2010/01/5-MapReduceAlgorithms.pdf
http://www.cloudera.com/videos/mapreduce_algorithms

http://www.umiacs.umd.edu/~jimmylin/cloud-2010-Spring/session3-slides.pdf

UMD Class from Spring 2010:
http://www.umiacs.umd.edu/~jimmylin/cloud-2010-Spring/syllabus.html
Accompanying text:
http://www.umiacs.umd.edu/~jimmylin/book.html
http://www.umiacs.umd.edu/~jimmylin/MapReduce-book-final.pdf

Papers to read/discuss in class:

Models

MapReduce: Simplified Data Processing on Large Clusters
Jeffrey Dean and Sanjay Ghemawat
OSDI 2004
http://labs.google.com/papers/mapreduce.html
Communications of the ACM, 2010
http://cacm.acm.org/magazines/2010/1/55744-mapreduce-a-flexible-data-processing-tool/fulltext

On the Complexity of Processing Massive, Unordered, Distributed Data.
J. Feldman, S. Muthukrishnan, A. Sidiropoulos, C. Stein and Z. Svitkina, 
SODA 2008 
http://arxiv.org/abs/cs/0611108

A Model of Computation for MapReduce
H. Karloff, S. Suri, and S. Vassilvitskii
SODA 2010
http://www.sidsuri.com/About_Me_files/mrc2.pdf

Sorting, Searching, and Simulation in the MapReduce Framework
Michael T. Goodrich, Nodari Sitchinava, Qin Zhang
Under submission, 2011
http://arxiv.org/abs/1101.1902

Systems on top of MR:

Interpreting the Data: Parallel Analysis with Sawzall
Scientific Programming Journal 2005
Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan
http://research.google.com/archive/sawzall.html

Pig latin: a not-so-foreign language for data processing
SIGMOD 2008
Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins     
https://portal.acm.org//citation.cfm?id=1376616.1376726&coll=DL&dl=GUIDE&CFID=5697894&CFTOKEN=14842407

Hive: a warehousing solution over a map-reduce framework
VLDB 2009
Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, Raghotham Murthy
http://portal.acm.org/ft_gateway.cfm?id=1687609&type=pdf&coll=DL&dl=GUIDE&CFID=5697894&CFTOKEN=14842407

Hive - A Petabyte Scale Data Warehouse Using Hadoop
ICDE 2010
http://infolab.stanford.edu/~ragho/hive-icde2010.pdf

Alternatives/Extensions

Dryad: Distributed data-parallel programs from sequential building blocks.
Eurosys 2007
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly
http://research.microsoft.com/pubs/63785/eurosys07.pdf

MapReduce Online
Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy and Russell Sears
NSDI 2010, SIGMOD 2010
neilconway.org/docs/sigmod2010_hop_demo.pdf
http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-136.pdf

Algorithms

Simulating Parallel Algorithms in the MapReduce Framework with Applications to Parallel Computational Geometry
Michael T. Goodrich
Massive 2010
http://arxiv.org/abs/1004.4708

A New Computation Model for Cluster Computing
Foto Afrati, Jeff Ullman
http://infolab.stanford.edu/%7Eullman/pub/mapred-model-report.pdf

Max-cover in map-reduce
Flavio Chierichetti, Ravi Kumar, Andrew Tomkins
WWW 2010
http://portal.acm.org/citation.cfm?id=1772715     

Graphs and Matrices

Graph Twiddling in a MapReduce World
Jonathan Cohen
Computing in Science and Engineering, vol. 11, no. 4, pp. 29-41, July/August, 2009.
http://www.computer.org/portal/web/csdl/doi/10.1109/MCSE.2009.120
           
Parallelizing Random Walk with Restart for large-scale query recommendation
Meng-Fen Chiang, Tsung-Wei Wang, Wen-Chih Peng
2010 Workshop on Massive Data Analytics on the Cloud 
http://portal.acm.org/citation.cfm?id=1779599.1779607

Distributed non-negative matrix factorization for dyadic data analysis on mapreduce

Chao Liu, Hung-chih Yang, Jinliang Fan, Li-Wei He, Yi-Min Wang WWW 2010

http://research.microsoft.com/pubs/119077/DNMF.pdf

DOULION: Counting Triangles in Massive Graphs with a Coin 
Charalampos E. Tsourakakis, U. Kang, Gary L. Miller, Christos Faloutsos
KDD 2009
Fast counting of triangles in real-world networks: proofs, algorithms and observations
http://reports-archive.adm.cs.cmu.edu/anon/ml2008/CMU-ML-08-103.pdf

PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations
U Kang, Charalampos E. Tsourakakis, Christos Faloutsos
IEEE International Conference on Data Mining (ICDM 2009)
http://www.cs.cmu.edu/~ctsourak/pegasusICDM09.pdf

Database

Optimizing Joins in a Map-Reduce Environment
Foto Afrati, Jeff Ullman
http://infolab.stanford.edu/%7Eullman/pub/join-mr.pdf

Efficient parallel set-similarity joins using MapReduce
Rares Vernica, Michael J. Carey, Chen Li
SIGMOD 2010
http://portal.acm.org/citation.cfm?id=1807222

Applications


Scaling Up Classifiers to Cloud Computers  
Christopher Moretti, Karsten Steinhaeuser, Douglas Thain, Nitesh V. Chawla
ICDM 08
http://www.cse.nd.edu/~dthain/papers/classify-icdm08.pdf

Large-Scale Behavioral Targeting
KDD 09
Ye Chen, Dmitriy Pavlov, John Canny
http://www.cc.gatech.edu/~zha/CSE8801/ad/p209-chen.pdf

MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees
BMC Bioinformatics 2010
Suzanne J Matthews email and Tiffani L Williams email
http://www.biomedcentral.com/1471-2105/11/S1/S15

A novel approach to multiple sequence alignment using hadoop data grids
2010 Workshop on Massive Data Analytics on the Cloud
G. Sudha Sadasivam, G. Baktavatchalam
http://portal.acm.org/citation.cfm?id=1779599.1779601

Experiences on Processing Spatial Data with MapReduce
Ariel Cary, Zhengguo Sun, Vagelis Hristidis, Naphtali Rishe
21st International Conference on Scientific and Statistical Database Management 
http://users.cis.fiu.edu/~vagelis/publications/Spatial-MapReduce-SSDBM2009.pdf

Web-Scale Distributional Similarity and Entity Set Expansion
Patrick Pantel, Eric Crestan, Arkady Borkovsky, Ana-Maria Popescu, Vishnu Vyas
2009 Conference on Empirical Methods in Natural Language Processing
http://www.aclweb.org/anthology/D/D09/D09-1098.pdf

Other lists of papers:

http://atbrox.com/2010/05/08/mapreduce-hadoop-algorithms-in-academic-papers-may-2010-update/
http://www.columbia.edu/~ak2834/mapreduce.html

-----------------------------------------------------------------------------------------------------------------------------------------------------------------

Schedule (Mobile Apps)

Jan 20: First meeting and planning projects.  
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值