MapReduce Algorithms:
Introductory slides:
http://code.google.com/edu/submissions/mapreduce-minilecture/lec2-mapred.ppt
Talk videos:
http://code.google.com/edu/submissions/mapreduce-minilecture/listing.html
Other tutorials:
http://www.cloudera.com/wp-content/uploads/2010/01/5-MapReduceAlgorithms.pdf
http://www.cloudera.com/videos/mapreduce_algorithms
http://www.umiacs.umd.edu/~jimmylin/cloud-2010-Spring/session3-slides.pdf
UMD Class from Spring 2010:
http://www.umiacs.umd.edu/~jimmylin/cloud-2010-Spring/syllabus.html
Accompanying text:
http://www.umiacs.umd.edu/~jimmylin/book.html
http://www.umiacs.umd.edu/~jimmylin/MapReduce-book-final.pdf
Papers to read/discuss in class:
Jeffrey Dean and Sanjay Ghemawat
OSDI 2004
http://labs.google.com/papers/mapreduce.html
Communications of the ACM, 2010
http://cacm.acm.org/magazines/2010/1/55744-mapreduce-a-flexible-data-processing-tool/fulltext
On the Complexity of Processing Massive, Unordered, Distributed Data.
J. Feldman, S. Muthukrishnan, A. Sidiropoulos, C. Stein and Z. Svitkina,
SODA 2008
http://arxiv.org/abs/cs/0611108
A Model of Computation for MapReduce
H. Karloff, S. Suri, and S. Vassilvitskii
SODA 2010
http://www.sidsuri.com/About_Me_files/mrc2.pdf
Sorting, Searching, and Simulation in the MapReduce Framework
Michael T. Goodrich, Nodari Sitchinava, Qin Zhang
Under submission, 2011
http://arxiv.org/abs/1101.1902
Scientific Programming Journal 2005
Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan
http://research.google.com/archive/sawzall.html
Pig latin: a not-so-foreign language for data processing
SIGMOD 2008
Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins
https://portal.acm.org//citation.cfm?id=1376616.1376726&coll=DL&dl=GUIDE&CFID=5697894&CFTOKEN=14842407
Hive: a warehousing solution over a map-reduce framework
VLDB 2009
Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, Raghotham Murthy
http://portal.acm.org/ft_gateway.cfm?id=1687609&type=pdf&coll=DL&dl=GUIDE&CFID=5697894&CFTOKEN=14842407
Hive - A Petabyte Scale Data Warehouse Using Hadoop
ICDE 2010
http://infolab.stanford.edu/~ragho/hive-icde2010.pdf
Eurosys 2007
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly
http://research.microsoft.com/pubs/63785/eurosys07.pdf
MapReduce Online
Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy and Russell Sears
NSDI 2010, SIGMOD 2010
neilconway.org/docs/sigmod2010_hop_demo.pdf
http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-136.pdf
Michael T. Goodrich
Massive 2010
http://arxiv.org/abs/1004.4708
A New Computation Model for Cluster Computing
Foto Afrati, Jeff Ullman
http://infolab.stanford.edu/%7Eullman/pub/mapred-model-report.pdf
Max-cover in map-reduce
Flavio Chierichetti, Ravi Kumar, Andrew Tomkins
WWW 2010
http://portal.acm.org/citation.cfm?id=1772715
Jonathan Cohen
Computing in Science and Engineering, vol. 11, no. 4, pp. 29-41, July/August, 2009.
http://www.computer.org/portal/web/csdl/doi/10.1109/MCSE.2009.120
Parallelizing Random Walk with Restart for large-scale query recommendation
Meng-Fen Chiang, Tsung-Wei Wang, Wen-Chih Peng
2010 Workshop on Massive Data Analytics on the Cloud
http://portal.acm.org/citation.cfm?id=1779599.1779607
Chao Liu, Hung-chih Yang, Jinliang Fan, Li-Wei He, Yi-Min Wang WWW 2010
http://research.microsoft.com/pubs/119077/DNMF.pdf
DOULION: Counting Triangles in Massive Graphs with a Coin
Charalampos E. Tsourakakis, U. Kang, Gary L. Miller, Christos Faloutsos
KDD 2009
Fast counting of triangles in real-world networks: proofs, algorithms and observations
http://reports-archive.adm.cs.cmu.edu/anon/ml2008/CMU-ML-08-103.pdf
PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations
U Kang, Charalampos E. Tsourakakis, Christos Faloutsos
IEEE International Conference on Data Mining (ICDM 2009)
http://www.cs.cmu.edu/~ctsourak/pegasusICDM09.pdf
Foto Afrati, Jeff Ullman
http://infolab.stanford.edu/%7Eullman/pub/join-mr.pdf
Efficient parallel set-similarity joins using MapReduce
Rares Vernica, Michael J. Carey, Chen Li
SIGMOD 2010
http://portal.acm.org/citation.cfm?id=1807222
Scaling Up Classifiers to Cloud Computers
Christopher Moretti, Karsten Steinhaeuser, Douglas Thain, Nitesh V. Chawla
ICDM 08
http://www.cse.nd.edu/~dthain/papers/classify-icdm08.pdf
Large-Scale Behavioral Targeting
KDD 09
Ye Chen, Dmitriy Pavlov, John Canny
http://www.cc.gatech.edu/~zha/CSE8801/ad/p209-chen.pdf
MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees
BMC Bioinformatics 2010
Suzanne J Matthews email and Tiffani L Williams email
http://www.biomedcentral.com/1471-2105/11/S1/S15
A novel approach to multiple sequence alignment using hadoop data grids
2010 Workshop on Massive Data Analytics on the Cloud
G. Sudha Sadasivam, G. Baktavatchalam
http://portal.acm.org/citation.cfm?id=1779599.1779601
Experiences on Processing Spatial Data with MapReduce
Ariel Cary, Zhengguo Sun, Vagelis Hristidis, Naphtali Rishe
21st International Conference on Scientific and Statistical Database Management
http://users.cis.fiu.edu/~vagelis/publications/Spatial-MapReduce-SSDBM2009.pdf
Web-Scale Distributional Similarity and Entity Set Expansion
Patrick Pantel, Eric Crestan, Arkady Borkovsky, Ana-Maria Popescu, Vishnu Vyas
2009 Conference on Empirical Methods in Natural Language Processing
http://www.aclweb.org/anthology/D/D09/D09-1098.pdf
http://www.columbia.edu/~ak2834/mapreduce.html
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Schedule (Mobile Apps)
Jan 20: First meeting and planning projects.
Introductory slides:
http://code.google.com/edu/submissions/mapreduce-minilecture/lec2-mapred.ppt
Talk videos:
http://code.google.com/edu/submissions/mapreduce-minilecture/listing.html
Other tutorials:
http://www.cloudera.com/wp-content/uploads/2010/01/5-MapReduceAlgorithms.pdf
http://www.cloudera.com/videos/mapreduce_algorithms
http://www.umiacs.umd.edu/~jimmylin/cloud-2010-Spring/session3-slides.pdf
UMD Class from Spring 2010:
http://www.umiacs.umd.edu/~jimmylin/cloud-2010-Spring/syllabus.html
Accompanying text:
http://www.umiacs.umd.edu/~jimmylin/book.html
http://www.umiacs.umd.edu/~jimmylin/MapReduce-book-final.pdf
Papers to read/discuss in class:
Models
MapReduce: Simplified Data Processing on Large ClustersJeffrey Dean and Sanjay Ghemawat
OSDI 2004
http://labs.google.com/papers/mapreduce.html
Communications of the ACM, 2010
http://cacm.acm.org/magazines/2010/1/55744-mapreduce-a-flexible-data-processing-tool/fulltext
On the Complexity of Processing Massive, Unordered, Distributed Data.
J. Feldman, S. Muthukrishnan, A. Sidiropoulos, C. Stein and Z. Svitkina,
SODA 2008
http://arxiv.org/abs/cs/0611108
A Model of Computation for MapReduce
H. Karloff, S. Suri, and S. Vassilvitskii
SODA 2010
http://www.sidsuri.com/About_Me_files/mrc2.pdf
Sorting, Searching, and Simulation in the MapReduce Framework
Michael T. Goodrich, Nodari Sitchinava, Qin Zhang
Under submission, 2011
http://arxiv.org/abs/1101.1902
Systems on top of MR:
Interpreting the Data: Parallel Analysis with SawzallScientific Programming Journal 2005
Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan
http://research.google.com/archive/sawzall.html
Pig latin: a not-so-foreign language for data processing
SIGMOD 2008
Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins
https://portal.acm.org//citation.cfm?id=1376616.1376726&coll=DL&dl=GUIDE&CFID=5697894&CFTOKEN=14842407
Hive: a warehousing solution over a map-reduce framework
VLDB 2009
Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, Raghotham Murthy
http://portal.acm.org/ft_gateway.cfm?id=1687609&type=pdf&coll=DL&dl=GUIDE&CFID=5697894&CFTOKEN=14842407
Hive - A Petabyte Scale Data Warehouse Using Hadoop
ICDE 2010
http://infolab.stanford.edu/~ragho/hive-icde2010.pdf
Alternatives/Extensions
Dryad: Distributed data-parallel programs from sequential building blocks.Eurosys 2007
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly
http://research.microsoft.com/pubs/63785/eurosys07.pdf
MapReduce Online
Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy and Russell Sears
NSDI 2010, SIGMOD 2010
neilconway.org/docs/sigmod2010_hop_demo.pdf
http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-136.pdf
Algorithms
Simulating Parallel Algorithms in the MapReduce Framework with Applications to Parallel Computational GeometryMichael T. Goodrich
Massive 2010
http://arxiv.org/abs/1004.4708
A New Computation Model for Cluster Computing
Foto Afrati, Jeff Ullman
http://infolab.stanford.edu/%7Eullman/pub/mapred-model-report.pdf
Max-cover in map-reduce
Flavio Chierichetti, Ravi Kumar, Andrew Tomkins
WWW 2010
http://portal.acm.org/citation.cfm?id=1772715
Graphs and Matrices
Graph Twiddling in a MapReduce WorldJonathan Cohen
Computing in Science and Engineering, vol. 11, no. 4, pp. 29-41, July/August, 2009.
http://www.computer.org/portal/web/csdl/doi/10.1109/MCSE.2009.120
Parallelizing Random Walk with Restart for large-scale query recommendation
Meng-Fen Chiang, Tsung-Wei Wang, Wen-Chih Peng
2010 Workshop on Massive Data Analytics on the Cloud
http://portal.acm.org/citation.cfm?id=1779599.1779607
Distributed non-negative matrix factorization for dyadic data analysis on mapreduce
Chao Liu, Hung-chih Yang, Jinliang Fan, Li-Wei He, Yi-Min Wang WWW 2010
http://research.microsoft.com/pubs/119077/DNMF.pdf
DOULION: Counting Triangles in Massive Graphs with a Coin
Charalampos E. Tsourakakis, U. Kang, Gary L. Miller, Christos Faloutsos
KDD 2009
Fast counting of triangles in real-world networks: proofs, algorithms and observations
http://reports-archive.adm.cs.cmu.edu/anon/ml2008/CMU-ML-08-103.pdf
PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations
U Kang, Charalampos E. Tsourakakis, Christos Faloutsos
IEEE International Conference on Data Mining (ICDM 2009)
http://www.cs.cmu.edu/~ctsourak/pegasusICDM09.pdf
Database
Optimizing Joins in a Map-Reduce EnvironmentFoto Afrati, Jeff Ullman
http://infolab.stanford.edu/%7Eullman/pub/join-mr.pdf
Efficient parallel set-similarity joins using MapReduce
Rares Vernica, Michael J. Carey, Chen Li
SIGMOD 2010
http://portal.acm.org/citation.cfm?id=1807222
Applications
Scaling Up Classifiers to Cloud Computers
Christopher Moretti, Karsten Steinhaeuser, Douglas Thain, Nitesh V. Chawla
ICDM 08
http://www.cse.nd.edu/~dthain/papers/classify-icdm08.pdf
Large-Scale Behavioral Targeting
KDD 09
Ye Chen, Dmitriy Pavlov, John Canny
http://www.cc.gatech.edu/~zha/CSE8801/ad/p209-chen.pdf
MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees
BMC Bioinformatics 2010
Suzanne J Matthews email and Tiffani L Williams email
http://www.biomedcentral.com/1471-2105/11/S1/S15
A novel approach to multiple sequence alignment using hadoop data grids
2010 Workshop on Massive Data Analytics on the Cloud
G. Sudha Sadasivam, G. Baktavatchalam
http://portal.acm.org/citation.cfm?id=1779599.1779601
Experiences on Processing Spatial Data with MapReduce
Ariel Cary, Zhengguo Sun, Vagelis Hristidis, Naphtali Rishe
21st International Conference on Scientific and Statistical Database Management
http://users.cis.fiu.edu/~vagelis/publications/Spatial-MapReduce-SSDBM2009.pdf
Web-Scale Distributional Similarity and Entity Set Expansion
Patrick Pantel, Eric Crestan, Arkady Borkovsky, Ana-Maria Popescu, Vishnu Vyas
2009 Conference on Empirical Methods in Natural Language Processing
http://www.aclweb.org/anthology/D/D09/D09-1098.pdf
Other lists of papers:
http://atbrox.com/2010/05/08/mapreduce-hadoop-algorithms-in-academic-papers-may-2010-update/http://www.columbia.edu/~ak2834/mapreduce.html
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Schedule (Mobile Apps)
Jan 20: First meeting and planning projects.