7 Important Data Science Papers

转自:http://datascience101.wordpress.com/2013/08/26/7-important-data-science-papers/

It is back-to-school time, and here are some papers to keep you busy this school year. All the papers are free. This list is far from exhaustive, but these are some important papers in data science and big data.

Google Search

  • PageRank – This is the paper that explains the algorithm behind Google search.

Hadoop

  • MapReduce – This paper explains a programming model for processing large datasets. In particular, it is the programming model used in hadoop.
  • Google File System – Part of hadoop is HDFS. HDFS is an open-source version of the distributed file system explained in this paper.

NoSQL

These are 2 of the papers that drove/started the NoSQL debate. Each paper describes a different type of storage system intended to be massively scabable.

Machine Learning

Bonus Paper

  • Random Forests – One of the most popular machine learning techniques. It is heavily used in Kaggle competitions, even by the winners.

Are there any other papers you feel should be on the list?


阅读更多
文章标签: DM 数据科学
个人分类: 数据科学
上一篇我如何把薪水从 50人民币/天 提升到 100美元/小时的
下一篇公开课笔记大盘点
想对作者说点什么? 我来说一句

Writing Science_ How to Write Papers

2018年05月25日 3.64MB 下载

没有更多推荐了,返回首页

关闭
关闭