转自:http://blog.sina.com.cn/s/blog_5420e0000101i4a7.html
一、以下是一些数据挖掘领域专家牛人的网站,有很多精华,能开阔研究者的思路,在此共享:
1.Rakesh Agrawal
主页:http://research.microsoft.com/en-us/people/rakesha/ 数据挖掘领域唯一独有的关键规则研究的创始人,其主要的Apriori算法开启了这一伟大的领域。之前他在IBM研究院工作,目前在微软研究院从事搜索的相关工作。除了关联规则外,他还在Hippocratic Database, Sovereign Information Sharing, and Privacy-Preserving Data Mining等方面做出了开创性的工作。
2.Jiawei Han(韩家炜)
主页:http://www.cs.uiuc.edu/~hanj/
著名数据挖掘书籍,《数据挖掘概念和技术》作者,在DM界久负盛名。他的个人主页里面有很多他的papers,都非常经典;还有他所教授的课程,可以下载课件学习。他的一些杰出的工作集中在关联规则挖掘FP树,异构网络挖掘等。
3.Jon Kleinberg
主页:http://www.cs.cornell.edu/home/kleinber/ 康奈尔大学计算机科学家,著名牛掰的HITS算法的发明人(这里,顺带插一个传说,一次讲座中,有人问Jon的老师HITS和PAGERANK哪个先提出的?老师炫耀地讲:当然是我们HITS先提出来,而且比PAGERANK要早很多呢。最后有人又问那么早多长时间呢,老师回答:一个星期。呵呵)。目前其主要研究兴趣集中在社区分析上面。
4.Philip S. Yu
主页:http://www.cs.uic.edu/PSYu/ 数据库和数据挖掘领域的重要影响力人物,是为数不多的在工业界(watson research center)和学术界都有绝对影响的。。。更多参见链接http://www.guzili.com/?p=131
5.Jian Pei
数据挖掘牛人,经常来中国讲授数据挖掘课程。个人主页上有他发表的数据挖掘相关论文,课程信息,还有一些推荐书籍和源代码。
6.Mohammed J. Zaki
http://www.cs.rpi.edu/~zaki/index.php
数据挖掘牛人,个人主页里面有很多精辟的papers,还有课程,相关的源代码,非常的经典。偶像啊O(∩_∩)O~
7.Qiang Yang
8.Wei Wang
http://www.cs.unc.edu/~weiwang/
数据挖掘牛人,个人主页里面有papers,还有教授的数据挖掘课程,生物信息学课程课件。课件非常好,很适合学习,喜欢看她的课件。
9.周志华
南京大学数据挖掘牛人,个人主页里面有数据挖掘相关很多资源,收集了国外很多大学的数据挖掘课程。
二、一些学习资源/主要是网站
1.Statistical Learning Theory from Berkeley
This course will provide an introduction to probabilistic and computational methods for the statistical modeling of complex, multivariate data. It will concentrate on graphical models, a flexible and powerful approach to capturing statistical dependencies in complex, multivariate data. In particular, the course will focus on the key theoretical and methodological issues of representation, estimation, and inference.
2.Data Mining from Stanford
This will also be helpful.
3.The Lasso Page(略有点old)
The Lasso is a shrinkage and selection method for linear regression. It minimizes the usual sum of squared errors, with a bound on the sum of the absolute values of the coefficients. It has connections to soft-thresholding of wavelet coefficients, forward stagewise regression, and boosting methods.
4.Data Mining Tutorials
This is a really informative website with tutorials on statistical data mining. They were written by Andrew Moore an employee at Google. He covers the foundation of data analysis, including decision trees, Bayesian classifiers and many other techniques we've been learning in class. I great website to check out if you're having trouble with any topics or simply would just like to learn more.
5.Data Mining Research
This is a comprehensive blog about the latest developments in data mining research. Provides a great overview of what scholars and professionals are talking about with regards to the discipline. The individual who started this blog is a working professional in the field, working for FinScore, a Swiss provider of software and professional services focusing in data mining and customer intelligence. A couple very interesting and insightful posts from the blog include: “10 Very Interesting People in Data Mining,” “Data Mining: A New Weapon in the Fight Against Medicaid Fraud,” and “Worst practices in Data Mining.” Stephanie Santoso
6.Statistical Learning Article
An article on the elements on statistical learning, how data mining is used to give predictions. Azai Ighadaro
7.Kernel-Machines.Org
This page is devoted to learning methods building on kernels, such as the support vector machine. It grew out of earlier pages at the Max Planck Institute for Biological Cybernetics and at GMD FIRST, snapshots of which can be found here and here. In those days, information about kernel methods was sparse and nontrivial to find, and the kernel machines web site acted as a central repository for the field. It included a list of people working in the field, and online preprints of most publications.
8.Welcome to Boosting.org
We are pleased to announce a new website on Boosting and related ensemble learning methods, e.g. Boosting, Arcing, Bagging, the connection to mathematical programming and large margin classifiers, and model selection. The aim is to serve as a central information source by providing links to papers, upcoming events, datasets, code, etc.
9.Perfectly Random Sampling with Markov Chains
Random sampling has found numerous applications in physics, statistics, and computer science. Perhaps the most versatile method of generating random samples from a probability space is to run a Markov chain. This site provides a comprehensive collection of this area!
10.Independent Component Analysis
A Tutorial
11.Self Organizing Maps
An excellent short introduction
12.Reversible Markov Chains and Random Walks on Graphs
Early drafts of chapters are available as PDF files
13.A Brief Introduction to Graphical Models and Bayesian Networks
"Graphical models are a marriage between probability theory and graph theory. They provide a natural tool for dealing with two problems that occur throughout applied mathematics and engineering -- uncertainty and complexity -- and in particular they are playing an increasingly important role in the design and analysis of machine learning algorithms. Fundamental to the idea of a graphical model is the notion of modularity -- a complex system is built by combining simpler parts. Probability theory provides the glue whereby the parts are combined, ensuring that the system as a whole is consistent, and providing ways to interface models to data. The graph theoretic side of graphical models provides both an intuitively appealing interface by which humans can model highly-interacting sets of variables as well as a data structure that lends itself naturally to the design of efficient general-purpose algorithms.
14.Gaussian Processes for Machine Learning
The bayesian approach for data mining.