Top 10 Algorithms in Data Mining
- [April 22, 2009:] A companion book on The Top Ten Algorithms in Data Mining published in April 2009
- [December 24, 2007:] A companion article in PDF for this top-10 algorithm initiative:
Xindong Wu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J. McLachlan, Angus Ng, Bing Liu, Philip S. Yu, Zhi-Hua Zhou, Michael Steinbach, David J. Hand and Dan Steinberg, Top 10 Algorithms in Data Mining, Knowledge and Information Systems, 14(2008), 1: 1-37.
In an effort to identify some of the most influential algorithms that have been widely used in the data mining community, the IEEE International Conference on Data Mining (ICDM) identified the top 10 algorithms in data mining for presentation at ICDM '06 in Hong Kong.
As the first step in the identification process, in September 2006 we invited the ACM KDD Innovation Award and IEEE ICDM Research Contributions Award winners to each nominate up to 10 best-known algorithms in data mining. All except one in this distinguished set of award winners responded to our invitation. We asked each nomination to provide the following information: (a) the algorithm name, (b) a brief justification, and (c) a representative publication reference. We also advised that each nominated algorithm should have been widely cited and used by other researchers in the field, and the nominations from each nominator as a group should have a reasonable representation of the different areas in data mining.
After the nominations in Step 1, we verified each nomination for its citations on Google Scholar in late October 2006, and removed those nominations that did not have at least 50 citations. All remaining (18) nominations are given on the candidate list below, organized in 10 topics. Please note that for some of these algorithms such as K-means, the citation is not given on the original paper that introduced the algorithm, but a recent paper that highlights the importance of the technique.
In the third step of the identification process, we had a wider involvement of the research community. We invited the Program Committee members of KDD-06, ICDM '06, and SDM '06 as well as the ACM KDD Innovation Award and IEEE ICDM Research Contributions Award winners to each vote for up to 10 well-known algorithms from the above candidate list. The voting results of this step were presented at ICDM '06 and are given in the slides below.
We hope the identification of the top 10 algorithms can promote data mining to wider real-world applications and inspire more researchers in data mining to further explore these 10 algorithms, including their impact and new research issues.
Xindong Wu and Vipin Kumar
December 25, 2006