KDD Cup 竞赛历程-CSDN博客

KDD Cup简介

KDD Cup is the annual Data Mining and Knowledge Discovery competition organized by ACM Special Interest Group on Knowledge Discovery and Data Mining, the leading professional organization of data miners.
（由SIGKDD（ACM Special Interest Group on Knowledge Discovery and Data Mining）组织，每年一次的KDD竞赛，和SIGKDD国际会议同期举行。同时面向学术界和业界。）

here is the KDD Cup Center：

http://www.sigkdd.org/kddcup/index.php

历届KDD Cup的主题：

2015:用大数据预测MOOCer是否会“翘课”

2014:帮助一个慈善网站识别出那些格外激动人心的项目

2013:Determine whether an author has written a given paper

2012:(1)社交网络中的个性化推荐系统,(2)搜索广告系统的pTCR点击率预估

2011:(1)音乐评分预测(2)识别音乐是否被用户评分

2010:根据智能教学系统和学生之间的交互日志，来预测学生在数学题测验上的表现

2009:电信运营商客户行为预测

2008:乳腺癌早期检测问题

2007:预测电影评价问题

2006:医疗数据挖掘

2005:互联网用户查询分类

2004, 有指导分类的多种性能度量
2003, 网络挖掘及使用日志分析
2002, 生物信息及文本挖掘（分子生物学领域）
2001, 生物信息及医药（医药设计中的生物活性预测、预测基因/蛋白质的功能及定位）
2000, web挖掘任务（根据点击流及交易数据）
1999, 网络侵入侦测及报告
1998, 生成最佳直销名单
1997, 预测出最可能的善款捐赠人

KDD Cup 1997

http://www-aig.jpl.nasa.gov/public/kdd97/kdd_cup.html
Task
given data on past responders to fund-raising, predict most likely responders for new campaign
Dataset
321 fields/variables, Significant effort on data preprocessing
Participants
45 companies/institutions participated
16 contestants turned in their results
Shared 1-2 place
Charles Elkan, Ph.D. from University of California, San Diego (BNB, Boosted Naive Bayesian Classifier)
Urban Science Applications, Inc. (Gain, Direct Marketing Selection System)
3rd Place
Silicon Graphics, Inc (MineSet)

KDD Cup 1998

http://kdd.ics.uci.edu/databases/kddcup98/kddcup98.html
Task: the goal was to select the best list to mail a solicitation
Dataset: 95412 records and 481 fields
Participants: 21 teams completed the challenge and submitted results
1st place: Urban Science Applications, Inc. (Software GainSmarts)
2nd place: SAS Institute, Inc. (Software Enterprise Miner)
3rd place: Quadstone Limited (Software Decisionhouse)

KDD Cup 1999

1.Classifier learning contest
http://www-cse.ucsd.edu/users/elkan/clresults.html
The goal was to build a predictive model for identifying network intrusions.
24 entries were submitted.
2.Knowledge discovery "report" contest
http://www.cse.ucsd.edu/users/elkan/kdresults.html
The goal was to apply a range of knowledge discovery techniques to the same data used in the 1998 competition, and discover higher-level knowledge from data .
Co-winners
J. Georges and A.Milley (SAS)
S. Rosset and A. Inger (Amdocs, Israel).
Honorable mention
Paola Sebastiani, Marco Ramoni, and Alexander Crea of Bayesware Ltd.

KDD Cup 2000

http://www.ecn.purdue.edu/KDDCUP/
Task: The questions related to clickstream and purchase data from an e-tailer. Five questions.
Dataset: Obtained from Gazelle.com, a legwear and legcare Web retailer
Over 150 teams requested data, 30 teams submitted the answers.
Questions 1 & 5 Winner: Amdocs
Exploratory Data Analysis – SAS, S Plus
Classification Tree, Rules Extraction – Amdocs Business Insight Tool
Questions 2 & 3 Winner: Salford Systems
Question 4 Winner: e-steam

KDD Cup 2001

http://www.cs.wisc.edu/~dpage/kddcup2001/
Problems from bioinformaitcs
Data set 1
Prediction of Molecular Bioactivity for Drug Design -- Binding to Thrombin (task 1)
Data set 2
Prediction of Gene/Protein Function (task 2) and Localization (task 3)
136 groups , 200 submissions
Task 1 winner (Thrombin)
Jie Cheng (Canadian Imperial Bank of Commerce).
Bayesian network learner and classifier
Task 2 winner (Function)
Mark-A. Krogel (University of Magdeburg).
Inductive Logic programming
Task 3 winner (Localization)
Hisashi Hayashi, Jun Sese, and Shinichi Morishita (University of Tokyo).
K nearest neighbor

KDD Cup 2002

http://www.biostat.wisc.edu/~craven/kddcup/
Two tasks from molecular biology domains
Task 1: construct models that can assist genome annotators by automatically extracting information from scientific articles
Task 2: learn models that characterize the behavior of individual genes in a hidden experimental setting.
Task 1 winner
Yizhar Regev and Michal Finkelstein
ClearForest and Celera, USA
Task 2 winner
Adam Kowalczyk and Bhavani Raskutti
Telstra Research Laboratories, Australia
Single Class SVM

KDD Cup 2003

http://www.cs.cornell.edu/projects/kddcup/
Data set
A very large archive of research papers
Citation structure and (partial) data on the downloading of papers by users
Task
Task 1: predict how many citations each paper will receive during the three months leading up to the KDD 2003 conference
Task 2: a citation graph of a large subset of the archive from only the LaTex sources
Task 3: each paper's popularity will be estimated based on partial download logs
Task 4: devise their own questions
Task 1 :
Claudia Perlich, Foster Provost, Sofus Kacskassy
New York University
Task 2:
David Vogel
AI Insight Inc.
Task 3 :
Janez Brank and Jure Leskovec
Jozef Stefan Institute, Slovenija
Task 4 :
Amy McGovern, Lisa Friedland, Michael Hay, Brian Gallagher, Andrew Fast, Jennifer Neville, and David Jensen
University of Massachusetts, Amherst, USA

KDD Cup 2004

http://kodiak.cs.cornell.edu/kddcup/
April 28 --- July 14, 2004
两个问题，数据分别来自
生物信息学
量子物理学
不同性能指标下的数据挖掘问题
有来自49个国家的注册 (including .com)
优胜者来自China, Germany, India, New Zealand, USA
优胜者一半来自公司，一半来自大学
Protein Winners:
Bernhard Pfahringer
University of Waikato, Computer Science Department
1st Place Overall

Yan Fu, RuiXiang Sun, Qiang Yang, Simin He, Chunli Wang, Haipeng Wang, Shiguang Shan, Junfa Liu, Wen Gao
Institute of Computing Technology, Chinese Academy of Sciences
Tied for 1st Place Overall
Honorable Mention for Squared Error
Honorable Mention for Average Precision

David S. Vogel, Eric Gottschalk, and Morgan C. Wang
MEDai / A.I. Insight / University of Central Florida
Tied for 1st Place Overall
Honorable Mention for Top-1 Accuracy

Dirk Dach, Holger Flick, Christophe Foussette, Marcel Gaspar, Daniel Hakenjos, Felix Jungermann, Christian Kullmann, Anna Litvina, Lars Michele, Katharina Morik, Martin Scholz, Siehyun Strobel, Marc Twiehaus, Nazif Veliu
Artificial Intelligence Unit, University of Dortmund, Germany
Honorable Mention for Rank of Last

资源来自：http://huzhyi21.blog.163.com/blog/static/1007396200981534952541

2015届KDD Cup

http://www.kddcup2015.com/information.html

2014届KDD Cup

https://www.kaggle.com/c/kdd-cup-2014-predicting-excitement-at-donors-choose/

http://www.datapub.cn/d/562defa1e4b05a46eeaad9ce

2013届KDD Cup

http://www.kaggle.com/c/kdd-cup-2013-author-paper-identification-challenge

http://blog.csdn.net/pf1492536/article/details/9162667

2012届KDD Cup

Track1任务：社交网络中的个性化推荐系统

根据腾讯微博中的用户属性（User Profile）、SNS社交关系、在社交网络中的互动记录（retweet、comment、at）等，以及过去30天内的历史item推荐记录，来预测接下来最有可能被用户接受的推荐item列表

Track2任务：搜索广告系统的pTCR点击率预估

提供用户在腾讯搜索的查询词（query）、展现的广告信息（包括广告标题、描述、url等），以及广告的相对位置（多条广告中的排名）和用户点击情况，以及广告主和用户的属性信息，来预测后续时间用户对广告的点击情况

数据集：http://www.kddcup2012.org/c/kddcup2012-track1/data

论文：http://www.kddcup2012.org/workshop

2011届KDD Cup

Track1任务：音乐评分预测

根据用户在雅虎音乐上item的历史评分记录，来预测用户对其他item（包括歌曲、专辑等）的评分和实际评分之间的差异RMSE（最小均方误差）。同时提供的还有歌曲所属的专辑、歌手、曲风等信息

Track2任务：识别音乐是否被用户评分

每个用户提供6首候选的歌曲，其中3首为用户已评分数据，另3首是该用户未评分，但是出自用户中整体评分较高的歌曲。歌曲的属性信息（专辑、歌手、曲风等）也同样提供。参赛者给出二分分类结果（0/1分类），并根据整体准确率计算最终排名

数据集：http://kddcup.yahoo.com/datasets.php#

论文：http://kddcup.yahoo.com/workshop.php

2010届KDD Cup

http://www.datapub.cn/d/55d6bed7e4b022099bb3e532

2009届KDD Cup

法国电信运营商Orange的大规模数据中，积累了大量客户的行为记录。竞赛者需要设计一个良好的客户关系管理系统（CRM），用快速、稳定的方法，预测客户三个维度的属性，包括：1、忠诚度：用户切换运营商的可能性（Churn）；2、购买欲：购买新服务的可能性（Appetency）；3、增值性：客户升级或追加购买高利润产品的可能性（Up-selling）。结果用AUC曲线来评估

数据集：http://www.sigkdd.org/kddcup/index.php

论文：http://jmlr.csail.mit.edu/proceedings/papers/v7/

2008届KDD Cup

http://www.kdd.org/kdd-cup/view/kdd-cup-2008/Data