KDDCUP历年主题

KDD Cup简介

KDD Cup is the annual Data Mining and Knowledge Discovery competition organized by ACM Special Interest Group on Knowledge Discovery and Data Mining, the leading professional organization of data miners.
(由SIGKDD(ACM Special Interest Group on Knowledge Discovery and Data Mining)组织,每年一次的KDD竞赛,和SIGKDD国际会议同期举行。同时面向学术界和业界。 )

here is the KDD Cup Center:

http://www.sigkdd.org/kddcup/index.php

历届KDD Cup的主题:

2015:用大数据预测MOOCer是否会“翘课”

2014:帮助一个慈善网站识别出那些格外激动人心的项目

2013:Determine whether an author has written a given paper

2012:(1)社交网络中的个性化推荐系统,(2)搜索广告系统的pTCR点击率预估

2011:(1)音乐评分预测(2)识别音乐是否被用户评分

2010:根据智能教学系统和学生之间的交互日志,来预测学生在数学题测验上的表现

2009:电信运营商客户行为预测

2008:乳腺癌早期检测问题

2007:预测电影评价问题

2006:医疗数据挖掘

2005:互联网用户查询分类

2004, 有指导分类的多种性能度量
2003, 网络挖掘及使用日志分析
2002, 生物信息及文本挖掘(分子生物学领域)
2001, 生物信息及医药(医药设计中的生物活性预测、预测基因/蛋白质的功能及定位)
2000, web挖掘任务(根据点击流及交易数据)
1999, 网络侵入侦测及报告
1998, 生成最佳直销名单
1997, 预测出最可能的善款捐赠人

KDD Cup 1997

http://www-aig.jpl.nasa.gov/public/kdd97/kdd_cup.html
Task
given data on past responders to fund-raising, predict most likely responders for new campaign
Dataset
321 fields/variables, Significant effort on data preprocessing
Participants
45 companies/institutions participated
16 contestants turned in their results
Shared 1-2 place
Charles Elkan, Ph.D. from University of California, San Diego (BNB, Boosted Naive Bayesian Classifier)
Urban Science Applications, Inc. (Gain, Direct Marketing Selection System)
3rd Place
Silicon Graphics, Inc (MineSet)

KDD Cup 1998

http://kdd.ics.uci.edu/databases/kddcup98/kddcup98.html
Task: the goal was to select the best list to mail a solicitation
Dataset: 95412 records and 481 fields
Participants: 21 teams completed the challenge and submitted results
1st place: Urban Science Applications, Inc. (Software GainSmarts)
2nd place: SAS Institute, Inc. (Software Enterprise Miner)
3rd place: Quadstone Limited (Software Decisionhouse)

KDD Cup 1999

1.Classifier learning contest
http://www-cse.ucsd.edu/users/elkan/clresults.html
The goal was to build a predictive model for identifying network intrusions.
24 entries were submitted.
2.Knowledge discovery "report" contest
http://www.cse.ucsd.edu/users/elkan/kdresults.html
The goal was to apply a range of knowledge discovery techniques to the same data used in the 1998 competition, and discover higher-level knowledge from data .
Co-winners
J. Georges and A.Milley (SAS)
S. Rosset and A. Inger (Amdocs, Israel).
Honorable mention
Paola Sebastiani, Marco Ramoni, and Alexander Crea of Bayesware Ltd.

KDD Cup 2000

http://www.ecn.purdue.edu/KDDCUP/
Task: The questions related to clickstream and purchase data from an e-tailer. Five questions.
Dataset: Obtained from Gazelle.com, a legwear and legcare Web retailer
Over 150 teams requested data, 30 teams submitted the answers.
Questions 1 & 5 Winner: Amdocs
Exploratory Data Analysis – SAS, S Plus
Classification Tree, Rules Extraction – Amdocs Business Insight Tool
Questions 2 & 3 Winner: Salford Systems
Question 4 Winner: e-steam

KDD Cup 2001

http://www.cs.wisc.edu/~dpage/kddcup2001/
Problems from bioinformaitcs
Data set 1
Prediction of Molecular Bioactivity for Drug Design -- Binding to Thrombin (task 1)
Data set 2
Prediction of Gene/Protein Function (task 2) and Localization (task 3)
136 groups , 200 submissions
Task 1 winner (Thrombin)
Jie Cheng (Canadian Imperial Bank of Commerce).
Bayesian network learner and classifier
Task 2 winner (Function)
Mark-A. Krogel (University of Magdeburg).
Inductive Logic programming
Task 3 winner (Localization)
Hisashi Hayashi, Jun Sese, and Shinichi Morishita (University of Tokyo).
K nearest neighbor

KDD Cup 2002

http://www.biostat.wisc.edu/~craven/kddcup/
Two tasks from molecular biology domains
Task 1: construct models that can assist genome annotators by automatically extracting information from scientific articles
Task 2: learn models that characterize the behavior of individual genes in a hidden experimental setting.
Task 1 winner
Yizhar Regev and Michal Finkelstein
ClearForest and Celera, USA
Task 2 winner
Adam Kowalczyk and Bhavani Raskutti
Telstra Research Laboratories, Australia
Single Class SVM

KDD Cup 2003

http://www.cs.cornell.edu/projects/kddcup/
Data set
A very large archive of research papers
Citation structure and (partial) data on the downloading of papers by users
Task
Task 1: predict how many citations each paper will receive during the three months leading up to the KDD 2003 conference
Task 2: a citation graph of a large subset of the archive from only the LaTex sources
Task 3: each paper's popularity will be estimated based on partial download logs
Task 4: devise their own questions
Task 1 :
Claudia Perlich, Foster Provost, Sofus Kacskassy
New York University
Task 2:
David Vogel
AI Insight Inc.
Task 3 :
Janez Brank and Jure Leskovec
Jozef Stefan Institute, Slovenija
Task 4 :
Amy McGovern, Lisa Friedland, Michael Hay, Brian Gallagher, Andrew Fast, Jennifer Neville, and David Jensen
University of Massachusetts, Amherst, USA

KDD Cup 2004

http://kodiak.cs.cornell.edu/kddcup/
April 28 --- July 14, 2004
两个问题,数据分别来自
生物信息学
量子物理学
不同性能指标下的数据挖掘问题
有来自49个国家的注册 (including .com)
优胜者来自China, Germany, India, New Zealand, USA
优胜者一半来自公司,一半来自大学
Protein Winners:
Bernhard Pfahringer
University of Waikato, Computer Science Department
1st Place Overall

Yan Fu, RuiXiang Sun, Qiang Yang, Simin He, Chunli Wang, Haipeng Wang, Shiguang Shan, Junfa Liu, Wen Gao
Institute of Computing Technology, Chinese Academy of Sciences
Tied for 1st Place Overall
Honorable Mention for Squared Error
Honorable Mention for Average Precision

David S. Vogel, Eric Gottschalk, and Morgan C. Wang
MEDai / A.I. Insight / University of Central Florida
Tied for 1st Place Overall
Honorable Mention for Top-1 Accuracy

Dirk Dach, Holger Flick, Christophe Foussette, Marcel Gaspar, Daniel Hakenjos, Felix Jungermann, Christian Kullmann, Anna Litvina, Lars Michele, Katharina Morik, Martin Scholz, Siehyun Strobel, Marc Twiehaus, Nazif Veliu
Artificial Intelligence Unit, University of Dortmund, Germany
Honorable Mention for Rank of Last

资源来自:http://huzhyi21.blog.163.com/blog/static/1007396200981534952541


2015届KDD Cup

http://www.kddcup2015.com/information.html


2014届KDD Cup

https://www.kaggle.com/c/kdd-cup-2014-predicting-excitement-at-donors-choose/

http://www.datapub.cn/d/562defa1e4b05a46eeaad9ce


2013届KDD Cup

http://www.kaggle.com/c/kdd-cup-2013-author-paper-identification-challenge

http://blog.csdn.net/pf1492536/article/details/9162667


 2012届KDD Cup

Track1任务:社交网络中的个性化推荐系统

根据腾讯微博中的用户属性(User Profile)、SNS社交关系、在社交网络中的互动记录(retweet、comment、at)等,以及过去30天内的历史item推荐记录,来预测接下来最有可能被用户接受的推荐item列表

Track2任务:搜索广告系统的pTCR点击率预估

提供用户在腾讯搜索的查询词(query)、展现的广告信息(包括广告标题、描述、url等),以及广告的相对位置(多条广告中的排名)和用户点击情况,以及广告主和用户的属性信息,来预测后续时间用户对广告的点击情况

数据集:http://www.kddcup2012.org/c/kddcup2012-track1/data

论文:http://www.kddcup2012.org/workshop

 

2011届KDD Cup

Track1任务:音乐评分预测

根据用户在雅虎音乐上item的历史评分记录,来预测用户对其他item(包括歌曲、专辑等)的评分和实际评分之间的差异RMSE(最小均方误差)。同时提供的还有歌曲所属的专辑、歌手、曲风等信息

Track2任务:识别音乐是否被用户评分

每个用户提供6首候选的歌曲,其中3首为用户已评分数据,另3首是该用户未评分,但是出自用户中整体评分较高的歌曲。歌曲的属性信息(专辑、歌手、曲风等)也同样提供。参赛者给出二分分类结果(0/1分类),并根据整体准确率计算最终排名

数据集:http://kddcup.yahoo.com/datasets.php#

论文:http://kddcup.yahoo.com/workshop.php

 

2010届KDD Cup

http://www.datapub.cn/d/55d6bed7e4b022099bb3e532


2009届KDD Cup

法国电信运营商Orange的大规模数据中,积累了大量客户的行为记录。竞赛者需要设计一个良好的客户关系管理系统(CRM),用快速、稳定的方法,预测客户三个维度的属性,包括:1、忠诚度:用户切换运营商的可能性(Churn);2、购买欲:购买新服务的可能性(Appetency);3、增值性:客户升级或追加购买高利润产品的可能性(Up-selling)。结果用AUC曲线来评估

数据集:http://www.sigkdd.org/kddcup/index.php

论文:http://jmlr.csail.mit.edu/proceedings/papers/v7/


2008届KDD Cup

http://www.kdd.org/kdd-cup/view/kdd-cup-2008/Data

  • 2
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
CNN(KDDCUP99)是一个基于卷积神经网络的入侵检测系统。它使用KDD Cup 1999数据集,该数据集包含了模拟的网络流量数据,用于评估网络入侵检测系统的性能。以下是对CNN(KDDCUP99)的解释: CNN(KDDCUP99)使用卷积神经网络(CNN)作为其核心算法。卷积神经网络是一种专门用于处理图像和其他格状数据的神经网络结构。在入侵检测领域,CNN广泛应用于网络流量数据的分析和分类。 KDD Cup 1999数据集是CNN(KDDCUP99)所使用的数据集。它是一个标准的网络流量数据集,由美国空军办公室提供。该数据集包含了长达一周的模拟网络流量数据,包括正常流量和多种攻击类型的流量。这些攻击类型包括DoS(拒绝服务)、R2L(远程到本地)和U2R(本地提升为远程)等。KDD Cup 1999数据集充分反映了真实网络环境中的攻击行为。 CNN(KDDCUP99)的工作流程如下:首先,将原始的网络流量数据转换成合适的输入格式,以便于CNN模型处理。接下来,通过卷积层和池化层对数据进行特征提取和降维操作。特征提取过程通过卷积操作捕捉到局部的网络流量模式,提高模型对于相关特征的感知能力。然后,将提取的特征输入全连接层进行进一步的分类和判断。最后,通过使用合适的损失函数及优化算法,对模型进行训练和优化,以提高模型的性能和泛化能力。 CNN(KDDCUP99)在入侵检测中具有一定的优势。由于CNN能够捕捉到局部的网络流量模式,因此它在处理网络流量数据特征提取方面相对于传统的方法更加优秀。此外,CNN模型具有较好的泛化能力,能够应对不同网络环境中的入侵行为。通过对KDD Cup 1999数据集的实验结果表明,CNN(KDDCUP99)在不同攻击类型的检测上表现出较高的准确率和效果。 总结起来,CNN(KDDCUP99)是一种基于卷积神经网络的入侵检测系统,它使用KDD Cup 1999数据集进行训练和测试。通过CNN模型的优秀特性和对网络流量数据的高效处理,CNN(KDDCUP99)能够实现准确的入侵检测。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值