数据挖掘数据集下载搜集整理版

7 篇文章 8 订阅
5 篇文章 0 订阅

数据挖掘数据集下载搜集整理版

 

1、气候监测数据集http://cdiac.ornl.gov/ftp/ndp026b

 

2、几个实用的测试数据集下载的网站

 

Data forMATLAB hackers (HandwrittenDigits、Faces、Text)

 

http://www.cs.toronto.edu/~roweis/data.html

 

3、UCI KDD Archive(各类数据集)

 

http://kdd.ics.uci.edu/summary.task.type.html

 

http://kdd.ics.uci.edu/summary.data.type.html

 

4、UCI收集的机器学习数据集

 

ftp://pami.sjtu.edu.cn/ 

 

http://www.ics.uci.edu/~mlearn//MLRepository.htm 

 

5、样本数据库

 

http://kdd.ics.uci.edu/

 

WWW-pageswere manually classified

 

http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/ 

 

6、CMU World Wide Knowledge Base(Web->KB) project(classified web pages、relational data describing pages and hyperlinks)

 

http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/ 

 

7、人工智能机器学习

 

http://duch-links.wikispaces.com/

 

8、文本分类,即rainbow的数据集

 

http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html 

 

9、Statlib 数理统计相关程序库

 

http://liama.ia.ac.cn/SCILAB/scilabindexgb.htm

 

http://lib.stat.cmu.edu/

 

http://lib.stat.cmu.edu/datasets/

 

http://lib.stat.cmu.edu/modules.php?op=modload&name=Downloads&file=index&req=viewdownload&cid=2

 

10、癌症基因:

 

http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi

 

11、金融、医药数据:

 

http://lisp.vse.cz/pkdd99/Challenge/chall.htm

 

12、时间序列数据的网址

 

http://www.stat.wisc.edu/~reinsel/bjr-data/ 

 

13、kdnuggets 相关链接各种数据集:

 

http://www.kdnuggets.com/datasets/index.html

 

14、德国智能分析和信息系统

 

http://www.mlnet.org/cgi-bin/mlnetois.pl/?File=datasets.html 

 

http://dctc.sjtu.edu.cn/adaptive/datasets/  

 

http://fimi.cs.helsinki.fi/data/ 

 

15、IBM智能信息

 

http://www-958.ibm.com/software/data/cognos/manyeyes/datasets

 

http://www.almaden.ibm.com/software/quest/Resources/index.shtml

 

16、Frequent Set Counting

 

http://miles.cnuce.cnr.it/~palmeri/datam/DCI/datasets.php

 

17、评分数据集

 

    Movielens 电影评分数据

 

    基本数据描述:包括以下三个数据集:

    a.943个用户对1682个电影的10万条评分

    b.6040个用户对3900个电影的1百万条评分

    c.71567个用户对10681个电影的1千万条评分

    http://www.grouplens.org/ 

 

    Book-Crossing 书籍评分数据

 

    基本数据描述:包含了278,858个用户对271,379本书籍的1,149,780条评分。该数据集由Cai-Nicolas Ziegler 在2004年8-9月用4周的时间从Book-Crossing社区用网络爬出。

   http://www.informatik.uni-freiburg.de/~cziegler/BX/

 

    Jester Joke Data Set 笑话评分集合

 

    来自UC Berkeley的Ken Goldberg发布的一个推荐系统使用的数据集。包含关于100个笑话的73,496名用户评分的410万条连续评分。

   http://www.ieor.berkeley.edu/~goldberg/jester-data/

 

    Netflix 数据集

 

    也是电影评分数据集,480,189 个用户,17,770 部电影,100,480,507 条评分记录。与它相比,MovieLens 数据集少了 2 个数量级。它的位置相信会逐渐被 Netflix 数据所替代,这是时代进步的必然结果。

    说明:以上四个均为用户评分数据

 

21、GPS轨迹数据

 

GeoLifeGPS Trajectories

http://research.microsoft.com/en-us/downloads/b16d359d-d164-469e-9fd4-daa38f2b2e13/default.aspx  

 

GPSTrajectories with transportation mode labels

http://research.microsoft.com/apps/pubs/?id=141896

 

Movebank动物轨迹

http://www.movebank.org/

 

22、手机WIFI蓝牙

 

ACommunity Resource for Archiving Wireless Data At Dartmouth

http://crawdad.cs.dartmouth.edu/

 

crowflow  手机和wifi轨迹

http://crowdflow.net/

 

23、OpenStreetMap Data

 

planet.openstreetmap.org或者http://metro.teczno.com/

 

24、openpath上传数据+API

 

https://openpaths.cc/  

 

25、FOURSQUARE

 

26、GeoTime

 

http://www.geotime.com/GeoTime(s)/January-2012/Cupid-Strikes-Again--Time-Series---GIS--Together-a.aspx  

 

27、数据堂

http://www.datatang.com/

28、http://www.kdnuggets.com/datasets/

29、http://appsrv.cse.cuhk.edu.hk/~kdd/data_collection.html

 

 

 

 

 

 

 

IBMAlmaden Research Center Data Mining Projects

 

 

 

DataSets:

 

·         Synthetic DataGeneration Code for Associations and Sequential Patterns

·         Synthetic DataGeneration Code for Classification

·         "Dense"Data-Sets (apriori binary format, 3.2Mb)

·         Enron Email Data Set

Demos:

 

·         General Visualizations for Associations

·         Visualization Demo:Market Basket Analysis

 

IBMIntelligent Miner:

 

·         IBM Intelligent Minerfor Data

·         Video and image clipsfrom IBM Data Mining T.V. Ad

 

IBM DataMining Resources:

 

·         Business IntelligenceSolutions   Our colleagues offering datamining consultancy and services.

·         Data AbstractionResearch Group   Our colleagues in IBMThomas J. Watson Research Center.   Ourcolleagues in France.

·         Data Mining:Extending the Information Warehouse Framework  IBM White Paper on Data Mining.

 

在下面的网址可以找到reuters数据集

 

http://www.research.att.com/~lewis/reuters21578.html

 

关于基金的数据挖掘的网站

 

http://www.gotofund.com/index.asp

 

http://lans.ece.utexas.edu/~strehl/

 

 

 

reuters数据集

 

http://www.research.att.com/~lewis/reuters21578.html

 

http://www-2.cs.cmu.edu/webkb

 

http://www.cs.auc.dk/research/DP/tdb/TimeCenter/TimeCenterPublications/TR-75.pdf

 

 

 

关联:

 

http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar

 

http://www.phys.uni.torun.pl/~duch/software.html

 

 

 

WEKA:

 

http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar 

 

1。A jarfile containing 37classification problems, originally obtained from the UCI repository

 

http://prdownloads.sourceforge.net/weka/datasets-UCI.jar  

 

2。A jarfile containing 37 regressionproblems, obtained from various sources

 

http://prdownloads.sourceforge.net/weka/datasets-numeric.jar 

 

3。A jarfile containing 30 regressiondatasets collected by Luis Torgo

 

http://prdownloads.sourceforge.net/weka/regression-datasets.jar  

 

 

 

数据挖掘相关比赛以及数据集

 

u  2005 University of California data miningcontest, predicting bad accounts and their churn date using real-world CRMdata, deadline June 30, 2005.

 

u  ILP 2005 Challenge, on the prediction offunctional classes of genes.

 

u  KDD Cup 2005, on classifying internet usersearch queries, deadline July 8.

 

u  Data Mining Cup 2005 (Chemnitz, Germany), forstudents; topic: How data mining can ascertain the risk of loss of payments andreduce this risk.

 

u  KDD Cup 2004, focuses on data-mining for aseveral performance criteria using datasets frombioinformatics and quantumphysics.

 

u  InfoVis 2004 Contest, The History of InfoVis.

 

u  DATA MINING CUP 2004 (Chemnitz, Germany), forstudents.

 

u  InfoVis 2003 Contest: Visualization and PairWise Comparison of Trees, results announced Sep 5, 2003.

 

 

 

u  KDD CUP 2003

 

u http://www.cs.cornell.edu/projects/kddcup/index.html

 

u  KDD Cup 2003, focuses on problems motivatedby network mining and the analysis of usage logs.

 

u  DATA MINING CUP 2003 (Chemnitz, Germany). Thetask is to identify spam emails before they reach the user′s mailbox.

 

u  KDD Cup 2002, focus on data mining inmolecular biology.

 

u  Student Data Mining Cup (2002), ChemnitzUniversity and Prudential Systems.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值