数据挖掘--数据集

WHO Data Set:包含social, economic, health, and political indicators

http://www.exploredata.net

 

http://www.exploredata.net/

 

 

一下来自:http://blog.sina.com.cn/s/blog_4b700c4c0102dyjs.html

 

(用于数据挖掘、信息检索、知识发现)

1、气候监测数据集 http://cdiac.ornl.gov/ftp/ndp026b

2、几个实用的测试数据集下载的网站

http://www.cs.toronto.edu/~roweis/data.html
http://www.cs.toronto.edu/~roweis/data.html
http://kdd.ics.uci.edu/summary.task.type.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.phys.uni.torun.pl/~duch/software.html
在下面的网址可以找到reuters数据集
http://www.research.att.com/~lewis/reuters21578.html

以下网址上有各种数据集:
http://kdd.ics.uci.edu/summary.data.type.html

进行文本分类,还有一个数据集是可以用的,即rainbow的数据集
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html

3、找了很多测试数据集,写论文的同志们肯定需要的,至少能用来检验算法的效果
可能有一些不能访问,但是总有能访问的吧:

UCI收集的机器学习数据集
ftp://pami.sjtu.edu.cn/
http://www.ics.uci.edu/~mlearn//MLRepository.htm

statlib
http://liama.ia.ac.cn/SCILAB/scilabindexgb.htm
http://lib.stat.cmu.edu/

样本数据库
http://kdd.ics.uci.edu/
http://www.ics.uci.edu/~mlearn/MLRepository.html

关于基金的数据挖掘的网站
http://www.gotofund.com/index.asp

http://lans.ece.utexas.edu/~strehl/

reuters数据集
http://www.research.att.com/~lewis/reuters21578.html

各种数据集:
http://kdd.ics.uci.edu/summary.data.type.html
http://www.mlnet.org/cgi-bin/mlnetois.pl/?File=datasets.html
http://lib.stat.cmu.edu/datasets/
http://dctc.sjtu.edu.cn/adaptive/datasets/
http://fimi.cs.helsinki.fi/data/
http://www.almaden.ibm.com/software/quest/Resources/index.shtml
http://miles.cnuce.cnr.it/~palmeri/datam/DCI/

进行文本分类&WEB
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html

http://www.w3.org/TR/WD-logfile-960221.html
http://www.w3.org/Daemon/User/Config/Logging.html#AccessLog
http://www.w3.org/1998/11/05/WC-workshop/Papers/bala2.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.web-caching.com/traces-logs.html
http://www-2.cs.cmu.edu/webkb
http://www.cs.auc.dk/research/DP/tdb/TimeCenter/TimeCenterPublications/TR-75.pdf
http://www.cs.cornell.edu/projects/kddcup/index.html

时间序列数据的网址
http://www.stat.wisc.edu/~reinsel/bjr-data/

apriori算法的测试数据
http://www.almaden.ibm.com/cs/quest/syndata.html

数据生成器的链接
http://www.cse.cuhk.edu.hk/~kdd/data_collection.html
http://www.almaden.ibm.com/cs/quest/syndata.html
关联:
http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html#assocSynData原文地址http://www.cnblogs.com/bobomouse/archive/2007/05/26/760513.html

WEKA:
http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
1。A jarfile containing 37 classification problems, originally obtained from the UCI repository
http://prdownloads.sourceforge.net/weka/datasets-UCI.jar
2。A jarfile containing 37 regression problems, obtained from various sources
http://prdownloads.sourceforge.net/weka/datasets-numeric.jar
3。A jarfile containing 30 regression datasets collected by Luis Torgo
http://prdownloads.sourceforge.net/weka/regression-datasets.jar

癌症基因:
http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi

金融数据:
http://lisp.vse.cz/pkdd99/Challenge/chall.htm

另一个人提供的
http://www.cs.toronto.edu/~roweis/data.html
http://kdd.ics.uci.edu/summary.task.type.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.phys.uni.torun.pl/~duch/software.html
在下面的网址可以找到reuters数据集
http://www.research.att.com/~lewis/reuters21578.html

以下网址上有各种数据集:
http://kdd.ics.uci.edu/summary.data.type.html

进行文本分类,还有一个数据集是可以用的,即rainbow的数据集
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html
Download the Financial Data (~17.5M zipped file, ~67M unzipped data)
Download the Medical Data (~2M zipped file, ~6M unzipped data)
http://lisp.vse.cz/pkdd99/Challenge/chall.htm

kdnuggets 相关链接数据集:
http://www.kdnuggets.com/datasets/index.html

还有另外一个很好的资源网址为:http://kdd.ics.uci.edu/,里面包含的数据资源如下(按应用领域划分):

Direct Marketing
KDD CUP 1998 Data

GIS
Forest CoverType

Indexing
Corel Image Features
Pseudo Periodic Synthetic Time Series

Intrusion Detection
KDD CUP 1999 Data

Process Control
Synthetic Control Chart Time Series

Recommendation Systems
Entree Chicago Recommendation Data

Robots
Pioneer-1 Mobile Robot Data
Robot Execution Failures

Sign Language Recognition
Australian Sign Language Data
High-quality Australian Sign Language Data

Text Categorization
20 Newsgroups Data
Reuters-21578 Text Categorization Collection
NSF Research Awards Abstracts 199 0-2003

World Wide Web
Microsoft Anonymous Web Data
MSNBC Anonymous Web Data
Syskill Webert Web Data

这里又找到一个,在一个老外的blog上找到的http://www.fs.fed.us/fire/fuelman/

摘自:http://www.shamoxia.com/html/y2009/490.html

关于时空挖掘方面的期刊和会议
Journals:
· ACM Transactions on Database Systems
· VLDB Journal
· IEEE Transactions on Knowledge and Data Engineering
· Information Systems
· Data and Knowledge Engineering
· Knowledge and Information Systems
· Data Mining and Knowledge Discovery
· International Journal of Data Warehousing and Data Mining
· Geoinformatica
· SIGKDD Explorations
Conferences
· ACM SIGMOD conference
· International Conference on Very Large Databases (VLDB)
· International Conference on Data Engineering (ICDE)
· Extending Database Technology (EDBT)
· International Conference on Data Warehousing and Knowledge Discovery (DaWaK)
· Knowledge Discovery in Databases (KDD) conference
· Various KDD workshops
· Symposium on Spatial and Temporal Databases (SSTD)
· International Conference on Scientific and Statistical Database Management (SSDBM)
· International Conference on Data Mining (ICDM)
· SIAM International Conference on Data Mining
· ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery
(DMKD)
· ACM International Symposium on Advances in Geographic Information Systems (ACM-GIS)

Google视角的地理信息科学家


Google Scholar可以查询一些学者的发表和引用情况,它是基于检索的,相对于SCI,可能不那么严格,但是因为SCI只包括论文,不包括书,另外,查询引文的范围也有限,这使得不能全面反映一个学者的贡献。所以相对而言,Google Scholar尽管不太成熟,但是方向是好的。

近来查询了几位地理信息科学领域“牛人”的发表和引用情况,还是比较有趣。

第一个,M.F. Goodchild,被尊为地理信息科学之父,他的Geographical information science第一次提出了该概念,而Geographical data modeling则总结了GIS的两个概念模型,场和要素,应该说都是影响深远,引用次数在他的文章中居于前列。后面几本是书,反映了它对于GIS诸多问题,如不确定性问题的重视。Geographic Information Systems and Science个人比较推荐,现在引用次数也比较高了。

Geographical information science 267
Geographical data modeling 214
The Accuracy of Spatial Databases 218
Environmental Modeling with GIS 219
Geographic Information Systems and Science 490

第二个,M.J. Egenhofer。他以9-I模型闻名,Point-set topological spatial relations被引用700多次,在本领域应该算是非常高了。他的工作相对比较具体,又是GIS中的基本问题,所以容易被引用。但是他没有高的被引用的书,可能影响了其影响力。
Point-set topological spatial relations 703
Reasoning about Binary Topological Relations 360
Spatial SQL: a query and presentation language 300
Naive Geography 301

第三个,P. A. Burrough,更是厉害,两本书都被引用超过1000次,在GIS领域有深远的影响,去年IJGIS出了一期向他致敬的专辑。但是文章引用相对较低。看来一个人到了一定程度,就要写书,进一步提高影响力,写文章主要是“爬坡阶段”学者做的事情。
Principles of geographical information systems for land resources assessment 1277
Principles of geographical information systems 1196
Fractal dimensions of landscapes and other environmental data 251
Multiscale sources of spatial variation in soil. I. The application of fractal concepts to nested levels of soil variation 222

M.F. Worboys,GIS: A Computing Perspective这本书不错,A Unified Model for Spatial and Temporal Information也经常在时空数据模型的文章中被引用。

GIS: A Computing Perspective 588
A Unified Model for Spatial and Temporal Information 237

所以,总结经验,写文章要写有开创性的,能够引起更多人兴趣的,到了一定程度,就要写书——当然,这说起来容易,做起来难。

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值