WHO Data Set:包含social, economic, health, and political indicators
一下来自:http://blog.sina.com.cn/s/blog_4b700c4c0102dyjs.html
(用于数据挖掘、信息检索、知识发现)
1、气候监测数据集 http://cdiac.ornl.gov/ftp/ndp026b
2、几个实用的测试数据集下载的网站
http://www.cs.toronto.edu/~roweis/data.html
http://www.cs.toronto.edu/~roweis/data.html
http://kdd.ics.uci.edu/summary.task.type.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.phys.uni.torun.pl/~duch/software.html
在下面的网址可以找到reuters数据集http://www.research.att.com/~lewis/reuters21578.html
以下网址上有各种数据集:
http://kdd.ics.uci.edu/summary.data.type.html
进行文本分类,还有一个数据集是可以用的,即rainbow的数据集
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html
3、找了很多测试数据集,写论文的同志们肯定需要的,至少能用来检验算法的效果
可能有一些不能访问,但是总有能访问的吧:
UCI收集的机器学习数据集
ftp://pami.sjtu.edu.cn/
http://www.ics.uci.edu/~mlearn//MLRepository.htm
statlib
http://liama.ia.ac.cn/SCILAB/scilabindexgb.htm
http://lib.stat.cmu.edu/
样本数据库
http://kdd.ics.uci.edu/
http://www.ics.uci.edu/~mlearn/MLRepository.html
关于基金的数据挖掘的网站
http://www.gotofund.com/index.asp
http://lans.ece.utexas.edu/~strehl/
reuters数据集
http://www.research.att.com/~lewis/reuters21578.html
各种数据集:
http://kdd.ics.uci.edu/summary.data.type.html
http://www.mlnet.org/cgi-bin/mlnetois.pl/?File=datasets.html
http://lib.stat.cmu.edu/datasets/
http://dctc.sjtu.edu.cn/adaptive/datasets/
http://fimi.cs.helsinki.fi/data/
http://www.almaden.ibm.com/software/quest/Resources/index.shtml
http://miles.cnuce.cnr.it/~palmeri/datam/DCI/
进行文本分类&WEB
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html
http://www.w3.org/TR/WD-logfile-960221.html
http://www.w3.org/Daemon/User/Config/Logging.html#AccessLog
http://www.w3.org/1998/11/05/WC-workshop/Papers/bala2.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.web-caching.com/traces-logs.html
http://www-2.cs.cmu.edu/webkb
http://www.cs.auc.dk/research/DP/tdb/TimeCenter/TimeCenterPublications/TR-75.pdf
http://www.cs.cornell.edu/projects/kddcup/index.html
时间序列数据的网址
http://www.stat.wisc.edu/~reinsel/bjr-data/
apriori算法的测试数据
http://www.almaden.ibm.com/cs/quest/syndata.html
数据生成器的链接
http://www.cse.cuhk.edu.hk/~kdd/data_collection.html
http://www.almaden.ibm.com/cs/quest/syndata.html
关联:
http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html#assocSynData原文地址http://www.cnblogs.com/bobomouse/archive/2007/05/26/760513.html
WEKA:
http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar
1。A jarfile containing 37 classification problems, originally obtained from the UCI repository
http://prdownloads.sourceforge.net/weka/datasets-UCI.jar
2。A jarfile containing 37 regression problems, obtained from various sources
http://prdownloads.sourceforge.net/weka/datasets-numeric.jar
3。A jarfile containing 30 regression datasets collected by Luis Torgo
http://prdownloads.sourceforge.net/weka/regression-datasets.jar
癌症基因:
http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi
金融数据:
http://lisp.vse.cz/pkdd99/Challenge/chall.htm
另一个人提供的
http://www.cs.toronto.edu/~roweis/data.html
http://kdd.ics.uci.edu/summary.task.type.html
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
http://www.phys.uni.torun.pl/~duch/software.html
在下面的网址可以找到reuters数据集
http://www.research.att.com/~lewis/reuters21578.html
以下网址上有各种数据集:
http://kdd.ics.uci.edu/summary.data.type.html
进行文本分类,还有一个数据集是可以用的,即rainbow的数据集
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html
Download the Financial Data (~17.5M zipped file, ~67M unzipped data)
Download the Medical Data (~2M zipped file, ~6M unzipped data)
http://lisp.vse.cz/pkdd99/Challenge/chall.htm
kdnuggets 相关链接数据集:
http://www.kdnuggets.com/datasets/index.html
还有另外一个很好的资源网址为:http://kdd.ics.uci.edu/,里面包含的数据资源如下(按应用领域划分):
Direct Marketing
GIS
Indexing
Intrusion Detection
Process Control
Recommendation Systems
Robots
Sign Language Recognition
Text Categorization
World Wide Web
这里又找到一个,在一个老外的blog上找到的http://www.fs.fed.us/fire/fuelman/
摘自:http://www.shamoxia.com/html/y2009/490.html
关于时空挖掘方面的期刊和会议
Journals:
· ACM Transactions on Database Systems
· VLDB Journal
· IEEE Transactions on Knowledge and Data Engineering
· Information Systems
· Data and Knowledge Engineering
· Knowledge and Information Systems
· Data Mining and Knowledge Discovery
· International Journal of Data Warehousing and Data Mining
· Geoinformatica
· SIGKDD Explorations
Conferences
· ACM SIGMOD conference
· International Conference on Very Large Databases (VLDB)
· International Conference on Data Engineering (ICDE)
· Extending Database Technology (EDBT)
· International Conference on Data Warehousing and Knowledge Discovery (DaWaK)
· Knowledge Discovery in Databases (KDD) conference
· Various KDD workshops
· Symposium on Spatial and Temporal Databases (SSTD)
· International Conference on Scientific and Statistical Database Management (SSDBM)
· International Conference on Data Mining (ICDM)
· SIAM International Conference on Data Mining
· ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery
(DMKD)
· ACM International Symposium on Advances in Geographic Information Systems (ACM-GIS)
Google视角的地理信息科学家
Google Scholar可以查询一些学者的发表和引用情况,它是基于检索的,相对于SCI,可能不那么严格,但是因为SCI只包括论文,不包括书,另外,查询引文的范围也有限,这使得不能全面反映一个学者的贡献。所以相对而言,Google Scholar尽管不太成熟,但是方向是好的。
近来查询了几位地理信息科学领域“牛人”的发表和引用情况,还是比较有趣。
第一个,M.F. Goodchild,被尊为地理信息科学之父,他的Geographical information science第一次提出了该概念,而Geographical data modeling则总结了GIS的两个概念模型,场和要素,应该说都是影响深远,引用次数在他的文章中居于前列。后面几本是书,反映了它对于GIS诸多问题,如不确定性问题的重视。Geographic Information Systems and Science个人比较推荐,现在引用次数也比较高了。
Geographical information science 267
Geographical data modeling 214
The Accuracy of Spatial Databases 218
Environmental Modeling with GIS 219
Geographic Information Systems and Science 490
第二个,M.J. Egenhofer。他以9-I模型闻名,Point-set topological spatial relations被引用700多次,在本领域应该算是非常高了。他的工作相对比较具体,又是GIS中的基本问题,所以容易被引用。但是他没有高的被引用的书,可能影响了其影响力。
Point-set topological spatial relations 703
Reasoning about Binary Topological Relations 360
Spatial SQL: a query and presentation language 300
Naive Geography 301
第三个,P. A. Burrough,更是厉害,两本书都被引用超过1000次,在GIS领域有深远的影响,去年IJGIS出了一期向他致敬的专辑。但是文章引用相对较低。看来一个人到了一定程度,就要写书,进一步提高影响力,写文章主要是“爬坡阶段”学者做的事情。
Principles of geographical information systems for land resources assessment 1277
Principles of geographical information systems 1196
Fractal dimensions of landscapes and other environmental data 251
Multiscale sources of spatial variation in soil. I. The application of fractal concepts to nested levels of soil variation 222
M.F. Worboys,GIS: A Computing Perspective这本书不错,A Unified Model for Spatial and Temporal Information也经常在时空数据模型的文章中被引用。
GIS: A Computing Perspective 588
A Unified Model for Spatial and Temporal Information 237
所以,总结经验,写文章要写有开创性的,能够引起更多人兴趣的,到了一定程度,就要写书——当然,这说起来容易,做起来难。