13 Machine Learning Data Set Collections(13个机器学习数据集)

Here are 13 resources on Machine Learning data sets.

Landsat on AWS

Landsat 8 data is available for anyone to use via Amazon S3. All Landsat 8 scenes from 2015 are available along with a selection of cloud-free scenes from 2013 and 2014. All new Landsat 8 scenes are made available each day, often within hours of production. MathWorks has created a freely-downloadable tool for accessing, processing, and visualizing Landsat on AWS data in MATLAB. With this tool, you can create a map display of scene locations with markers that show each scene’s metadata.

Category: GIS, Sensor Data, Satellite Imagery, Natural Resource

 

NASA NEX

NASA NEX is a collaboration and analytical platform that combines state-of-the-art supercomputing, Earth system modeling, workflow management and NASA remote-sensing data. Through NEX, users can explore and analyze large Earth science data sets, run and share modeling algorithms, collaborate on new or existing projects and exchange workflows and results within and among other science communities.

 

Common Crawl Corpus

A corpus of web crawl data composed of over 5 billion web pages. This data set is freely available on Amazon S3 and is released under the Common Crawl Terms of Use.

 

1000 Genomes Project and AWS

The 1000 Genomes Project is an international collaboration which has established the most detailed catalogue of human genetic variation, including SNPs, structural variants, and their haplotype context. The final phase of the project sequenced more than 2500 individuals from 26 different populations around the world and produced an integrated set of phased haplotypes with more than 80 million variants for these individuals. The Amazon mirror contains the complete data set from the project and the data can be found at: s3.amazonaws.com/1000genomes.

 

MNIST database of handwritten digits

The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.

 

UCI Machine Learning Repository

UC Irvine Machine Learning Repository currently maintain 333 datasets as a service to machine learning community.

 

Delve Datasets

The Delve datasets and families are available from this page. Every dataset (or family) has a brief overview page and many also have detailed documentation. You can download gzipped-tar files of the datasets, but you will require the delve software environment to get maximum benefit from them. Datasets are categorized as primarily assessment, development or historical according to their recommended use. Within each category we have distinguished datasets as regression or classification according to how their prototasks have been created.

 

Data sets for nonlinear dimensionality reduction

Data sets for nonlinear dimensionality reduction provides datasets for Swiss roll and Faces.

 

mldata

mldata is a machine learning dataset repository. It contains more than 800 public archived data sets with ratings, views, no of downloads, comments.

 

Mammographic Image Analysis

When benchmarking an algorithm it is recommendable to use a standard test database (data set) for researchers to be able to directly compare the results. Most of the mammographic databases are not publicly available. The most easily accessed databases and therefore the most commonly used databases are the Mammographic Image Analysis Society (MIAS) database and the Digital Database for Screening Mammography (DDSM).

 

Mulan

Mulan: A Java Library for Multi-Label Learning have Multi-label classification datasets and Multi-target regression datasets.

 

Auton Lab Datasets

The Auton Lab encourages researchers to examine and replicate their findings. To facilitate this goal, they provide datasets identical to those used in their published works.

 

Datasets for "The Elements of Statistical Learning"

Datasets for "The Elements of Statistical Learning" provides datasets in different types of categories like Bone Mineral Density, Countries, Galaxy and many  more.


http://www.datasciencecentral.com/m/blogpost?id=6448529:BlogPost:341263

转载于:https://my.oschina.net/CaptainA/blog/1483792

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值