开源数据集

最新推荐文章于 2024-08-27 10:55:01 发布

五道口纳什

最新推荐文章于 2024-08-27 10:55:01 发布

阅读量3.9k

点赞数 1

分类专栏：清单

本文链接：https://blog.csdn.net/lanchunhui/article/details/51353446

版权

清单专栏收录该内容

49 篇文章 0 订阅

订阅专栏

一个经典的机器学习的处理框架下，用作训练集使用，而可将标准的测试图像作为测试集使用。

/pub/machine-learning-databases 的索引
标准测试图像： Lena、Barbara、Boat、Pepper、Cameraman …

0. 图像分割（image segmentation）

The Berkeley Segmentation Dataset and Benchmark
- images
- human segmentations

1. 图像分类（image classification）

CVG - UGR - Image database
CIFAR-10：The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
10 labels：airplane，automobile，bird，cat，deer，dog，frog，horse，ship，truck；
Classification datasets results：各种算法模型在各种分类任务上的实验结果
- MNIST
- CIFAR-10、CIFAR-100
- STL-10、SVHN
- ILSVRC2012 task 1
mnist（手写字符识别）
mnist database
matlab 解析 mnist 数据集
ImageNet
- top-5 error rate
  ImageNet 图像通常有 1000 个可能的类别，对每幅图像你可以猜 5 次结果（即同时预测5个类别标签），当其中有任何一次预测对了，结果都算对（事实上一个图像也只可能属于一个图像 category ），当 5 次全都错了的时候，才算预测错误，这时候的分类错误率就叫 top5 错误率。

2. 场景分类与目标识别（scene classification & object recognition）

SUN Database
- Scene Recognition Benchmark
- Object Detection Benchmark

3. 各大数据集的读写方式

minst

with gzip.open('./mnist.pkl.gz', 'rb') as f:
	trainset, validset, testset = cPickle.load(f)

# 如果需要用到符号式编程（比如 theano），对这些数据做进一步的封装调整

def shared_data(data, borrow=True):
	data_x, data_y = data
	shared_x = theano.shared(np.asarray(data_x, dtype=theano.config.floatX), borrow=borrow)
	shared_y = theano.shared(np.asarray(data_y, dtype=theano.config.floatY), borrow=borrow)
	return shared_x, shared_y

trainset_x, trainset_y = shared_data(trainset)
...