【图像隐写数据集】图像隐写数据集整理

本文链接：https://blog.csdn.net/qq_40859587/article/details/123256588

1 信息隐藏研究相关网址

美国Binghamton著名的Jessica Fridrich团队：http://dde.binghamton.edu/download/feature_extractors，
http://dde.binghamton.edu/download/ensemble
本人GitHub，star了一些隐写相关代码
北大视觉信息智能学习实验室（VILLA）张健教授团队，RIIS和CROSS的作者都在这里，还有一篇文章开源（一种用于实用盲水印的两阶段可分离深度学习框架）
复旦大学-隐者联盟（公众号）
daniellerch的网站
CSDN博主：Hard Coder，科研苟Gamber，凌峰（可逆信息隐藏），csq7，我的关注列表里面也有一些
————————————————下面这些不太常用————————————————
台湾逢甲大学张真诚/Chin-Chenhttps://www.iecs.fcu.edu.tw/alan3c 这个能看到最新论文，但是个人网站打不开
台湾成功大学-多媒体与人机通讯实验室 2020年之后就没有更新了
JJTC: http://www.jjtc.com （Neil F. Johnson博士，2012年之后就没有更新，其中有很多隐写工具，但多数已经失效）
WetStone: https://www.wetstonetech.com/products/stegohunt/ 隐写分析工具的商用产品

需要BOSSBase、BOW2、UCID数据集，可以私信或闲鱼搜索用户“焱小轩魔方”

2 图像隐写数据集

一篇综述【Image Steganography: A Review of the Recent Advances】中的数据集整理

在这里插入图片描述

另一篇综述【Digital image steganography survey and investigation (goal, assessment, method, development, and dataset)】中的数据集，前三个在论文中比较常见

BossBase Image Dataset
BOWS2 Image Dataset
ALASKA v2 Image Dataset
IStego-100 K Image Dataset
This dataset was first published on Yang et al. [265]. This dataset consists of 104,052 cover images and 104,052 stego images, so the total is 208,104 images. The stego image in this dataset is designed using the J-UNIWARD, nsF5 and UERD algorithm with a payload of 0.1-0.4 bpnzAC at random. All images have dimensions of 1024×1024 pixels, of which 200,000 are training images and the rest are test images. Like BossBase and ALASKA, this dataset is designed for universal steganalysis and safety testing in steganographic research. This dataset can be downloaded via url https://github.com/YangzlTHU/IStego100K.
RGB-BMP Steganalysis Dataset（目前读论文还没见过）
RGB-BMP Steganalysis Dataset consists of 3000 images with a size of 512×512 pixels which can be used for steganography, steganalysis, and image processing researches. The number of this dataset is not as much as ALASKA, BossBase, or IStego. However, this dataset can still be used as an alternative. One of the studies using this dataset is [36]. This dataset can be downloaded at https://data.mendeley.com/datasets/sp4g8h7v8k/1.
Other Image Dataset（这些不太常见）
Not all datasets are specifically designed for steganography or steganalysis, but these datasets are used in steganographic research. Some of them are the 【UCID】 image dataset first published in Schaefer and Stich [266], and some of the studies using this dataset are [26,43,89,110,115,119,121,134,252]. The UCID dataset can be downloaded at the link http://jasoncantarella.com/downloads/ucid.v2.tar.gz. 【Corel Image dataset】 is used in several researches such as [41,87,267] and can be downloaded on https://sites.google.com/site/dctresearch/Home/content-based-image-retrieval. 【Kodak Image Dataset】 used in research [85] can be downloaded at http://r0k.us/graphics/kodak/. 【Washington image database】 can be downloaded on the page http://www.cs.washington.edu/research/imagedatabase. This dataset is used in several steganographic research such as [32,253]. Next is the 【McGill Database】, which consists of 1034 uncompressed images with dimensions 768 × 576 or 576 × 768 pixels. The McGill dataset was first published in [268] and used in research [121,123].

2.1 BOSSbase数据集

steganalysis方向的文章，尤其是dl来做steganalysis的文章，通常都用BOSSbase这个数据集来进行验证和对比。对比的对象常常是HUGO，WOW，SUNIWARD这几种。

BOSSbase 1.01 是一万张图片的数据集，由7个digital camera获取，然后被处理成512×512的大小。

1–1354 Canon EOS 400D
1355–1415 Canon EOS 40D
1416–2769 Canon EOS 7D
2770–4811 Canon EOS DIGITAL REBEL XSi
4812–6209 PENTAX K20D
6210–7242 NIKON D70
7243–10212 M9 Digital Camera
这是camera的信息。

http://agents.fel.cvut.cz/boss/index.php?mode=view&tmpl=materials 数据原始网站。
http://agents.fel.cvut.cz/stegodata/ 其他相关资料和数据

http://dde.binghamton.edu/download/ 给出了数据集的下载，并提供了许多C++和MATLAB的源码，可供复现不同的feature extract方法，以及不同的classification方法等。

2.2 bows-2数据集

来自比赛 Break Our Watermarking System - 2nd Ed.
数据集链接 http://bows2.ec-lille.fr/
已失效，可有偿分享（10000张512×512的pgm原图），私信或GitHub联系

3 隐写分析的数据集

数据集名称	时间	下载链接	图像数量
ALASKA	2019	https://alaska.utt.fr/#material	80,000张 kaggle提供的数据集https://www.kaggle.com/competitions/alaska2-image-steganalysis/data?select=Cover
DIV2K	2017	https://data.vision.ee.ethz.ch/cvl/DIV2K/	1000张 800训练，100交叉，100测试
UCID	2003	http://vision.doc.ntu.ac.uk/	1338 （03年的有点老了）
MIR Flickr 25K和1M	2008	https://press.liacs.nl/mirflickr/mirdownload.html/	25,000张

4 cv方向数据集

4.1 CIFAR-10 数据集简介原文链接

CIFAR-10 是由 Hinton 的学生 Alex Krizhevsky 和 Ilya Sutskever 整理的一个用于识别普适物体的小型数据集。图片的尺寸为 32×32 。
一共包含 10 个类别的 RGB 彩色图片：飞机（ airlane ）、汽车（ automobile ）、鸟类（ bird ）、猫（ cat ）、鹿（ deer ）、狗（ dog ）、蛙类（ frog ）、马（ horse ）、船（ ship ）和卡车（ truck ）。每个类别60000张图⽚。其中有5万张训练图⽚及1万张测试图⽚。
与 MNIST 数据集中目比， CIFAR-10 具有以下不同点：
• CIFAR-10 是 3 通道的彩色 RGB 图像，而 MNIST 是灰度图像。
• CIFAR-10 的图片尺寸为 32×32，而 MNIST 的图片尺寸为 28×28，比 MNIST 稍大。
• CIFAR-10 含有的是现实世界中真实的物体，不仅噪声很大，而且物体的比例、特征都不尽相同，这为识别带来很大困难。

4.2 场景数据集

4.2.1 COCO 数据集

介绍：包含多个场景和复杂场景中的对象，用于对象识别、检测和分割任务。
下载链接：COCO Dataset

4.2.2 ImageNet

下载链接：opendata

4.2.3 PLACE365

Places365是Places2数据库的最新子集。
Places365有两个版本：Places365 Standard和Places365 Challenge。
Places365标准的火车集有来自365个场景类别的约180万张图像，其中每个类别最多有5000张图像。
同时，Places365 Challenge的训练集有620万张额外的图像，以及Places365 Standard的所有图像（总共约800万张图像），其中每个类别最多有40000张图像。

这个数据集太大了，找到一个低分辨率版本256*256，但是下载要积分，TensorFlow官方也有一个链接 https://tensorflow.google.cn/datasets/catalog/places365_small，看页面只能用TF的dataloader加载

opendatalab链接