概要
训练一个从可见光(RGB)到红外热成像(Thermal)的Image2Image模型。难点:
- 温度信息无法通过RGB像素表征,因此同样的颜色信息可能对应不同的Thermal图像;
- 不同采集设备因参数不同从而采集到的红外图像质量区别很大,不同作者对数据的预处理方式不一样也会导致数据集差异大;
- 丢失的信息基本无法迁移,比如浓雾后的人体;
训练数据
这部分主要来源于网络共享,并且都已注明出处,如侵权请联系
- LLVIP (paired)
https://bupt-ai-cz.github.io/LLVIP/
https://blog.csdn.net/qq_29562209/article/details/126665611
Https://pan.baidu.com/s/1yxbnLUiK8xa0mAt5cDNKig密码:b88p - M3FD (paired)
https://github.com/dlut-dimt/TarDAL
https://pan.baidu.com/s/1GoJrrl_mn2HNQVDSUdPCrw?pwd=M3FD - FLIR (paired, unaligned)
https://www.flir.com/oem/adas/adas-dataset-form/
https://avoid.overfit.cn/post/cb714527964e49bd9858eb2a4b2a1e62
v1 : https://pan.baidu.com/s/11GJe4MdM_NH6fuENCQ2MtQ 提取码:019b
v2 : https://pan.baidu.com/s/1ooLmEm39Y_LSinU860Zj1w?pwd=3cp3#list/path=%2F - KAIST (paired)
链接:https://pan.baidu.com/s/1V6qOIUIo2yojy-se_oWjtQ 提取码:9yhh - VOC
https://github.com/vlkniaz/ThermalGAN - OTCBVS
http://vcipl-okstate.org/pbvs/bench/ - BU-TIV (Thermal Infrared Video) Benchmark
http://csr.bu.edu/BU-TIV/BUTIV.html#:~:text=If%20you%20find,AT%20gmail.com - VTUAV
https://zhang-pengyu.github.io/DUT-VTUAV/
FLIR数据对齐
官网下载的数据集没有对齐,无法直接使用。因此,需要把RGB对齐到Thermal(RGB的FOV大于Thermal),方法:
- 分别计算RGB和T的关键点,可以手动选一组点,或者用SIFT等算法生成一组,代码使用第一种方法;
- 计算AffineMatrix;
- 对RGB图形进行仿射变换;
#!/usr/bin/env python
# coding=utf-8
import numpy as np
import cv2
import pdb
import os
import tqdm
IMG_EXTENSIONS = [
'.jpg', '.JPG', '.jpeg', '.JPEG',
'.png', '.PNG', '.ppm', '.PPM', '.bmp', '.BMP', '.tiff'
]
def is_image_file(filename):
return any(filename.endswith(extension) for extension in IMG_EXTENSIONS)
def make_dataset(dir):
images = []
assert os.path.isdir(dir), '%s is not a valid directory' % dir
for root, _, fnames in sorted(os.walk(dir)):
for fname in fnames:
if is_image_file(fname):
path = os.path.join(root, fname)
images.append(path)
return images
def main():
rgb_image = cv2.imread('train/RGB/FLIR_00088.jpg')
thermal_image = cv2.imread('train/thermal_8_bit/FLIR_00088.jpeg')
rgb_point = np.array(
[
[308,741],
[579,606],
[1328,754]
],
dtype=np.float32
)
thermal_point = np.array(
[
[63,238],
[172,182],
[478,244]
],
dtype=np.float32
)
M =cv2.getAffineTransform(rgb_point, thermal_point)
aligned_rgb_img = cv2.warpAffine(rgb_image, M, (thermal_image.shape[1], thermal_image.shape[0]))
#cv2.imshow("affine", aligned_rgb_img)
#cv2.waitKey()
dataroot = './train'
phase = 'train'
### input A (label maps)
dir_A = 'A'
dir_A = os.path.join(dataroot, phase + dir_A)
A_paths = sorted(make_dataset(dir_A))
A_paths_aligned = []
B_paths_aligned = []
### input B (real images)
dir_B = 'B'
dir_B = os.path.join(dataroot, phase + dir_B)
B_paths = sorted(make_dataset(dir_B))
for line in A_paths:
_, filename = os.path.split(line)
B_path = os.path.join(dir_B, filename.replace('jpg', 'jpeg'))
if os.path.exists(B_path):
A_paths_aligned.append(line)
B_paths_aligned.append(B_path)
A_paths = A_paths_aligned
B_paths = B_paths_aligned
for idx in tqdm.trange(len(A_paths)):
A = A_paths[idx]
img_A = cv2.imread(A)
img_B = cv2.imread(B_paths[idx])
aligned_A = cv2.warpAffine(img_A, M, (img_B.shape[1], img_B.shape[0]))
cv2.imwrite(A.replace('trainA', 'trainA_Aligned'), aligned_A)
im_AB = np.concatenate([aligned_A, img_B], 1)
cv2.imwrite(A.replace('trainA', 'train_AB'), im_AB)
if __name__ == '__main__':
main()
算法框架
主要分为两类,paired 和 unpaired。
paried
setA和setB需要一一对应,图片内容最好对齐
优点: 能最大程度的保证迁移后的内容一致;
缺点: 数据限制多,比较难获取;
unpaired
setA和setB不需要对应
优点:数据利用率更高;
缺点:可能会出现算法臆想,破坏图像原有内容;
技术细节
基于pix2pixHD,根据Thermal图像的特性以及实际拿到的真实数据(低频信息为主),时间紧只简单修改了几点,之后有新想法再补充:
- num_D=1,只保留最后一个尺度;
- 修改VGGLOSS的权重,更关注全局的特征一致;
- 只输入RGB作为LABEL;
效果如下图(左:RGB 中:生成器输出 右:GT)
实际训练下来,以下几个点是算法处理不了的:
- 影子;
- 完全的遮挡,比如烟雾后的人体;
- 暗光条件下(夜景)效果较差;