图像数据分析
Exploratory data analysis comprises of brief analyses to describe a dataset to guide the modeling process and to answer preliminary questions. For classification problems, this might include looking at the distributions of variables or checking for any meaningful patterns of predictors across different classes. The same problem holds for the classification of image data. We intend to find meaningful information simple operations can give us. Here, I outline a couple of methods we can do to achieve this goal using Chest X-Rays data [source]. This dataset consists of X-ray images of pneumonia patients and healthy controls.
探索性数据分析包括简短的分析,以描述数据集以指导建模过程并回答初步问题。 对于分类问题,这可能包括查看变量的分布或检查跨不同类的预测变量的任何有意义的模式。 对于图像数据的分类也存在相同的问题。 我们打算寻找简单的操作可以为我们提供的有意义的信息。 在这里,我概述了我们可以使用胸部X射线数据[ 来源 ]达到此目标的几种方法。 该数据集由肺炎患者和健康对照的X射线图像组成。
原始比较 (Raw Comparison)
First, we can start by simply looking at a few randomly sampled images.
首先,我们可以从简单地查看一些随机采样的图像开始。
import os
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.preprocessing import image
%matplotlib inline
train_dir = 'DATA/train' # image folder
# get the list of jpegs from sub image class folders
normal_imgs = [fn for fn in os.listdir(f'{train_dir}/NORMAL') if fn.endswith('.jpeg')]
pneumo_imgs = [fn for fn in os.listdir(f'{train_dir}/PNEUMONIA') if fn.endswith('.jpeg')]
# randomly select 3 of each
select_norm = np.random.choice(normal_imgs, 3, replace = False)
select_pneu = np.random.choice(pneumo_imgs, 3, replace = False)
# plotting 2 x 3 image matrix
fig = plt.figure(figsize = (8,6))
for i in range(6):
if i < 3:
fp = f'{train_dir}/NORMAL/{select_norm[i]}'
label = 'NORMAL'
else:
fp = f'{trai