【深度学习】无监督图像分类的一些尝试

清风中扬歌

已于 2023-11-23 23:23:22 修改

阅读量661

点赞数

分类专栏：深度学习文章标签：深度学习分类人工智能 python 卷积神经网络聚类图像处理

于 2023-11-23 23:21:47 首次发布

本文链接：https://blog.csdn.net/qq_27109843/article/details/134587670

版权

深度学习专栏收录该内容

1 篇文章 0 订阅

订阅专栏

提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档

文章目录

前言
一、初次尝试
二.使用迁移学习来提取特征
三、使用自编码器
- 1.自编码器网络结构
- 2.训练过程
四.聚类
总结

前言

因为一些项目需求，本人最近对无监督图像分类有一些想法，做了许多尝试最后得出了两种方案

一、初次尝试

最开始是想通过数字图像处理的一些思路来获取图像特征，然后使用kmeans聚类来进行无监督图像分类，特征指标用了RBG直方图合并，轮廓等，然后进行kmeans聚类，最后的结果发现效果不是很行，因为对于一个物体来说出了颜色和轮廓特征还有很多深层的特征，这就需要使用CNN来进行特征提取。但是这个时候就出现了一个问题，这是一个无监督的分类问题，我没有标签数据来进行计算损失。所以就想到了两个方向：1.使用预训练的resnet50来做迁移学习
2.使用无监督学习上的经典网络-自编码器(autoencoder)

二.使用迁移学习来提取特征

使用resnet50的原因是因为：
1.resnet50有很深的网络层级，可以用来当特征提取器
2.resnet50是在大规模的数据集ImageNet上做的预训练，这种迁移学习的方法通常能够带来更好的性能，尤其是在有限的数据集上。

1.导入resnet50

代码如下（示例）：

import cv2
import numpy as np
import os
import warnings
import time
from tqdm import tqdm
import  shutil
import torch
import torch.nn as nn
from torchvision.models import resnet50
from  torchvision.transforms import transforms
model = resnet50(pretrained=True).cuda()

2.获取图片数据

def get_image_data(filepath:str) -> list:
    '''
        get images
    :param filepath:
        image files' path
    :return:
    image_data:
        a list of images
    '''
    image_data = []
    for root, dirs, files in os.walk(filepath):
        for file in files:
            if file.lower().endswith(('.png', '.jpg', '.jpeg')):
                image_data.append(cv2.cvtColor(cv2.imread(os.path.join(root, file)),cv2.COLOR_BGR2RGB))
    return image_data

3.处理图片数据

    def extract_features(image_data,model)->list:
        '''
            :param image_data:
        	a list of images
        	model:
        	neanmodel
            use resnet50 to extract the feature_map
        '''
        features = []
        transform = transforms.Compose([
            transforms.ToPILImage(),
            transforms.Resize((224, 224)),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
        ])
        for image in image_data:
            img = transform(image)
            img = img.unsqueeze(0)
            with torch.no_grad():
                feature = model(img)
            features.append(feature.squeeze().numpy())
        return features

三、使用自编码器

最基本的一个想法就是使用自编码器的encoder-decoder结构，encoder中使用若干层卷积和BatchNorm，docoder中使用若干层反卷积+BatchNorm。使用自编码器来重构图像训练整个网络反过来训练encoder结构，然后使用训练好的encoder器来充当特征提取器。

1.自编码器网络结构

代码如下：

class Autoencoder(nn.Module):
    def __init__(self):
        super(Autoencoder, self).__init__()

        self.encoder = nn.Sequential(
            nn.Conv2d(4, 16, kernel_size=3, stride=2, padding=1),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.Conv2d(16, 32, kernel_size=3, stride=2, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(),
        )

        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(32, 16, kernel_size=3, stride=2, padding=1, output_padding=1),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.ConvTranspose2d(16, 4, kernel_size=3, stride=2, padding=1, output_padding=1),
            nn.BatchNorm2d(4),
            nn.Sigmoid(),
        )

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

    def encode(self,x):
        x = self.encoder(x)
        return x

    def decode(self,x):
        x = self.decoder(x)
        return x

2.训练过程

代码如下：

num_epochs = 10
for epoch in range(num_epochs):
    # 训练
    start = time.time()
    for batch in train_loader:
        images = batch.cuda()

        # 前向传递
        outputs = autoencoder(images)

        # 计算损失
        loss = criterion(outputs, images)

        # 反向传播优化
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    # 验证集测试
    val_loss = 0.0
    with torch.no_grad():
        for data in val_loader:
            val_images = data.cuda()
            val_outputs = autoencoder(val_images)
            val_loss += criterion(val_outputs, val_images).item()

    val_loss /= len(val_loader)
    print(f'Epoch [{epoch + 1}/{num_epochs}], Training Loss: {loss.item():.4f}, Validation Loss: {val_loss:.4f},Time:{time.time()-start:.4f} seconds')

四.聚类

1.Kmeans-GPU

由于已经使用了神经网络来进行特征提取，那也就会想将Kmeans聚类过程放到GPU上，这里我在Github上找到一个Kmeans的GPU版本贴一下原链接，为了方便我自己使用我改成了单例类模式

Github：Kmeans-GPU

下面是整合的单例类代码：

class Kmeans_GPU():
    @staticmethod
    def cosine_distance(obs, centers):
        obs_norm = obs / obs.norm(dim=1, keepdim=True)
        centers_norm = centers / centers.norm(dim=1, keepdim=True)
        cos = torch.matmul(obs_norm, centers_norm.transpose(1, 0))
        return 1 - cos

    @staticmethod
    def l2_distance(obs, centers):
        dis = ((obs.unsqueeze(dim=1) - centers.unsqueeze(dim=0)) ** 2.0).sum(dim=-1).squeeze()
        return dis

    @staticmethod
    def _kmeans_batch(obs: torch.Tensor,
                      k: int,
                      distance_function,
                      batch_size=0,
                      thresh=1e-5,
                      norm_center=False):
        # k x D
        centers = obs[torch.randperm(obs.size(0))[:k]].clone()
        history_distances = [float('inf')]
        if batch_size == 0:
            batch_size = obs.shape[0]
        while True:
            # (N x D, k x D) -> N x k
            segs = torch.split(obs, batch_size)
            seg_center_dis = []
            seg_center_ids = []
            for seg in segs:
                distances = distance_function(seg, centers)
                center_dis, center_ids = distances.min(dim=1)
                seg_center_ids.append(center_ids)
                seg_center_dis.append(center_dis)

            obs_center_dis_mean = torch.cat(seg_center_dis).mean()
            obs_center_ids = torch.cat(seg_center_ids)
            history_distances.append(obs_center_dis_mean.item())
            diff = history_distances[-2] - history_distances[-1]
            if diff < thresh:
                if diff < 0:
                    warnings.warn("Distance diff < 0, distances: " + ", ".join(map(str, history_distances)))
                break
            for i in range(k):
                obs_id_in_cluster_i = obs_center_ids == i
                if obs_id_in_cluster_i.sum() == 0:
                    continue
                obs_in_cluster = obs.index_select(0, obs_id_in_cluster_i.nonzero().squeeze())
                c = obs_in_cluster.mean(dim=0)
                if norm_center:
                    c /= c.norm()
                centers[i] = c
        return centers, history_distances[-1]

    @staticmethod
    def kmeans(obs: torch.Tensor, k: int,
               distance_function=l2_distance,
               iter=20,
               batch_size=0,
               thresh=1e-5,
               norm_center=False):
        """
               Performs k-means on a set of observation vectors forming k clusters.

               Parameters
               ----------
               obs : torch.Tensor
                  Each row of the M by N array is an observation vector.

               k : int
                  The number of centroids to generate. A code is assigned to
                  each centroid, which is also the row index of the centroid
                  in the code_book matrix generated.

                  The initial k centroids are chosen by randomly selecting
                  observations from the observation matrix.

               distance_function : function, optional
                  The function to calculate distances between observations and centroids.
                  Default value: l2_distance

               iter : int, optional
                  The number of times to run k-means, returning the codebook
                  with the lowest distortion. This parameter does not represent the
                  number of iterations of the k-means algorithm.

               batch_size : int, optional
                  Batch size of observations to calculate distances, if your GPU memory can NOT handle all observations.
                  Default value is 0, which will send all observations into distance_function.

               thresh : float, optional
                  Terminates the k-means algorithm if the change in
                  distortion since the last k-means iteration is less than
                  or equal to thresh.

               norm_center : False, optional
                  Whether to normalize the centroids while updating every centroid.

               Returns
               -------
               best_centers : torch.Tensor
                  A k by N array of k centroids. The i'th centroid
                  codebook[i] is represented with the code i. The centroids
                  and codes generated represent the lowest distortion seen,
                  not necessarily the globally minimal distortion.

               best_distance : float
                  The mean distance between the observations passed and the best centroids generated.
               """
        best_distance = float("inf")
        best_centers = None
        for i in range(iter):
            if batch_size == 0:
                batch_size == obs.shape[0]
            centers, distance = Kmeans_GPU._kmeans_batch(obs, k,
                                                         norm_center=norm_center,
                                                         distance_function=distance_function,
                                                         batch_size=batch_size,
                                                         thresh=thresh)
            if distance < best_distance:
                best_centers = centers
                best_distance = distance
        return best_centers, best_distance

    @staticmethod
    def product_quantization(data, sub_vector_size, k, **kwargs):
        centers = []
        for i in range(0, data.shape[1], sub_vector_size):
            sub_data = data[:, i:i + sub_vector_size]
            sub_centers, _ = Kmeans_GPU.kmeans(sub_data, k=k, **kwargs)
            centers.append(sub_centers)
        return centers

    @staticmethod
    def data_to_pq(data, centers):
        assert (len(centers) > 0)
        assert (data.shape[1] == sum([cb.shape[1] for cb in centers]))

        m = len(centers)
        sub_size = centers[0].shape[1]
        ret = torch.zeros(data.shape[0], m,
                          dtype=torch.uint8,
                          device=data.device)
        for idx, sub_vec in enumerate(torch.split(data, sub_size, dim=1)):
            dis = Kmeans_GPU.l2_distance(sub_vec, centers[idx])
            ret[:, idx] = dis.argmin(dim=1).to(dtype=torch.uint8)
        return ret

    @staticmethod
    def train_product_quantization(data, sub_vector_size, k, **kwargs):
        center_list = Kmeans_GPU.product_quantization(data, sub_vector_size, k, **kwargs)
        pq_data = Kmeans_GPU.data_to_pq(data, center_list)
        return pq_data, center_list

    @staticmethod
    def _gram(x):
        (bs, ch, h, w) = x.size()
        f = x.view(bs, ch, w * h)
        f_T = f.transpose(1, 2)
        G = f.bmm(f_T) / (ch * h * w)
        return G

    @staticmethod
    def pq_distance_book(pq_centers):
        assert (len(pq_centers) > 0)

        pq = torch.zeros(len(pq_centers),
                         len(pq_centers[0]),
                         len(pq_centers[0]),
                         device=pq_centers[0].device)
        for ci, center in enumerate(pq_centers):
            for i in range(len(center)):
                dis = Kmeans_GPU.l2_distance(center[i:i + 1, :], center)
                pq[ci, i] = dis
        return pq

    @staticmethod
    def asymmetric_table(query, centers):
        m = len(centers)
        sub_size = centers[0].shape[1]
        ret = torch.zeros(
            query.shape[0], m, centers[0].shape[0],
            device=query.device)
        assert (query.shape[1] == sum([cb.shape[1] for cb in centers]))
        for i, offset in enumerate(range(0, query.shape[1], sub_size)):
            sub_query = query[:, offset: offset + sub_size]
            ret[:, i, :] = Kmeans_GPU.l2_distance(sub_query, centers[i])
        return ret

    @staticmethod
    def asymmetric_distance_slow(asymmetric_tab, pq_data):
        ret = torch.zeros(asymmetric_tab.shape[0], pq_data.shape[0])
        for i in range(asymmetric_tab.shape[0]):
            for j in range(pq_data.shape[0]):
                dis = 0
                for k in range(pq_data.shape[1]):
                    sub_dis = asymmetric_tab[i, k, pq_data[j, k].item()]
                    dis += sub_dis
                ret[i, j] = dis
        return ret

    @staticmethod
    def asymmetric_distance(asymmetric_tab, pq_data):
        pq_db = pq_data.long()
        dd = [torch.index_select(asymmetric_tab[:, i, :], 1, pq_db[:, i]) for i in range(pq_data.shape[1])]
        return sum(dd)

    @staticmethod
    def pq_distance(obj, centers, pq_disbook):
        ret = torch.zeros(obj.shape[0], centers.shape[0])
        for obj_idx, o in enumerate(obj):
            for ct_idx, c in enumerate(centers):
                for i, (oi, ci) in enumerate(zip(o, c)):
                    ret[obj_idx, ct_idx] += pq_disbook[i, oi.item(), ci.item()]
        return ret

2.聚类过程

使用了tqmd和shutil库来进行对图片直接分类
代码如下：

    def cluster_GPU(image_paths:list,features:list,k:int,savepath:str):
        '''
        :param image_paths:
            images'path;
        :param features:
            a list of images'feature map;
        :param k:
            cluster center nums;
        :param savepath:
            cluster's results' savepath
        :return:
        '''
        features_gpu = torch.from_numpy(np.array(features)).to('cuda')
        for i in range(k):
            cluster_folder = os.path.join(savepath, f'Cluster_{i}')
            os.makedirs(cluster_folder, exist_ok=True)
        centers, _ = Kmeans_GPU.kmeans(features_gpu.half(), k, distance_function=Kmeans_GPU.cosine_distance, iter=10)
        for i, image_path in tqdm(enumerate(image_paths), desc="Copying images"):
            distances = Kmeans_GPU.cosine_distance(features_gpu[i].unsqueeze(0).half(), centers)
            cluster_label = torch.argmin(distances).item()
            cluster_folder = os.path.join(savepath, f'Cluster_{cluster_label}')
            # 构造新的文件路径
            new_image_path = os.path.join(cluster_folder, os.path.basename(image_path))
            # 复制图片
            shutil.copy(image_path, new_image_path)

总结

本次项目尝试了两种不同的方式来进行无监督图像分类，主要的思想还是使用CNN来进行图像深度特征提取，然后使用kmeans进行聚类分析，对于要分成多少类，即k的个数可以使用网格搜索的思想去进行遍历寻找到最优k，这里也贴一份寻找最优k的代码块：
代码如下：

    def get_best_k(features,min:int,max:int):
        distortions = []  # 存储每个 K 对应的畸变值
        features_gpu = torch.from_numpy(np.array(features)).to('cuda')
        for n_clusters in range(min,max+1):
            print(f"Running K-means for K = {n_clusters}")
            # 使用GPU加速的K均值聚类
            centers, _ = Kmeans_GPU.kmeans(features_gpu.half(), n_clusters, distance_function=Kmeans_GPU.kmeans.cosine_distance, iter=10)

            # 将数据移到 CPU 上并转换为单精度
            centers = centers.float().cpu()
            features_gpu = features_gpu.float().cpu()

            # 计算距离
            distances = Kmeans_GPU.cosine_distance(features_gpu, centers)

            # 将数据移到 GPU 上
            distances = distances.to('cuda')

            # 计算畸变
            distortion = distances.min(dim=1).values.mean().item()
            distortions.append(distortion)
            features_gpu = features_gpu.to('cuda')

        # 找到畸变下降趋势平稳的 K 值
        best_k = min + np.argmin(np.gradient(distortions))  # 使用畸变梯度变化最小的 K 值

        print(f"The best K value is: {best_k}")
        return best_k