广州大学计算机视觉实验四：图像分割

最新推荐文章于 2024-06-02 23:06:30 发布

wujiekd

最新推荐文章于 2024-06-02 23:06:30 发布

阅读量2.1k

点赞数 1

分类专栏：比赛+项目开源方案文章标签：计算机视觉

本文链接：https://blog.csdn.net/weixin_43999137/article/details/119320530

版权

比赛+项目开源方案专栏收录该内容

29 篇文章 8 订阅

订阅专栏

一、实验目的

本实验课程是计算机、智能、物联网等专业学生的一门专业课程，通过实验，帮助学生更好地掌握计算机视觉相关概念、技术、原理、应用等；通过实验提高学生编写实验报告、总结实验结果的能力；使学生对计算机视觉、模式识别实现等有比较深入的认识。
1.掌握模式识别中涉及的相关概念、算法。
2.熟悉计算机视觉中的具体编程方法；
3.掌握问题表示、求解及编程实现。

二、基本要求

1.实验前，复习《计算机视觉与模式识别》课程中的有关内容。
2.准备好实验数据。
3.编程要独立完成，程序应加适当的注释。
4.完成实验报告。

三、实验软件

使用Python实现。

四、实验内容

选择任意图片，分别采用以下技术进行图像分割Image Segmentation

通过filter bank提取的纹理特征进行图像分割
结合像素值与坐标的k-means聚类，进行图像分割
结合像素值与坐标的mean shift聚类，进行图像分割
通过graph partition图分割的方式进行图像分割

五、实验过程

1. 通过filter bank提取的纹理特征进行图像分割

1、背景
filter bank
参考文献：Contour and Texture Analysis for Image Segmentation
原文来源：通过不同变换、旋转二维的高斯滤波器得到的组合

图像分割的操作可以分为三个步骤：

①　使用一组滤波器卷积图像
②　通过滤波器组输出的聚类向量来查找texton，这一步其实已经可以得到分割的图像了。
③　最后使用到步骤二得到的聚类中心点，然后计算texton直方图，并且采用图切割的方法得到最终的分割图像，这一步由于实现较复杂，舍弃。

2、导入库

import numpy as np
import cv2
import matplotlib.pyplot as plt
import scipy
from skimage import data, segmentation, color
from skimage.future import graph

3、制作filter bank
一共48个滤波器
制作filter bank，主要通过变换旋转2维的高斯滤波器得到的组合

def gaussian1d(sigma, mean, x, ord):
    x = np.array(x)
    x_ = x - mean
    var = sigma ** 2

    # Gaussian Function
    g1 = (1 / np.sqrt(2 * np.pi * var)) * (np.exp((-1 * x_ * x_) / (2 * var)))

    if ord == 0:
        g = g1
        return g
    elif ord == 1:
        g = -g1 * ((x_) / (var))
        return g
    else:
        g = g1 * (((x_ * x_) - var) / (var ** 2))
        return g


def gaussian2d(sup, scales):
    var = scales * scales
    shape = (sup, sup)
    n, m = [(i - 1) / 2 for i in shape]
    x, y = np.ogrid[-m:m + 1, -n:n + 1]
    g = (1 / np.sqrt(2 * np.pi * var)) * np.exp(-(x * x + y * y) / (2 * var))
    return g


def log2d(sup, scales):
    var = scales * scales
    shape = (sup, sup)
    n, m = [(i - 1) / 2 for i in shape]
    x, y = np.ogrid[-m:m + 1, -n:n + 1]
    g = (1 / np.sqrt(2 * np.pi * var)) * np.exp(-(x * x + y * y) / (2 * var))
    h = g * ((x * x + y * y) - var) / (var ** 2)
    return h


def makefilter(scale, phasex, phasey, pts, sup):
    gx = gaussian1d(3 * scale, 0, pts[0, ...], phasex)
    gy = gaussian1d(scale, 0, pts[1, ...], phasey)

    image = gx * gy

    image = np.reshape(image, (sup, sup))
    return image


def makeLMfilters():
    sup = 49
    scalex = np.sqrt(2) * np.array([1, 2, 3])
    norient = 6
    nrotinv = 12

    nbar = len(scalex) * norient
    nedge = len(scalex) * norient
    nf = nbar + nedge + nrotinv
    F = np.zeros([sup, sup, nf])
    hsup = (sup - 1) / 2

    x = [np.arange(-hsup, hsup + 1)]
    y = [np.arange(-hsup, hsup + 1)]

    [x, y] = np.meshgrid(x, y)

    orgpts = [x.flatten(), y.flatten()]
    orgpts = np.array(orgpts)

    count = 0
    for scale in range(len(scalex)):
        for orient in range(norient):
            angle = (np.pi * orient) / norient
            c = np.cos(angle)
            s = np.sin(angle)
            rotpts = [[c + 0, -s + 0], [s + 0, c + 0]]
            rotpts = np.array(rotpts)
            rotpts = np.dot(rotpts, orgpts)
            F[:, :, count] = makefilter(scalex[scale], 0, 1, rotpts, sup)
            F[:, :, count + nedge] = makefilter(scalex[scale], 0, 2, rotpts, sup)
            count = count + 1

    count = nbar + nedge
    scales = np.sqrt(2) * np.array([1, 2, 3, 4])

    for i in range(len(scales)):
        F[:, :, count] = gaussian2d(sup, scales[i])
        count = count + 1

    for i in range(len(scales)):
        F[:, :, count] = log2d(sup, scales[i])
        count = count + 1

    for i in range(len(scales)):
        F[:, :, count] = log2d(sup, 3 * scales[i])
        count = count + 1

    return F


for i in range(0,18):
    plt.subplot(3,6,i+1)
    plt.axis('off')
    plt.imshow(F[:,:,i], cmap = 'gray')

在这里插入图片描述

for i in range(0,18):
    plt.subplot(3,6,i+1)
    plt.axis('off')
    plt.imshow(F[:,:,i+18], cmap = 'gray')

在这里插入图片描述

for i in range(0,12):
    plt.subplot(4,4,i+1)
    plt.axis('off')
    plt.imshow(F[:,:,i+36], cmap = 'gray')

在这里插入图片描述

4、导入图片
导入Imagenet数据集的一张小狗图片，将它转换为灰度图。

img = cv2.imread("./images/771.jpg", cv2.IMREAD_GRAYSCALE)
img = cv2.resize(img, dsize=(100, 100), interpolation=cv2.INTER_CUBIC)
img_org = img.copy()
print(img_org.shape)
plt.imshow(img_org, cmap='gray')
plt.show()

在这里插入图片描述

5、使用filter bank卷积图像

plt.figure(figsize=(100, 200))
hyper_col = np.empty([img.shape[0],img.shape[1],48])
img= np.float32(img)
for i in range(0,48):    
    plt.subplot(16,3,i+1)
    plt.axis('off')
    kernel = F[:,:,i]
    hyper_col[:,:,i] = cv2.filter2D(img,-1,kernel)
    plt.imshow(hyper_col[:,:,i], cmap = 'gray')

展示部分：
在这里插入图片描述

6、通过滤波器组输出再聚类来得到分割后的图片
采用K均值聚类，主要区分小狗前景和背景即可，图像中的小狗旁边还有一个小玩具，所以定义为3类。

#展开数据
hyper_col_data = hyper_col.copy().reshape(-1,48)
hyper_col_data = np.float32(hyper_col_data)
sq_hcd = np.power(hyper_col_data.copy(),2)
sum_sq_hcd = np.sum(sq_hcd,1)
L2_norm_hcd = np.power(sum_sq_hcd,1/2)
norm_factor = np.log10(1+(L2_norm_hcd/0.03))
norm_hyper_col_data = np.empty([hyper_col_data.shape[0],hyper_col_data.shape[1]])
for hi in range(0,hyper_col_data.shape[0]):
    norm_hyper_col_data[hi,:] = (hyper_col_data[hi,:] * norm_factor[hi])/L2_norm_hcd[hi]
norm_hyper_col_data = np.float32(norm_hyper_col_data)

criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
K = 3
#Kmeans聚类，聚K类
ret,label,center=cv2.kmeans(norm_hyper_col_data,K,None,criteria,10,cv2.KMEANS_RANDOM_CENTERS)


res_img = label.reshape(img.shape[0],img.shape[1])
plt.axis('off')
plt.imshow(res_img, cmap='gray')

在这里插入图片描述

2.结合像素值与坐标的k-means聚类，进行图像分割

还是采取前面使用到的小狗图像，对灰度图分析，存在三个特征，X坐标，Y坐标，灰度值。聚类可视化结果如下：

import numpy as np
import cv2
import matplotlib.pyplot as plt
import scipy
from skimage import data, segmentation, color
from skimage.future import graph
img = cv2.imread("./771.jpg", cv2.IMREAD_GRAYSCALE)
img = cv2.resize(img, dsize=(100, 100), interpolation=cv2.INTER_CUBIC)
img_org = img.copy()


# 提取三个特征：X坐标、Y坐标、灰度值
img_fea = np.empty([img.shape[0],img.shape[1],3])
for i in range(100):
    for j in range(100):
        img_fea[i][j][0] = i
        img_fea[i][j][1] = j
        img_fea[i][j][2] = img[i][j]

img_fea_data = img_fea.copy().reshape(-1,3)
img_fea_data = np.float32(img_fea_data)

criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
K = 3
#Kmeans聚类，聚K类
ret,label,center=cv2.kmeans(img_fea_data,K,None,criteria,10,cv2.KMEANS_RANDOM_CENTERS)

res_img = label.reshape(img.shape[0],img.shape[1])
plt.imshow(res_img, cmap='gray')

在这里插入图片描述

对RGB图像分析，存在五个特征，X坐标，Y坐标，RGB三通道值。聚类可视化结果如下：

import numpy as np
import cv2
import matplotlib.pyplot as plt
import scipy
from skimage import data, segmentation, color
from skimage.future import graph
img = cv2.imread("./771.jpg")
img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
img = cv2.resize(img, dsize=(100, 100), interpolation=cv2.INTER_CUBIC)
img_org = img.copy()


# 提取五个特征：X坐标、Y坐标、RGB三通道


img_fea = np.empty([img.shape[0], img.shape[1], 5])
for i in range(100):
    for j in range(100):
        img_fea[i][j][0] = i
        img_fea[i][j][1] = j
        img_fea[i][j][2] = img[i][j][0]
        img_fea[i][j][3] = img[i][j][1]
        img_fea[i][j][4] = img[i][j][2]

img_fea_data = img_fea.copy().reshape(-1, 5)
img_fea_data = np.float32(img_fea_data)

criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
K = 3
# Kmeans聚类，聚K类
ret, label, center = cv2.kmeans(img_fea_data, K, None, criteria, 10, cv2.KMEANS_RANDOM_CENTERS)

res_img = label.reshape(img.shape[0],img.shape[1])
plt.imshow(res_img)

在这里插入图片描述

3.结合像素值与坐标的mean shift聚类，进行图像分割

对灰度图分析，存在三个特征，X坐标，Y坐标，灰度值。 聚类可视化结果如下：
import numpy as np
import cv2
import matplotlib.pyplot as plt
import scipy
from skimage import data, segmentation, color
from skimage.future import graph

img = cv2.imread("./771.jpg", cv2.IMREAD_GRAYSCALE)
img = cv2.resize(img, dsize=(100, 100), interpolation=cv2.INTER_CUBIC)
img_org = img.copy()

# 提取三个特征：X坐标、Y坐标、灰度值
img_fea = np.empty([img.shape[0], img.shape[1], 3])
for i in range(100):
    for j in range(100):
        img_fea[i][j][0] = i
        img_fea[i][j][1] = j
        img_fea[i][j][2] = img[i][j]

img_fea_data = img_fea.copy().reshape(-1, 3)
img_fea_data = np.float32(img_fea_data)

from sklearn.cluster import MeanShift, estimate_bandwidth

bandwidth2 = estimate_bandwidth(img_fea_data, quantile=0.1, n_samples=100)

ms = MeanShift(bandwidth2, bin_seeding=True)
ms.fit(img_fea_data)
label = ms.labels_
res_img = label.reshape(img.shape[0], img.shape[1])
plt.imshow(res_img, cmap='gray')

在这里插入图片描述

对RGB图像分析，存在五个特征，X坐标，Y坐标，RGB三通道值。聚类可视化结果如下：

import numpy as np
import cv2
import matplotlib.pyplot as plt
import scipy
from skimage import data, segmentation, color
from skimage.future import graph
img = cv2.imread("./771.jpg")
img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
img = cv2.resize(img, dsize=(100, 100), interpolation=cv2.INTER_CUBIC)
img_org = img.copy()

# 提取五个特征：X坐标、Y坐标、RGB三通道
img_fea = np.empty([img.shape[0], img.shape[1], 5])
for i in range(100):
    for j in range(100):
        img_fea[i][j][0] = i
        img_fea[i][j][1] = j
        img_fea[i][j][2] = img[i][j][0]
        img_fea[i][j][3] = img[i][j][1]
        img_fea[i][j][4] = img[i][j][2]

img_fea_data = img_fea.copy().reshape(-1, 5)
img_fea_data = np.float32(img_fea_data)

from sklearn.cluster import MeanShift, estimate_bandwidth

bandwidth2 = estimate_bandwidth(img_fea_data, quantile=0.1, n_samples=100)

ms = MeanShift(bandwidth2, bin_seeding=True)
ms.fit(img_fea_data)
label = ms.labels_
res_img = label.reshape(img.shape[0], img.shape[1])
plt.imshow(res_img)

在这里插入图片描述

4.通过graph partition图分割的方式进行图像分割

图分割算法的计算量非常大，将原图resize至30乘于30，并且仅做二值分割，分割前景和后景。
图分割实现的寻找最大流的算法为：Ford-Fulkerson。

得到的分割结果如下：

在这里插入图片描述

主要代码如下：

import cv2
import numpy as np


class GraphEmbedding:
    def __init__(self, path_img=None, array_input=None, sigma=20, resize=30):

        self.resizing_factor = resize
        if path_img == None:
            self.img_array = array_input
            self.width, self.height = (
                self.img_array.shape[0],
                self.img_array.shape[1],
            )
        else:
            self.img_array = self.OpenImg(path_img)
            self.width = self.height = self.resizing_factor
        self.embeddings_matrix = np.zeros(
            (self.height * self.width + 2, self.height * self.width + 2)
        )
        self.sigma = sigma

    def OpenImg(self, path):
        image = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
        self.original_size = (image.shape[0], image.shape[1])
        image = cv2.resize(image, (self.resizing_factor, self.resizing_factor))

        return image

    def compute_weight(self, pixel1, pixel2):
        penalty = 100 * np.exp(
            (-((pixel1 - pixel2) ** 2)) / (2 * (self.sigma ** 2))
        )
        return penalty

    def compute_edges(self):
        self.max_capacity = -np.inf
        for i in range(self.height):
            for j in range(self.width):
                l = i * self.width + j
                if i < self.height - 1:
                    k = (i + 1) * self.width + j
                    self.embeddings_matrix[k, l] = self.compute_weight(
                        self.img_array[i, j], self.img_array[i + 1, j]
                    )
                    self.embeddings_matrix[l, k] = self.embeddings_matrix[k, l]
                    self.max_capacity = max(
                        self.max_capacity, self.embeddings_matrix[k, l]
                    )
                if j < self.width - 1:
                    k = i * self.width + j + 1
                    self.embeddings_matrix[k, l] = self.compute_weight(
                        self.img_array[i, j], self.img_array[i, j + 1]
                    )
                    self.embeddings_matrix[l, k] = self.embeddings_matrix[k, l]
                    self.max_capacity = max(
                        self.max_capacity, self.embeddings_matrix[k, l]
                    )

    def compute_edges_source_sink(self, clusters_centers):

        for i in range(self.height):
            for j in range(self.width):
                l = i * self.width + j
                self.embeddings_matrix[-2][l] = self.compute_weight(
                    self.img_array[i, j], clusters_centers[0]
                )
        for i in range(self.height):
            for j in range(self.width):
                l = i * self.width + j
                self.embeddings_matrix[l][-1] = self.compute_weight(
                    self.img_array[i, j], clusters_centers[1]
                )

    def compute_graph(self, clusters_centers):
        self.compute_edges()
        self.compute_edges_source_sink(clusters_centers)



from queue import *
import numpy as np
import maxflow
from PIL import Image
import cv2


def BFS(ResGraph, V, s, t, parent):
    """
    Breadth first search algo.
    """
    q = Queue()
    VISITED = np.zeros(V, dtype=bool)
    q.put(s)
    VISITED[s] = True
    parent[s] = -1

    while not q.empty():
        p = q.get()
        for vertex in range(V):
            if (not VISITED[vertex]) and ResGraph[p][vertex] > 0:
                q.put(vertex)
                parent[vertex] = p
                VISITED[vertex] = True
    return VISITED[vertex]


def DFS(ResGraph, V, s, VISITED):
    """
    depth first search
    """
    current = [s]
    while current:
        v = current.pop()
        if not VISITED[v]:
            VISITED[v] = True
            current.extend([u for u in range(V) if ResGraph[v][u]])


def FordFulkerson(graph, s, t):
    print("Running Ford-Fulkerson algorithm")
    ResGraph = graph.copy()
    V = len(graph)
    parent = np.zeros(V, dtype="int32")

    while BFS(ResGraph, V, s, t, parent):
        pathFlow = float("inf")
        v = t
        while v != s:
            u = parent[v]
            pathFlow = min(pathFlow, ResGraph[u][v])
            v = parent[v]

        v = t
        while v != s:
            u = parent[v]
            ResGraph[u][v] -= pathFlow
            ResGraph[v][u] += pathFlow
            v = parent[v]

    VISITED = np.zeros(V, dtype=bool)
    DFS(ResGraph, V, s, VISITED)

    all_cuts = []

    for i in range(V):
        for j in range(V):
            if VISITED[i] and not VISITED[j] and graph[i][j]:
                all_cuts.append((i, j))
    return all_cuts


def boykov_kolmog(img_path, lbda, sigma, fore_grnd_sample, back_grnd_sample):
    """
    Implements Kolmogorov Boykov graph cut algorithm for image segmentation
    params:
    img_path : path to the input image
    lbda : hyperparameter of the cost function, defines similarity between pixels
    sigma : hyperparameter of the cost function, decay parameter.
    fore_grnd_sample : bounding box of the manually selected foreground area
    back_grnd_sample : bounding box of the manually selected background area
    """
    img = Image.open(img_path).convert("L")
    img_foreground = img.crop(fore_grnd_sample)
    img_background = img.crop(back_grnd_sample)
    img, img_foreground, img_background = (
        np.array(img),
        np.array(img_foreground),
        np.array(img_background),
    )
    fore_mean = np.mean(
        cv2.calcHist([img_foreground], [0], None, [256], [0, 256])
    )
    back_mean = np.mean(
        cv2.calcHist([img_background], [0], None, [256], [0, 256])
    )

    # initalizing foreground and background probabilities
    Foreground = np.ones(img.shape)
    Background = np.ones(img.shape)
    img_vec = img.reshape(-1, 1)
    H, W = img.shape[:2]

    # Initialize Graph
    graph = maxflow.Graph[int](H, W)
    tree = maxflow.Graph[int]()

    # Construct Trees
    nodes, nodeids = graph.add_nodes(H * W), tree.add_grid_nodes(img.shape)
    tree.add_grid_edges(nodeids, 0), tree.add_grid_tedges(
        nodeids, img, 255 - img
    )
    gr = tree.maxflow()
    segments = tree.get_grid_segments(nodeids)

    for i in range(H):
        for j in range(W):
            Foreground[i, j] = -np.log(
                abs(img[i, j] - fore_mean)
                / (abs(img[i, j] - fore_mean) + abs(img[i, j] - back_mean))
            )
            Background[i, j] = -np.log(
                abs(img[i, j] - back_mean)
                / (abs(img[i, j] - back_mean) + abs(img[i, j] - fore_mean))
            )
    Foreground = Foreground.reshape(-1, 1)
    Background = Background.reshape(-1, 1)

    # Normalizing
    for i in range(img_vec.shape[0]):
        img_vec[i] = img_vec[i] / np.linalg.norm(img_vec[i])

    for i in range(H * W):
        ws = Foreground[i] / (
            Foreground[i] + Background[i]
        )  # Calculating source weight
        wt = Background[i] / (
            Foreground[i] + Background[i]
        )  # Calculating sink weight
        graph.add_tedge(i, ws[0], wt)

        # Dealing with pixels on the border of the image
        if i % W != 0:
            w = lbda * np.exp(-(abs(img_vec[i] - img_vec[i - 1]) ** 2) / sigma)
            graph.add_edge(i, i - 1, w[0], lbda - w[0])

        if (i + 1) % W != 0:
            w = lbda * np.exp(-(abs(img_vec[i] - img_vec[i + 1]) ** 2) / sigma)
            graph.add_edge(i, i + 1, w[0], lbda - w[0])
        if i // W != 0:
            w = lbda * np.exp(-(abs(img_vec[i] - img_vec[i - W]) ** 2) / sigma)
            graph.add_edge(i, i - W, w[0], lbda - w[0])
        if i // W != H - 1:
            w = lbda * np.exp(-(abs(img_vec[i] - img_vec[i + W]) ** 2) / sigma)
            graph.add_edge(i, i + W, w[0], lbda - w[0])

    print("Maximum Flow: {}".format(gr))

    # Get binary labels and return mask
    segments_ = np.zeros(nodes.shape)
    for i in range(len(nodes)):
        segments_[i] = graph.get_segment(nodes[i])  # Get binary classification
    segments_ = segments_.reshape(img.shape[0], img.shape[1])
    mask = 255 * np.ones((img.shape[0], img.shape[1]))
    for i in range(img.shape[0]):
        for j in range(img.shape[1]):
            if segments[i, j] == False:
                mask[i, j] = 0
    return mask

wujiekd

关注

1
点赞
踩
16

收藏

觉得还不错? 一键收藏
打赏
2
评论
广州大学计算机视觉实验四：图像分割

实验四图像分割目录实验四图像分割一、实验目的二、基本要求三、实验软件四、实验内容五、实验过程1. 通过filter bank提取的纹理特征进行图像分割2.结合像素值与坐标的k-means聚类，进行图像分割3.结合像素值与坐标的mean shift聚类，进行图像分割4.通过graph partition图分割的方式进行图像分割一、实验目的本实验课程是计算机、智能、物联网等专业学生的一门专业课程，通过实验，帮助学生更好地掌握计算机视觉相关概念、技术、原理、应用等；通过实验提高学生编写实验报告、总结
复制链接

扫一扫