相关资料
广州大学计算机视觉实验一:图像处理入门
广州大学计算机视觉实验二:摄像机几何
广州大学计算机视觉实验三:图像滤波
广州大学计算机视觉实验四:图像分割
广州大学计算机视觉实验五:简易数字识别
广州大学计算机视觉实验六:车牌识别
六份实验报告下载链接Click me🔗
实验四 图像分割
一、实验目的
本实验课程是计算机、智能、物联网等专业学生的一门专业课程,通过实验,帮助学生更好地掌握计算机视觉相关概念、技术、原理、应用等;通过实验提高学生编写实验报告、总结实验结果的能力;使学生对计算机视觉、模式识别实现等有比较深入的认识。
1.掌握模式识别中涉及的相关概念、算法。
2.熟悉计算机视觉中的具体编程方法;
3.掌握问题表示、求解及编程实现。
二、基本要求
1.实验前,复习《计算机视觉与模式识别》课程中的有关内容。
2.准备好实验数据。
3.编程要独立完成,程序应加适当的注释。
4.完成实验报告。
三、实验软件
使用Python实现。
四、实验内容
选择任意图片,分别采用以下技术进行图像分割Image Segmentation
- 通过filter bank提取的纹理特征进行图像分割
- 结合像素值与坐标的k-means聚类,进行图像分割
- 结合像素值与坐标的mean shift聚类,进行图像分割
- 通过graph partition图分割的方式进行图像分割
五、实验过程
1. 通过filter bank提取的纹理特征进行图像分割
1、背景
filter bank
参考文献:Contour and Texture Analysis for Image Segmentation
原文来源:通过不同变换、旋转二维的高斯滤波器得到的组合
图像分割的操作可以分为三个步骤:
① 使用一组滤波器卷积图像
② 通过滤波器组输出的聚类向量来查找texton,这一步其实已经可以得到分割的图像了。
③ 最后使用到步骤二得到的聚类中心点,然后计算texton直方图,并且采用图切割的方法得到最终的分割图像,这一步由于实现较复杂,舍弃。
2、导入库
import numpy as np
import cv2
import matplotlib.pyplot as plt
import scipy
from skimage import data, segmentation, color
from skimage.future import graph
3、制作filter bank
一共48个滤波器
制作filter bank,主要通过变换旋转2维的高斯滤波器得到的组合
def gaussian1d(sigma, mean, x, ord):
x = np.array(x)
x_ = x - mean
var = sigma ** 2
# Gaussian Function
g1 = (1 / np.sqrt(2 * np.pi * var)) * (np.exp((-1 * x_ * x_) / (2 * var)))
if ord == 0:
g = g1
return g
elif ord == 1:
g = -g1 * ((x_) / (var))
return g
else:
g = g1 * (((x_ * x_) - var) / (var ** 2))
return g
def gaussian2d(sup, scales):
var = scales * scales
shape = (sup, sup)
n, m = [(i - 1) / 2 for i in shape]
x, y = np.ogrid[-m:m + 1, -n:n + 1]
g = (1 / np.sqrt(2 * np.pi * var)) * np.exp(-(x * x + y * y) / (2 * var))
return g
def log2d(sup, scales):
var = scales * scales
shape = (sup, sup)
n, m = [(i - 1) / 2 for i in shape]
x, y = np.ogrid[-m:m + 1, -n:n + 1]
g = (1 / np.sqrt(2 * np.pi * var)) * np.exp(-(x * x + y * y) / (2 * var))
h = g * ((x * x + y * y) - var) / (var ** 2)
return h
def makefilter(scale, phasex, phasey, pts, sup):
gx = gaussian1d(3 * scale, 0, pts[0, ...], phasex)
gy = gaussian1d(scale, 0, pts[1, ...], phasey)
image = gx * gy
image = np.reshape(image, (sup, sup))
return image
def makeLMfilters():
sup = 49
scalex = np.sqrt(2) * np.array([1, 2, 3])
norient = 6
nrotinv = 12
nbar = len(scalex) * norient
nedge = len(scalex) * norient
nf = nbar + nedge + nrotinv
F = np.zeros([sup, sup, nf])
hsup = (sup - 1) / 2
x = [np.arange(-hsup, hsup + 1)]
y = [np.arange(-hsup, hsup + 1)]
[x, y] = np.meshgrid(x, y)
orgpts = [x.flatten(), y.flatten()]
orgpts = np.array(orgpts)
count = 0
for scale in range(len(scalex)):
for orient in range(norient):
angle = (np.pi * orient) / norient
c = np.cos(angle)
s = np.sin(angle)
rotpts = [[c + 0, -s + 0], [s + 0, c + 0]]
rotpts = np.array(rotpts)
rotpts = np.dot(rotpts, orgpts)
F[:, :, count] = makefilter(scalex[scale], 0, 1, rotpts, sup)
F[:, :, count + nedge] = makefilter(scalex[scale], 0, 2, rotpts, sup)
count = count + 1
count = nbar + nedge
scales = np.sqrt(2) * np.array([1, 2, 3, 4])
for i in range(len(scales)):
F[:, :, count] = gaussian2d(sup, scales[i])
count = count + 1
for i in range(len(scales)):
F[:, :, count] = log2d(sup, scales[i])
count = count + 1
for i in range(len(scales)):
F[:, :, count] = log2d(sup, 3 * scales[i])
count = count + 1
return F
for i in range(0,18):
plt.subplot(3,6,i+1)
plt.axis('off')
plt.imshow(F[:,:,i], cmap = 'gray')
for i in range(0,18):
plt.subplot(3,6,i+1)
plt.axis('off')
plt.imshow(F[:,:,i+18], cmap = 'gray')
for i in range(0,12):
plt.subplot(4,4,i+1)
plt.axis('off')
plt.imshow(F[:,:,i+36], cmap = 'gray')
4、导入图片
导入Imagenet数据集的一张小狗图片,将它转换为灰度图。
img = cv2.imread("./images/771.jpg", cv2.IMREAD_GRAYSCALE)
img = cv2.resize(img, dsize=(100, 100), interpolation=cv2.INTER_CUBIC)
img_org = img.copy()
print(img_org.shape)
plt.imshow(img_org, cmap='gray')
plt.show()
5、使用filter bank卷积图像
plt.figure(figsize=(100, 200))
hyper_col = np.empty([img.shape[0],img.shape[1],48])
img= np.float32(img)
for i in range(0,48):
plt.subplot(16,3,i+1)
plt.axis('off')
kernel = F[:,:,i]
hyper_col[:,:,i] = cv2.filter2D(img,-1,kernel)
plt.imshow(hyper_col[:,:,i], cmap = 'gray')
展示部分:
6、通过滤波器组输出再聚类来得到分割后的图片
采用K均值聚类,主要区分小狗前景和背景即可,图像中的小狗旁边还有一个小玩具,所以定义为3类。
#展开数据
hyper_col_data = hyper_col.copy().reshape(-1,48)
hyper_col_data = np.float32(hyper_col_data)
sq_hcd = np.power(hyper_col_data.copy(),2)
sum_sq_hcd = np.sum(sq_hcd,1)
L2_norm_hcd = np.power(sum_sq_hcd,1/2)
norm_factor = np.log10(1+(L2_norm_hcd/0.03))
norm_hyper_col_data = np.empty([hyper_col_data.shape[0],hyper_col_data.shape[1]])
for hi in range(0,hyper_col_data.shape[0]):
norm_hyper_col_data[hi,:] = (hyper_col_data[hi,:] * norm_factor[hi])/L2_norm_hcd[hi]
norm_hyper_col_data = np.float32(norm_hyper_col_data)
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
K = 3
#Kmeans聚类,聚K类
ret,label,center=cv2.kmeans(norm_hyper_col_data,K,None,criteria,10,cv2.KMEANS_RANDOM_CENTERS)
res_img = label.reshape(img.shape[0],img.shape[1])
plt.axis('off')
plt.imshow(res_img, cmap='gray')
2.结合像素值与坐标的k-means聚类,进行图像分割
还是采取前面使用到的小狗图像,对灰度图分析,存在三个特征,X坐标,Y坐标,灰度值。 聚类可视化结果如下:
import numpy as np
import cv2
import matplotlib.pyplot as plt
import scipy
from skimage import data, segmentation, color
from skimage.future import graph
img = cv2.imread("./771.jpg", cv2.IMREAD_GRAYSCALE)
img = cv2.resize(img, dsize=(100, 100), interpolation=cv2.INTER_CUBIC)
img_org = img.copy()
# 提取三个特征:X坐标、Y坐标、灰度值
img_fea = np.empty([img.shape[0],img.shape[1],3])
for i in range(100):
for j in range(100):
img_fea[i][j][0] = i
img_fea[i][j][1] = j
img_fea[i][j][2] = img[i][j]
img_fea_data = img_fea.copy().reshape(-1,3)
img_fea_data = np.float32(img_fea_data)
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
K = 3
#Kmeans聚类,聚K类
ret,label,center=cv2.kmeans(img_fea_data,K,None,criteria,10,cv2.KMEANS_RANDOM_CENTERS)
res_img = label.reshape(img.shape[0],img.shape[1])
plt.imshow(res_img, cmap='gray')
对RGB图像分析,存在五个特征,X坐标,Y坐标,RGB三通道值。聚类可视化结果如下:
import numpy as np
import cv2
import matplotlib.pyplot as plt
import scipy
from skimage import data, segmentation, color
from skimage.future import graph
img = cv2.imread("./771.jpg")
img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
img = cv2.resize(img, dsize=(100, 100), interpolation=cv2.INTER_CUBIC)
img_org = img.copy()
# 提取五个特征:X坐标、Y坐标、RGB三通道
img_fea = np.empty([img.shape[0], img.shape[1], 5])
for i in range(100):
for j in range(100):
img_fea[i][j][0] = i
img_fea[i][j][1] = j
img_fea[i][j][2] = img[i][j][0]
img_fea[i][j][3] = img[i][j][1]
img_fea[i][j][4] = img[i][j][2]
img_fea_data = img_fea.copy().reshape(-1, 5)
img_fea_data = np.float32(img_fea_data)
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
K = 3
# Kmeans聚类,聚K类
ret, label, center = cv2.kmeans(img_fea_data, K, None, criteria, 10, cv2.KMEANS_RANDOM_CENTERS)
res_img = label.reshape(img.shape[0],img.shape[1])
plt.imshow(res_img)
3.结合像素值与坐标的mean shift聚类,进行图像分割
对灰度图分析,存在三个特征,X坐标,Y坐标,灰度值。 聚类可视化结果如下:
import numpy as np
import cv2
import matplotlib.pyplot as plt
import scipy
from skimage import data, segmentation, color
from skimage.future import graph
img = cv2.imread("./771.jpg", cv2.IMREAD_GRAYSCALE)
img = cv2.resize(img, dsize=(100, 100), interpolation=cv2.INTER_CUBIC)
img_org = img.copy()
# 提取三个特征:X坐标、Y坐标、灰度值
img_fea = np.empty([img.shape[0], img.shape[1], 3])
for i in range(100):
for j in range(100):
img_fea[i][j][0] = i
img_fea[i][j][1] = j
img_fea[i][j][2] = img[i][j]
img_fea_data = img_fea.copy().reshape(-1, 3)
img_fea_data = np.float32(img_fea_data)
from sklearn.cluster import MeanShift, estimate_bandwidth
bandwidth2 = estimate_bandwidth(img_fea_data, quantile=0.1, n_samples=100)
ms = MeanShift(bandwidth2, bin_seeding=True)
ms.fit(img_fea_data)
label = ms.labels_
res_img = label.reshape(img.shape[0], img.shape[1])
plt.imshow(res_img, cmap='gray')
对RGB图像分析,存在五个特征,X坐标,Y坐标,RGB三通道值。聚类可视化结果如下:
import numpy as np
import cv2
import matplotlib.pyplot as plt
import scipy
from skimage import data, segmentation, color
from skimage.future import graph
img = cv2.imread("./771.jpg")
img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
img = cv2.resize(img, dsize=(100, 100), interpolation=cv2.INTER_CUBIC)
img_org = img.copy()
# 提取五个特征:X坐标、Y坐标、RGB三通道
img_fea = np.empty([img.shape[0], img.shape[1], 5])
for i in range(100):
for j in range(100):
img_fea[i][j][0] = i
img_fea[i][j][1] = j
img_fea[i][j][2] = img[i][j][0]
img_fea[i][j][3] = img[i][j][1]
img_fea[i][j][4] = img[i][j][2]
img_fea_data = img_fea.copy().reshape(-1, 5)
img_fea_data = np.float32(img_fea_data)
from sklearn.cluster import MeanShift, estimate_bandwidth
bandwidth2 = estimate_bandwidth(img_fea_data, quantile=0.1, n_samples=100)
ms = MeanShift(bandwidth2, bin_seeding=True)
ms.fit(img_fea_data)
label = ms.labels_
res_img = label.reshape(img.shape[0], img.shape[1])
plt.imshow(res_img)
4.通过graph partition图分割的方式进行图像分割
图分割算法的计算量非常大,将原图resize至30乘于30,并且仅做二值分割,分割前景和后景。
图分割实现的寻找最大流的算法为:Ford-Fulkerson。
得到的分割结果如下:
主要代码如下:
import cv2
import numpy as np
class GraphEmbedding:
def __init__(self, path_img=None, array_input=None, sigma=20, resize=30):
self.resizing_factor = resize
if path_img == None:
self.img_array = array_input
self.width, self.height = (
self.img_array.shape[0],
self.img_array.shape[1],
)
else:
self.img_array = self.OpenImg(path_img)
self.width = self.height = self.resizing_factor
self.embeddings_matrix = np.zeros(
(self.height * self.width + 2, self.height * self.width + 2)
)
self.sigma = sigma
def OpenImg(self, path):
image = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
self.original_size = (image.shape[0], image.shape[1])
image = cv2.resize(image, (self.resizing_factor, self.resizing_factor))
return image
def compute_weight(self, pixel1, pixel2):
penalty = 100 * np.exp(
(-((pixel1 - pixel2) ** 2)) / (2 * (self.sigma ** 2))
)
return penalty
def compute_edges(self):
self.max_capacity = -np.inf
for i in range(self.height):
for j in range(self.width):
l = i * self.width + j
if i < self.height - 1:
k = (i + 1) * self.width + j
self.embeddings_matrix[k, l] = self.compute_weight(
self.img_array[i, j], self.img_array[i + 1, j]
)
self.embeddings_matrix[l, k] = self.embeddings_matrix[k, l]
self.max_capacity = max(
self.max_capacity, self.embeddings_matrix[k, l]
)
if j < self.width - 1:
k = i * self.width + j + 1
self.embeddings_matrix[k, l] = self.compute_weight(
self.img_array[i, j], self.img_array[i, j + 1]
)
self.embeddings_matrix[l, k] = self.embeddings_matrix[k, l]
self.max_capacity = max(
self.max_capacity, self.embeddings_matrix[k, l]
)
def compute_edges_source_sink(self, clusters_centers):
for i in range(self.height):
for j in range(self.width):
l = i * self.width + j
self.embeddings_matrix[-2][l] = self.compute_weight(
self.img_array[i, j], clusters_centers[0]
)
for i in range(self.height):
for j in range(self.width):
l = i * self.width + j
self.embeddings_matrix[l][-1] = self.compute_weight(
self.img_array[i, j], clusters_centers[1]
)
def compute_graph(self, clusters_centers):
self.compute_edges()
self.compute_edges_source_sink(clusters_centers)
from queue import *
import numpy as np
import maxflow
from PIL import Image
import cv2
def BFS(ResGraph, V, s, t, parent):
"""
Breadth first search algo.
"""
q = Queue()
VISITED = np.zeros(V, dtype=bool)
q.put(s)
VISITED[s] = True
parent[s] = -1
while not q.empty():
p = q.get()
for vertex in range(V):
if (not VISITED[vertex]) and ResGraph[p][vertex] > 0:
q.put(vertex)
parent[vertex] = p
VISITED[vertex] = True
return VISITED[vertex]
def DFS(ResGraph, V, s, VISITED):
"""
depth first search
"""
current = [s]
while current:
v = current.pop()
if not VISITED[v]:
VISITED[v] = True
current.extend([u for u in range(V) if ResGraph[v][u]])
def FordFulkerson(graph, s, t):
print("Running Ford-Fulkerson algorithm")
ResGraph = graph.copy()
V = len(graph)
parent = np.zeros(V, dtype="int32")
while BFS(ResGraph, V, s, t, parent):
pathFlow = float("inf")
v = t
while v != s:
u = parent[v]
pathFlow = min(pathFlow, ResGraph[u][v])
v = parent[v]
v = t
while v != s:
u = parent[v]
ResGraph[u][v] -= pathFlow
ResGraph[v][u] += pathFlow
v = parent[v]
VISITED = np.zeros(V, dtype=bool)
DFS(ResGraph, V, s, VISITED)
all_cuts = []
for i in range(V):
for j in range(V):
if VISITED[i] and not VISITED[j] and graph[i][j]:
all_cuts.append((i, j))
return all_cuts
def boykov_kolmog(img_path, lbda, sigma, fore_grnd_sample, back_grnd_sample):
"""
Implements Kolmogorov Boykov graph cut algorithm for image segmentation
params:
img_path : path to the input image
lbda : hyperparameter of the cost function, defines similarity between pixels
sigma : hyperparameter of the cost function, decay parameter.
fore_grnd_sample : bounding box of the manually selected foreground area
back_grnd_sample : bounding box of the manually selected background area
"""
img = Image.open(img_path).convert("L")
img_foreground = img.crop(fore_grnd_sample)
img_background = img.crop(back_grnd_sample)
img, img_foreground, img_background = (
np.array(img),
np.array(img_foreground),
np.array(img_background),
)
fore_mean = np.mean(
cv2.calcHist([img_foreground], [0], None, [256], [0, 256])
)
back_mean = np.mean(
cv2.calcHist([img_background], [0], None, [256], [0, 256])
)
# initalizing foreground and background probabilities
Foreground = np.ones(img.shape)
Background = np.ones(img.shape)
img_vec = img.reshape(-1, 1)
H, W = img.shape[:2]
# Initialize Graph
graph = maxflow.Graph[int](H, W)
tree = maxflow.Graph[int]()
# Construct Trees
nodes, nodeids = graph.add_nodes(H * W), tree.add_grid_nodes(img.shape)
tree.add_grid_edges(nodeids, 0), tree.add_grid_tedges(
nodeids, img, 255 - img
)
gr = tree.maxflow()
segments = tree.get_grid_segments(nodeids)
for i in range(H):
for j in range(W):
Foreground[i, j] = -np.log(
abs(img[i, j] - fore_mean)
/ (abs(img[i, j] - fore_mean) + abs(img[i, j] - back_mean))
)
Background[i, j] = -np.log(
abs(img[i, j] - back_mean)
/ (abs(img[i, j] - back_mean) + abs(img[i, j] - fore_mean))
)
Foreground = Foreground.reshape(-1, 1)
Background = Background.reshape(-1, 1)
# Normalizing
for i in range(img_vec.shape[0]):
img_vec[i] = img_vec[i] / np.linalg.norm(img_vec[i])
for i in range(H * W):
ws = Foreground[i] / (
Foreground[i] + Background[i]
) # Calculating source weight
wt = Background[i] / (
Foreground[i] + Background[i]
) # Calculating sink weight
graph.add_tedge(i, ws[0], wt)
# Dealing with pixels on the border of the image
if i % W != 0:
w = lbda * np.exp(-(abs(img_vec[i] - img_vec[i - 1]) ** 2) / sigma)
graph.add_edge(i, i - 1, w[0], lbda - w[0])
if (i + 1) % W != 0:
w = lbda * np.exp(-(abs(img_vec[i] - img_vec[i + 1]) ** 2) / sigma)
graph.add_edge(i, i + 1, w[0], lbda - w[0])
if i // W != 0:
w = lbda * np.exp(-(abs(img_vec[i] - img_vec[i - W]) ** 2) / sigma)
graph.add_edge(i, i - W, w[0], lbda - w[0])
if i // W != H - 1:
w = lbda * np.exp(-(abs(img_vec[i] - img_vec[i + W]) ** 2) / sigma)
graph.add_edge(i, i + W, w[0], lbda - w[0])
print("Maximum Flow: {}".format(gr))
# Get binary labels and return mask
segments_ = np.zeros(nodes.shape)
for i in range(len(nodes)):
segments_[i] = graph.get_segment(nodes[i]) # Get binary classification
segments_ = segments_.reshape(img.shape[0], img.shape[1])
mask = 255 * np.ones((img.shape[0], img.shape[1]))
for i in range(img.shape[0]):
for j in range(img.shape[1]):
if segments[i, j] == False:
mask[i, j] = 0
return mask