Machine Vision Fundamentals (1)_fundamentals of machine vision-CSDN博客

本文链接：https://blog.csdn.net/upr_rom/article/details/122683641

I will summarize the content about basic knowledge and related MATLAB codes from ME5405 (Machine Vision).

Book Reference:

Image Processing: Analysis, and Machine Vision M. Sonka, V. Hlavac R. Boyle.

ImageProcessingAnalysisandMachineVision-机器学习文档类资源-CSDN文库

Digital Image Processing: 4/e R. C. Gonzalez, R. E. Woods

Scope of Machine Vision

Human Vision is just a small part of the whole pictures in natures. We use equipments to collect pictures in the entire range of electromagnetic spectrum and use Computer visions (including Machine Vision) to process the data and obtain what we want. In Machine Vision, we process the data for the application in ME, CS and other aspects.

In healthcare, we use gamma rays to scan humans' bones, X-rays to head CT and so on.

In manufacturing, we use the whole camera system consisting of cameras, lens, illuminations, image processing system and mechanical parts to inspect the quality of the objects.

Human vision is one very important part of humans and we use it to process the informations and make decisions. Similarly, we could use Machine Vision to extract any features from the image and make decision. So Machine Vision is crucial for us to achieve automation.

Difficulty of Machine Vision

1. loss of information in 3D to 2D

Except the depth camera, based on the pinhole model, we project 3D objects into one plane. So we do not know the distance of the object. But if we use two cameras, it's possible for us to calculate the size of the objects (SLAM techiniques) .

2. interpretation

People extract freatures from visual cortex and use the features to make decisions. However, if we want the computer to analyze the pictures just like the human, we should design algorithms.

3. noise

In the images, there may be many noisy points that influence the process of the image.

4. Too much data

For one picture, it's acutally a tuple of several big matrices including many pixels. If we have to process many images at once, it will be huge computation.

5. Brightness

We could not make sure the pictures' brightness which depends on the illumination of the lights, the position of the camera, the surface geometry and reflectance properties of the objects and so on. So the average of the brightness is not useful in many scenarios, but the relative brightness is useful.

6. need foe global view

we should recognize the objects by the whole picture rather than several pixels or local features.

MATLAB basic codes

% MATLAB 

%  我们将在当前目录下创建/打开一个 叫create_image.m 的文件
edit create_image


%{
矩阵信息处理
}%


% 将图片输入，变成矩阵
% imread 返回的值一般都是unit8 [0,255]之间
picture_model = imread('model.jpg'); 


% 从矩阵中分别提取RGB值
modelR = model(:,:,1);
modelG = model(:,:,2);
modelB = model(:,:,3);


# RGB矩阵变成灰度矩阵
modelGray = rgb2gray(model);


% 展示矩阵（图片）的信息(尺寸，大小，类型)
whos model;
size(modelGray)

%{
图像展示
}%


% 建立图片figure
figure(1);


% 将矩阵展示为图片
image(picture_model)
subimage(model)
imshow(picture_model)
%{
对于RGB图像，矩阵为m*n*3
image 显示图像时会对图像尺寸进行缩放
imshow 显示图像时会根据图像的原始尺寸进行展示


对于gray图像，矩阵为m*n
对于灰度图像（属于索引图像），我们也是需要将灰度图像中的每一个数值转化成RGB或者灰度值。
image 对于灰度图像，会将索引值直接转化为RGB64（直接映射），数值在[1,64]进行clip
这是因为默认的colormap 是 RGB64, 我们可以设置  当前的colormap，来修改 image的映射关系
colormap( gray(256) )


imshow 对于灰度图像，会将索引值转换为 gray(256) (直接映射)，数值在[0,255]之间
另外,imshow也可以使用线性映射 imshow(picture_model,[]) ，等同于 imagesc(model)
参考： https://blog.csdn.net/paoxungan5156/article/details/103319237
}%


% 添加 显示色条
colorbar;
% 添加坐标轴
axis image
% 或者
axis on
% 关闭坐标轴
axis off 
% 添加标题
title('the image of the model')

Image Representation

For every pixels in the image, we often use 8-bit, 16-bit, 24-bit(真彩色) or 32-bit to save the color pattern. For 8-bit, we have 256 options for one pixels.

We can consider the image as a continuous function of intensity. For 8 bits, the intensity is [0, 256]. For 16 bits, the intensity is [0, 2^16-1]. In this way, we do not need to consider whether the image is RGB or gray. For the gray image, we could use gray(256) or other precision.

The image is modeled as the function $f(x,y,t)$ .

$f(x,y) = i(x,y) \times r(x,y)$

$i(x,y)$ - illumination components (the amount of the light) range([0,infinity])

$r(x,y)$ - reflection components (the amount of light reflected by the object) range([0,1])

For black silk, the reflectance is 0.01. For the silver plate, it's 0.9.

When the object in the real world is projected into a plane in the camera, the light is continuous, so the brightness and the continuous function are continuous. But when we want to save the image into the physical memory, we have to divide it into the discrete numbers in the matrix by the Image Sampling and Quantization (a subpart of digital signal processing)

Image Sampling will turn the continuous function into a matrix.

Quantization can make the intensity integral.

For M*N image, the center of the image is $(floor(\frac{M}{2}),floor(\frac{N}{2}))$

Because in MATLAB, the numbering starts at1, the center of the image is $(floor(\frac{M}{2}+1),floor(\frac{N}{2}+1))$

Image Resolution is the measure of sampling density, which provides the relationship between the pixel dimensions and physical dimensions. Besides, pixels per inch (ppi) can also measure the resolution of the image.

$pixel\ size = \frac{1}{ppi}\ per \ inch$

Raster Dimension: the number of horizontal and vertical samples in the pixel grid, such as 1080i (1920 X 1080), 2K (2048 X 1536) for 4:3 aspect ratio.

If we scale down the image, the raster dimension will be reduced. And if we scale up the image, the raster dimension will be increased.

Color depth: the maximum number of data a pixel can store.

For monochrome, $f(i,j)=0,1,2,...,K-1$

for binary. K=2; for grayscale: K can be $2^8$ , [0,255] for f(x,y)

For RGB Color Images, we use RGB to describe every pixels. $f(i,j,R),f(i,j,G),f(i,j,B)$ .

For 24 bits, we use [0,255] to describe every channel of RGB. But for 16 bits and 8 bits, we use less value for every channel.

Topological Properties

Connectivity 被用来判断两个点（相同性质的点）之间的连通性。

connectivity 分为4-neighborhood和8-neighborhood两种。

Distance 分为 Euclidean | city block | chess board 三种

$Euclidean: D_E=\sqrt{(x_2-x_1)^2+(y_2-y_1)^2} \\$

$City\ block: D_4=|x_2-x_1|+|y_2-y_1|$ ，对应 4-neighborhood

$Chess\ board: D_8=max(|x_2-x_1|,|y_2-y_1|)$ ，对应8-neighborhood

Histogram: 描述图片中不同强度（intensity）出现的频率，可以说明图片的对比度，欠曝光或者过曝光。

%{
使用MATLAB，得到histogram数据
}%

H=zeros(1,levels);

for i = 1:size(im,1)
    for j = 1:size(im,2)
        H( im(i,j) ) = H( im(i,j) ) +1;
    end
end


%{
现成的算法: 直接显示histogram图片
}%
h = imhist(model) 

% imhist(model,b)   b将说明将该像素分成多少部分，默认是256，也就是每一个强度都是单独的一个部分

Image Pre-processing: Image enhancement (放大一些需要的特征：比如增大局部区域的对比度) and image restoration (消除噪点)

图像的预处理分为时域和频域的图像处理。

时域的图像处理又分为单点的图像处理（intensity transformation）和考虑周围像素的图像处理(spatial filtering)

$g(x,y)=T[f(x,y)]$

通过T就可以实现对图片强度的转变。

1. Intensity Transformation:

因为拍摄条件的原因等等，图像的强度会出现整体偏高或者偏低的情况（一直保持在某部分），为了更好的使用图片的信息，我们可以采用强度转换，实现图片的histogram较为均一。因为强度转换不考虑每个像素点的周围像素，也不考虑像素所处的位置，因此在强度转换中我们只是将每个点的强度值进行转变，因此转换方程就成了

$s = T(r)$ ，其中 r 是输入点的强度，s是输出点的强度。而且s 和 r的取值范围都是[0,L-1]

1.1 Negative transformation

有时候我们想要的特征像素是浅色的，而背景是深色的，我们因此需要进行反变换。

$s=T(r)=L-1-r$

1.2. Histogram Processing

为了将强度分布较为密集的图像转换为强度分布较为分散的图像，我们可以将强度整体进行拓展，实现较小范围的强度转变为较大范围的强度。

Contrast Stretching

%{
Contrast Stretching
}%

% 将得到 一个工具箱，能够将当前图片根据手柄进行 对比度延申
imtool(modelGray)

我们通过此方法将强度函数进行改变，实际上是线性地扩大某部分（中间部分）的对比度，将中间部分的对比度上升，展现出来之前不便于观察的信息。

Brightness Thresholding

此方法可以将灰度图像转换为黑白图像，只展现灰度值大于T的像素信息。在图像分割，物体识别时会用到。

Intensity-Level Slicing

选择性选取部分的信息

Dynamic Range Compression

可以处理强度范围不在[0,L-1]，并且范围极宽（跨越好几个数量级）的图片，扩大低强度区域的范围，从而使得低强度范围的信息可见。

Histogram Equalization

将强度信息进行均一化处理。

通过转换方程 $s=T(r)$ ，实现 $p_s(s)=\frac{1}{L-1}$ 。

因为我们知道 $p_s(s)\ ds = p_r(r) \ dr$

同时我们也知道 $ds=\frac{\partial T}{\partial r}dr$

所以 $\frac{\partial T}{\partial r}=\frac{ds}{dr}=\frac{p_r(r)}{p_s(s)}=(L-1)p_r(r)$

因此， $s=T(r)=(L-1) \int_{0}^{r} p_r(w)dw$

对于离散式版本：

$s_k=T(r_k)=(L-1)\sum_{i=0}^{k} p_r(q_i),\ k=0,1,...,L-1$ ，然后进行四舍五入取整

Histogram Specification

我们可以不采用均一化的方式，而采用其他我们想要的 $p_s(s)$ ，这样子我们可以实现其他的强度分布。

我们可以先通过 $z = G(s)$ 实现均一化，使得 $p_z(z)=\frac{1}{L-1}$

同时我们也使用均一化使得 $z=T(r)$

因此，我们可以得到 $G(s)=T(r)$ ,

因此可以得到该转换函数 $s=G^{-1}[T(r)]$

对于离散式也是同样的道理。

2. Geometric Transformations

可以改变像素的空间排布，主要包括两个操作：坐标系的转换，以及在转换好的坐标系中进行强度的插值。

2.1 Image Registration

实现参考图片与扭曲图片之间的转换。（已知一些对应点在两个坐标系中的坐标）

(v,w)是输入图片的坐标系，(x,y)是输出图片的坐标系。

我们可以根据对应点的信息 $[(v_1,w_1),(x_1,y_1)],...,[(v_n,w_n),(x_n,y_n)]$ 对转换方程进行评估：如双线性拟合：

$x=c_1 v+c_2w+c_3vw+c_4$

$y=c_5v+c_6w+c_7vw+c_8$

2.2 Brightness Interpolation

插值方法：

% 得到图像的高 和 宽
r = size(im, 1);  % 高  x轴
c = size(im, 2);  % 宽  y轴

%{
得到原图像的X,Y轴坐标  [xmin,xmax],[ymin,ymax]，分别是从[1,width],[1,height]

try 
...
catch 
...
end 
如果try 不成功，就进行catch中的 


}%
% compute the range of original image
try xrange = axesoforig.x; yrange = axesoforig.y;
catch xrange=1:c; yrange=1:r; end

%{
meshgrid 得到图像光栅中的所有点的坐标，origin.xi  origin.yi 的尺寸都是 size(width,height)
}%
[orig.xi, orig.yi] = meshgrid(xrange, yrange);

% corner points of the image
% 得到原图像坐标的边界点的坐标 size(2,4)
orig.u = [[min(xrange) min(yrange)]',[min(xrange) max(yrange)]', ...
[max(xrange) min(yrange)]', [max(xrange) max(yrange)]'];

% 将第三列设置成 ones(1,4) 的向量
% make homogeneous
orig.u(3,:)=1;

% map forward
% 通过转换矩阵，得到变换后的图像
forward.x = T * orig.u;
% compute the limits of the ouput image (bounding box)
% 得到真实的变换后的图像的尺寸，因为第三行可能不是1，说明尺寸有变换
forward.x(1:2,:) = forward.x(1:2,:)./repmat(forward.x(3,:),2,1);

% 得到变换后矩阵的边界点（在原始图片的坐标系中）
maxx = max(forward.x(1,:));
minx = min(forward.x(1,:));
maxy = max(forward.x(2,:));
miny = min(forward.x(2,:));

% 变换后矩阵的坐标系，以及图像光栅中所有点的坐标
axesofnew.x = minx:step:maxx;
axesofnew.y = miny:step:maxy;
[u, v] = meshgrid( axesofnew.x, axesofnew.y);

% 建立矩阵  以 [point_x point_y 1]' 为每一列，列数为height*width  
x2 = [u(:) v(:) ones(size(v(:)))]';

%得到变换后图像点 对应的原始图片中的点的位置
x1 = inv(T) * x2;
% normalization
x1(1:2,:) = x1(1:2,:) ./ repmat(x1(3,:),2,1);

%将矩阵变换回 (width,height)的尺寸
new.xi = reshape(x1(1,:),size(u));
new.yi = reshape(x1(2,:),size(v));

% 如果原图像只是一个矩阵，不是一系列矩阵，则只对单层进行插值，如果是多层，则对多层进行插值
%{
interp2( axesoforigin_x, axesoforigin_y, double(image), axesofnew_x, axesofnew_y, method)

axesoforigin_x,y 原图像的x,y轴坐标
double(image)   将原图像变成double类型，方便后续插值的精确性 
axesofnew_x,y  变换后图像在原图像中的坐标位置
method 表示插值方法： 'linear'双线性插值  'nearest'最邻近插值 'spline'三次样条插值 'cubic'双三次插值

interp1 是对一元函数进行插值:  
interp1(x,y,xq.method)
x,y是原始点的坐标， xq是插值点的坐标， method: linear, nearest, next, previous, pchip, cubic, spline
}%
layers = size(im,3);
if layers > 1
    im_out = zeros(length(axesofnew.y),length(axesofnew.x),layers);
    for i=1:layers
        im_out(:,:,1) = ...
            interp2(orig.xi, orig.yi, double(im(:,:,i)), ...
                new.xi, new.yi, method);
     end
else
     im_out = interp2(orig.xi, orig.yi, double(im), new.xi, new.yi, method);
end

Thresholding：阈值分割，可以将灰度图片转换为黑白图片。通过将背景去除，然后就可以得到图片中的各种物体。

model = imread('lecture0_model.png');
model_gray = rgb2gray(model);


%{
multithresh算法使用Otsu算法 将强度值分成 N+1部分，求解出来N个阈值
Multithresh(model,N)
结果是 1*N的阈值向量
}%

threshhold_value = multithresh(model_gray,2);

%{
imquantize将 原强度值矩阵 根据阈值向量，分成1,2,...,N+1个部分，每部分都是一个label， 一共N+1个label 
}%
seg_model = imquantize(model_gray, threshhold_value)


%{
label2rgb    为了将label进行有效的显示，可以将label值 映射为RGB值
映射函数可以根据默认给定，也可以自己添加
}%
% 此时的RGB图片只有 21种值
RGB_image = label2rgb(seg_model)

对于有些问题，我们只需要形状信息|物体分割，因此对黑白图片进行处理比较方便。

Thresholding

我们首先需要将灰度图像转换为黑白图像

但是有时候会出现较大的重叠；或者两部分的方差差别较大的时候这样的方法就不太可操作。

Optimum Global Thresholding (Otsu's Method)

我们可以将阈值分割看作是数据决策问题，减少在划分像素时出现的整体误差。

OTSU算法（大津法—最大类间方差法）原理及实现_小武的博客-CSDN博客_大津法

基于边界进行阈值分割

我们为了将物体环境进行区分，我们先找到物体的边界。

我们需要两次扫描。一次对X方向，一次对Y方向。在每个方向对每行进行扫描，如果该像素点与下一个像素点分别在阈值两侧，那么就将当前点设置成 $L_E$ 。然后将两次得到的 $L_E$ 合并（或关系），那就得到了物体的边界点。

当我们找到边界点之后，我们就可以将边界点内的物体变成黑色，外部变成白色，也就完成了物体分割，接下来就需要对物体进行标签化。

元件标记 component labelling

Multi-pass Method

Forward scan: 我们先沿着X轴方向，以每个点作为中心点，然后进行4 connectivity or 8 connectivity，如果周围没有已经标记好的点的话，那就将这个点设置为新的label点；如果周围有的话就设置为最小的label值。然后整体扫描完。

Backward Scan: 我们然后沿着X轴反方向，再以每个点作为中心点，进行 connectivity or 8 connectivity，如果该中心点和周围的点有不同的label，那么就将这两个label进行匹配，扫描完就可以得到所有匹配好的重复的label。

最后将这些匹配好的label设置成相同的label。

区域分析 Region Analysis

我们可以对每个区域的面积，中心点和周长进行测量。

% region analysis

%  MATLAB中 可以提取 面积，中心点，周长，其他属性 
regionprops()

# 彩色图片基本属性