1 图像
图像是人类视觉的基础,是物体反射或透射光的分布,我们认为图像就是光的矩阵,矩阵上点的值代表光的强度,这些点被称为像素点(pixel)。
彩色图像可看做是色彩空间中的三维张量,比如RGB色彩空间,RGB信息分别存放在3个独立的“通道(channel)”上。
typedef struct {
int w, h, c;
float *data;
} image;
2 图像缩放和插值
图像缩放是指对图像大小进行调整。
比如
4
×
4
→
7
×
7
4\times4\rightarrow7\times7
4×4→7×7图像放大:
图像缩放的任务就是确定每一个新的像素点的值。方法是:
- 匹配像素点坐标。
- 用某种算法确定新的像素点的值。
2.1 匹配像素点坐标
在最邻近算法(nearest neighbor)和双线性插值算法(bilinear interpolation)中,要选取变换坐标后像素点邻近4个像素点进行新像素点值的计算,像素点的匹配:
a X + b = Y aX + b = Y aX+b=Y
在双三次插值算法中(bicubric interpolation),要选取变换坐标后邻近的16个像素点进行线性插值,像素点的匹配:
f ( x ) = a + b x + c x 2 + d x 3 f(x) = a + bx+cx^2+dx^3 f(x)=a+bx+cx2+dx3
双三次插值算法比较复杂,这里介绍最邻近算法和双线性插值算法的像素点匹配:
(
−
0.5
,
−
0.5
)
→
(
−
0.5
,
−
0.5
)
(
6.5
,
6.5
)
→
(
3.5
,
3.5
)
−
0.5
×
a
+
b
=
−
0.5
6.5
×
a
+
b
=
3.5
⇒
a
=
4
7
,
b
=
−
3
14
(-0.5, -0.5) \rightarrow(-0.5,-0.5) \\ (6.5, 6.5) \rightarrow(3.5,3.5) \\ -0.5\times a + b = -0.5 \\ 6.5\times a + b = 3.5 \\ \Rightarrow a = \frac{4}{7}, b = -\frac{3}{14}
(−0.5,−0.5)→(−0.5,−0.5)(6.5,6.5)→(3.5,3.5)−0.5×a+b=−0.56.5×a+b=3.5⇒a=74,b=−143
对于新图像上的某个像素点
(
1
,
3
)
(1, 3)
(1,3),可以匹配到原图像中的坐标位置:
i
=
4
7
×
1
−
3
14
=
5
14
j
=
4
7
×
3
−
3
14
=
21
14
i = \frac{4}{7}\times1 - \frac{3}{14} = \frac{5}{14} \\ j = \frac{4}{7}\times3 - \frac{3}{14} = \frac{21}{14}
i=74×1−143=145j=74×3−143=1421
2.2 最邻近算法
最邻近算法就是匹配到原来的坐标后,选择距离该坐标最近的像素点的值代表该新的像素点的值。
C语言实例代码:
float nn_interpolate(image im, float x, float y, int c)
{
// get nearest neighbor
int i = round(x);
int j = round(y);
return get_pixel(im, i, j, c);
}
image nn_resize(image im, int w, int h)
{
if (!w || !h)
return make_image(1,1,1);
// (w, h) -> (im.w, im.h)
// a * X + b -> Y
float w_a, w_b, h_a, h_b;
w_a = (float)im.w / w;
w_b = 0.5 * (w_a - 1);
h_a = (float)im.h / h;
h_b = 0.5 * (h_a - 1);
image ret = make_image(w, h, im.c);
for (int k = 0; k < im.c; k++) {
for (int j = 0; j < h; j++) {
for (int i = 0; i < w; i++) {
// map coordinates to orignal
float x = w_a * i + w_b;
float y = h_a * j + h_b;
float val = nn_interpolate(im, x, y, k);
set_pixel(ret, i, j, k, val);
}
}
}
return ret;
}
2.3 双线性插值算法
双线性插值算法是在匹配的像素点上通过邻近的4个像素点进行线性插值。
q
1
=
V
1
×
d
2
+
V
2
×
d
1
\quad\quad q1 = V1\times d2 + V2\times d1
q1=V1×d2+V2×d1
q
2
=
V
3
×
d
2
+
V
4
×
d
1
\qquad q2 = V3\times d2 + V4\times d1
q2=V3×d2+V4×d1
q
=
q
1
×
d
4
+
q
2
×
d
3
=
V
1
×
A
1
+
V
2
×
A
2
+
V
3
×
A
3
+
V
4
×
A
4
\begin{aligned} \quad q & = q1\times d4 + q2\times d3 \\ & = V1\times A1 + V2\times A2 + V3\times A3 + V4\times A4 \end{aligned}
q=q1×d4+q2×d3=V1×A1+V2×A2+V3×A3+V4×A4
C语言示例代码:
float bilinear_interpolate(image im, float x, float y, int c)
{
// the 4 point covering the pixel
int w_floor = floor(x);
int w_ceil = ceil(x);
int h_floor = floor(y);
int h_ceil = ceil(y);
float val_left_up = get_pixel(im, w_floor, h_floor, c);
float val_right_up = get_pixel(im, w_ceil, h_floor, c);
float val_left_down = get_pixel(im, w_floor, h_ceil, c);
float val_right_down = get_pixel(im, w_ceil, h_ceil, c);
float h1 = y - h_floor;
float h2 = 1 - h1;
float q1 = val_left_up * h2 + val_left_down * h1;
float q2 = val_right_up * h2 + val_right_down * h1;
float w1 = x - w_floor;
float w2 = 1 - w1;
float val = q1 * w2 + q2 * w1;
return val;
}
image bilinear_resize(image im, int w, int h) {
if (!w || !h)
return make_image(1,1,1);
// (w, h) -> (im.w, im.h)
// a * X + b -> Y
float w_a, w_b, h_a, h_b;
w_a = (float)im.w / w;
w_b = 0.5 * (w_a - 1);
h_a = (float)im.h / h;
h_b = 0.5 * (h_a - 1);
image ret = make_image(w, h, im.c);
for (int k = 0; k < im.c; k++) {
for (int j = 0; j < h; j++) {
for (int i = 0; i < w; i++) {
// map coordinates to orignal
float x = w_a * i + w_b;
float y = h_a * j + h_b;
float val = bilinear_interpolate(im, x, y, k);
set_pixel(ret, i, j, k, val);
}
}
}
return ret;
}
3 卷积
卷积是图像处理中强大且广泛使用的技术。我们将两个函数
f
(
x
)
f(x)
f(x) 和
g
(
x
)
g(x)
g(x) 的卷积定义为积分:
f
(
x
)
∗
g
(
x
)
=
∫
−
∞
∞
f
(
u
)
g
(
x
−
u
)
d
u
f(x)*g(x)=\int_{-\infty}^{\infty}f(u)g(x-u)\,du
f(x)∗g(x)=∫−∞∞f(u)g(x−u)du
该积分可以描述为:点扩散函数(PSF)
g
(
x
)
g(x)
g(x) 对函数
f
(
x
)
f(x)
f(x) 的每个点的作用的累积。
把卷积用于数字图像中时,上面的公式要有两个变化:
- 要使用双积分,因为数字图像每个通道有2维度;
- 积分必须改成离散求和。
数字图像上的卷积公式为:
F
(
x
,
y
)
=
f
(
x
,
y
)
∗
g
(
x
,
y
)
=
∑
i
∑
j
f
(
i
,
j
)
g
(
x
−
i
,
y
−
j
)
F(x,y)=f(x,y)*g(x,y)=\sum_i\sum_jf(i,j)g(x-i,y-j)
F(x,y)=f(x,y)∗g(x,y)=i∑j∑f(i,j)g(x−i,y−j)
其中
g
g
g 被称为卷积模板(function),也称卷积核(kernel)、过滤器(filter)。在使用卷积模板前必须将其翻转
18
0
o
180^o
180o(这个是由卷积的定义而来),所以我们可以预先翻转(以后代码或公式中的卷积模板都是已经预先翻转过的):
h
(
x
,
y
)
=
g
(
−
x
,
−
y
)
h(x,y)=g(-x,-y)
h(x,y)=g(−x,−y)
再得到卷积公式:
F
(
x
,
y
)
=
∑
i
∑
j
f
(
x
+
i
,
y
+
j
)
h
(
i
,
j
)
F(x,y)=\sum_i\sum_jf(x+i,y+j)h(i,j)
F(x,y)=i∑j∑f(x+i,y+j)h(i,j)
即将修改后的卷积模板和对应的邻近值相乘求和。在
3
×
3
3\times 3
3×3 的邻域中卷积模板可以表达为:
[
h
4
h
3
h
2
h
5
h
0
h
1
h
6
h
7
h
8
]
\begin{bmatrix} h4 & h3 & h2 \\ h5 & h0 & h1 \\ h6 & h7 & h8 \end{bmatrix}
⎣⎡h4h5h6h3h0h7h2h1h8⎦⎤
卷积算法为:
for all pixels in image do {
Q0 = P0*h0 + P1*h1 + P2*h2 + P3*h3 + P4*h4
+ P5*h5 + P6*h6 + P7*h7 + P8*h8;
}
C语言示例代码:
/*
* This convolution basic use same width and height as orignal image, stride is 1.
* "preserve = 1" represents im keeps its channel, or it change to 1 channel
*/
image convolve_image(image im, image filter, int preserve)
{
assert(filter.c == 1 || im.c == filter.c);
// get the result image channel after convolution
int channel;
if(preserve == 1) {
channel = im.c;
} else {
channel = 1;
}
// result image
image convoluted_image = make_image(im.w, im.h, channel);
float sum = 0;
for (int j = 0; j < im.h; j ++) {
for (int i = 0; i < im.w; i++) {
// find the match neibor pixels idx
int min_x = i - filter.w / 2;
int min_y = j - filter.h / 2;
// sum the kernel
sum = 0;
for (int k = 0; k < im.c; k++) {
float filter_pix, im_pix;
for (int jj = 0; jj < filter.h; jj++) {
for (int ii = 0; ii < filter.w; ii++) {
int im_x = min_x + ii;
int im_y = min_y + jj;
im_pix = get_pixel(im, im_x, im_y, k);
if (filter.c >1) {
filter_pix = get_pixel(filter, ii, jj, k);
} else {
filter_pix = get_pixel(filter, ii, jj, 0);
}
sum += filter_pix * im_pix;
}
}
// if convoluted_image's channel > 1, the kernel sum should be seperate in each channel
if (channel > 1) {
set_pixel(convoluted_image, i, j, k, sum);
sum = 0;
}
}
// if convoluted_image's channel == 1, the kernel sum should contain all the channels
if (channel == 1) {
set_pixel(convoluted_image, i, j, 0, sum);
}
}
}
// clamp image
// clamp_image(convoluted_image);
return convoluted_image;
}
常见简单卷积核有:
卷积核 | 名称 | 作用 |
Highpass Kernel | 边缘检测 | |
Sharpen Kernel | 锐化图片 | |
Emboss Kernel | stylin' |