opencv图像的阈值操作

最新推荐文章于 2021-08-30 19:54:58 发布

longwinyang

最新推荐文章于 2021-08-30 19:54:58 发布

阅读量1.5k

点赞数

分类专栏： Opencv3.0学习

Opencv3.0学习专栏收录该内容

12 篇文章 0 订阅

订阅专栏

本节简介：OpenCV中的阈值(threshold)函数： threshold 的运用。

threshold

Applies a fixed-level threshold to each array element.

C++: double threshold(InputArray src, OutputArray dst, double thresh, double maxval, int type)

Python: cv2.threshold(src, thresh, maxval, type[, dst]) → retval, dst

C: double cvThreshold(const CvArr* src, CvArr* dst, double threshold, double max_value, int threshold_type)

Parameters:	src – input array (single-channel, 8-bit or 32-bit floating point). dst – output array of the same size and type as `src`. thresh – threshold value. maxval–maximum value to use with the `THRESH_BINARY` and `THRESH_BINARY_INV` thresholding types. type – thresholding type (see the details below).

Parameters:

src – input array (single-channel, 8-bit or 32-bit floating point).
dst – output array of the same size and type as src.
thresh – threshold value.

       
       maxval–maximum value to use with the THRESH_BINARY and THRESH_BINARY_INV thresholding types.
       
       type – thresholding type (see the details below).

The function applies fixed-level thresholding to a single-channel array. The function is typically used to get a bi-level (binary) image out of a grayscale image ( compare() could be also used for this purpose) or for removing a noise, that is, filtering out pixels with too small or too large values. There are several types of thresholding supported by the function. They are determined by type :

THRESH_BINARY

$\texttt{dst} (x,y) = \fork{\texttt{maxval}}{if $\texttt{src}(x,y) > \texttt{thresh}$}{0}{otherwise}$

THRESH_BINARY_INV

$\texttt{dst} (x,y) = \fork{0}{if $\texttt{src}(x,y) > \texttt{thresh}$}{\texttt{maxval}}{otherwise}$

THRESH_TRUNC

$\texttt{dst} (x,y) = \fork{\texttt{threshold}}{if $\texttt{src}(x,y) > \texttt{thresh}$}{\texttt{src}(x,y)}{otherwise}$

THRESH_TOZERO

$\texttt{dst} (x,y) = \fork{\texttt{src}(x,y)}{if $\texttt{src}(x,y) > \texttt{thresh}$}{0}{otherwise}$

THRESH_TOZERO_INV

$\texttt{dst} (x,y) = \fork{0}{if $\texttt{src}(x,y) > \texttt{thresh}$}{\texttt{src}(x,y)}{otherwise}$

Also, the special values THRESH_OTSU or THRESH_TRIANGLE may be combined with one of the above values. In these cases, the function determines the optimal threshold value using the Otsu’s or Triangle algorithm and uses it instead of the specified thresh . The function returns the computed threshold value. Currently, the Otsu’s and Triangle methods are implemented only for 8-bit images.

../../../_images/threshold.png

什么是阈值？

最简单的图像分割的方法。
应用举例：从一副图像中利用阈值分割出我们需要的物体部分（当然这里的物体可以是一部分或者整体）。这样的图像分割方法是基于图像中物体与背景之间的灰度差异，而且此分割属于像素级的分割。
为了从一副图像中提取出我们需要的部分，应该用图像中的每一个像素点的灰度值与选取的阈值进行比较，并作出相应的判断。（注意：阈值的选取依赖于具体的问题。即：物体在不同的图像中有可能会有不同的灰度值。
一旦找到了需要分割的物体的像素点，我们可以对这些像素点设定一些特定的值来表示。（例如：可以将该物体的像素点的灰度值设定为：‘0’（黑色）,其他的像素点的灰度值为：‘255’（白色）；当然像素点的灰度值可以任意，但最好设定的两种颜色对比度较强，方便观察结果）。

阈值化的类型：

OpenCV中提供了阈值（threshold）函数： threshold 。
这个函数有5种阈值化类型，在接下来的章节中将会具体介绍。
为了解释阈值分割的过程，我们来看一个简单有关像素灰度的图片，该图如下。该图中的蓝色水平线代表着具体的一个阈值。

阈值类型1：二进制阈值化

该阈值化类型如下式所示:

$\texttt{dst} (x,y) = \fork{\texttt{maxVal}}{if $\texttt{src}(x,y) > \texttt{thresh}$}{0}{otherwise}$
解释：在运用该阈值类型的时候，先要选定一个特定的阈值量，比如：125，这样，新的阈值产生规则可以解释为大于125的像素点的灰度值设定为最大值(如8位灰度值最大为255)，灰度值小于125的像素点的灰度值设定为0。

阈值类型2：反二进制阈值化

该阈值类型如下式所示：

$\texttt{dst} (x,y) = \fork{0}{if $\texttt{src}(x,y) > \texttt{thresh}$}{\texttt{maxVal}}{otherwise}$
解释：该阈值化与二进制阈值化相似，先选定一个特定的灰度值作为阈值，不过最后的设定值相反。（在8位灰度图中，例如大于阈值的设定为0，而小于该阈值的设定为255）。

阈值类型3：截断阈值化

该阈值化类型如下式所示：

$\texttt{dst} (x,y) = \fork{\texttt{threshold}}{if $\texttt{src}(x,y) > \texttt{thresh}$}{\texttt{src}(x,y)}{otherwise}$
解释：同样首先需要选定一个阈值，图像中大于该阈值的像素点被设定为该阈值，小于该阈值的保持不变。（例如：阈值选取为125，那小于125的阈值不改变，大于125的灰度值（230）的像素点就设定为该阈值）。

阈值类型4：阈值化为0

该阈值类型如下式所示：

$\texttt{dst} (x,y) = \fork{\texttt{src}(x,y)}{if $\texttt{src}(x,y) > \texttt{thresh}$}{0}{otherwise}$
解释：先选定一个阈值，然后对图像做如下处理：1 像素点的灰度值大于该阈值的不进行任何改变；2 像素点的灰度值小于该阈值的，其灰度值全部变为0。

阈值类型5：反阈值化为0

该阈值类型如下式所示：

$\texttt{dst} (x,y) = \fork{0}{if $\texttt{src}(x,y) > \texttt{thresh}$}{\texttt{src}(x,y)}{otherwise}$
解释：原理类似于0阈值，但是在对图像做处理的时候相反，即：像素点的灰度值小于该阈值的不进行任何改变，而大于该阈值的部分，其灰度值全部变为0。

cvtColor

Converts an image from one color space to another.

C++: void cvtColor (InputArray src, OutputArray dst, int code, int dstCn=0 )

Python: cv2. cvtColor (src, code [, dst [, dstCn ] ] ) → dst

C: void cvCvtColor (const CvArr* src, CvArr* dst, int code )

Parameters:

Parameters:	src – input image: 8-bit unsigned, 16-bit unsigned ( `CV_16UC...` ), or single-precision floating-point. dst – output image of the same size and depth as `src`. code – color space conversion code (see the description below). dstCn – number of channels in the destination image; if the parameter is 0, the number of the channels is derived automatically from `src` and `code` .

src – input image: 8-bit unsigned, 16-bit unsigned ( CV_16UC... ), or single-precision floating-point.
dst – output image of the same size and depth as src.
code – color space conversion code (see the description below).
dstCn – number of channels in the destination image; if the parameter is 0, the number of the channels is derived automatically from src and code .

The function converts an input image from one color space to another. In case of a transformation to-from RGB color space, the order of the channels should be specified explicitly (RGB or BGR). Note that the default color format in OpenCV is often referred to as RGB but it is actually BGR (the bytes are reversed). So the first byte in a standard (24-bit) color image will be an 8-bit Blue component, the second byte will be Green, and the third byte will be Red. The fourth, fifth, and sixth bytes would then be the second pixel (Blue, then Green, then Red), and so on.

The conventional ranges for R, G, and B channel values are:

0 to 255 for CV_8U images
0 to 65535 for CV_16U images
0 to 1 for CV_32F images

In case of linear transformations, the range does not matter. But in case of a non-linear transformation, an input RGB image should be normalized to the proper value range to get the correct results, for example, for RGB $\rightarrow$ L*u*v* transformation. For example, if you have a 32-bit floating-point image directly converted from an 8-bit image without any scaling, then it will have the 0..255 value range instead of 0..1 assumed by the function. So, before calling cvtColor , you need first to scale the image down:

 
      img *= 1./255;
cvtColor(img, img, COLOR_BGR2Luv);

If you use cvtColor with 8-bit images, the conversion will have some information lost. For many applications, this will not be noticeable but it is recommended to use 32-bit images in applications that need the full range of colors or that convert an image before an operation and then convert back.

If conversion adds the alpha channel, its value will set to the maximum of corresponding channel range: 255 for CV_8U, 65535 for CV_16U, 1 for CV_32F.

The function can do the following transformations:

RGB $\leftrightarrow$ GRAY ( COLOR_BGR2GRAY, COLOR_RGB2GRAY, COLOR_GRAY2BGR, COLOR_GRAY2RGB ) Transformations within RGB space like adding/removing the alpha channel, reversing the channel order, conversion to/from 16-bit RGB color (R5:G6:B5 or R5:G5:B5), as well as conversion to/from grayscale using:

$\text{RGB[A] to Gray:} \quad Y \leftarrow 0.299 \cdot R + 0.587 \cdot G + 0.114 \cdot B$

and

$\text{Gray to RGB[A]:} \quad R \leftarrow Y, G \leftarrow Y, B \leftarrow Y, A \leftarrow \max (ChannelRange)$

The conversion from a RGB image to gray is done with:
```
cvtColor(src, bwsrc, COLOR_RGB2GRAY);
```
More advanced channel reordering can also be done with mixChannels() .
RGB $\leftrightarrow$ CIE XYZ.Rec 709 with D65 white point ( COLOR_BGR2XYZ, COLOR_RGB2XYZ, COLOR_XYZ2BGR, COLOR_XYZ2RGB):

$\begin{bmatrix} X \\ Y \\ Z \end{bmatrix} \leftarrow \begin{bmatrix} 0.412453 & 0.357580 & 0.180423 \\ 0.212671 & 0.715160 & 0.072169 \\ 0.019334 & 0.119193 & 0.950227 \end{bmatrix} \cdot \begin{bmatrix} R \\ G \\ B \end{bmatrix}$

$\begin{bmatrix} R \\ G \\ B \end{bmatrix} \leftarrow \begin{bmatrix} 3.240479 & -1.53715 & -0.498535 \\ -0.969256 & 1.875991 & 0.041556 \\ 0.055648 & -0.204043 & 1.057311 \end{bmatrix} \cdot \begin{bmatrix} X \\ Y \\ Z \end{bmatrix}$

$X$ , $Y$ and $Z$ cover the whole value range (in case of floating-point images, $Z$ may exceed 1).
RGB $\leftrightarrow$ YCrCb JPEG (or YCC) ( COLOR_BGR2YCrCb, COLOR_RGB2YCrCb, COLOR_YCrCb2BGR, COLOR_YCrCb2RGB )

$Y \leftarrow 0.299 \cdot R + 0.587 \cdot G + 0.114 \cdot B$

$Cr \leftarrow (R-Y) \cdot 0.713 + delta$

$Cb \leftarrow (B-Y) \cdot 0.564 + delta$

$R \leftarrow Y + 1.403 \cdot (Cr - delta)$

$G \leftarrow Y - 0.714 \cdot (Cr - delta) - 0.344 \cdot (Cb - delta)$

$B \leftarrow Y + 1.773 \cdot (Cb - delta)$

where

$delta = \left \{ \begin{array}{l l} 128 & \mbox{for 8-bit images} \\ 32768 & \mbox{for 16-bit images} \\ 0.5 & \mbox{for floating-point images} \end{array} \right .$

Y, Cr, and Cb cover the whole value range.
RGB $\leftrightarrow$ HSV ( COLOR_BGR2HSV, COLOR_RGB2HSV, COLOR_HSV2BGR, COLOR_HSV2RGB )

In case of 8-bit and 16-bit images, R, G, and B are converted to the floating-point format and scaled to fit the 0 to 1 range.

$V \leftarrow max(R,G,B)$

$S \leftarrow \fork{\frac{V-min(R,G,B)}{V}}{if $V \neq 0$}{0}{otherwise}$

$H \leftarrow \forkthree{{60(G - B)}/{(V-min(R,G,B))}}{if $V=R$}{{120+60(B - R)}/{(V-min(R,G,B))}}{if $V=G$}{{240+60(R - G)}/{(V-min(R,G,B))}}{if $V=B$}$

If $H<0$ then $H \leftarrow H+360$ . On output $0 \leq V \leq 1$ , $0 \leq S \leq 1$ , $0 \leq H \leq 360$ .

The values are then converted to the destination data type:
- 8-bit images
  
  $V \leftarrow 255 V, S \leftarrow 255 S, H \leftarrow H/2 \text{(to fit to 0 to 255)}$
- 16-bit images (currently not supported)
  
  $V <- 65535 V, S <- 65535 S, H <- H$
- 32-bit images
  
  H, S, and V are left as is
RGB $\leftrightarrow$ HLS ( COLOR_BGR2HLS, COLOR_RGB2HLS, COLOR_HLS2BGR, COLOR_HLS2RGB ).

In case of 8-bit and 16-bit images, R, G, and B are converted to the floating-point format and scaled to fit the 0 to 1 range.

$V_{max} \leftarrow {max}(R,G,B)$

$V_{min} \leftarrow {min}(R,G,B)$

$L \leftarrow \frac{V_{max} + V_{min}}{2}$

$S \leftarrow \fork { \frac{V_{max} - V_{min}}{V_{max} + V_{min}} }{if $L < 0.5$ } { \frac{V_{max} - V_{min}}{2 - (V_{max} + V_{min})} }{if $L \ge 0.5$ }$

$H \leftarrow \forkthree {{60(G - B)}/{S}}{if $V_{max}=R$ } {{120+60(B - R)}/{S}}{if $V_{max}=G$ } {{240+60(R - G)}/{S}}{if $V_{max}=B$ }$

If $H<0$ then $H \leftarrow H+360$ . On output $0 \leq L \leq 1$ , $0 \leq S \leq 1$ , $0 \leq H \leq 360$ .

The values are then converted to the destination data type:
- 8-bit images
  
  $V \leftarrow 255 \cdot V, S \leftarrow 255 \cdot S, H \leftarrow H/2 \; \text{(to fit to 0 to 255)}$
- 16-bit images (currently not supported)
  
  $V <- 65535 \cdot V, S <- 65535 \cdot S, H <- H$
- 32-bit images
  
  H, S, V are left as is
RGB $\leftrightarrow$ CIE L*a*b* ( COLOR_BGR2Lab, COLOR_RGB2Lab, COLOR_Lab2BGR, COLOR_Lab2RGB ).

In case of 8-bit and 16-bit images, R, G, and B are converted to the floating-point format and scaled to fit the 0 to 1 range.

$\vecthree{X}{Y}{Z} \leftarrow \vecthreethree{0.412453}{0.357580}{0.180423}{0.212671}{0.715160}{0.072169}{0.019334}{0.119193}{0.950227} \cdot \vecthree{R}{G}{B}$

$X \leftarrow X/X_n, \text{where} X_n = 0.950456$

$Z \leftarrow Z/Z_n, \text{where} Z_n = 1.088754$

$L \leftarrow \fork{116*Y^{1/3}-16}{for $Y>0.008856$}{903.3*Y}{for $Y \le 0.008856$}$

$a \leftarrow 500 (f(X)-f(Y)) + delta$

$b \leftarrow 200 (f(Y)-f(Z)) + delta$

where

$f(t)= \fork{t^{1/3}}{for $t>0.008856$}{7.787 t+16/116}{for $t\leq 0.008856$}$

and

$delta = \fork{128}{for 8-bit images}{0}{for floating-point images}$

This outputs $0 \leq L \leq 100$ , $-127 \leq a \leq 127$ , $-127 \leq b \leq 127$ . The values are then converted to the destination data type:
- 8-bit images
  
  $L \leftarrow L*255/100, \; a \leftarrow a + 128, \; b \leftarrow b + 128$
- 16-bit images
  
  (currently not supported)
- 32-bit images
  
  L, a, and b are left as is
RGB $\leftrightarrow$ CIE L*u*v* ( COLOR_BGR2Luv, COLOR_RGB2Luv, COLOR_Luv2BGR, COLOR_Luv2RGB ).

In case of 8-bit and 16-bit images, R, G, and B are converted to the floating-point format and scaled to fit 0 to 1 range.

$\vecthree{X}{Y}{Z} \leftarrow \vecthreethree{0.412453}{0.357580}{0.180423}{0.212671}{0.715160}{0.072169}{0.019334}{0.119193}{0.950227} \cdot \vecthree{R}{G}{B}$

$L \leftarrow \fork{116 Y^{1/3}}{for $Y>0.008856$}{903.3 Y}{for $Y\leq 0.008856$}$

$u' \leftarrow 4*X/(X + 15*Y + 3 Z)$

$v' \leftarrow 9*Y/(X + 15*Y + 3 Z)$

$u \leftarrow 13*L*(u' - u_n) \quad \text{where} \quad u_n=0.19793943$

$v \leftarrow 13*L*(v' - v_n) \quad \text{where} \quad v_n=0.46831096$

This outputs $0 \leq L \leq 100$ , $-134 \leq u \leq 220$ , $-140 \leq v \leq 122$ .

The values are then converted to the destination data type:
- 8-bit images
  
  $L \leftarrow 255/100 L, \; u \leftarrow 255/354 (u + 134), \; v \leftarrow 255/262 (v + 140)$
- 16-bit images
  
  (currently not supported)
- 32-bit images
  
  L, u, and v are left as is
The above formulae for converting RGB to/from various color spaces have been taken from multiple sources on the web, primarily from the Charles Poynton site http://www.poynton.com/ColorFAQ.html
Bayer $\rightarrow$ RGB ( COLOR_BayerBG2BGR, COLOR_BayerGB2BGR, COLOR_BayerRG2BGR, COLOR_BayerGR2BGR,COLOR_BayerBG2RGB, COLOR_BayerGB2RGB, COLOR_BayerRG2RGB, COLOR_BayerGR2RGB ). The Bayer pattern is widely used in CCD and CMOS cameras. It enables you to get color pictures from a single plane where R,G, and B pixels (sensors of a particular component) are interleaved as follows:

The output RGB components of a pixel are interpolated from 1, 2, or 4 neighbors of the pixel having the same color. There are several modifications of the above pattern that can be achieved by shifting the pattern one pixel left and/or one pixel up. The two letters $C_1$ and $C_2$ in the conversion constants CV_Bayer $C_1 C_2$ 2BGR and CV_Bayer $C_1 C_2$ 2RGB indicate the particular pattern type. These are components from the second row, second and third columns, respectively. For example, the above pattern has a very popular “BG” type.

代码示范：

 
   #include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/highgui/highgui.hpp"

using namespace cv;

/// 全局变量定义及赋值

int threshold_value = 0;
int threshold_type = 0;
int const max_value = 255;
int const max_type = 4;
int const max_BINARY_value = 255;

Mat src, src_gray, dst;
char* window_name = "Threshold Demo";

char* trackbar_type = "Type: \n 0: Binary \n 1: Binary Inverted \n 2: Truncate \n 3: To Zero \n 4: To Zero Inverted";
char* trackbar_value = "Value";

/// 自定义函数声明
void Threshold_Demo( int, void* );

/**
 * @主函数
 */
int main( int argc, char** argv )
{
  /// 读取一副图片，不改变图片本身的颜色类型（该读取方式为DOS运行模式）
  src = imread( argv[1], 1 );

  /// 将图片转换成灰度图片
  cvtColor( src, src_gray, CV_RGB2GRAY );

  /// 创建一个窗口显示图片
  namedWindow( window_name, CV_WINDOW_AUTOSIZE );

  /// 创建滑动条来控制阈值
  createTrackbar( trackbar_type,
                  window_name, &threshold_type,
                  max_type, Threshold_Demo );

  createTrackbar( trackbar_value,
                  window_name, &threshold_value,
                  max_value, Threshold_Demo );

  /// 初始化自定义的阈值函数
  Threshold_Demo( 0, 0 );

  /// 等待用户按键。如果是ESC健则退出等待过程。
  while(true)
  {
    int c;
    c = waitKey( 20 );
    if( (char)c == 27 )
      { break; }
   }

}


/**
 * @自定义的阈值函数
 */
void Threshold_Demo( int, void* )
{
  /* 0: 二进制阈值
     1: 反二进制阈值
     2: 截断阈值
     3: 0阈值
     4: 反0阈值
   */

  threshold( src_gray, dst, threshold_value, max_BINARY_value,threshold_type );

  imshow( window_name, dst );
}
 
  

解释：

先看一下整个程序的结构：
- 先读取一副图片，如果是图片颜色类型是RGB3色类型，则转换成灰度类型的图像。转换颜色类型可以运用OpenCV中的 cvtColor<> 函数。
```
src = imread( argv[1], 1 );

/// 颜色类型从RGB 转换成灰度
cvtColor( src, src_gray, CV_RGB2GRAY );
```
- 然后创建一个窗口来显示该图片可以检验转换结果
```
namedWindow( window_name, CV_WINDOW_AUTOSIZE );
```
- 接着该程序创建两个滚动条来等待用户的输入：
  - 第一个滚动条作用：选择阈值类型：二进制，反二进制，截断，0，反0。
  - 第二个滚动条作用：选择阈值的大小。
```
createTrackbar( trackbar_type,
             window_name, &threshold_type,
             max_type, Threshold_Demo );

createTrackbar( trackbar_value,
             window_name, &threshold_value,
             max_value, Threshold_Demo );
```
- 在这里等到用户拖动滚动条来输入阈值类型以及阈值的大小，或者是用户键入ESC健退出程序。
- 无论何时拖动滚动条，用户自定义的阈值函数都将会被调用。
```
/**
 * @自定义的阈值函数
 */
void Threshold_Demo( int, void* )
{
  /* 0: 二进制阈值
     1: 反二进制阈值
     2: 截断阈值
     3: 0阈值
     4: 反0阈值
   */

  threshold( src_gray, dst, threshold_value, max_BINARY_value,threshold_type );

  imshow( window_name, dst );
}
```
  就像你看到的那样，在这样的过程中，函数 threshold<> 会接受到5个参数：
  - src_gray: 输入的灰度图像的地址。
  - dst: 输出图像的地址。
  - threshold_value: 进行阈值操作时阈值的大小。
  - max_BINARY_value: 设定的最大灰度值（该参数运用在二进制与反二进制阈值操作中）。
  - threshold_type: 阈值的类型。从上面提到的5种中选择出的结果。

结果：

程序编译过后，从正确的路径中读取一张图片。例如，该输入图片如下所示：
首先，阈值类型选择为反二进制阈值类型。我们希望灰度值大于阈值的变暗，即这一部分像素的灰度值设定为0。从下图中可以很清楚的看到这样的变化。（在原图中，狗的嘴和眼睛部分比图像中的其他部分要亮，在结果图中可以看到由于反二进制阈值分割，这两部分变的比其他图像的都要暗。原理具体参见本节中反二进制阈值部分解释）
现在，阈值的类型选择为0阈值。在这种情况下，我们希望那些在图像中最黑的像素点彻底的变成黑色，而其他大于阈值的像素保持原来的面貌。其结果如下图所示：