Computer Vision 计算机视觉常见问题整理

这篇博客整理了计算机视觉(CV)的学习问题,涵盖了机器视觉的定义、应用、与人工智能的关系,以及图像处理、模式识别、深度学习等关键领域。同时,讨论了机器视觉面临的挑战,如图像歧义、环境因素和大量数据处理。此外,还介绍了人类视觉系统、关键技术和颜色成像模型等基础知识。
摘要由CSDN通过智能技术生成

这里是一些学习CV时整理的一些问题集,可能有助于复习等。由于该笔记是很早以前制作的,暂时不做修改。

CV Question

  1. what’s machine vision?

Machine vision (MV) is the technology and methods used to provide imaging-based automatic inspection and analysis for such applications as automatic inspection, process control, and robot guidance in industry.

Input: image, video, Output: inspection and analysis

Goal: give computers super human-level perception

  1. Typical perception channel

Representation -> ‘fancy math’ -> output, and representation and output are the parts we are most interested in.

  1. Common Applications

Automated visual inspection, object recognition, face detection, face makeovers, vision in cars. image stitching, virtual fitting, vr, kinect fusion, 3D reconstruction.

  1. Subject connection

Image processing: digital image processing is the use of a digital computer to process digital images through an algorithm. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing.

Computer Graphics: Computer graphics is the discipline of generating images with the aid of computers.

Pattern Recognition: Pattern recognition is the automated recognition of patterns and regularities in data.

Computer Vision: Computer vision is an interdisciplinary scientific field that deals with how computers can be made to gain high-level understanding from digital images or videos.

Difference between Computer Vision adn Machine Vision: Computer vision refers to automation of the capture and processing of images, with an emphasis on image analysis. In other words, CV’s goal is not only to see, but also to process and provide useful results based on the observation. Machine vision refers to the use of computer vision in industrial environments, making it a subcategory of computer vision.

Artificial intelligence: Computer science defines AI research as the study of “intelligent agents”: any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals.[1] A more elaborate definition characterizes AI as “a system’s ability to correctly interpret external data, to learn from such data, and to use those learnings to achieve specific goals and tasks through flexible adaptation.”

  1. Vision Process
  • Feature extraction and region segmentation. (Low)
  • Modeling and Schema Representation (Midem)
  • Describe and understand (high)
  1. Difficulties faced by Machine Vision
  • Image ambiguity: When 3D sense is projected as a 2D image, the info of the depth and invisible parts is lost. Therefore, 3D objects of different shapes projected on the image plane may produce the same image.
  • Environment Factors: Factors in the scene such as lighting, objects shapes, surface colors, cameras, and the changes in spatial relationships, etc.
  • Knowledge guidance: Under different knowledge guidance, the same image will produce different recognition results.
  • Large amounts of data: Gray image, color image, and depth image have a huge amount of information.The huge amount of data requires a large storage space, and it is not easy to process quickly.
  1. Human Vision System

Physical structure: HVS is composed of optical system, retina, visual pathway.

TODO: I don’t want to learn HVS knowledge first, so I skip it. If I have extra time, the remain knowledge will be made up.

  1. Key tech in Computer Vision System

    1. Image process( Smooth denoising, Standardization, Missing/Outlier Value Process )
    2. Image feature extraction( Shape, Texture, Color, Spatial Relations )
    3. Image Recognition( GoogLeNet, ResNet… )
  2. Image formation

The randomness of the Imaging Process and the complexity of the imaging object determine the nature of the image with a random signal.

An image bascially consists of:

  • Illumination component i ( x , y ) i(x, y) i(x,y)
  • Reflection component r ( x , y ) r(x, y) r(x,y)

So, The 2D function representation of the Image:
f ( x , y ) = i ( x , y ) ∗ r ( x , y ) f(x, y) = i(x, y) * r(x, y) f(x,y)=i(x,y)r(x,y)

  1. Human eye brightness perception range

Total Range: 1 0 − 2 − 1 0 6 10^{-2} -10^6 102106, so Contrast c = B m a x / B m i n = 1 0 8 c = B_{max} / B_{min} = 10^8 c=Bmax/Bmin=108, and the Relative Contrast c r = 100 % × ( B − B 0 ) / B 0 c_r = 100\% \times (B - B_0) / B_0 cr=100%×(BB0)/B0 where B 0 B_0 B0 means background brightness and B B B means the object brightness.

Relationship between subjective brightness S and actual brightness B:
S = K ln ⁡ B + K 0 S = K \ln{B}+ K_0 S=KlnB+K0

  1. Brightness adaptability

Visually sensitive is contrast, not the brightness value itself.

Weber theorem:

If the brightness of an object differs from the surrounding background I I I (their ratio is a function). It is approximately constant within a certain range of brightness, with a constant value of 0.02, which is called the Weber ratio.
Δ I I = 0.02 \frac{\Delta I}{I} = 0.02 IΔI=0.02
Mach Effect: The visual system is less sensitive to spatial high and low frequencies, while it is more sensitive to spatial intermediate frequencies.Therefore, a brightness overshoot occurs at a sudden change in brightness. This overshoot can enhance the outline of the scene seen by the human eye.

  1. Color imaging model

Light energy itself is colorless. Color is a physiological and psychological phenomenon that occurs when people’s eyes perceive light.

Lightwave: Light is an electromagnetic wave that radiates according to its wavelength.

Young–Helmholtz theory(trichromatic theory): the three types of cone photoreceptors could be classified as short-preferring (violet), middle-preferring (green), and long-preferring (red).

  1. Color property

Hue: the degree to which a stimulus can be described as similar to or different from stimuli that are described as red, green, blue, and yellow.

Saturation: colorfulness of an area judged in proportion to its brightness.

Intensity: Refers to the degree of light and darkness that the human eye feels due to color stimuli.

Grassman Laws:

First law: Two colored lights appear different if they differ in either dominant wavelength, luminance or purity. Corollary: For every colored light there exists a light with a complementary color such that a mixture of both lights either desaturates the more intense component or gives uncolored (grey/white) light.
Second law: The appearance of a mixture of light made from two components changes if either component changes. Corollary: A mixture of two colored lights that are non-complementary result in a mixture that varies in hue with relative intensities of each light and in saturation according to the distance between the hues of each light.
Third law: There are lights with different spectral power distributions but appear identical. First corollary: such identical appearing lights must have identical effects when added to a mixture of light. Second corollary: such identical appearing lights must have identical effects when subtracted (i.e., filtered) from a mixture of light.
Fourth law: The intensity of a mixture of lights is the sum of the intensities of the components.
  1. Color

The result of interaction between physical light in the environment and our visual system

  1. Color Space
  • Linear color space
  • RGB color space
  • HSV color space
  • CIE XYZ
  1. White Balance

White balance (WB) is the process of removing unrealistic color casts, so that objects which appear white in person are rendered white in your photo.

Color temperature describes the spectrum of light which is radiated from a “blackbody” with that surface temperature.

Von Kries adaptation:

  • Multiply each channel by a gain factor
  • A more general transformation would correspond to an arbitrary 3x3 matrix

Best way: gray card:

  • Take a picture of a neutral object
  • Deduce the weight of each channel

Brightest pixel assumption (non-staurated)

  • Highlights usually have the color of the light source
  • Use weights inversely proportional to the values of the brightest pixels

Gamutmapping

  • Gamut: convex hull of all pixel colors in an image
  • Find the transformation that matches the gamut of the image to the gamut of a “typical” image under white light
  1. Mathematical representation of an image

Optical radiation power of wavelength 𝜆 is received on the imaging target surface of the camera:
I = f ( x , y , λ , t ) I = f(x, y, \lambda, t) I=f(x,y,λ,t)
Common image types:

  • Binary image
  • Grayscale image
  • Index image
  • RGB image
  1. Common concepts

pixel neighborhood: 4-neighborhood( N 4 ( p ) N_{4}(p) N4(p)), 8-neighborhood( N 8 ( p ) N_{8}(p) N8(p));

pixel adjacency ===> pixel connectivity;

Template(filter, mask) + convolution ===> filtering, smoothing, sharpening;

Convolution operation properties:

  • Smoothness: Make the fine structure of each function smooth
  • Diffusivity: Interval expansion, Diffusion of energy distribution

Application of convolution:

  • Deconvolution
  • Remove noise
  • Feature enhancement
  1. Pixels distance

Distance measurement function characteristics:

Common distance metric functions:

  • Euclidean Distance: D E ( p , q ) = [ ( x − s ) 2 + ( y − t ) 2 ] 1 2 D_{E}(p, q) = [(x - s)^2 + (y - t)^2]^{\frac{1}{2}} DE(p,q)=[(xs)2+(yt)2]21
  • City-block Distance: D 4 ( p , q ) = ∣ x − s ∣ + ∣ y − t ∣ D_{4}(p, q) = |x - s| + |y - t| D4(p,q)=xs+yt
  • Chessboard Distance: D ( p , q ) = m a x ( ∣ x − s ∣ + ∣ y − t ∣ ) D(p, q) = max(|x - s| + |y - t|) D(p,q)=max(xs+yt)

p-norm: $ \norm{x}_p = (\sum_i{|x_i|p}){\frac{1}{p}} $

Frobenius-norm: KaTeX parse error: Undefined control sequence: \norm at position 1: \̲n̲o̲r̲m̲{A}_F = \sqrt{\…

In the image, the L-2 norm constraint does not distinguish between the edge tangent direction and the gradient direction of the image, and does not reflect the difference between the texture area and the flat area.

The edges of the image will be blurred during the image restoration process.

The L-1 norm constraint only diffuses in the tangential direction of the edges, but does not diffuse in the gradient direction. The goal is to keep the edges of the image as close as possible. Will lead to poor noise suppression effect, step patching results.

  1. Statistical characteristics of images
  • Information entropy: H = − ∑ i = 1 k p i log ⁡ 2 p i H = -\sum_{i=1}^k p_i \log_2{p_i} H=i=1kpilog2pi

  • Gray average: f ˉ = ∑ i = 0 M − 1 ∑ j = 0 N − 1 f ( i , j ) M N \bar{f} = \frac{\sum_{i=0}^{M-1} \sum_{j=0}^{N-1} f(i, j)}{MN} fˉ=MNi=0M1

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值