Computer Vision and Pattern Recognision Review

最新推荐文章于 2024-10-16 10:27:58 发布

parry64

最新推荐文章于 2024-10-16 10:27:58 发布

阅读量892

点赞数 22

文章标签：计算机视觉前端人工智能

本文链接：https://blog.csdn.net/parry64/article/details/135205441

版权

CVPR Review

Image Processing

在这里插入图片描述
Find 3D edges.
find 3D edges and planes

在这里插入图片描述

convolution 将 kernel 中心对称， inverted left-right and up-down
cross-correlation 不用

convolution can be changed to a matrix multiplication

在这里插入图片描述

IDFT - 2D

Box filter blur

在这里插入图片描述

在这里插入图片描述

近看highpass，远看lowpass

在这里插入图片描述

Box filters are simple and fast but may result in blocky effects.
Mean filters preserve edges better but can cause blurring.
Gaussian filters are commonly used for smoothing and noise reduction, offering a more natural blur with preserved image details。

Different scales / size of filter? Padding if necessary. Why different scales and how content affect the result？
Ans: extraction of features at different levels of detail. local features, fine-grained details, such as edges and textures or larger, more global features like shapes and objects.
padding: the output feature maps have the same spatial dimensions. Without padding, a loss of important information at the borders of the image.
Separability property of a filter / convolution? 2d conv->2*1d conv
how can be separated? Step by step
Ans:
Input image: W*H
Kernel: K*K
stride: S*S
Outputimage: [(W - K) / S + 1 ] * [(H - K) / S + 1]
Same as steps:
1. kernel (1,K) and stride (1,S), get (W, [(H - K) / S + 1])
2. kernel (K,1) and stride (S,1), get same result
What is Fourier Transform? What is the usage? How to calculate in 1D? 2D?
Why it is important? 1d equation and 2d equation, no calculation
Ans: decompose a complex signal into its constituent frequencies. frequency domain rather than the time domain.
The Fourier Transform is important because it allows us to analyze complex signals and understand their frequency content. It helps in filtering out noise, extracting meaningful information, compressing data, and understanding the behavior of signals in different domains. In image processing, the Fourier Transform is used for tasks such as image enhancement, denoising, compression, and pattern recognition.
How to work on a kernel approximating a 1st, 2nd derivative?
Gradient operators. estimate the local gradient. rate of change of the function at each point. Steps: 1. Choose a suitable kernel: Sobel operator and Prewitt operator. 2. conv the kernal to image or signal. 3. The result of the convolution operation is an approximation of the local gradient. For the first derivative, the result will be a vector showing the gradient in both the x and y directions. For the second derivative, the result will be a scalar representing the magnitude of the second derivative. 4. kernel size and high sampling rate increase accuracy.
Convolution in image domain is equivalent to multiplication in frequency domain. Why? Verify?
Ans: convolution in the spatial domain corresponds to multiplication in the frequency domain due to the properties of the Fourier transform.

Steerable Pyramid: It is an extension of the Laplacian pyramid that allows for multi-directional decomposition. It uses a set of steerable filters to compute the image representation at each level.

What is histogram match? How? And applications?
matches a specified reference histogram. redistributing the pixel values of an image to achieve a desired histogram shape. cumulative distribution function. Normalize the CDF. find the closest CDF value in the reference histogram and replace the pixel value with the corresponding intensity value. Image enhancement; Image registration; Color transfer; Image recognition;
Non-maximum suppression? How? Applications?
object detection algorithms to eliminate multiple overlapping bounding boxes and only keep the one with the highest confidence score, which represents the most probable location and size of the object. 1. Sorting the detections 2. Selecting the highest-scoring detection 3. Calculating overlap (LoU) 4. Removing overlapping detections 5. Iterating through the remaining detections. Apps: Object, Text, Face, Edge detection.

Neural Network

Cross-entropy and its usage?
Loss functions. Corss entropy is the difference of two probability distribution.
-\sum(p_i*log(q_i))Use the cross entropy loss function can help the predicted distribution near the real distribution, continuous differentiable and convet.

The transformer model utilizes self-attention mechanisms to capture the relationships between different words (or tokens) in a sentence. This attention mechanism allows the model to focus on relevant words while processing the input, enabling it to capture long-range dependencies effectively. Unlike traditional recurrent or convolutional neural networks, transformers do not have sequential or local dependencies, making them highly parallelizable and efficient.

One of the main advantages of transformers in computer vision is their ability to capture global contextual information effectively. They can model interactions between all image regions simultaneously, enabling the model to understand the relationships between objects and their context in a scene.

Transformers at capturing long-range dependencies and modeling complex relationships in visual data. This makes them well-suited for tasks that require understanding the context and semantic relationships between objects in images. Additionally, transformers have shown great potential in tasks such as image captioning, visual question answering, and image generation, where understanding and generating coherent and contextually relevant output is crucial.

Geometry

在这里插入图片描述
camera to image coordinates, use left matrix

Focal Length as Function of FOV

world coordinate to camera coordinates

Disparity is used to estimate the depth information of the scene.
Depth map is obtained by using the disparity information. It represents the distance of objects in the scene from the camera. Higher disparity values mean objects are closer to the camera, while lower values indicate objects are farther away.
Auto driving collision detection can utilize the stereo camera setup and depth information. By continuously analyzing the depth map, the system can determine the distance of objects in the scene and detect potential collision risks. Depth-based algorithms and techniques are used to identify obstacles and calculate their proximity to the vehicle, enabling the collision detection system to react and take necessary measures to avoid accidents.
在这里插入图片描述

在这里插入图片描述

The Fundamental Matrix is a 3*3 matrix that encodes epipolar geometry. Given a point in one image, multiplying the fundamental matrix will tell us the epipolar line in the second image. Eight-point algorithm.
*[ F_ij] =0