Image analysis图像分析
图像分析是一个展开来说很大的部分,在这里仅整理了考试相关的理论部分。
Image analysis
– Refers to the representation, processing, and modelling of visual data to derive useful insights.
– Suffers from the semantic gap.
– Visual data (image, video, …) is unstructured.
Pixel colors can be:
– Full color
– Grayscale
– Black/white
• A richer color scheme requires more storage and retains more information.
举个例子: To recognize a person in a picture it is sufficient to use grayscale colors.
B/W color cannot recognizing a person, B/W color scheme may be suitable for tasks such as OCR.
– Too high of a resolution or too rich of a color scheme increases storage requirements, processing time.
– Too low of a resolution or too restricted a color scheme increases difficulty of an image analytics task and may result in having not enough information to solve the task.
所以我们要一定选择最合适的方法。
Image Analysis Steps
• Collection and labelling
– Collect representative images from a given task and label the ground truth.
• Image representation
– Select and/or design appropriate image representations (invariant and discriminative).
• Image analysis techniques
– Apply and/or design appropriate analysis techniques for the given tasks (classification, detection, tracking, segmentation, etc.)
Bag-of-Visual-Words Model
从图像中检测Interest point detection通过K-means clustering–>Generated “Visual Words”–>From an image to a histogram–>Classifying images.
Procedure of the BoVW model based Image Recognition
Image dataset–>Local feature extraction–>Visual word creation–>Histogram generation–>Image Recognition.
Deep Learning Model
• Image Recognition
– Faces, objects, poses, scenes, …
• Video content analysis
– Action, activities, events, summarization, …
• Visual information management
– Search, retrieval, indexing, browsing, …
• Potential Outcome: AI
– Computers can see and understand visual information
– Robotics, self-driving cars, surveillance
Neural Networks
It has been proven:
– Three layers are enough (if neurons are linear)
– Two layers are enough (if neurons are non-linear).
Reasons why deep CNNs are better:
- The number of trainable parameters of previous NNs becomes extremely large.
- Previous NNs are sensitive to image distortions (shift, scale, …).
- Previous NN completely ignore the topology of the input data.
Deep Learning Model
• Inspired by the way that human brain processes information.
• Many layers of non-linear processing stages.
• Designed to implicitly extract relevant features
• CNN is a feed-forward network that can extract topological properties from an image.
• Like almost every other neural networks they are trained with a version of the back-propagation algorithm.
• Convolutional Neural Networks are designed to recognize visual patterns directly from pixel images with minimal pre-processing.
• They can recognize patterns with extreme variability (such as handwritten characters).
Convolutional Neural Networks (CNNs)
• A special multi-stage architecture inspired by visual system– Higher stages compute more global, more invariant features.
Feature extraction layer Convolution layer
Shift and distortion invariance or Subsampling layer
Disadvantages:
• From a memory and capacity standpoint the CNN is not much bigger than a regular two layer network.
• Convolution operations are computationally expensive and take up about 67% of the time.
– CNN’s are about 3X slower than their fully connected NN (same-size)
• Small kernel size make the inner loops inefficient (frequent JMP)
• Cache unfriendly memory access
Image Representation: From SIFT to CNNs
• Three main approaches
– Directly use pre-trained CNNs models
• to extract image feature representations
– Fine-tune pre-trained CNNs models
• with the images from recognition tasks
– BoVW model based on CNN features
• “Deep SIFT”
参考书目
-
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data, EMC Education Services, John Wiley & Sons, 27 Jan. 2015
-
Data Mining: The Textbook by Charu C. Aggarwal, Springer 2015
-
C.M. Christopher, P. Raghavan and H. Schutze. Introduction to Information Retrieval, Cambridge University Press. 20084.
-
Computer Vision: A Modern Approach (2nd Edition), by David A. Forsyth and Jean Ponce, Pearson, 2011.
图片来自课件和个人的整理。
中文图片来自网络。