数据科学与大数据分析学习笔记-10图像分析

图像分析是一个展开来说很大的部分,在这里仅整理了考试相关的理论部分。

Image analysis
– Refers to the representation, processing, and modelling of visual data to derive useful insights.
– Suffers from the semantic gap.
– Visual data (image, video, …) is unstructured.

Pixel colors can be:
– Full color
– Grayscale
– Black/white

• A richer color scheme requires more storage and retains more information.

举个例子: To recognize a person in a picture it is sufficient to use grayscale colors.
B/W color cannot recognizing a person, B/W color scheme may be suitable for tasks such as OCR.
– Too high of a resolution or too rich of a color scheme increases storage requirements, processing time.
– Too low of a resolution or too restricted a color scheme increases difficulty of an image analytics task and may result in having not enough information to solve the task. 

所以我们要一定选择最合适的方法。

Image Analysis Steps

• Collection and labelling
– Collect representative images from a given task and label the ground truth.

• Image representation
– Select and/or design appropriate image representations (invariant and discriminative).

• Image analysis techniques
– Apply and/or design appropriate analysis techniques for the given tasks (classification, detection, tracking, segmentation, etc.)

Bag-of-Visual-Words Model

在这里插入图片描述
从图像中检测Interest point detection通过K-means clustering–>Generated “Visual Words”–>From an image to a histogram–>Classifying images.
在这里插入图片描述
Procedure of the BoVW model based Image Recognition
Image dataset–>Local feature extraction–>Visual word creation–>Histogram generation–>Image Recognition.

Deep Learning Model

• Image Recognition
– Faces, objects, poses, scenes, …

• Video content analysis
– Action, activities, events, summarization, …

• Visual information management
– Search, retrieval, indexing, browsing, …

• Potential Outcome: AI
– Computers can see and understand visual information
– Robotics, self-driving cars, surveillance

Neural Networks

It has been proven:
– Three layers are enough (if neurons are linear)
– Two layers are enough (if neurons are non-linear).

Reasons why deep CNNs are better:

  1. The number of trainable parameters of previous NNs becomes extremely large.
  2. Previous NNs are sensitive to image distortions (shift, scale, …).
  3. Previous NN completely ignore the topology of the input data.

Deep Learning Model

• Inspired by the way that human brain processes information.
• Many layers of non-linear processing stages.
• Designed to implicitly extract relevant features
• CNN is a feed-forward network that can extract topological properties from an image.
• Like almost every other neural networks they are trained with a version of the back-propagation algorithm.
• Convolutional Neural Networks are designed to recognize visual patterns directly from pixel images with minimal pre-processing.
• They can recognize patterns with extreme variability (such as handwritten characters).

Convolutional Neural Networks (CNNs)

• A special multi-stage architecture inspired by visual system– Higher stages compute more global, more invariant features.
在这里插入图片描述
Feature extraction layer Convolution layer
Shift and distortion invariance or Subsampling layer

Disadvantages:
• From a memory and capacity standpoint the CNN is not much bigger than a regular two layer network.
• Convolution operations are computationally expensive and take up about 67% of the time.
– CNN’s are about 3X slower than their fully connected NN (same-size)
• Small kernel size make the inner loops inefficient (frequent JMP)
• Cache unfriendly memory access

Image Representation: From SIFT to CNNs

• Three main approaches
– Directly use pre-trained CNNs models
• to extract image feature representations

– Fine-tune pre-trained CNNs models
• with the images from recognition tasks

– BoVW model based on CNN features
• “Deep SIFT”

参考书目

  1. Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data, EMC Education Services, John Wiley & Sons, 27 Jan. 2015

  2. Data Mining: The Textbook by Charu C. Aggarwal, Springer 2015

  3. C.M. Christopher, P. Raghavan and H. Schutze. Introduction to Information Retrieval, Cambridge University Press. 20084.

  4. Computer Vision: A Modern Approach (2nd Edition), by David A. Forsyth and Jean Ponce, Pearson, 2011.

图片来自课件和个人的整理。
中文图片来自网络。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值