马尔可夫矩阵模型被用于_用于图像上下文理解的马尔可夫模型

最新推荐文章于 2021-07-21 13:32:10 发布

weixin_26704853

最新推荐文章于 2021-07-21 13:32:10 发布

阅读量680

点赞数

文章标签： python 机器学习人工智能计算机视觉深度学习

原文链接：https://medium.com/@yefengxia/markov-model-for-image-context-understanding-f319e6a3aa2f

版权

马尔可夫矩阵模型被用于

对象：对图像上下文的概率分布建模，例如在PASCAL VOC2012数据集上。 (Object: modeling the probability distribution of image context, e.g. on the PASCAL VOC2012 dataset.)

数据预处理 (Data preprocessing)

The Pascal VOC challenge is a very popular dataset for building and evaluating algorithms for image classification, object detection, and segmentation. Image segmentation is a sparse classification for each pixel in an image while image classification usually takes a single object as the focus and tells us what that image is. Object detection is to connect each object with its position in the image, bounding-box is an expected form added to the original image. Image captioning is something that summarizes total image content.

Pascal VOC挑战是一个非常流行的数据集，用于建立和评估图像分类，目标检测和分割算法。图像分割是图像中每个像素的稀疏分类，而图像分类通常将单个对象作为焦点，并告诉我们该图像是什么。对象检测是将每个对象与其在图像中的位置联系起来，包围盒是添加到原始图像中的预期形式。图像标题是对总图像内容进行汇总的内容。

Here we make something mixed by these tasks. Segment labeling, I will call it. It’s the process that we divide the pre-segmented image into m x n equal-size blocks and give a label for each block. The block label depends on the principal segment in a block, which means a segment class with the most area in the block will represent the block. On the assumption that we partition a segment image from Pascal VOC2012 into 3x3 equal blocks, illustrated as follows.

在这里，我们将这些任务混合在一起。段标签，我将其称为。我们将预先分割的图像分成mxn个相等大小的块，并为每个块提供标签的过程。块标签取决于块中的主要段，这意味着块中面积最大的段类将代表该块。假设我们将来自Pascal VOC2012的分段图像划分为3x3个相等的块，如下所示。

Image for post — Left: A segment image. Middle: A segment blocks image with 3x3 blocks. Right: 3x3 label matrix according to the main segment component in each block.

We take the same preprocessing on all segment images on Pascal VOC 2012. The results of all matrices can be saved in a Numpy Array, which we can call up for our model later.

我们对Pascal VOC 2012上的所有片段图像进行相同的预处理。所有矩阵的结果都可以保存在Numpy数组中，稍后我们可以调用该模型。

马尔可夫模型 (Markov model)

“In probability theory, a Markov model is a stochastic model used to model randomly changing systems. It is assumed that future states depend only on the current state, not on the events that occurred before it” by wikipedia.

“以概率论， 马尔可夫模型是一个随机模型用来模拟随机地改变系统。维基百科假设未来状态仅取决于当前状态，而不取决于当前状态。

First-order Markov models as a simplest Markov model interpretation have succeeded in many sequence modeling and in many control tasks.

作为最简单的马尔可夫模型解释的一阶马尔可夫模型已经在许多序列建模和许多控制任务中取得了成功。

A fundamental property of all Markov models is their memorylessness. They satisfy a first-order Markov property if the probability to move a new state St+1 only depends on the current state St, and not on any previous state, where t is the current time. Said otherwise, given the present state, the future and past states are independent. Formally, a stochastic process has the first-order Markov property if the conditional probability distribution of future states of the process (conditional on both past and present values) depends only upon the present state:

所有马尔可夫模型的基本属性是它们的无记忆性。如果移动新状态S t + 1的概率仅取决于当前状态S t而不取决于任何先前状态( t是当前时间)，则它们满足一阶Markov属性。换句话说，给定当前状态，未来和过去状态是独立的。形式上，如果随机过程的未来状态的条件概率分布(既取决于过去值又表示当前值)仅取决于当前状态，则该随机过程具有一阶马尔可夫性质：

P(st+1|s1,s2,…,st)=P(st+1|st)

P(st + 1 | s1，s2，…，st)= P(st + 1 | st)

Based on this equation, we assume that our normal image ist a pixel-level sequential event, the intensity of the next pixel t+1 only depends on the intensity of the current pixel t. However, there are 4 directions in 2 d world, we don’t exactly know the spatial relationship of the next and the current pixel. They could extend from top to bottom, from bottom to top, from right to left, from left to right. For each pixel, there are 4 possible sequences. Therewith we need 4 Markov models for 2 d image problems. Each directional Markov model records the probability relationship for a certain image pixel sequence.

基于此等式，我们假定正常图像发生像素级顺序事件，下一个像素t + 1的强度仅取决于当前像素t的强度。但是，二维世界中有4个方向，我们不完全知道下一个像素与当前像素的空间关系。它们可以从上到下，从下到上，从右到左，从左到右延伸。对于每个像素，有4个可能的序列。因此，我们需要针对2维图像问题的4个马尔可夫模型。每个方向性马尔可夫模型记录特定图像像素序列的概率关系。

In detail, one Markov model records the probability distribution of the right neighbor pixel when the center pixel known. One Markov model records the probability distribution of the left neighbor pixel when the center pixel known. The same way, one records of the top neighbor pixel and the last one records of the bottom neighbor pixel.

详细地，当中心像素已知时，一个马尔可夫模型记录右邻像素的概率分布。一个马尔可夫模型记录了中心像素已知时左邻像素的概率分布。同样，上一个相邻像素的记录和下一个相邻像素的记录。

For visualization of frequency distribution, we can adopt one of Python NLP tools — nltk, in case that we have encoded all classes in Pascal VOC2012 with class number from 1 to 20. The following screenshot is our input data (left) and frequency distribution table of the left-neighbor pixel (right). Besides this one table, there are still 3 frequency distribution tables.

为了可视化频率分布，我们可以采用一种Python NLP工具— nltk，以防我们在Pascal VOC2012中编码了所有类，其类号为1到20。以下屏幕截图是我们的输入数据(左)和频率分布表左邻像素(右)的像素。除此一张表外，还有3个频率分配表。

Markov models provide us with all spatial probability distribution between two neighbor pixels/ blocks. That is the fundamental of the whole image since an image is actually combined with many two-pixels pairs. So far, we have modeled the label matrix probability distribution for pixel-pairs successfully. Furthermore, we can fill out the blanks in an incomplete matrix based on the chain rule. We take advantage of 4 tables of frequency distributions of pixel-pairs and calculate the probability of a class behind another one. The probabilities can be transferred with the chain rule, even the farthest pixel has a little influence on the next unknown pixel. The closer the pixel distance, the greater the impact. The following figures demonstrate how we make a prediction of unknown-block with the full use of known image context. In the left sub-figure, there is an assumpted example where the bottom-right block somehow unlabeled. The middle sub-figure shows all influences on the being predicted block, where the dash-arrow means there are many approaches for its influence while the solid arrow represents the absolute influence approach. For the case in right illustrated, we take an average influence of all the possible approaches because all approaches are already the shortest with the same distance. For the influences from the different pixel, we take a weighted average algorithm, according to the distance.

马尔可夫模型为我们提供了两个相邻像素/块之间的所有空间概率分布。这是整个图像的基础，因为图像实际上是由许多两个两个像素对组成的。到目前为止，我们已经成功地对像素对的标签矩阵概率分布进行了建模。此外，我们可以根据链规则将不完整的矩阵填入空白处。我们利用像素对频率分布的4个表格，并计算出一个类别在另一个类别之后的概率。概率可以通过链式规则进行传递，即使最远的像素对下一个未知像素也几乎没有影响。像素距离越近，影响越大。下图说明了我们如何充分利用已知图像上下文来预测未知块。在左侧的子图中，有一个假设的示例，其中右下角的方框以某种方式未标记。中间子图显示了对正被预测的块的所有影响，其中虚线箭头表示有多种影响方法，而实心箭头表示绝对影响方法。对于右图所示的情况，我们对所有可能的方法进行平均影响，因为在相同距离下，所有方法已经是最短的了。对于不同像素的影响，我们根据距离采用加权平均算法。

测试 (Test)

after we defined all the above processes in python, we can test our model in the test set which contains 200 image label matrices. For each 3x3 test label matrix, we make a blank block from 9 blocks. Based on rest 8 known blocks, we can recover the blank block’s label compared with the truth. In the following screenshot, the number 0 in an “original img” stands for the blank block, we can print all probability distributions with Markov models. We look for the classes with the first and second maximum probability as our predictions, which are compared with ground truth. After 200 times loop, we obtain the conclusion that our approach for one unknown-block within 3x3 Pascal VOC2012 label matrix has reached top-1 ACC with 76.0% and top-2 ACC with 95.0%. It’s absolutely not bad. As a summary, we review the process of segment labeling, which depends on the other truly labeled segment labels that provide contextual information in the image to make predictions on the unknown segments. Nevertheless, there are also shortcomings: the calculation of probability’s chain is a little slow when we enlarge the matrix’s dimension, e.g. 5x5, 10x10. It requires a few minutes to get calculation results over an image.

在python中定义了上述所有过程之后，我们可以在包含200个图像标签矩阵的测试集中测试模型。对于每个3x3测试标签矩阵，我们用9个块组成一个空白块。根据其余的8个已知块，我们可以将空白块的标签与真实情况进行比较。在下面的屏幕截图中，“原始img”中的数字0代表空白块，我们可以使用Markov模型打印所有概率分布。我们寻找具有第一和第二最大概率的类作为我们的预测，并将其与基本事实进行比较。经过200次循环后，我们得出的结论是，我们针对3x3 Pascal VOC2012标签矩阵中一个未知块的方法已达到top-1 ACC(占76.0％)和top-2 ACC(占95.0％)。绝对不错。总而言之，我们回顾了片段标记的过程，该过程取决于其他真正标记的片段标签，这些片段标签在图像中提供上下文信息以对未知片段进行预测。但是，也存在缺点：当我们扩大矩阵的维度(例如5x5、10x10)时，概率链的计算有些慢。需要花费几分钟才能获得图像的计算结果。

LIKE OUR CASE: we can solve the vision problems that occur in the real-world, e.g. blind spots in an image, recovering lost information in scenes.

像我们的案例：我们可以解决现实世界中出现的视觉问题，例如图像中的盲点，恢复场景中丢失的信息。

Thanks for reading my story. The corresponding code will be uploaded later in GitHub. With our approach, we have used Markov model for image context understanding on Pascal VOC2012. If you have more computational resources, you can try to tile segment images into more blocks so that each block (label matrix) represents the image scene better and more precisely.

感谢您阅读我的故事。相应的代码将稍后在GitHub中上传。通过我们的方法，我们已使用Markov模型对Pascal VOC2012上的图像上下文进行了理解。如果您拥有更多的计算资源，则可以尝试将分段图像平铺到更多块中，以便每个块(标签矩阵)更好，更准确地表示图像场景。

翻译自: https://medium.com/@yefengxia/markov-model-for-image-context-understanding-f319e6a3aa2f

马尔可夫矩阵模型被用于

weixin_26704853

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
马尔可夫矩阵模型被用于_用于图像上下文理解的马尔可夫模型

马尔可夫矩阵模型被用于对象：对图像上下文的概率分布建模，例如在PASCAL VOC2012数据集上。 (Object: modeling the probability distribution of image context, e.g. on the PASCAL VOC2012 dataset.) 数据预处理 (Data preprocessing)The Pascal VOC cha...
复制链接

扫一扫