人的大脑是如何识别运动物体

最新推荐文章于 2024-05-23 16:37:40 发布

generalAI

最新推荐文章于 2024-05-23 16:37:40 发布

阅读量3.5k

点赞数 1

分类专栏：人工智能

人工智能专栏收录该内容

102 篇文章 4 订阅

订阅专栏

作者：Owl of Minerva
链接：https://www.zhihu.com/question/26430414/answer/32936529
来源：知乎
著作权归作者所有，转载请联系作者获得授权。

人脑对运动的识别是一个很大的问题，目前来说没有完整解决。仅仅探究认知上人脑的运动检测功能，或许并不能完整的解释人脑对运动识别的高可靠性。还需要知道人眼的凝视(gaze)能力和该能力的实现的神经基础。而这一点，正是目前我们在计算机的运动追踪中很少被考虑到的。

1.视觉通路(Visual Pathway)与视网膜拓扑映射(Retinotopy)
&lt;img src="https://i-blog.csdnimg.cn/blog_migrate/c816c7dd3f525cc3a526742d36dbf325.png" data-rawwidth="607" data-rawheight="805" class="origin_image zh-lightbox-thumb" width="607" data-original="https://pic1.zhimg.com/66cf2676000bfc93b843db3167006b84_r.jpg"&gt;上图是横断面的人脑视觉通路[1]示意图。人眼的所有眼各有鼻侧和颞侧视野，在视网膜(Retina)感知光信号之后，左右两侧的视神经将信号向后传递，在视交叉(Optic Chiasma)将各自的鼻侧和颞侧信号分流并继续向后传递，分流之后左侧神经只传递右侧视觉信号(左眼鼻侧视野和右眼颞侧视野)，右侧神经质传递左侧视觉信号(左眼颞侧视野和右眼鼻侧视野)，分别通过两侧的外侧膝状体(LGN, Lateral Geniculate Nucleus)，继续传递到枕叶的初级视皮层(V1, Primary Visual Cortex)和更高级的视皮层. 易于理解的，视网膜感知的物体，通过该通路，会以一定空间关系映射视皮层上[2], 该关系被称为视网膜拓扑映射(Retinotopy)[3]. 上图是横断面的人脑视觉通路[1]示意图。人眼的所有眼各有鼻侧和颞侧视野，在视网膜(Retina)感知光信号之后，左右两侧的视神经将信号向后传递，在视交叉(Optic Chiasma)将各自的鼻侧和颞侧信号分流并继续向后传递，分流之后左侧神经只传递右侧视觉信号(左眼鼻侧视野和右眼颞侧视野)，右侧神经质传递左侧视觉信号(左眼颞侧视野和右眼鼻侧视野)，分别通过两侧的外侧膝状体(LGN, Lateral Geniculate Nucleus)，继续传递到枕叶的初级视皮层(V1, Primary Visual Cortex)和更高级的视皮层. 易于理解的，视网膜感知的物体，通过该通路，会以一定空间关系映射视皮层上[2], 该关系被称为视网膜拓扑映射(Retinotopy)[3].
&lt;img src="https://i-blog.csdnimg.cn/blog_migrate/642379ab50e7386ed33cb9e40c85d143.jpeg" data-rawwidth="1280" data-rawheight="1138" class="origin_image zh-lightbox-thumb" width="1280" data-original="https://pic1.zhimg.com/a3388f6230130011131f4d2dc4f35d40_r.jpg"&gt; &lt;img src="https://i-blog.csdnimg.cn/blog_migrate/0148ce550f786e5aae1e853405c47633.jpeg" data-rawwidth="1280" data-rawheight="1075" class="origin_image zh-lightbox-thumb" width="1280" data-original="https://pic4.zhimg.com/c5aad600acac26e18fc952fdbdf82863_r.jpg"&gt;基于该拓扑关系，人们甚至可以通过搜集视皮层的fMRI信号来重建人眼看到的文字/图象，即所谓Mind Reading[4-7]. 基于该拓扑关系，人们甚至可以通过搜集视皮层的fMRI信号来重建人眼看到的文字/图象，即所谓Mind Reading[4-7].
&lt;img src="https://i-blog.csdnimg.cn/blog_migrate/dd36b72bbdd8e38860a4b31f406ea06b.jpeg" data-rawwidth="700" data-rawheight="182" class="origin_image zh-lightbox-thumb" width="700" data-original="https://pic4.zhimg.com/3ecd97c8006e82ea4c3ac9341495f43b_r.jpg"&gt; 2.视觉皮层的分级结构和并行结构
视觉信号传递到V1之后，会继续向更高级的视觉皮层传递，最早在根据猕猴(Macaque Monkey)视皮层的解剖结果，人们重建了其各个视皮层之间的分级结构[7]：
&lt;img src="https://i-blog.csdnimg.cn/blog_migrate/106a835b6fcd4d7452c2b94de78caf7b.png" data-rawwidth="613" data-rawheight="613" class="origin_image zh-lightbox-thumb" width="613" data-original="https://pic2.zhimg.com/7753928b3a6f8be1ca453fc91791608d_r.jpg"&gt;以此为基础，逐渐理解了人的视皮层分级结构：以此为基础，逐渐理解了人的视皮层分级结构：
&lt;img src="https://i-blog.csdnimg.cn/blog_migrate/cfa8b976348e085dbaac96a9793a2acd.png" data-rawwidth="812" data-rawheight="1519" class="origin_image zh-lightbox-thumb" width="812" data-original="https://pic1.zhimg.com/056cdc0ea907ebe1d19ca3b777528cb4_r.jpg"&gt;
从初级到更高级的视觉皮层，视觉信息逐级传递。人脑理解的内容越来越复杂化、抽象化，由“模式”变成具体的"物"，再到物的特性和物与物之间的关系。在逐级传到过程中，人们也注意到，其在皮层的传到可以大体分成两个通路，腹侧通路(Ventral Pathway/Stream)和背侧通路(Dorsal Pathway/Stream)[8]。
&lt;img src="https://i-blog.csdnimg.cn/blog_migrate/43f75176d45540215bfa41fd59a1eb33.png" data-rawwidth="823" data-rawheight="543" class="origin_image zh-lightbox-thumb" width="823" data-original="https://pic2.zhimg.com/8241b1f7b398ea6192b7081d64421ad1_r.jpg"&gt;分别又按照功能，被称作“What”和“Where”通路：分别又按照功能，被称作“What”和“Where”通路：
&lt;img src="https://i-blog.csdnimg.cn/blog_migrate/2378efc1625fc75bd719dbcb96c6bb11.png" data-rawwidth="929" data-rawheight="449" class="origin_image zh-lightbox-thumb" width="929" data-original="https://pic1.zhimg.com/d54b3afb7fac698dd7746d3cfda7d508_r.jpg"&gt;其中其中 “Where”跟物体的位置和运动处理有关，“When”跟物体的识别有关[9]。但是该模型仍广受批评。
&lt;img src="https://i-blog.csdnimg.cn/blog_migrate/03046ab4b51df75a9c4d63cacf570c67.png" data-rawwidth="361" data-rawheight="277" class="content_image" width="361"&gt;

3.识别与跟踪
通过以上对分级结构的了解可以理解，在分级结构中任何一个区域的功能受损，都会影响人脑对运动的识别和跟踪，各种研究也证实了这一点[10-14]。人对运动的感知和理解，并不是局限在某一个脑区或者几个脑区，而是全脑参与的活动[15]。人脑在运动处理上，除了有被动的运动感知，还有主动的运动追踪，即凝视（共轭凝视，conjugate gaze）能力。

人有四个系统处理共轭凝视：
扫视系统(saccadic system)，最常调用的系统，在人主动的调整凝视方向时使用[16]；
追踪系统(pursuit system)，追踪运动物体；
视动反射系统(Optokinetic reflex system)，物体出现在视野，眼睛会追踪，当物体消失，人眼会反射性的回到凝视物体首次出现的位置；
前庭-眼反射系统(Vestibulo–ocular reflex system)，协调头部的运动以获得稳定的图像，是眼睛的“三轴稳定平台”。

因而人眼就像一台高灵敏度、快速聚焦、快速响应、三轴稳定的摄像机，目前人类还造不出在对焦能力上足以和人眼比肩的摄像机，更别说做到跟眼睛同等体积了。人脑在运动识别及追踪处理上，目前区别于计算机运动识别追踪的重要一点是， 人脑的识别追踪是实时的，并能反馈控制眼睛的主动追逐。在该机制下，通过“where”和“what”通路处理的信息，会传递到额叶眼领域(FEF, Frontal eye fields)等眼动控制中心，作出眼球运动响应。该机制因此有如下优点：

能永远让目标处在清晰度最高的视野中间和聚焦点
一次识别和追踪失败，可以回来再看
根据物体之间的相对运动区分物体
能跟记忆关联，帮助物体识别和追踪
运动预判

在目前没有能力如此强大的摄像机机的背景下，目前做的计算机运动识别追踪主要是基于被动影像的处理，即不会根据处理结果调整聚焦和镜头方向。在该条件下，对追逐算法引入学习能力，仍然可以大幅提高追踪速度和准确性，如Tracking-Learning-Detection (TLD) [17]方法：
&lt;img src="https://i-blog.csdnimg.cn/blog_migrate/ce283b0e3967d1598bafb38699c4292e.png" data-rawwidth="575" data-rawheight="288" class="origin_image zh-lightbox-thumb" width="575" data-original="https://pic3.zhimg.com/9ed84428a5e1ab0f9916906bb1d2c9d2_r.jpg"&gt;在图象理解方面，人脑的图像理解既有 bottom-up机制，又有top-down机制，两种相互助益。在图象理解方面，人脑的图像理解既有 bottom-up机制，又有top-down机制，两种相互助益。
在图象模态方面，可以多模态结合，弥补摄像头没有回看能力的缺陷。
在计算方面，分布式计算以提高实时性。
如果能将计算结果反馈给摄像头控制，可能大有帮助。

以上
--------
[1] Standring, Susan. "Gray’s anatomy." The anatomical basis of clinical practice39 (2008).
[2] Tootell R B H, Hadjikhani N K, Vanduffel W, et al. Functional analysis of primary visual cortex (V1) in humans[J]. Proceedings of the National Academy of Sciences, 1998, 95(3): 811-817.
[3] Engel S A, Glover G H, Wandell B A. Retinotopic organization in human visual cortex and the spatial precision of functional MRI[J]. Cerebral cortex, 1997, 7(2): 181-192.
[4] Miyawaki, Yoichi, et al. "Visual image reconstruction from human brain activity using a combination of multiscale local image decoders." Neuron 60.5 (2008): 915-929.
[5] Kay, Kendrick N., and Jack L. Gallant. "I can see what you see." Nature neuroscience 12.3 (2009): 245-245.
[6] Stanley, Garrett B. "Reading and writing the neural code." Nature neuroscience16.3 (2013): 259-263.
[7] Van Essen, David C., and John HR Maunsell. "Hierarchical organization and functional streams in the visual cortex." Trends in neurosciences 6 (1983): 370-375.
[8] Kandel, Eric R., James H. Schwartz, and Thomas M. Jessell, eds. Principles of neural science. Vol. 4. New York: McGraw-Hill, 2000.
[9] Ungerleider, Leslie G., and James V. Haxby. "‘What’and ‘where’in the human brain." Current opinion in neurobiology 4.2 (1994): 157-165.
[10] Grossman, Emily, et al. "Brain areas involved in perception of biological motion." Journal of cognitive neuroscience 12.5 (2000): 711-720.
[11] Vaina, Lucia M., et al. "Functional neuroanatomy of biological motion perception in humans." Proceedings of the National Academy of Sciences 98.20 (2001): 11656-11661.
[12] Grossman, Emily D., and Randolph Blake. "Brain areas active during visual perception of biological motion." Neuron 35.6 (2002): 1167-1175.
[13] Grezes, Julie, et al. "Does perception of biological motion rely on specific brain regions?." Neuroimage 13.5 (2001): 775-785.
[14] Saygin, Ayse Pinar. "Superior temporal and premotor brain areas necessary for biological motion perception." Brain 130.9 (2007): 2452-2461.
[15] Rokszin, Alice, et al. "Visual pathways serving motion detection in the mammalian brain." Sensors 10.4 (2010): 3218-3242.
[16] Robinson, D. A. "The mechanics of human saccadic eye movement." The Journal of physiology174.2 (1964): 245-264.
[17] Kalal, Zdenek, Krystian Mikolajczyk, and Jiri Matas. "Tracking-learning-detection." Pattern Analysis and Machine Intelligence, IEEE Transactions on34.7 (2012): 1409-1422.

generalAI

关注

1
点赞
踩
11

收藏

觉得还不错? 一键收藏
0
评论
人的大脑是如何识别运动物体

作者：Owl of Minerva链接：https://www.zhihu.com/question/26430414/answer/32936529来源：知乎著作权归作者所有，转载请联系作者获得授权。人脑对运动的识别是一个很大的问题，目前来说没有完整解决。仅仅探究认知上人脑的运动检测功能，或许并不能完整的解释人脑对运动识别的高可靠性。还需要知道人眼的凝视(gaze)能力和该能
复制链接

扫一扫

专栏目录