计算机视觉我们的计算机如何看待和理解他们所看到的东西

Definition: Computer Vision is the field of computer science that helps the computers to gain high-level information from digital images and videos by analyzing them using various machine learning algorithms. From an engineering point of view, its main objective is to automate the human visual system and to lessen the human effort.

定义:计算机视觉是计算机科学领域,它通过使用各种机器学习算法对计算机进行分析,从而帮助计算机从数字图像和视频中获取高级信息。 从工程的角度来看,其主要目的是使人类视觉系统自动化并减少人类的工作量。

As a scientific discipline, computer vision is related to artificial systems. Computer vision image data can take many forms such as views from multiple cameras, multidimensional data from medical scanner or video frames etc.

作为一门科学学科,计算机视觉与人工系统有关。 计算机视觉图像数据可以采用多种形式,例如来自多个摄像机的视图,来自医疗扫描仪或视频帧的多维数据等。

History:

历史:

Computer vision began in the late 1960s at universities which were pioneering in the field of artificial intelligence. It was developed to mimic the human visual system and was the next stepping stone towards building robots with their own human-like intelligence. This was first achieved in 1966 when a camera was attached to a computer and the computer was made to describe what it saw.

计算机视觉始于1960年代末期在人工智能领域开创的大学。 它的开发是为了模仿人类的视觉系统,并且是构建具有自己的类人智能的机器人的下一个踏脚石。 最初是在1966年实现的,当时在计算机上安装了相机,并通过计算机描述了所看到的东西。

At that time, what they wanted to achieve through computer vision was digital image processing that could create a 3 dimensional structure of the full scene to improve understandings. These studies from 1970s formed the very early basis of the modern computer vision algorithms that include labelling of line, edge extraction from images, polyhedral and non-polyhedral modelling, motion estimation and optical flow.

当时,他们希望通过计算机视觉实现的是数字图像处理,可以创建整个场景的3维结构以增进理解。 1970年代的这些研究形成了现代计算机视觉算法的早期基础,该算法包括线的标注,图像的边缘提取,多面体和非多面体建模,运动估计和光流。

The next decade saw further research and development in the theories and concepts of computer vision such as scale-space, the deduction of shape from various prompts such as shading, contour models called snakes, focus and textures.

在接下来的十年中,对计算机视觉的理论和概念(例如比例空间),各种提示(例如阴影,称为蛇的轮廓模型,焦点和纹理)的形状推导进行了进一步的研究和开发。

By the 1990s, some of the previous topics of research were getting more attention and helped in getting better results for making practical models for this technology. These include projective 3D reconstruction to get a better understanding of the camera calibrations. Various graph cut techniques were used to solve image calibrations. This decade also achieved the milestone in computer science history when facial recognition came into existence using computer vision technologies.

到1990年代,以前的一些研究主题已引起越来越多的关注,并有助于为该技术创建实用模型而获得更好的结果。 其中包括投影3D重建,以更好地了解相机校准。 各种图形切割技术用于解决图像校准问题。 当使用计算机视觉技术进行面部识别时,这十年也达到了计算机科学史上的里程碑。

At the end of this decade, a significant leap had been taken in this technology that included developments in image morphing, panoramic image stitching and image based rendering.

在这十年的末期,这项技术取得了重大飞跃,包括图像变形,全景图像拼接和基于图像的渲染方面的发展。

Transition from machine learning to advanced deep learning algorithms brought more complex optimization frameworks in practice that helped us to make use of the real potential in this technology.

从机器学习到高级深度学习算法的过渡在实践中带来了更复杂的优化框架,这些框架帮助我们利用了该技术的真正潜力。

Technical terms used here-

这里使用的技术术语

1. Optical flow: It is the pattern formed due to apparent motion of edges, objects and surfaces caused due to relative motion between an observer and an object.

1.光流:它是由于观察者和物体之间的相对运动而导致的边缘,物体和表面的明显运动而形成的图案。

Image for post

Figure : Optical Flow experienced by a rotating observer. Magnitude and direction of optical flow at each point is represented by length and direction of arrow.

图片:旋转观察者经历的光流。 每个点的光流大小和方向由箭头的长度和方向表示。

2. Optical flow sensor: It is a vision sensor that has an image sensor chip connected to a processor that helps in measuring the optical flow and thus gives a output based on it.

2.光流量传感器:这是一种视觉传感器,其图像传感器芯片连接到处理器,该处理器有助于测量光流量,并因此而基于光输出。

3. Motion estimation: Motion estimation can be defined as the study of shift in the 2D motion vectors usually by studying the differences between adjacent image frames.

3.运动估计:运动估计可以被定义为通常通过研究相邻图像帧之间的差异来研究二维运动向量中的偏移。

4. Polyhedron modelling: Polyhedron modelling is a physical construction of a polyhedron using various solid materials. Polyhedron is Greek word meaning “many bases”. Polyhedron models can be used in computer vision technologies to make 3D models that helps in training machine learning algorithms.

4.多面体建模:多面体建模是使用各种固体材料对多面体进行的物理构造。 多面体是希腊语,意为“许多基础”。 多面体模型可用于计算机视觉技术中,以创建有助于训练机器学习算法的3D模型。

5. Camera Calibration: Camera calibration is done to estimate the intrinsic and the extrinsic parameters of a camera that helps in achieving a high accuracy and low distortion rate from the camera.

5.摄像机校准:进行摄像机校准是为了估计摄像机的内部参数和外部参数,这有助于实现摄像机的高精度和低失真率。

6. 3D reconstruction from multiple images: 3D reconstruction from multiple images is basically creating three-dimensional models from sets of images.

6.从多个图像进行3D重建:从多个图像进行3D重建基本上是根据图像集创建三维模型。

7. Contour models called snakes: It is a framework in computer vision that helps in depicting an object outline from a possibly noisy 2D image.

7.称为蛇的轮廓模型:这是计算机视觉中的一个框架,有助于从可能嘈杂的2D图像描绘对象轮廓。

Related Fields:

相关领域:

1. Artificial Intelligence: Artificial intelligence is the field of computer science that deals with granted machines human like capabilities of thinking and doing specific tasks.

1.人工智能:人工智能是计算机科学领域,它处理授予机器的类似人的思考和执行特定任务的能力。

2. Information Engineering: This field of computer science came into limelight in the 21st century itself. It deals with generation, distribution, analysis and use of that data, knowledge, information in systems.

2.信息工程:计算机科学的这一领域在21世纪本身就引起了人们的关注。 它处理系统中该数据,知识,信息的生成,分发,分析和使用。

3. Neurobiology: Neurobiology, specifically the study of human vision system in neurobiology plays a vital role in the development of computer vision technology. It has helped us understand about how a real vision system works using neurons and has helped us to mimic it using the concepts of artificial intelligence and machine learning.

3.神经生物学:神经生物学,特别是人类视觉系统在神经生物学中的研究在计算机视觉技术的发展中起着至关重要的作用。 它帮助我们了解了真实视觉系统如何使用神经元工作,并帮助我们使用人工智能和机器学习的概念来模仿它。

4. Solid State Physics: Computer vision systems depend on image sensors which are used to detect electro-magnetic radiation mostly of the form infra-red or visible light. These sensors are designed using concepts of quantum mechanics and quantum physics. Physics deals with the behavior of optics that acts as the core concept for this technology.

4.固态物理学:计算机视觉系统依赖于图像传感器,该图像传感器用于检测电磁辐射,大部分形式为红外或可见光。 这些传感器是使用量子力学和量子物理学的概念设计的。 物理处理光学的行为,这是该技术的核心概念。

5. Signal Processing: Due to the multi-dimensionality of image signals, there might be requirement of conversion of one variable signal to multi variable or vice-versa thus image processing is also a related field to computer vision.

5.信号处理:由于图像信号的多维性,可能需要将一个可变信号转换为多变量,反之亦然,因此图像处理也是计算机视觉的一个相关领域。

Working:

加工:

To understand the working of computer vision we will take the example of this greyscale image buffer of Abraham Lincoln. Each pixel brightness is represented by a single 8 bit number. As we know that each bit can be represented by two characters 0 and 1 thus for 8 bits, 28 = 256 giving us a range from 0 (black) to 255 (white).

为了理解计算机视觉的工作原理,我们将以亚伯拉罕·林肯的灰度图像缓冲区为例。 每个像素亮度由单个8位数字表示。 众所周知,每个位都可以由两个字符0和1表示,因此对于8位,28 = 256给出了从0(黑色)到255(白色)的范围。

Image for post

Figure: Pixel data diagram

图:像素数据图

All the pixel position values are universally stored at the hardware level in a one dimensional array. Thus we get a long list of unsigned characters as shown below:

所有像素位置值通常以一维数组形式存储在硬件级别。 因此,我们得到了一长串未签名的字符,如下所示:

{157, 153, 174, 168, 150, 152, 129, 151, 172, 161, 155, 156,
155, 182, 163, 74, 75, 62, 33, 17, 110, 210, 180, 154,
180, 180, 50, 14, 34, 6, 10, 33, 48, 106, 159, 181,
206, 109, 5, 124, 131, 111, 120, 204, 166, 15, 56, 180,
194, 68, 137, 251, 237, 239, 239, 228, 227, 87, 71, 201,
172, 105, 207, 233, 233, 214, 220, 239, 228, 98, 74, 206,
188, 88, 179, 209, 185, 215, 211, 158, 139, 75, 20, 169,
189, 97, 165, 84, 10, 168, 134, 11, 31, 62, 22, 148,
199, 168, 191, 193, 158, 227, 178, 143, 182, 106, 36, 190,
205, 174, 155, 252, 236, 231, 149, 178, 228, 43, 95, 234,
190, 216, 116, 149, 236, 187, 86, 150, 79, 38, 218, 241,
190, 224, 147, 108, 227, 210, 127, 102, 36, 101, 255, 224,
190, 214, 173, 66, 103, 143, 96, 50, 2, 109, 249, 215,
187, 196, 235, 75, 1, 81, 47, 0, 6, 217, 255, 211,
183, 202, 237, 145, 0, 0, 12, 108, 200, 138, 243, 236,
195, 206, 123, 207, 177, 121, 123, 200, 175, 13, 96, 218};

{157、153、174、168、150、152、129、151、172、161、155、156, 155、182、163、74、75、62、33、17、110、210、180、154, 180、180、50、14、34、6、10、33、48、106、159、181, 206、109、5、124、131、111、120、204、166、15、56、180, 194、68、137、251、237、239、239、228、227、87、71、201, 172、105、207、233、233、214、220、239、228、98、74、206, 188、88、179、209、185、215、211、158、139、75、20、169, 189、97、165、84、10、168、134、11、31、62、22、148, 199、168、191、193、158、227、178、143、182、106、36、190, 205、174、155、252、236、231、149、178、228、43、95、234, 190、216、116、149、236、187、86、150、79、38、218、241, 190、224、147、108、227、210、127、102、36、101、255、224, 190、214、173、66、103、143、96、50、2、109、249、215, 187、196、235、75、1、81、47、0、6、217、255、211, 183、202、237、145、0、0、12、108、200、138、243、236, 195、206、123、207、177、121、123、200、175、13、96、218};

This way of storing data might appear to be 2 dimensional but actually inside a computer it is stored in one dimensional array only, since the computer memory is just an ever increasing linear list of address spaces. This example below will help you understand how the addresses of a multidimensional photo are stored in a one dimension array in the computer memory.

这种存储数据的方式似乎是二维的,但实际上在计算机内部,它仅以一维数组存储,因为计算机内存只是地址空间的线性列表。 下面的示例将帮助您了解如何将多维照片的地址存储在计算机内存中的一维数组中。

Image for post

Figure :How pixels are stored in memory

图片:像素如何存储在内存中

Coming back to the Abraham Lincoln example, if we want to add color to it then it becomes very complicated. Computers only read colors in an image as a series of three colors:

回到亚伯拉罕·林肯的例子,如果我们想为其添加颜色,那么它将变得非常复杂。 计算机只能读取图像中的三种颜色的颜色:

Red, Green and Blue (RGB color series)

红色,绿色和蓝色(RGB颜色系列)

All of these are stored in the same 0–255 scale, thus the computer has to store 3 extra color values for each pixel which makes it very complicated. If we want to color the Lincoln photo above, we need 12 x 16 x 3 = 576 values.

所有这些都以相同的0-255比例存储,因此计算机必须为每个像素存储3个额外的颜色值,这使其非常复杂。 如果要为上面的林肯照片上色,则需要12 x 16 x 3 = 576个值。

Image for post

Figure: RGB colors

图:RGB颜色

Thus, now we have got the basic understanding of how images are stored and pixels work. A lot of pixel values need to be stored and iterated over for every single image stored in the system which require advanced algorithms.

因此,现在我们对图像的存储方式和像素的工作方式有了基本的了解。 对于需要高级算法的系统中存储的每个单个图像,都需要存储并迭代许多像素值。

Computer vision earlier required a lot of manual coding by the developers but the evolution of computer industry has brought machine learning and deep learning algorithms into existence that have automated many of our tasks.

早期的计算机视觉需要开发人员进行大量的手动编码,但是计算机行业的发展已使机器学习和深度学习算法得以实现,这些算法使我们的许多任务自动化。

Evolution of Computer Vision:

计算机视觉的演变:

Certainly, computer vision mostly involves finding specific shapes and patterns in the image data provided to the computer.

当然,计算机视觉主要涉及在提供给计算机的图像数据中找到特定的形状和图案。

Earlier most of this work had to be done by the developers themselves, they had to do all the measurements and find patterns and then input data into the machine. The accuracy rate for identifying objects and patterns by the computers was low back then because getting access to some considerable amount of data to train the machine was difficult .

早期的大部分工作必须由开发人员自己完成,他们必须进行所有测量并找到模式,然后将数据输入到机器中。 那时,计算机识别物体和图案的准确率很低,因为很难访问大量数据来训练机器。

Then came machine learning algorithms that automated much of the process. It provided a different approach to computer vision problems, we could then use smaller programmed applications to identify specific patterns in images.

然后是机器学习算法,该算法可自动执行大部分过程。 它提供了解决计算机视觉问题的另一种方法,然后我们可以使用较小的编程应用程序来识别图像中的特定图案。

These were statistical learning algorithms that included logistic regression, decision trees, linear regression and Support Vector Machines (SVM).

这些是统计学习算法,包括逻辑回归,决策树,线性回归和支持向量机(SVM)。

Introduction of deep learning concepts to the world was when this industry actually started to boom because it has helped computers to identify objects and patterns with a 99% accuracy rate. It uses artificial neural networks to do measurements of edges, vertices etc and to identify all the patterns and objects in the images with high accuracy. The neural networks are very fast and can measure and analyze huge amounts of data in short periods of time. This helps to train the machine such that its recognition accuracy increases. One such neural network that has led to great advancements in the field of computer vision is the Convolution Neural Network.

深度学习概念的引入是在这个行业开始真正兴起的时候,因为它帮助计算机以99%的准确率识别对象和模式。 它使用人工神经网络进行边缘,顶点等的测量,并以高精度识别图像中的所有图案和物体。 神经网络非常快,可以在短时间内测量和分析大量数据。 这有助于训练机器,使其识别精度提高。 卷积神经网络就是其中一种在计算机视觉领域取得了长足进步的神经网络。

翻译自: https://medium.com/swlh/computer-vision-how-can-our-computers-see-and-make-sense-out-of-what-they-see-f6dd777aed07

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值