通过模拟设计更安全的城市

Every year 1.25 million people are killed in traffic accidents. In 2016, road crashes resulted in 40,000 deaths and 4.6 million injuries in United States alone. Unity has partnered with the City of Bellevue in Washington State to work towards reducing those numbers through technology.

每年有125万人在交通事故中丧生。 2016年,仅在美国,道路交通事故就造成40,000人死亡和460万人受伤。 Unity已与华盛顿州贝尔维尤市合作,致力于通过技术来减少这些数量。

By using machine learning and simulations, we believe we can identify unsafe intersections and mitigate the risks before the actual accidents happen. In this blog post, we’ll talk about the initial steps on this collaboration: we’ll introduce the project, talk about the basic computer vision concepts behind its idea and showcase how the Unity Engine is used to create simulated environments to train machine learning models used in the solution.

通过使用机器学习和模拟 ,我们相信我们可以在实际事故发生之前识别不安全的交叉路口并减轻风险。 在此博客文章中,我们将讨论协作的初始步骤:我们将介绍该项目,讨论其思想背后的基本计算机视觉概念,并展示如何使用Unity Engine创建模拟环境来训练机器学习解决方案中使用的模型。

主动 (The initiative)

This partnership between Unity and the City of Bellevue is part of a project called Video Analytics towards Vision Zero.

Unity与贝尔维尤市之间的这种合作伙伴关系是名为“实现零视力的视频分析”项目的一部分。

It’s a long name, so let’s break it down to understand its scope.

这是一个很长的名字,所以让我们分解一下以了解其范围。

  • Vision Zero is a multi-national, multi-entity road safety project focused on building  a highway system with no fatalities or serious injuries involving road traffic.

    事故愿景是一个跨国,多实体的道路安全项目,致力于建立高速公路系统,而不会造成死亡或严重的道路交通伤害。

  • Video Analytics towards Vision Zero is the technology initiative driven by the City of Bellevue to find and fix unsafe intersections by identifying “near miss” situations through video analysis. In other words, to automatically detect would-be accidents that were averted, and modify the intersection to prevent future occurrences.

    迈向零视力的 视频 分析是由贝尔维尤市推动的技术计划,目的是通过视频分析识别“差点错过”情况,从而找到并修复不安全的交叉路口。 换句话说,可以自动检测到避免发生的事故,并修改交叉路口以防止将来发生事故。

The overarching goal of the initiative is to create a computer vision system that can leverage cameras spread across the city’s many intersections to identify unsafe intersections. By recognizing objects in the video streams – cars, people, bicycles, etc – and their trajectory, the system is expected to provide input to traffic planners about frequency and nature of incidents, allow for reviewing dangerous situations and generate more accurate general reports about those intersections. All this data will allow cities to make sure safety measures are put in place – such as redesigning of crossing lanes, adjustment of stop lights timing, introduction of clearer signage – getting us closer to the the zero fatalities goal aimed by Vision Zero.

该计划的总体目标是创建一个计算机视觉系统,该系统可以利用遍布城市多个路口的摄像机识别不安全的路口。 通过识别视频流中的对象(汽车,人,自行车等)及其轨迹,该系统有望为交通规划人员提供有关事件发生频率和性质的输入,允许查看危险情况并生成关于这些事件的更准确的常规报告交叉路口。 所有这些数据将使城市确保采取适当的安全措施,例如重新设计人行横道,调整停车灯时间,引入更清晰的标志,使我们更接近零伤亡愿景的零死亡目标。

This article focuses on the object recognition task, which is the initial piece of the safety system planned for the initiative. Unity Engine’s ability to create synthetic representations of the real world is a valuable resource for computer vision applications, as we’ll explain ahead.

本文重点介绍对象识别任务,这是为该计划计划的安全系统的初始部分。 正如我们将在后面解释的那样,Unity Engine创建现实世界的综合表示的能力对于计算机视觉应用程序来说是宝贵的资源。

训练计算机视觉模型 (Training computer vision models)

Computer vision is a subfield of artificial intelligence that aims to extract information from images and videos. There has been significant progress in this area for the last five years and it has been widely applied to automatically make sense of real images. In particular, a computer vision model can learn to understand the video feeds from multiple intersections and what goes on around them.

计算机视觉是人工智能的一个子领域,旨在从图像和视频中提取信息。 在过去的五年中,该领域取得了重大进展,并且已广泛应用于自动识别真实图像。 特别地,计算机视觉模型可以学习理解来自多个路口的视频提要以及它们周围发生的事情。

This understanding requires detecting, classifying and tracking different elements of the image. By detecting, we mean locating each object or piece of the scenery in the image. The classification provides information about the type of the detected parts. Additionally, each object is tracked as an individual instance in the image – which is particularly important when dealing with sequences of images or videos. All this metadata allows for the machine learning model to understand the image as a whole.

这种理解要求检测,分类跟踪图像的不同元素。 通过检测,我们的意思是在图像中定位每个对象或每个风景。 分类提供有关检测到的零件类型的信息。 此外,每个对象都作为图像中的单个实例进行跟踪–在处理图像或视频序列时,这一点尤其重要。 所有这些元数据都使机器学习模型可以整体理解图像。

Usage of real world images is a common approach to provide bootstrapping data for computer vision models. The initial identification and classification metadata is provided through manual work, using specialized tools in a process called annotation.

现实世界图像的使用是为计算机视觉模型提供引导数据的常用方法。 最初的标识和分类元数据是通过手动工作使用称为注解的过程中使用专用工具提供的。

Here’s an example taken from Vision Zero annotation tool:

这是从“视觉零”注解工具获取的示例:

Notice the bounding boxes created manually by the tool operators around each pedestrian in the street, as well as the “area of interest” defined by the dashed red line.

注意工具操作员在街道上每个行人周围手动创建的边界框,以及由红色虚线定义的“感兴趣区域”。

Once the images are annotated, a part of the set is used for training the machine learning model while a smaller hold-out of the dataset is used to evaluate the performance of the trained model on scenes it has never seen before.

注释完图像后,该集合的一部分将用于训练机器学习模型,而数据集的较小保留量将用于评估训练过的模型在从未见过的场景中的性能。

On an abstract level this is how the data is used for training supervised models:

在抽象级别上,这是将数据用于训练监督模型的方式:

A common challenge with computer vision applications is that finding enough meaningful training and evaluation data is hard. Manual annotation of real world images and videos is the norm and was a big focus on the initial phase of the Video Analytics project. But it is a costly process, and the quality of the labels can be affected by operator fatigue, inconsistencies in the procedures and other human factors.

计算机视觉应用程序的一个常见挑战是很难找到足够的有意义的训练和评估数据。 手动标注现实世界中的图像和视频是很普遍的做法,并且主要关注Video Analytics项目的初始阶段。 但这是一个昂贵的过程,而且标签的质量可能会受到操作员疲劳,程序不一致以及其他人为因素的影响。

Even discounting the costs and time, the data obtained through capture is still limited by what the real world can provide. If you need to train your model with some bicycles and buses, wait for them to show up in the same frame. If you want to see what your model does when there’s snow, or rain, or fog, then you need to befriend a meteorologist… and be ready to fire off the cameras when they say so.

即使降低成本和时间,通过捕获获得的数据仍然受到现实世界所提供的限制。 如果您需要用一些自行车和公共汽车来训练模型,请等待它们出现在同一框架中。 如果您想查看在下雪,下雨或下雾时模型的功能,则需要与气象学家成为朋友……并准备在摄像机讲话时将其开火。

Simulation is a good way to overcome these limitations. By providing full control of the contents – including full understanding of the nature of each element of the scene – a simulation environment can produce virtually infinite sets of training and evaluation data with absolutely accurate annotations in a multitude of situations that can be either designed for specific cases or generated procedurally to cover as many scenarios as possible.

仿真是克服这些限制的好方法。 通过提供对内容的完全控制(包括对场景中每个元素的性质的充分理解),模拟环境可以在多种情况下生成几乎无限的训练和评估数据集,并提供绝对准确的注释,这些情况可以针对特定情况进行设计案例或程序生成以涵盖尽可能多的场景。

场景和情节 (Scenes and Episodes)

Let’s go over a couple of concepts at the core of simulations for computer vision: Scenes and Episodes.

让我们研究一下计算机视觉模拟核心的两个概念:场景和情节。

Scenes are all the static and dynamic elements, as well as parameters that are modeled in a simulation environment. They include buildings, streets, vegetation (static elements); cars, pedestrians, bicycles (dynamic elements); weather conditions, time of the day (sun position), fog (parameters).

场景是所有静态和动态元素,以及在模拟环境中建模的参数。 它们包括建筑物,街道,植被(静态元素); 汽车,行人,自行车(动态元素); 天气条件,一天中的时间(太阳位置),雾(参数)。

Episodes are determined configurations of the scene elements. For example, a certain placement for pedestrians, a certain route for cars, the presence of rain, road conditions, etc. One can imagine an episode as being an instance of a scene.

情节是确定的场景元素配置。 例如,为行人提供了一定的位置,为汽车提供了一定的路线,下雨了,路况等等。人们可以将一个情节想象为一个场景的实例。

In this picture, the boxes on top row represent the individual elements of the scene: static assets, dynamic assets, and parameters – in this case, weather. On the bottom box all pieces from the scene are combined into a complete episode.

在此图片中,第一行的框代表场景的各个元素:静态资产,动态资产和参数-在这种情况下为天气。 在底部的方框中,场景中的所有片段都组合成一个完整的情节。

When creating simulated data, we typically refer to episode variation as the process of generating different episodes for a given scene. This process can be tailored to create a comprehensive set of situations expected to be found in the real world application of the machine learning model being trained .

创建模拟数据时,我们通常将情节变化称为给定场景生成不同情节的过程。 可以定制此过程,以创建预期在正在训练的机器学习模型的实际应用中发现的全面情况。

使用Unity进行真实世界的模拟 (Using Unity for real-world simulations)

Given the costs and limitations in gathering real-world data, it is natural to consider replacing or augmenting it with synthetic data generated by a game engine such as Unity. Due to recent advances in graphics hardware, rendering, and advent of virtual and augmented reality, the Unity Engine has evolved into a complete 3D modeling tool, able to generate highly photo-realistic simulations. This has been noticed by both industry and academia, and many projects have been developed to take advantage of Unity’s simulation capabilities. One of the most recognized projects is SYNTHIA.

考虑到收集现实世界数据的成本和限制,很自然地考虑用Unity之类的游戏引擎生成的合成数据替换或扩充它。 由于图形硬件,渲染以及虚拟现实和增强现实技术的最新发展,Unity Engine已经发展成为一个完整的3D建模工具,能够生成高度逼真的照片。 工业界和学术界都注意到了这一点,并且开发了许多项目来利用Unity的仿真功能。 SYNTHIA是最受认可的项目之一

Developed entirely on Unity by The Computer Vision Center (CVC) at Universitat Autónoma de Barcelona (UAB), the project focuses on creating a collection of synthetic images and videos depicting street scenes in a diverse range of episode variations.

该项目完全由巴塞罗那大学(UAB)的计算机视觉中心(CVC)在 Unity上开发,该项目着重于创建合成图像和视频的集合,这些图像和视频描绘了各种情节变化中的街道场景。

CVC has been pioneering computer vision research for the past 20 years and its SYNTHIA dataset has become a seminal source for those working on autonomous vehicles perception systems.

在过去的20年中,CVC一直是计算机视觉研究的开创者,其SYNTHIA数据集已成为从事自动驾驶感知系统工作的人的开创性资源。

For Vision Zero, Unity joined forces once again with CVC to provide the City of Bellevue with the best technology and expertise available.

对于Vision零,Unity再次与CVC联合,为贝尔维尤市提供了最好的技术和专业知识。

Through imagery and 3D models provided by the City of Bellevue, and leveraging the integration of Otoy’s OctaneRender with Unity Engine, CVC took on creating a set of scenes that can be leveraged to improve both the training and the evaluation of the computer vision models built by Microsoft.

通过贝尔维尤市提供的图像和3D模型,并利用Otoy的OctaneRender与Unity Engine集成 ,CVC可以创建一组场景,可以利用这些场景来改进由以下人员构建的计算机视觉模型的训练和评估:微软。

第一次模拟 (The first simulations)

Vision Zero focuses on vehicles and pedestrians interaction on intersections. It means that a proper simulation needs to represent a few city areas with high level of detail, with challenging camera angles, in a multitude of situations – i.e. variation of objects in the scenes – in order to generate the data coverage required by the computer vision models.

零号愿景专注于路口的车辆和行人互动。 这意味着在多种情况下(例如,场景中物体的变化),正确的模拟需要代表具有高细节水平,具有挑战性的摄像机角度的几个城市区域,以便生成计算机视觉所需的数据覆盖范围楷模。

The video below shows an intersection (116th Ave NE and NE 12th Street) in the City of Bellevue. The real camera picture at the 16 seconds point is a good baseline to understand the amazing level of photorealism achieved in this simulation.

以下视频显示了贝尔维尤市的一个十字路口(北大街116号大街和北大街12号大街)。 16秒点的真实相机图片是了解此模拟中达到惊人的逼真的水平的良好基准。

演示地址

Not only are the images impressively realistic, but because they are Unity assets, we have all metadata needed for a 100% error-free segmentation – i.e. pixel-level classification of everything in the scene, fundamental data for Computer Vision model training. There’s also precise information about distances, depth and materials, eliminating the need for human annotation.

图像不仅逼真逼真,而且因为它们是Unity的资产,所以我们拥有100%无错误分割所需的所有元数据-即场景中所有内容的像素级分类,用于计算机视觉模型训练的基本数据。 还有关于距离,深度和材质的精确信息,从而无需人工注释。

Here’s an example of depth metadata, taken from the video above:

这是深度元数据的示例,取自上面的视频:

The shades of gray represent the distance of each object from the camera, the darker, the closer. The sky has no data and is represented as pure black.

灰色阴影代表每个物体距相机的距离,越暗,距离越近。 天空没有数据,以纯黑色表示。

Notice that because we have very fine grained information about the image, it’s possible to distinguish individual leaves in the trees and different elements of the buildings facades, for example.

请注意,由于我们具有关于图像的非常精细的信息,因此例如可以区分树木中的单个叶子和建筑物立面的不同元素。

Another snapshot from the video showing the Semantic Segmentation:

视频中的另一个快照显示了语义细分:

Compared to the manual image annotation tool shown earlier in this blog, the difference in quality and precision is clear. Here, it’s possible to have pixel-level labeling for full semantic segmentation instead of just a bounding box for coarse object detection.  There are many classes of segments, represented in the picture by the different colors. Notice the ability to correctly differentiate between cars and buses, streets and sidewalks. This is powerful metadata that can be leveraged by the model to predict the overlap between different objects in the scene much more accurately than is possible with manually annotated data.

与本博客前面显示的手动图像批注工具相比,质量和精度上的差异显而易见。 在这里,可以使用像素级标签进行完整的语义分割,而不仅仅是用于粗略对象检测的边界框。 细分的类别很多,在图片中用不同的颜色表示。 请注意,可以正确地区分汽车和公共汽车,街道和人行道。 这是功能强大的元数据,模型可以利用该元数据来预测场景中不同对象之间的重叠,这要比手动添加注释的数据更为准确。

The next steps in our efforts are to start experimenting with the episode variation strategies: different cars, more near-misses, weather variation, etc. and generate a comprehensive dataset to be fed into the training pipeline for the computer vision model. We are also looking at new street intersections to be generated, as we scale out the project.

我们努力的下一步是开始试验情节变化策略:不同的汽车,更多的未命中项目,天气变化等,并生成一个综合的数据集以馈入计算机视觉模型的训练管道。 在扩展项目规模时,我们还在寻找将要生成的新街道交叉口。

Initially, the models will target high accuracy on semantic segmentation. Eventually, trajectories will be detected and a complete “near miss” model will be developed to provide the automatic analytics that is the project’s goal.

最初,这些模型将针对语义分割实现高精度。 最终,将检测到轨迹,并开发完整的“差点错过”模型以提供自动分析,这是该项目的目标。

Unity will be supporting Microsoft in the process of evaluating improvements to the computer vision models’ performance, tweaking the simulation as needed.

Unity将在评估计算机视觉模型性能的改进过程中为Microsoft提供支持,并根据需要调整仿真。

Based on research from different teams – including CVC itself – we expect that the approach of mixing real and simulated data will ensure the best results for the models. We’ll be posting concrete results once available.

基于来自包括CVC本身在内的不同团队的研究,我们期望混合实际数据和模拟数据的方法将确保模型获得最佳结果。 一旦有可用的结果,我们将发布具体结果。

Unity的模拟未来 (A future of simulations with Unity)

Whereas this project tackles a problem in itself technically complex, its end goal – to improve safety in our cities and save lives – works as a natural catalyst to get all these great teams together. Unity, CVC, Microsoft and City of Bellevue – Industry, Academia and Government working towards a common goal.

尽管该项目本身解决的是技术上复杂的问题,但其最终目标–改善我们城市的安全并挽救生命–是促使所有优秀团队团结在一起的自然催化剂。 Unity,CVC,微软和贝尔维尤市–工业界,学术界和政府都在朝着共同的目标努力。

It’s only natural for Unity to be in the middle of all this, empowering and enabling its partners. After all, it aligns perfectly with in Unity’s core values:  democratize development, solve hard problems, and enable success.

对于Unity而言,这是很自然的事情,它可以赋予其合作伙伴强大的力量。 毕竟,它完全符合Unity的核心价值观:使开发民主化,解决难题并取得成功。

We have been collecting invaluable knowledge as the project evolves, and those learnings will get incorporated back into the engine so that everybody can benefit.

随着项目的发展,我们一直在收集宝贵的知识,这些学习将被重新整合到引擎中,以便每个人都可以受益。

You can expect Simulation to get even easier and more powerful on Unity as we go, and we are glad that we can do our part to have a safer world in the process.

您可以期望,随着我们的发展,Simulation将在Unity上变得更加轻松和强大,并且我们很高兴能够为在此过程中拥有一个更安全的世界而努力。

联系 (Contact)

If you’d like to know more about this project, please contact Jose De Oliveira (josed@unity3d.com).

如果您想了解更多有关该项目的信息,请联系Jose De Oliveira( josed@unity3d.com )。

致谢 (Acknowledgements)

The project is a collaboration with:

该项目与以下机构合作:

Franz Loewenherz,  principal transportation planner for the City of Bellevue and head of the Video Analytics initiative

贝尔维尤市首席交通规划师兼视频分析计划负责人Franz Loewenherz

Prof. Antonio M. López, Principal Investigator at the Computer Vision Center (CVC), and Associate Professor of the Computer Science Department, both from the Universitat Autònoma de Barcelona (UAB); as well as Dr. Jose A. Iglesias Research Scientist at CVC.

巴塞罗那自治大学 (UAB) 计算机视觉中心 (CVC)首席研究员, 计算机科学系副教授Antonio M.López教授; 以及CVC的Jose A. Iglesias研究科学家。

Ganesh Ananthanarayanan, researcher in the Mobility and Networking group at Microsoft Research.

Microsoft Research 移动与网络小组的研究员Ganesh Ananthanarayanan。

翻译自: https://blogs.unity3d.com/2018/01/23/designing-safer-cities-through-simulations/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值