unity dots_在DOTS上:C ++和C#

unity dots

This is a brief introduction to our new Data-Oriented Tech Stack (DOTS), sharing some insights in how and why we got to where we are today, and where we’re going next. We’re planning on posting more about DOTS on this blog in the near future.

这是对我们新的面向数据的技术堆栈(DOTS)的简要介绍,分享了一些有关如何以及为什么我们到达今天的位置以及下一步去向的见解。 我们计划在不久的将来在此博客上发布有关DOTS的更多信息。

Let’s talk about C++. The language Unity is written in today.

让我们谈谈C ++。 今天使用Unity语言编写。

One of many advanced game programmers’ problems at the end of the day is that they need to provide an executable with instructions the target processor can understand, that when executed will run the game.

最终,许多高级游戏程序员遇到的问题之一是,他们需要向可执行文件提供目标处理器可以理解的指令,执行该指令后即可运行游戏。

For the performance critical part of our code, we know what we want the final instructions to be. We just want an easy way to describe our logic in a reasonable way, and then trust and verify that the generated instructions are the ones we want.

对于代码的性能至关重要的部分,我们知道最终指令的含义。 我们只希望有一种简单的方法来以合理的方式描述我们的逻辑,然后信任并验证所生成的指令就是我们想要的指令。

In our opinion, C++ is not great at this task. I want my loop to be vectorized, but a million things can happen that might make the compiler not vectorize it. It might be vectorized today, but not tomorrow if a new seemingly innocent change happens. Just convincing all my C/C++ compilers to vectorize my code at all is hard.

我们认为,C ++在此任务上并不出色。 我希望对循环进行矢量化处理,但是可能发生一百万件事,可能会使编译器无法对其进行矢量化处理。 如果发生新的看似无害的变化,今天可能会矢量化,但明天可能不会。 仅仅说服我所有的C / C ++编译器完全矢量化我的代码是很难的。

We decided to make our own “reasonably comfortable way to generate machine code”, that checks all the boxes that we care about. We could spend a lot of energy trying to bend the C++ design train a little bit more in a direction it would work a little bit better for us, but we’d much rather spend that energy on a toolchain where we can do all of the design, and that we design exactly for the problem that game developers have.

我们决定采用自己的“合理的方式来生成机器代码”,以检查我们关心的所有内容。 我们可能会花很多精力尝试将C ++设计训练更多地朝着对我们更好的方向发展,但是我们宁愿将精力花在可以完成所有工作的工具链上设计,而我们正是针对游戏开发人员所遇到的问题进行设计的。

What checkboxes do we care about?

我们关心哪些复选框?

    Ok, so now that we know what things we care about, the next step is to decide on what the input language for this machine code generator is. Let’s say we have the following options:

    好的,现在我们知道我们关心的是什么,下一步就是确定此机器代码生成器的输入语言是什么。 假设我们有以下选项:

      Say What C#? For our most performance critical inner loops? Yes. C# is a very natural choice that comes with a lot of nice benefits for Unity:

      说什么C#? 对于我们最重要的性能内循环? 是。 C#是非常自然的选择,它为Unity带来很多好处:

      • A C#->intermediate IL compiler already exists (the Roslyn C# compiler from Microsoft), and we can just use it instead of having to write our own.

        AC#->中间IL编译器已经存在(Microsoft的Roslyn C#编译器),我们可以使用它,而不必自己编写。

      • We have a lot of experience modifying intermediate-IL, so it’s easy to do codegen and postprocessing on the actual program.

        我们在修改中间IL方面具有丰富的经验,因此可以很容易地在实际程序上进行代码生成和后处理。

      • Avoids many of C++’s problems (header inclusion hell, PIMPL patterns, long compile times)

        避免了许多C ++的问题(标头包含地狱,PIMPL模式,较长的编译时间)

      I quite enjoy writing code in C# myself. However, traditional C# is not an amazing language from a performance perspective. The C# language team, standard library team, and runtime team have been making great progress in the last two years. Still, when using C# language, you have no control over where/how your data is laid out in memory. And that is exactly what we need to improve performance.

      我非常喜欢自己用C#编写代码。 但是,从性能角度来看,传统的C#并不是一种令人惊奇的语言。 在过去的两年中,C#语言团队,标准库团队和运行时团队取得了长足的进步。 但是,当使用C#语言时,您无法控制数据在内存中的放置位置/方式。 而这正是我们提高性能所需要的。

      On top of that, the standard library is oriented around “objects on the heap”, and “objects having pointer references to other objects”.

      最重要的是,标准库面向“堆上的对象”和“具有指向其他对象的指针引用的对象”。

      That said, when working on a piece of performance critical code, we can give up on most of the standard library, (bye Linq, StringFormatter, List, Dictionary), disallow allocations (=no classes, only structs), reflection, the garbage collector and virtual calls, and add a few new containers that you are allowed to use (NativeArray and friends). Then, the remaining pieces of the C# language are looking really good. Check out Aras’s blog for some examples from his path tracer toy project for some examples.

      就是说,在处理一段性能关键代码时,我们可以放弃大多数标准库(再见Linq,StringFormatter,List,Dictionary),禁止分配(=无类,仅结构),反射,垃圾收集器和虚拟调用,并添加一些允许使用的新容器(NativeArray和朋友)。 然后,其余的C#语言看起来真的很棒。 请查看 Aras的博客 ,获取他的 路径追踪器玩具项目 中的一些示例。

      This subset lets us comfortably do everything we need in our hot loops. Because it’s a valid subset of C#, we can also run it as a regular C#. We can get errors on out of bounds access, with great error messages, debugger support and compilation speeds you forgot were possible when working in C++. We often refer to this subset as High-Performance C# or HPC#.

      该子集使我们可以轻松地执行热循环中需要的所有操作。 因为它是C#的有效子集,所以我们也可以将其作为常规C#运行。 在C ++中工作时,我们会发现错误信息,强大的错误消息,调试器支持和编译速度,这些都是您无法访问的。 我们通常将此子集称为高性能C#或HPC#。

      突发编译器:今天我们在哪里? (Burst compiler: Where are we today?)

      We’ve built a code generator/compiler called Burst. It’s been available since Unity 2018.1 as a preview package. We have a lot of work ahead, but we’re already happy with it today.

      我们已经构建了一个称为Burst的代码生成器/编译器。 自Unity 2018.1起作为预览包提供。 我们还有很多工作要做,但是今天我们已经对此感到满意。

      We’re sometimes faster than C++, also still sometimes slower than C++. The latter case we consider performance bugs we’re confident we can resolve.

      我们有时比C ++快,有时也比C ++慢。 对于后一种情况,我们认为我们可以解决性能错误。

      Only comparing performance is not enough though. What matters equally is what you had to do to get that performance. Example: we took the C++ culling code of our current C++ renderer and ported it to Burst. The performance was the same, but the C++ version had to do incredible gymnastics to convince our C++ compilers to actually vectorize. The Burst version was about 4x smaller.

      但是,仅比较性能是不够的。 同样重要的是,您必须执行该操作才能获得该性能。 示例:我们采用了当前C ++渲染器的C ++剔除代码并将其移植到Burst。 性能是相同的,但是C ++版本必须做令人难以置信的体操才能说服我们的C ++编译器实际进行矢量化。 爆裂版的尺寸缩小了约4倍。

      To be honest, the whole “you should move your most performance critical code to C#” story also didn’t result in everybody internally at Unity immediately buying it. For most of us, it feels like “you’re closer to the metal” when you use C++. But that won’t be true for much longer. When we use C# we have complete control over the entire process from source compilation down to machine code generation, and if there’s something we don’t like, we just go in and fix it.

      老实说,整个“您应该将对性能最关键的代码转移到C#”的故事并没有导致Unity内部的每个人都立即购买它。 对于我们大多数人来说,当您使用C ++时,感觉就像“您更接近金属”。 但这不会持续太久。 当我们使用C#时,我们可以完全控制从源代码编译到机器代码生成的整个过程,并且如果我们不喜欢某些内容,我们只需进行修复即可。

      We will slowly but surely port every piece of performance critical code that we have in C++ to HPC#. It’s easier to get the performance we want, harder to write bugs, and easier to work with.

      我们将缓慢而可靠地将C ++中的每条性能关键代码移植到HPC#。 获得我们想要的性能更容易,编写错误也更容易,并且更易于使用。

      Here’s a screenshot of Burst Inspector, allowing you to easily see what assembly instructions were generated for your different burst hot loops:

      这是Burst Inspector的屏幕截图,可让您轻松查看针对不同的突发热循环生成的汇编指令:

      Unity has a lot of different users. Some can enumerate the entire arm64 instruction set from memory, others are happy to create things without getting a PhD in computer science.

      Unity有很多不同的用户。 一些人可以从内存中枚举整个arm64指令集,而另一些人乐于创建东西而无需获得计算机科学博士学位。

      All users benefit as the parts of their frame time that are spent running engine code (usually 90%+) get faster. The parts that are running Asset Store package runtime code gets faster as Asset Store package authors adopt HPC#.

      所有用户都受益于运行引擎代码(通常超过90%)所花费的部分帧时间变得更快。 随着Asset Store软件包作者采用HPC#,正在运行Asset Store软件包运行时代码的部件变得更快。

      Advanced users will benefit on top of that by also being able to also write their own high-performance code in HPC#.

      高级用户还将能够在HPC#中编写自己的高性能代码,从而从中受益。

      优化粒度 (Optimization granularity)

      In C++, it’s very hard to ask the compiler to make different optimization tradeoffs for different parts of your project. The best you have is per file granularity on specifying optimization level.

      在C ++中,很难要求编译器对项目的不同部分进行不同的优化折衷。 您所拥有的最好的选择就是指定优化级别时每个文件的粒度。

      Burst is designed to take a single method in that program as input: the entry point to a hot loop. It will compile that function and everything that it invokes (which is guaranteed to be known: we don’t allow virtual functions or function pointers).

      Burst设计为采用该程序中的单个方法作为输入:热循环的入口点。 它将编译该函数及其调用的所有内容(保证是已知的:我们不允许使用虚函数或函数指针)。

      Because Burst only operates on a relatively small part of the program, we set optimization level to 11. Burst inlines pretty much every call site. Remove if checks that otherwise would not be removed, because in inlined form we have more information about the arguments of the function.

      由于Burst仅在程序的相对较小的部分上运行,因此我们将优化级别设置为11。Burst几乎内联每个调用站点。 如果检查否则将不会被删除,则删除,因为以内联形式,我们具有有关函数参数的更多信息。

      如何解决常见的多线程问题 (How that helps with common multithreading problems)

      C++ (nor C#) doesn’t do much to help developers to write thread-safe code.

      C ++(也不是C#)在帮助开发人员编写线程安全代码方面无济于事。

      Even today, more than a decade since game consumer hardware has >1 core, it is very hard to ship programs that use multiple cores effectively.

      即使到了今天,自游戏消费类硬件的内核数量超过1个世纪以来,要交付有效使用多个内核的程序还是非常困难的。

      Data races, nondeterminism and deadlocks are all challenges that make shipping multithreaded code difficult. What we want is features like “make sure that this function and everything that it calls never read or write global state”. We want violations of that rule to be compiler errors, not “guidelines we hope all programmers adhere to”. Burst gives a compiler error.

      数据争夺,不确定性和死锁都是使多线程代码难以交付的挑战。 我们需要的功能是“确保此功能及其调用的所有内容都不会读取或写入全局状态”。 我们希望违反该规则的是编译器错误,而不是“我们希望所有程序员都遵守的准则”。 突发会产生编译器错误。

      We encourage both Unity users and ourselves to write “jobified” code: splitting up all data transformations that need to happen into jobs. Each job is “functional”, as in side-effect free. It explicitly specifies the read-only buffers and read/write buffers it operates on. Any attempt to access other data results in a compiler error.

      我们鼓励Unity用户和我们自己编写“作业化”的代码:将所有需要进行的数据转换分解为作业。 每个工作都是“功能性”的,无副作用。 它显式指定了其操作的只读缓冲区和读/写缓冲区。 任何尝试访问其他数据的操作都会导致编译器错误。

      The job scheduler will guarantee that nobody is writing to your read-only buffer while your job is running. And we’ll guarantee that nobody is reading from your read/write buffer while your job is running.

      作业调度程序将确保在作业运行时,没有人向您的只读缓冲区写入数据。 而且,我们将保证在您的作业运行时,没有人从您的读/写缓冲区中读取数据。

      If you schedule a job that violates these rules, you get a runtime error every time. Not just in your unlucky race condition case. The error message will explain that you’re trying to schedule a job that wants to read from buffer A, but that you already scheduled a job before that will write to A, so if you want to do this, you need to specify that previous job as a dependency.

      如果安排的作业违反这些规则,则每次都会出现运行时错误。 不只是在您不幸的种族情况下。 该错误消息将说明您正在尝试调度要从缓冲区A读取的作业,但是您已经在调度该作业之前将其写入A,因此,如果要执行此操作,则需要指定前一个工作作为依赖。

      We find this safety mechanism catches a lot of bugs before they get committed and results in efficient use of all cores. It becomes impossible to code a deadlock or a race condition. Results are guaranteed to be deterministic regardless of how many threads are running, or how many time a thread gets interrupted by some other process.

      我们发现,这种安全机制在错误被提交之前就捕获了许多错误,并导致对所有内核的有效利用。 不可能编写死锁或竞争条件。 无论运行多少线程,或某个线程被其他进程中断多少次,都保证结果是确定的。

      破解整个堆栈 (Hacking the whole stack)

      By being able to hack on all these components, we can make them be aware of each other. For example, a common case for a vectorization not happening is that the compiler cannot guarantee that two pointers do not point to the same memory (aliasing). We know two NativeArray’s will never alias because we wrote the collection library, and we can use that knowledge in Burst, so it won’t have to give up on optimization because it’s afraid two array pointers might point to the same memory.

      通过能够破解所有这些组件,我们可以使它们彼此了解。 例如,向量化未发生的常见情况是编译器无法保证两个指针不会指向同一内存(别名)。 我们知道两个NativeArray永远不会别名,因为我们编写了集合库,并且可以在Burst中使用该知识,因此它不必放弃优化,因为它担心两个数组指针可能指向同一内存。

      Similarly, we wrote the Unity.Mathemetics math library. Burst has intimate knowledge of it. It will (in the future) be able to do accuracy sacrificing optimizations for things like math.sin(). Because to Burst math.sin() is not just any C# method to compile, it will understand the trigonometric properties of sin(), understand that sin(x) == x for small values of x (which Burst might be able to prove), understand it can be replaced by a Taylor series expansion for a certain accuracy sacrifice. Cross platform & architecture floating point determinism is also a future goal of burst that we believe is possible to achieve.

      同样,我们编写了Unity.Mathemetics数学库。 爆裂对此有很深的了解。 (将来)它将能够为诸如math.sin()之类的事情牺牲准确性进行优化。 因为对于Burst math.sin()而言,不仅要编译任何C#方法,它都将理解sin()的三角性质,了解对于小x值,sin(x)== x(Burst可以证明),请理解它可以用泰勒级数展开式代替,以牺牲一定的精度。 跨平台和架构浮点确定性也是我们认为有可能实现的爆发的未来目标。

      引擎代码和游戏代码之间的区别消失了 (The distinction between engine code and game code disappears)

      By writing Unity’s runtime code in HPC#, the engine and the game are written in the same language. We will distribute runtime systems that we have converted to HPC# as source code. Everyone will be able to learn from them, improve them, tailor them. We’ll have a level playing field, where nothing is stopping users from writing a better particle system, physics system or renderer than we write. I expect many people will. By having our internal development process be much more like our users’ development process, we’ll also feel our users pain more directly, and we can focus all our efforts into improving a single workflow, instead of two different ones.

      通过用HPC#编写Unity的运行时代码,引擎和游戏可以用相同的语言编写。 我们将分发已转换为源代码的HPC#运行时系统。 每个人都可以向他们学习,改进它们,定制它们。 我们将拥有一个公平的竞争环境,没有什么可以阻止用户编写比我们编写的更好的粒子系统,物理系统或渲染器。 我希望很多人会。 通过使我们的内部开发过程更像用户的开发过程,我们还将感到用户更加痛苦,并且我们可以将所有精力集中在改进单个工作流上,而不是两个不同的工作流上。

      In my next post, I’ll cover a different part of DOTS: the entity component system.

      在我的下一篇文章中,我将介绍DOTS的另一部分:实体组件系统。

      翻译自: https://blogs.unity3d.com/2019/02/26/on-dots-c-c/

      unity dots

      • 0
        点赞
      • 0
        收藏
        觉得还不错? 一键收藏
      • 0
        评论

      “相关推荐”对你有帮助么?

      • 非常没帮助
      • 没帮助
      • 一般
      • 有帮助
      • 非常有帮助
      提交
      评论
      添加红包

      请填写红包祝福语或标题

      红包个数最小为10个

      红包金额最低5元

      当前余额3.43前往充值 >
      需支付:10.00
      成就一亿技术人!
      领取后你会自动成为博主和红包主的粉丝 规则
      hope_wisdom
      发出的红包
      实付
      使用余额支付
      点击重新获取
      扫码支付
      钱包余额 0

      抵扣说明:

      1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
      2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

      余额充值