Qt 6.6和6.7使QML比以往更快:一个新的基准和分析

Qt 6.6 and 6.7 Make QML Faster than Ever: A New Benchmark and Analysis

Qt 6.6和6.7使QML比以往更快:一个新的基准和分析

December 18, 2023 by Ulf Hermann | Comments

​2023年12月18日:Ulf Hermann |评论

It has been a while since the last time I've posted here and a lot has happened to the Qt Quick Compiler infrastructure. It's time to show some updated numbers. The benchmark discussed in my previous post was heavily focused on value types and lists of value types. It applied some rather complex tricks to eke out the maximum speedup between the interpreted and compiled versions of the same program.

​自从我上次在这里发帖以来,已经有一段时间了,Qt Quick编译器基础架构发生了很多变化。是时候展示一些最新的数字了。我在上一篇文章中讨论的基准主要集中在价值类型和价值类型列表上。它应用了一些相当复杂的技巧来在同一程序的解释版本和编译版本之间勉强实现最大的加速。

Today I'm going to work with something more familiar to most people. I've written a new benchmark that's mostly based on object types (and lists of those), and refrains from underhandedly instructing the compiler's type propagation using secret knowledge about the quirks of JavaScript operators. It also does something useful this time around. I've re-implemented the DeltaBlue constraint solver found in the V8 benchmarks in typed QML.

今天,我将与大多数人更熟悉的东西一起工作。我已经编写了一个新的基准测试,它主要基于对象类型(及其列表),并避免使用有关JavaScript运算符怪癖的秘密知识来暗中指导编译器的类型传播。这一次它也做了一些有用的事情。我已经在类型QML中重新实现了V8基准测试中的DeltaBlue约束求解器。

In and of itself this is a somewhat foolish endeavour. Since I want to use object types, I use a separate QObject for each variable and each constraint. QObjects, as we all know, have a rather significant static overhead. Allocating a QObject just to store a few integers is quite a waste. The original implementation uses JavaScript objects. While still not ideal, those are somewhat more lightweight. Furthermore, in order to run the actual algorithm, we have to call a lot of functions, and calling functions in QML contexts is generally more expensive than doing the same in JavaScript contexts. This is because the context and scope hierarchy in QML is much more complex, and we often have to perform extra type conversions.

这本身就是一种有点愚蠢的努力。由于我想使用对象类型,所以我为每个变量和每个约束使用一个单独的QObject。众所周知,QObjects有相当大的静态开销。仅仅为了存储几个整数而分配一个QObject是相当浪费的。最初的实现使用JavaScript对象。虽然仍然不理想,但它们更轻一些。此外,为了运行实际的算法,我们必须调用很多函数,并且在QML上下文中调用函数通常比在JavaScript上下文中调用更昂贵。这是因为QML中的上下文和作用域层次结构要复杂得多,而且我们经常需要执行额外的类型转换。

So, why did I do this? Most of you will want to deal with QObjects in a lot of places since all of Qt Quick is built on QObjects. You cannot avoid allocating a QObject if you need an Item. So, in a way the implementation using QObject as storage for everything, while slower, is also more realistic. You may argue that I should have written the benchmark with Qt Quick itself to get even more realistic. I decided not to do so because as soon as you add actual graphics to the mix, you have to deal with a lot more noisy data. Qt Quick itself often adds unpredictable overhead you don't want to deal with in a benchmark. For example, if you happen to have any text in your application, it has to create the font database at some point. Or, the scenegraph performs complex operations in the background to put the pixels on the screen. Those operations may or may not happen in a separate thread, and if so, there is still a synchronization phase for each frame. Finally, the graphics driver itself kicks in and performs its own calculations. This is all very interesing if you're benchmarking Qt Quick. However, I want to benchmark the QML language here. For me this is all just noise. Therefore, I've written a non-graphical application built with QObjects. You can find the code in this repository.

​那么,我为什么要这么做呢?你们中的大多数人都想在很多地方处理QObjects,因为所有的QtQuick都是基于QObjects构建的。如果需要一个Item,则无法避免分配一个QObject。因此,在某种程度上,使用QObject作为所有内容的存储的实现虽然速度较慢,但也更现实。你可能会争辩说,我应该用Qt Quick自己编写基准测试,以变得更加现实。我决定不这么做,因为一旦把实际的图形添加到混合中,就必须处理更多的嘈杂数据。Qt-Quick本身经常会增加不想在基准测试中处理的不可预测的开销。例如,如果应用程序中碰巧有任何文本,则它必须在某个时刻创建字体数据库。或者,场景图在背景中执行复杂的操作,将像素放在屏幕上。这些操作可能发生在单独的线程中,也可能不发生,如果发生,则每个帧仍有一个同步阶段。最后,图形驱动程序本身启动并执行自己的计算。如果正在对标Qt Quick,这一切都非常有趣。然而,我想在这里对QML语言进行基准测试。对我来说,这只是噪音。因此,我编写了一个使用QObjects构建的非图形应用程序。可以在此存储库中找到代码。

And here is the good news: The performance numbers for dealing with QObjects and calling typed functions on them have improved massively in Qt 6.6 and Qt 6.7.

这里有一个好消息:在Qt6.6和Qt6.7中,处理QObjects和在QObjects上调用类型化函数的性能大大提高。

Time taken to run the DeltaBlue benchmark with different versions of Qt

使用不同版本的Qt运行DeltaBlue基准测试所花费的时间

On the Y axis you see the milliseconds it took to run one iteration of the benchmark. Lower is better. The benchmark was run with:

  • Qt 5.15, the last version of the Qt5 series. This is our baseline. In Qt 5.15 the Qt Quick Compiler didn't generate any C++ code for functions and bindings. It only produced byte code to be interpreted or JIT-compiled.
  • Qt 5.15,Qt5系列的最后一个版本。这是我们的基线。在Qt 5.15中,Qt Quick编译器没有为函数和绑定生成任何C++代码。它只生成要解释或JIT编译的字节码。
  • Qt 6.2, since that is when the new Qt Quick Compiler was introduced as tech preview.
  • Qt 6.2,因为那是新的Qt快速编译器作为技术预览版推出的时候。
  • Qt 6.5, the last LTS version.
  • Qt 6.5,最后一个LTS版本。
  • Qt 6.6, the most recent release, highlighted where appropriate.
  • 最新发布的Qt 6.6在适当的地方进行了重点介绍。
  • A recent snapshot of Qt 6.7highlighted where appropriate
  • Qt 6.7的最新快照,适当时突出显示

The setup

设置

The benchmarked program takes as input a number of variables and constraints between them. The variables are effectively numbers and the constraints hold:

基准程序将许多变量和它们之间的约束作为输入。变量实际上是数字,约束条件如下:

1.An input variable

1.输入变量

2.An output variable

2.输出变量

3.A scale variable

3.缩放变量

4.An offset variable

4.偏移变量

Either of these can be null. The constraint solver then manipulates the variable values, trying to achieve a state where for each constraint we get:

其中任何一个都可以为null。然后,约束解算器操纵变量值,试图实现这样一种状态:对于每个约束,我们得到:

output == input * scale + offset

There are more details to it, but this is the gist. Suffice to say, it's a somewhat demanding computational problem and as such well suited for our purposes.

它还有更多的细节,但这就是要点。可以说,这是一个有点苛刻的计算问题,非常适合我们的目的。

We run this on two sets of inputs: 1. A chain of alternating variables and constraints, 100 variables long. 2. A projection where 100 inputs are scaled and offset into 100 outputs.

我们在两组输入上运行此操作:1.变量和约束的交替链,100个变量长。2.一种投影,其中100个输入被缩放并偏移为100个输出。

The split between those two inputs is not very interesting. I'm giving them both together as a single data point in all the discussions below.

这两个输入之间的划分不是很有趣。在下面的所有讨论中,我将把它们作为一个单独的数据点。

Collected data points

收集的数据点

As mentioned before, there are two implementations of the actual algorithm:

如前所述,实际算法有两种实现方式:

1.The JavaScript version, almost as found in the V8 benchmark suite.

1.JavaScript版本,几乎可以在V8基准套件中找到。

2.The QML version I've written.

2.我写的QML版本。

Finally, I've split the execution into two phases for both implementations:

最后,我将两种实现的执行分为两个阶段:

1.A setup phase where all the objects are created that shall hold the variables and constraints.

1.一个设置阶段,在该阶段创建所有应包含变量和约束的对象。

2.The actual execution of the DeltaBlue algorithm.

2.DeltaBlue算法的实际执行情况。

Combining all this, a single run of the benchmark produces 4 data points:

综合所有这些,一次基准测试产生4个数据点:

1.The total time for the QML version

1.QML版本的总时间

2.The object creation time for the QML version

2.QML版本的对象创建时间

3.The total time for the JavaScript version

3.JavaScript版本的总时间

4.The object creation time for the JavaScript version

4.JavaScript版本的对象创建时间

It has to be said that we cannot run the exact same code for all versions of Qt to be tested:

不得不说的是,我们不能对要测试的所有版本的Qt运行完全相同的代码:

  • Qt 6.2 and Qt 5.15 cannot declare and initialize list properties in one QML binding/declaration. So, those had to be split in two lines.
  • Qt 6.2和Qt 5.15不能在一个QML绑定/声明中声明和初始化列表属性。所以,这些必须分成两行。
  • Qt 6.2 and Qt 5.15 do not know pragma ComponentBehavior so this had to be dropped, causing some IDs to become invisible to the compiler.
  • ​Qt 6.2和Qt 5.15不知道pragma ComponentBehavior,因此必须删除它,导致一些ID对编译器不可见。
  • Qt 6.2 and Qt 5.15 do not know that ':/qt/qml' is a default import path. It's added manually.
  • Qt 6.2和Qt 5.15不知道“:/qt/qml”是默认导入路径。它是手动添加的。
  • Qt 6.2 and Qt 5.15 cannot construct a QQmlComponent from URI and name. They have to load by URL instead.
  • ​Qt 6.2和Qt 5.15无法根据URI和名称构造QQmlComponent。它们必须通过URL加载。
  • Qt 5.15 has no proper build system API for QML modules. We build using qmake and CONFIG+=qtquickcompiler instead.
  • Qt 5.15没有适用于QML模块的正确构建系统API。我们使用qmake和CONFIG+=qtquickcompiler进行构建。
  • Qt 5.15 does not understand imports without versions. We add some versions to make it happy.
  • Qt5.15不理解没有版本的进口。我们添加了一些版本以使其愉快。

I could have avoided some of those differences, but I intentionally used the new features. They lead to improved performance where they are available, and the improved performance is what we are after.

我本可以避免其中的一些差异,但我有意使用了新功能。它们可以在可用的地方提高性能,而性能的提高正是我们所追求的。

No disk cache mode

无磁盘缓存模式

Now, since we don't have enough dimensions in our data, yet, we're adding another one. The benchmark by default uses Qt Quick Compiler to compile bindings and functions to C++. Comparing the numbers produced by the compiled code should give us the speedup caused by the compilation for each version of Qt, right? Well, unless the performance of the interpreter has also changed. In order to control for this, we also run the benchmark with QML_DISABLE_DISK_CACHE=1 for each version of Qt. This makes it ignore the compiled artifacts and instead work with the QML source code.

现在,由于我们的数据中还没有足够的维度,我们正在添加另一个维度。默认情况下,基准测试使用Qt Quick编译器来编译到C++的绑定和函数。比较编译后的代码产生的数字应该会给我们每个版本的Qt的编译带来的加速,对吧?除非解释器的表现也发生了变化。为了对此进行控制,我们还为每个版本的Qt运行QML_DISABLE_DISK_CACHE=1的基准测试。这使得它忽略了已编译的工件,转而使用QML源代码。

Finally, the Qt Quick Compiler Extensions have an extra feature that comes in very handy here:

最后,Qt Quick编译器扩展有一个非常方便的额外功能:

Static mode

静态模式

Consider three files A.qml, B.qml, and C.qml:

考虑三个文件A.qml、B.qml和C.qml:

// A.qml
import QtQml
QtObject { property int v: 11 }

// B.qml
import QtQml
A { property string v: "foos" }

// C.qml
import QtQml
QtObject { 
    property A a: A {}
    function evil(b: B) { a = b }
    function bark() { console.log(a.v) }
}

If you instantiate C and play with the evil and bark functions a bit, you will discover a feature of the QML language you didn't want to know about. It's called property shadowing. For great many properties and methods we cannot know in advance what types they will have at run time. This is a nasty problem for the Qt Quick Compiler. In Qt 6.6 it has learned to deal with it by wrapping the affected values in QVariant and checking their types where necessary. This comes at a performance cost, though. qmlsc has an extra option --static that tells it to ignore any shadowing. You can use it at your own risk. There are some properties that are intentionally shadowed. For example we're moving the focusReason property to QQuickItem, leaving a property of the same name in QQuickControl for backwards compatibility. Most shadowing, however, is a mistake.

​如果实例化C并稍微使用evil 和bark 函数,将发现QML语言的一个不想了解的特性。这叫做属性阴影。对于许多属性和方法,我们无法提前知道它们在运行时会有什么类型。这对Qt Quick编译器来说是个棘手的问题。在Qt 6.6中,它学会了通过将受影响的值包装在QVariant中并在必要时检查其类型来处理它。不过,这是以性能为代价的。qmlsc有一个额外的选项--static,它告诉忽略任何阴影。可以自担风险使用它。有些属性是有意隐藏的。例如,我们将focusReason属性移动到QQuickItem,在QQuickControl中保留一个同名属性以实现向后兼容性。然而,大多数阴影都是错误的。

The --static option was not available in Qt 6.2 and only takes effect with Qt 6.5, 6.6, and 6.7.

--static选项在Qt 6.2中不可用,仅在Qt 6.5、6.6和6.7中生效。

In our benchmark, we know we haven't shadowed anything, and we don't want to pay the performance price of checking. Therefore, we add a third, static, mode to each benchmark run to see how much we can gain in comparison to the normal, shadowable mode.

在我们的基准测试中,我们知道我们没有跟踪任何东西,我们不想为检查付出性能代价。因此,我们在每个基准测试运行中添加第三个静态模式,看看与正常的可阴影模式相比,我们能获得多少收益。

The results

结果

I've tried very hard to produce stable, comparable, data. The benchmarks are run on a linux machine booted directly into a shell, without init system. For program run I first try to warm the caches by performing a dry run, and then run 1000 iterations of the benchmark. For each benchmark function the program is re-started from scratch so that they cannot interfer with each other. So let's go back to the graph above.

我已经非常努力地制作出稳定的、可比较的数据。基准测试是在直接引导到shell的linux机器上运行的,不需要init系统。对于程序运行,我首先尝试通过执行干式运行来预热缓存,然后运行1000次基准测试迭代。对于每个基准函数,程序都是从头开始重新启动的,这样它们就不会相互干扰。让我们回到上图。

The first thing to note here is that I was not fully successful in my attempts to produce clean data. The JavaScript numbers should all be the same, especially within a single version of Qt. The way the QML code is compiled should not have any effect on the JavaScript execution. All the JavaScript at play here lives in a separate deltablue.js file that cannot be compiled to C++. Realizing this, I advise you to take all of the data with a roughly 5%-sized grain of salt.

这里首先要注意的是,我并没有完全成功地尝试生成干净的数据。JavaScript的数字应该是相同的,尤其是在Qt的单个版本中。QML代码的编译方式不应对JavaScript的执行产生任何影响。这里所有的JavaScript都存在于一个单独的deltablue.js文件中,该文件无法编译为C++。意识到这一点,我建议你用大约5%大小的盐来获取所有数据。

Another thing you can immediately see is that the QML version of the algorithm is generally much slower than the JavaScript version. As noted above, this is due to it being built on QObjects rather than JavaScript objects.

可以立即看到的另一件事是,QML版本的算法通常比JavaScript版本慢得多。如上所述,这是因为它是基于QObjects而不是JavaScript对象构建的。

On top of this, there is a noticable drop in performance between Qt 5.15 and Qt 6.2, for the QML version. If you look at the code you notice that there are a lot of as casts in there that tell the compiler what type to expect for some potentially shadowed value. In 5.15 as is a no-op. It was originally meant as a compile time only construct. Later, however, we noticed that this will lead to behavior differences between compiled and interpreted/JIT'ed code. To avoid those, we introduced type checks for both the compiled code and the interpreter and JIT. So, the later versions of Qt do more work here, but for Qt 6.2 and 6.5 it does not pay off, yet. Qt 6.2 and 6.5 still have to interpret or JIT most of the code here since their compilers' language coverage is rather limited.

除此之外,QML版本的Qt 5.15和Qt 6.2之间的性能也出现了显著下降。如果查看代码,会注意到其中有很多as强制转换,告诉编译器一些潜在的阴影值的类型。在5.15中,这是一个否定。它最初是作为一个只在编译时使用的构造。然而,后来我们注意到,这将导致编译的和解释的/JIT’ed代码之间的行为差异。为了避免这些问题,我们为编译后的代码、解释器和JIT引入了类型检查。因此,Qt的后期版本在这里做了更多的工作,但对于Qt 6.2和6.5,它还没有得到回报。Qt6.2和6.5仍然需要解释或JIT这里的大部分代码,因为它们的编译器的语言覆盖范围相当有限。

With that out of the way, let's look at the happy side of things. I've highlighted it in orange and red. Qt 6.6 takes about half the time Qt 6.5 takes to run the QML version of the benchmark, and Qt 6.7 improves on this some more. In static mode we get down to about a third of the 6.5 numbers. Here we get into a territory where the object creation overhead starts to dominate the benchmark. With Qt 6.7 in static mode, it took less time to run the whole benchmark than it took for the object creation alone with Qt 6.2.

抛开这些不谈,让我们看看事情快乐的一面。我用橙色和红色突出显示了它。Qt 6.6运行QML版本的基准测试所需的时间大约是Qt 6.5的一半,而Qt 6.7在这方面做了更多改进。在静态模式下,我们可以减少到6.5个数字的三分之一左右。在这里,我们进入了一个领域,对象创建开销开始主导基准测试。在Qt 6.7处于静态模式的情况下,运行整个基准测试所花费的时间比使用Qt 6.2单独创建对象所花费的更少。

Object creation also includes initial binding evaluation, which is why the object creation also benefits from compilation of bindings and expressions to C++. A complementary solution to object creation overhead will be qmltc, once it's ready.

​对象创建还包括初始绑定评估,这就是为什么对象创建还受益于将绑定和表达式编译到C++。对象创建开销的补充解决方案将是qmltc,一旦它准备好了。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值