Unity DOTS Burst 运行分析
Unity版本:Unity 2020.1.0f1
Burst版本:1.39
资料来源:
使用工具:Unity工程文件,IDA pro,X32dbg(这里都使用32位工程)
研究目的:研究Unity在mono和il2cpp环境下如何使用Burst和Burst是否有更多可能的利用.
前言
一直想了解Burst的运行原理,故花了一点时间研究其内部运行过程。
初步了解的情况是,Burst是由LLVM修改的到,LLVM是一个高度自定义的编译器,Unity就是使用LLVM来完成IR到硬编码的转换。
想深入理解可以参考视频:ECS-深入解析Burst Compiler
Burst技术有可能在未来替换il2cpp技术成为Unity跨平台的工具。
问题
研究前问自己几个问题:
- Burst的代码是如何嵌入到Unity工程里的?
- Burst是如何同时适应mono和il2cpp的?
- Burst在Editor展示的代码是否是最终运行时的代码?
开始分析
简单的BurstCompile代码,具体意义见代码:
public class MyBurst2Behavior : MonoBehaviour
{
void Start()
{
var input = new NativeArray<float>(10, Allocator.Persistent);
var output = new NativeArray<float>(1, Allocator.Persistent);
for (int i = 0; i < input.Length; i++)
input[i] = 1.0f * i;
var job = new MyJob
{
Input = input,
Output = output
};
job.Schedule().Complete();
Debug.Log("The result of the sum is: " + output[0]);
input.Dispose();
output.Dispose();
}
// Using BurstCompile to compile a Job with Burst
// Set CompileSynchronously to true to make sure that the method will not be compiled asynchronously
// but on the first schedule
[BurstCompile(CompileSynchronously = true)]
private struct MyJob : IJob
{
[ReadOnly]
public NativeArray<float> Input;
[WriteOnly]
public NativeArray<float> Output;
public void Execute()
{
float result = 0.0f;
for (int i = 0; i < Input.Length; i++)
{
result += Input[i];
}
Output[0] = result;
}
}
}
MyJob.Execute()通过Burst生成的代码如下:
这里是X86_SSE2下面会解释.
将项目发布生成32位il2cpp可执行文件(.exe),这里需要copy pdb文件.
然后用X32dbg调试,不停F9执行,这里会发现一个lib_burst_generated.dll的动态链接库.
这其实就是BurstCompiler真正生成的动态链接库.同时在lib_burst_generated.dll的文件目录下也存在一个lib_burst_generated.txt的解释文件.
内容如下:
Library: lib_burst_generated
–platform=Windows
–backend=burst-llvm-9
–target=X86_SSE2
–dump=Function
–float-precision=Standard
–format=Coff
–compilation-defines=UNITY_2020_1_0
–compilation-defines=UNITY_2020_1
–compilation-defines=UNITY_2020
…
这就解释了上面inspector为什么要显示X86_SSE2用来对比.
##下面就要找到MyJob.Execute()在lib_burst_generated.dll中的具体位置.
- 找到lib_burst_generated.txt中的
--method=Unity.Jobs.IJobExtensions+JobStruct`1[[MyBurst2Behavior+MyJob, Assembly-CSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]], UnityEngine.CoreModule, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null::Execute(MyBurst2Behavior+MyJob&, Assembly-CSharp, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null|System.IntPtr, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089|System.IntPtr, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089|Unity.Jobs.LowLevel.Unsafe.JobRanges&, UnityEngine.CoreModule, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null|System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089)--0d8e3c4830c6f75b280da41a7a03428f
这条最后的0d8e3c4830c6f75b280da41a7a03428f这就是MyJob.Execute()在lib_burst_generated.dll中入口函数名.
2. 用IDA pro打开lib_burst_generated.dll.
可以在Function Name中找到0d8e3c4830c6f75b280da41a7a03428f函数,点击进入
可以发现
.text:101039A0 public _0d8e3c4830c6f75b280da41a7a03428f
.text:101039A0 _0d8e3c4830c6f75b280da41a7a03428f proc near
.text:101039A0 ; DATA XREF: .rdata:off_1010E7B0↓o
.text:101039A0 mov eax, dword_10110700
.text:101039A5 jmp eax
.text:101039A5 _0d8e3c4830c6f75b280da41a7a03428f endp
最终可以找到MyJob.Execute()由Brust编译成的硬编码
.text:100C5480 MyJob__Execute__ proc near ; DATA XREF: burst_initialize+509↓o
.text:100C5480
.text:100C5480 arg_0 = dword ptr 4
.text:100C5480
.text:100C5480 mov eax, [esp+arg_0]
.text:100C5484 mov ecx, [eax+4]
.text:100C5487 test ecx, ecx
.text:100C5489 jle short loc_100C54A2
.text:100C548B mov edx, [eax]
.text:100C548D xorps xmm0, xmm0
.text:100C5490
.text:100C5490 loc_100C5490: ; CODE XREF: MyJob__Execute__+18↓j
.text:100C5490 addss xmm0, dword ptr [edx]
.text:100C5494 add edx, 4
.text:100C5497 dec ecx
.text:100C5498 jnz short loc_100C5490
.text:100C549A mov eax, [eax+0Ch]
.text:100C549D movss dword ptr [eax], xmm0
.text:100C54A1 retn
.text:100C54A2 ; ---------------------------------------------------------------------------
.text:100C54A2
.text:100C54A2 loc_100C54A2: ; CODE XREF: MyJob__Execute__+9↑j
.text:100C54A2 xorps xmm0, xmm0
.text:100C54A5 mov eax, [eax+0Ch]
.text:100C54A8 movss dword ptr [eax], xmm0
.text:100C54AC retn
.text:100C54AC MyJob__Execute__ endp
和上面inspector中的硬编码对比发现其实并不一样,Brust会对最终硬编码最优化.inspector中的硬编码并没有太多的参考价值.
检测MyJob.Execute()是否被真正调用
还是使用X32dbg调试,把断点打在0d8e3c4830c6f75b280da41a7a03428f函数位置
1496399F | 90 | nop |
149639A0 | A1 00079714 | mov eax,dword ptr ds:[14970700] |
149639A5 | FFE0 | jmp eax |
149639A7 | 90 | nop |
单步进入MyJob.Execute()
1492547F | 90 | nop |
14925480 | 8B4424 04 | mov eax,dword ptr ss:[esp+4] |
14925484 | 8B48 04 | mov ecx,dword ptr ds:[eax+4] |
14925487 | 85C9 | test ecx,ecx |
14925489 | 7E 17 | jle lib_burst_generated.149254A2 |
1492548B | 8B10 | mov edx,dword ptr ds:[eax] |
1492548D | 0F57C0 | xorps xmm0,xmm0 |
14925490 | F3:0F5802 | addss xmm0,dword ptr ds:[edx] |
14925494 | 83C2 04 | add edx,4 |
14925497 | 49 | dec ecx |
14925498 | 75 F6 | jne lib_burst_generated.14925490 |
1492549A | 8B40 0C | mov eax,dword ptr ds:[eax+C] |
1492549D | F3:0F1100 | movss dword ptr ds:[eax],xmm0 |
149254A1 | C3 | ret |
149254A2 | 0F57C0 | xorps xmm0,xmm0 |
149254A5 | 8B40 0C | mov eax,dword ptr ds:[eax+C] |
149254A8 | F3:0F1100 | movss dword ptr ds:[eax],xmm0 |
149254AC | C3 | ret |
149254AD | 90 | nop |
就证明了lib_burst_generated.dll中的MyJob.Execute()被实际调用过.
研究MyJob.Execute()的调用过程
需要IL2CPP_Test_BackUpThisFolder_ButDontShipItWithYourGame文件下的il2cpp生成源码
目录Editor\Data\il2cpp下的il2cpp源码
可执行文件目录下UnityPlayer.dll和UnityPlayer.pdb文件
在X32dbg中观察堆栈信息,可以依次获得返回函数:
return to unityplayer.ExecuteJob+92 from ???
return to unityplayer.ForwardJobToManaged+28 from unityplayer.ExecuteJob
return to unityplayer.private: int __thiscall JobQueue::Exec(struct JobInfo *,int,int,bool)+4A from ???
return to unityplayer.private: int __thiscall JobQueue::Steal(class JobGroup *,struct JobInfo *,int,int,bool,bool)+96 from unityplayer.private: int __thiscall JobQueue::Exec(struct JobInfo *,int,int,bool)
return to unityplayer.public: void __thiscall JobQueue::WaitForJobGroupID(struct JobGroupID,enum JobQueue::JobQueueWorkStealMode)+5C from unityplayer.private: int __thiscall JobQueue::Steal(class JobGroup *,struct JobInfo *,int,int,bool,bool)
return to unityplayer.void __cdecl CompleteFenceInternal(struct JobFence &,enum WorkStealMode)+1B from unityplayer.public: void __thiscall JobQueue::WaitForJobGroupID(struct JobGroupID,enum JobQueue::JobQueueWorkStealMode)
return to unityplayer.void __cdecl JobHandle_CUSTOM_ScheduleBatchedJobsAndComplete(struct JobFence &)+24 from unityplayer.void __cdecl CompleteFenceInternal(struct JobFence &,enum WorkStealMode)
gameassembly._MyBurst2Behavior_Start_m6F1D1D4F7850B34F19BDDB0118E4059E76B36DD3
一切要从gameassembly._MyBurst2Behavior_Start就是源码中的MyBurst2Behavior.Start()函数开始
在UnityEngine.CoreModule.cpp中
JobHandle_Complete_m947DF01E0F87C3B0A24AECEBF72D245A6CDBE148
JobHandle_ScheduleBatchedJobsAndComplete_m9D762E10C5648909D15E56C8E099410300F691A0
// System.Void Unity.Jobs.JobHandle::ScheduleBatchedJobsAndComplete(Unity.Jobs.JobHandle&)
IL2CPP_EXTERN_C IL2CPP_METHOD_ATTR void JobHandle_ScheduleBatchedJobsAndComplete_m9D762E10C5648909D15E56C8E099410300F691A0 (JobHandle_t8AEB8D31C25D7774C71D62B0C662525E6E36D847 * ___job0, const RuntimeMethod* method)
{
typedef void (*JobHandle_ScheduleBatchedJobsAndComplete_m9D762E10C5648909D15E56C8E099410300F691A0_ftn) (JobHandle_t8AEB8D31C25D7774C71D62B0C662525E6E36D847 *);
static JobHandle_ScheduleBatchedJobsAndComplete_m9D762E10C5648909D15E56C8E099410300F691A0_ftn _il2cpp_icall_func;
if (!_il2cpp_icall_func)
_il2cpp_icall_func = (JobHandle_ScheduleBatchedJobsAndComplete_m9D762E10C5648909D15E56C8E099410300F691A0_ftn)il2cpp_codegen_resolve_icall ("Unity.Jobs.JobHandle::ScheduleBatchedJobsAndComplete(Unity.Jobs.JobHandle&)");
_il2cpp_icall_func(___job0);
}
可以看到这里调用的是ScheduleBatchedJobsAndComplete函数,在UnityPlayer.dll中,显示为:
.text:106DA6C0 ?JobHandle_CUSTOM_ScheduleBatchedJobsAndComplete@@YAXAAUJobFence@@@Z proc near
.text:106DA6C0 ; DATA XREF: .data:11375488↓o
.text:106DA6C0
.text:106DA6C0 arg_0 = dword ptr 8
.text:106DA6C0
.text:106DA6C0 push ebp
.text:106DA6C1 mov ebp, esp
.text:106DA6C3 push esi
.text:106DA6C4 mov esi, [ebp+arg_0]
.text:106DA6C7 cmp dword ptr [esi], 0
.text:106DA6CA jz short loc_106DA6ED
.text:106DA6CC mov ecx, ?gBatchScheduler@@3PAVJobBatchDispatcher@@A ; this
.text:106DA6D2 call ??1JobBatchDispatcher@@QAE@XZ ; JobBatchDispatcher::~JobBatchDispatcher(void)
.text:106DA6D7 cmp dword ptr [esi], 0
.text:106DA6DA jz short loc_106DA6ED
.text:106DA6DC push 0
.text:106DA6DE push esi
.text:106DA6DF call ?CompleteFenceInternal@@YAXAAUJobFence@@W4WorkStealMode@@@Z ; CompleteFenceInternal(JobFence &,WorkStealMode)
.text:106DA6E4 push esi
.text:106DA6E5 call ?Empty@AbortShim@@SA?AV1@XZ ; AbortShim::Empty(void)
.text:106DA6EA add esp, 0Ch
.text:106DA6ED
.text:106DA6ED loc_106DA6ED: ; CODE XREF: JobHandle_CUSTOM_ScheduleBatchedJobsAndComplete(JobFence &)+A↑j
.text:106DA6ED ; JobHandle_CUSTOM_ScheduleBatchedJobsAndComplete(JobFence &)+1A↑j
.text:106DA6ED pop esi
.text:106DA6EE pop ebp
.text:106DA6EF retn
.text:106DA6EF ?JobHandle_CUSTOM_ScheduleBatchedJobsAndComplete@@YAXAAUJobFence@@@Z endp
然后就是CompleteFenceInternal JobQueue::WaitForJobGroupID JobQueue::Steal JobQueue::Exec
问题答案
1.Burst的代码是如何嵌入到Unity工程里的?
Unity从外部调用了lib_burst_generated.dll直接作为动态链接库使用。
2.Burst是如何同时适应mono和il2cpp的?
与第一题相似,mono和il2cpp都可以从外部调用lib_burst_generated.dll,不需要搞第二套方案。
3.Burst在Editor展示的代码是否是最终运行时的代码?
从结构上来看不是,Editor下应该时Debug模式生成的,lib_burst_generated.dll应该时经过优化的代码。
总结
Burst更向是个外挂程序,但的确提高了运行性能,但看起来不那么优雅。
当然Burst还有需要东西值得分析,以后研究到了我会再与大家分享。