ATS Architectural Overview
DMA access time can be significantly lengthened due to the time required to resolve the actual physical address.
主机侧使用IOMMU完成设备地址到物理memory地址的转换,以及设备访问权限的检查。但如果所有PCIe设备都在进行DMA操作,则TA(Translation Agent)和ATPT(Address Translation and Protection Table)则会成为瓶颈,从而影响系统的latency。
To mitigate these impacts, designs often include address translation caches in the entity that performs the address translation. In a CPU, the address translation cache is most commonly referred to as a translation look-aside buffer (TLB). For an I/O TA, the term address translation cache or ATC is used to differentiate it from the translation cache used by the CPU.
为了解决地址转换性能问题,设计者通常会实现一个地址转换cache,在CPU侧通常称为translation look-aside buffer (TLB);对于IO侧,则称为address translation cache or ATC。
While there are some similarities between TLB and ATC, there are important differences. A TLB serves the needs of a CPU that is nominally running one thread at a time. The ATC, however, is generally processing requests from multiple I/O Functions, each of which can be considered a separate thread
与TLB不同的是,ATC通常处理的是多线程的请求。
这一区别使得主机侧的ATC很难针对系统的IO Function数目来确定cache的大小。
. The benefits of having an ATC within a Device include: • Ability to alleviate TA resource pressure by distributing address translation caching responsibility (reduced probability of “thrashing” within the TA) • Enable ATC Devices to have less performance dependency on a system’s ATC size • Potential to ensure optimal access latency by sending pretranslated requests to central complex
设备侧ATC中存放有TA和ATPT的内容,从而降低设备性能对系统cache大小的依赖。
There are a number of considerations a Function or software can use in making such a determination; for example: • Memory address ranges that will be frequently accessed over an extended period of time or whose associated buffer content is subject to a significant update rate • Memory address ranges, such as work and completion queue structures, data buffers for low-latency communications, graphics frame buffers, host memory that is used to cache Function-specific content, and so forth
针对设备Funcion访问某一个需要被频繁刷新的地址,或者对latency特别敏感的访问,ATC能显著提高软件性能。
Address Translation Services (ATS) Overview
ATS uses a request-completion protocol between a Device and a Root Complex (RC) to provide translation services.
- ATS Request的路由规则和序的规则同Non-Posted Memory Read
- ATS Request可以在1个或多个TC上outstanding地发送
TA收到ATS Translation Request之后
- 检查该Function是否开启ATS能力
- 检查该Function是否有权限访问该段地址
- 检查是否可以给该Function提供页表
- 页表请求是否合规
- Page size必须为2的幂次方,并且地址需要对齐到page size
- 最小的page size为4096 bytes
- 为了提供系统资源的利用效率,该Function必须被告知minimum translation或invalidate size,并且该Fucntion需要支持该size。最小的minimum translation size必须为4096 bytes
- 页表请求是否合规
- TA告诉RC请求的结果是成功还是失败,RC产生ATS Translation Completion通过RP返回给Device Function
- RC针对1条ATS Translation Request至少返回1条ATS Translation Completion
- A successful translation can result in one or two ATS Translation Completion TLPs per request.
- An RC may pipeline multiple ATS Translation Completions,并且多条ATS Translation Completion的序是任意的
- RC需要按照与ATS Translation Request相同的TC返回Completion
- 如果Request请求的地址不是有效的,RC需要返回Completion表明该地址不可访问
- RC针对1条ATS Translation Request至少返回1条ATS Translation Completion
Dev