The ARMv8-A architecture introduces a number of changes, which enable significantly higher
This enables virtual memory beyond the 4GB limit. This is important for modern
Thirty-one 64-bit general-purpose registers increase performance and reduce
A +/-4GB addressing range for efficient data addressing within shared libraries
New exception model
This reduces OS and hypervisor software complexity.
Efficient cache management
User space cache operations improve dynamic code generation efficiency. Fast
Data cache clear using a Data Cache Zero instruction.
Hardware-accelerated cryptography
Provides 3 × to 10 × better software encryption performance. This is useful for
small granule decryption and encryption too small to offload to a hardware
accelerator efficiently, for example https.
Load-Acquire, Store-Release instructions
Designed for C++11, C11, Java memory models. They improve performance of
thread-safe code by eliminating explicit memory barrier instructions.
NEON double-precision floating-point advanced SIMD
This enables SIMD vectorization to be applied to a much wider set of algorithms,
for example, scientific computing, High Performance Computing (HPC) and
supercomputers.
performance processor implementations to be designed.
This enables the processor to access beyond 4GB of physical memory.
物理地址空间变大了,V7是4GB,V8变成多少了呢?是2^64B吗?
This enables virtual memory beyond the 4GB limit. This is important for modern
desktop and server software using memory mapped file I/O or sparse addressing.
虚拟地址空间变大了,V7是4GB, V8是不是变成了2^64B?
This enables power-efficient, high-performance spinlocks.
SPINLOCK变得HPLP, 硬件如何设计,指令上如何体现的呢?
Thirty-one 64-bit general-purpose registers increase performance and reduce
stack use.
31个64BIT寄存器,提高了Performance,减少了STACK的使用, 硬件如何设计,指令上如何体现的呢?
There is less need for literal pools.
不明白?是不是与XZR,WZR有关系?
A +/-4GB addressing range for efficient data addressing within shared libraries
and position-independent executables.
不明白?
Additional 16KB and 64KB translation granules
This reduces Translation Lookaside Buffer (TLB) miss rates and depth of page
walks.
This reduces Translation Lookaside Buffer (TLB) miss rates and depth of page
walks.
是不是说,TRANSLATION LOOKASIZE BUFFER的粒度变细(小)了,从而使得地址转换结果的HIT率,或者说是重复使用率提高了,也减少了遍历的次数/时间?
New exception model
This reduces OS and hypervisor software complexity.
应该是说有了EL0~EL3几种EXCEPTION LEVEL。
Efficient cache management
User space cache operations improve dynamic code generation efficiency. Fast
Data cache clear using a Data Cache Zero instruction.
用户空间也有了CACHE的操作?
DATA CACHE清零指令,V7没有吗,该指令的使用场景是,设计意图是?
Hardware-accelerated cryptography
Provides 3 × to 10 × better software encryption performance. This is useful for
small granule decryption and encryption too small to offload to a hardware
accelerator efficiently, for example https.
加解密有了硬件加速?
Load-Acquire, Store-Release instructions
Designed for C++11, C11, Java memory models. They improve performance of
thread-safe code by eliminating explicit memory barrier instructions.
新的内存操作指令,显式地不使用内存屏障?
NEON double-precision floating-point advanced SIMD
This enables SIMD vectorization to be applied to a much wider set of algorithms,
for example, scientific computing, High Performance Computing (HPC) and
supercomputers.
SIMD有了进一步发展?