1.检测平台是否支持DMAR设备
./drivers/pci/dmar.c->int __init early_dmar_detect(void)
{
acpi_status status = AE_OK;
/* if we could find DMAR table, then there are DMAR devices */
status = acpi_get_table(ACPI_SIG_DMAR, 0,
(struct acpi_table_header **)&dmar_tbl);
if (ACPI_SUCCESS(status) && !dmar_tbl) {
printk (KERN_WARNING PREFIX "Unable to map DMAR/n");
status = AE_NOT_FOUND;
}
return (ACPI_SUCCESS(status) ? 1 : 0);
}
该函数在内存初始化的时候调用:
./arch/x86_64/mm/init.c:528:pci_iommu_alloc();
通过读取 DMA Remapping table,来判断判断是否支持DMAR设备。
./include/acpi/actbl1.h:64:#define ACPI_SIG_DMAR"DMAR"/* DMA Remapping table */
/*******************************************************************************
*
* FUNCTION:acpi_get_table
*
* PARAMETERS:table_type- one of the defined table types
*Instance- the non zero instance of the table, allows
*support for multiple tables of the same type
*see acpi_gbl_acpi_table_flag
*ret_buffer- pointer to a structure containing a buffer to
*receive the table
*
* RETURN:Status
*
* DESCRIPTION: This function is called to get an ACPI table.The caller
*supplies an out_buffer large enough to contain the entire ACPI
*table.The caller should call the acpi_get_table_header function
*first to determine the buffer size needed.Upon completion
*the out_buffer->Length field will indicate the number of bytes
*copied into the out_buffer->buf_ptr buffer. This table will be
*a complete table including the header.
*
********************************************************************************/
2.初始化Intel IOMMU设备
./drivers/pci/intel-iommu.c:
int __init intel_iommu_init(void)
{
int ret = 0;
if (no_iommu || swiotlb || dmar_disabled)
return -ENODEV;
if (dmar_table_init())
return-ENODEV;
iommu_init_mempool();
dmar_init_reserved_ranges();
init_no_remapping_devices();
ret = init_dmars();
if (ret) {
printk(KERN_ERR "IOMMU: dmar init failed/n");
put_iova_domain(&reserved_iova_list);
iommu_exit_mempool();
return ret;
}
printk(KERN_INFO
"PCI-DMA: Intel(R) Virtualization Technology for Directed I/O/n");
force_iommu = 1;
dma_ops = &intel_dma_ops;
return 0;
}
该函数在arch/x86_64/kernel/pci-dma.c的
static int __init pci_iommu_init(void)
{
#ifdef CONFIG_CALGARY_IOMMU
calgary_iommu_init();
#endif
intel_iommu_init();
#ifdef CONFIG_IOMMU
gart_iommu_init();
#endif
no_iommu_init();
return 0;
}
中被调用,同时在该文件中注册为初始化函数:
/* Must execute after PCI subsystem */
fs_initcall(pci_iommu_init);
2.1 dmar_table_init
解析DMAR table。逐一打印每个dmar项,
dmar_table_print_dmar_entry(entry_header);
类似如下的信息在dmesg中出现:
ACPI DMAR:Host address width 36
ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed90000
ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed91000
ACPI DMAR:DRHD (flags: 0x00000001)base: 0x00000000fed93000
ACPI DMAR:RMRR base: 0x00000000000ed000 end: 0x00000000000effff
ACPI DMAR:RMRR base: 0x000000007f600000 end: 0x000000007fffffff
switch (entry_header->type) {
case ACPI_DMAR_TYPE_HARDWARE_UNIT:
ret = dmar_parse_one_drhd(entry_header);
break;
case ACPI_DMAR_TYPE_RESERVED_MEMORY:
ret = dmar_parse_one_rmrr(entry_header);
break;
default:
printk(KERN_WARNING PREFIX
"Unknown DMAR structure type/n");
ret = 0; /* for forward compatibility */
break;
}
解析如下两个表项:
DRHD - DMA Engine Reporting Structure
RMRR - Reserved memory Region Reporting Structure
对于DRHD表项,通过register函数,将每个DMA的物理设备放到一个list中。对于每个RMRR,同样放到一个全局列表中。
2.2 iommu_init_mempool
创建几个常用结构的slab_cache:
struct iova
struct iommu_domain
struct device_domain_info
2.3 dmar_init_reserved_ranges
初始化保留的区域。下面两种range是需要保留的:
1.IOAPIC ranges shouldn't be accessed by DMA
2.Reserve all PCI MMIO to avoid peer-to-peer access
2.4 init_no_remapping_devices
Graphics driver workarounds to provide unity map
Digg This
Most GFX drivers don't call standard PCI DMA APIs to allocate DMA buffer,
Such drivers will be broken with IOMMU enabled. To workaround this issue,
we added two options.
Once graphics devices are converted over to use the DMA-API's this entire
patch can be removed...
a. intel_iommu=igfx_off. With this option, DMAR who has just gfx devices
under it will be ignored. This mostly affect intergated gfx devices.
If the DMAR is ignored, gfx device under it will get physical address
for DMA.
b. intel_iommu=gfx_workaround. With this option, we will setup 1:1 mapping
for whole memory for gfx devices, that is physical address equals to
virtual address.In this way, gfx will use physical address for DMA, this
is primarily for add-in card GFX device.
2.5 init_dmars
初始化dmar数据结构。
TBD:数据结构关系图
dma_ops = &intel_dma_ops;
static struct dma_mapping_ops intel_dma_ops = {
.alloc_coherent = intel_alloc_coherent,
.free_coherent = intel_free_coherent,
.map_single = intel_map_single,
.unmap_single = intel_unmap_single,
.map_sg = intel_map_sg,
.unmap_sg = intel_unmap_sg,
};
3. DMAR ACPI Table结构
The system BIOS is responsible
for detecting the remapping hardware functions in the platform and for
locating the memory-mapped remapping hardware registers in the host
system address space. The BIOS reports the remapping hardware units in a
platform to system software through the DMA Remapping Reporting (DMAR)
ACPI table described below.
3.1 DMA Remapping Reporting Structure
Field
Byte Length
Byte Offset
Description
Signature
4
0
“DMAR”. Signature for the DMA Remapping Description table.
Length
4
4
Length, in bytes, of the description table including the length of the associated DMAremapping structures.
Revision
1
8
1
Checksum
1
9
Entire table must sum to zero.
OEMID
6
10
OEM ID
OEM Table ID
8
16
For DMAR description table, the Table ID is the manufacturer model ID.
OEM Revision
4
24
OEM Revision of DMAR Table for OEM Table ID.
Creator ID
4
28
Vendor ID of utility that created the table.
Creator Revision
4
32
Revision of utility that created the table.
Host Address Width
1
36
This field indicates
the maximum DMA physical addressability supported by this platform.
The system address map reported by the BIOS indicates what portions of
this addresses are populated.
The Host Address
Width (HAW) of the platform is computed as (N+1), where N is the value
reported in this field. For example, for a platform supporting 40
bits of physical addressability, the value of 100111b is reported in
this field.
Flags
1
37
? Bit 0: INTR_REMAP -
If Clear, the platform does not support interrupt remapping. If Set,
the platform supports interrupt remapping.
? Bits 1-7: Reserved.
Reserved
10
38
Reserved (0).
Remapping Structures[]
-
48
A list of
structures. The list will contain one or more DMA Remapping Hardware
Unit Definition (DRHD) structures, and zero or more Reserved Memory
Region Reporting (RMRR) and Root Port ATS Capability Reporting (ATSR)
structures. These structures are described below.
3.2 Remapping Structure Types
每个Remapping Structure的开始部分包含type和length两个字段。其中,type表示DMA-remapping structure的类型,而length表示该structure的长度。下表定义了type的可能值:
Value
Description
0
DMA Remapping Hardware Unit Definition (DRHD) Structure
1
Reserved Memory Region Reporting (RMRR) Structure
2
RootPortATS Capability Reporting (ATSR) Structure
>2
Reserved for future
use. For forward compatibility, software skips structures it does not
comprehend by skipping the appropriate number of bytes indicated by
the Length field.
注:BIOS
implementations must report these remapping structure types in
numerical order. i.e., All remapping structures of type 0 (DRHD)
enumerated before remapping structures of type 1 (RMRR), and so forth.
3.3 DMA Remapping Hardware Unit Definition Structure
A DMA-remapping hardware unit
definition (DRHD) structure uniquely represents a remapping hardware
unit present in the platform. There must be at least one instance of
this structure for each PCI segment in the platform.
Field
Byte Length
Byte Offset
Description
Type
2
0
0 - DMA Remapping Hardware Unit Definition (DRHD) structure
Length
2
2
Varies (16 + size of Device Scope Structure)
Flags
1
4
Bit 0: INCLUDE_PCI_ALL
lIf
Set, this remapping hardware unit has under its scope all PCI
compatible devices in the specified Segment, except devices reported
under the scope of other remapping hardware units for the same
Segment. If a DRHD structure with INCLUDE_PCI_ALL flag Set is reported
for a Segment, it must be enumerated by BIOS after all other DRHD
structures for the same Segment. A DRHD structure with INCLUDE_PCI_ALL
flag Set may use the ‘Device Scope’ field to enumerate I/OxAPIC and
HPET devices under its scope.
lIf
Clear, this remapping hardware unit has under its scope only devices
in the specified Segment that are explicitly identified through the
‘Device Scope’ field.
Bits 1-7: Reserved.
Reserved
1
5
Reserved (0).
Segment Number
2
6
The PCI Segment associated with this unit.
Register Base Address
8
8
Base address of remapping hardware register-set for this unit.
Device Scope []
-
16
The Device Scope
structure contains one or more Device Scope Entries that identify
devices in the specified segment and under the scope of this remapping
hardware unit.
3.3.1 Device Scope Structure
The Device Scope Structure is
made up of one or more Device Scope Entries. Each Device Scope Entry may
be used to indicate a PCI endpoint device, a PCI sub-hierarchy, or
devices such as I/OxAPICs or HPET (High Precision Event Timer). In this
section, the generic term ‘PCI’ is used to describe conventional PCI,
PCI-X, and PCI-Express devices. Similarly, the term ‘PCI-PCI bridge’ is
used to refer to conventional PCI bridges, PCI-X bridges, PCI Express
root ports, or downstream ports of a PCI Express switch. A PCI
sub-hierarchy is defined as the collection of PCI controllers that are
downstream to a specific PCI-PCI bridge. To identify a PCI
sub-hierarchy, the Device Scope Entry needs to identify only the parent
PCI-PCI bridge of the sub-hierarchy.
Field
Byte Length
Byte Offset
Description
Type
1
0
The following values
are defined for this field. ? 0x01: PCI Endpoint Device - The device
identified by the ‘Path’ field is a PCI endpoint device. This type
must not be used in Device Scope of DRHD structures with
INCLUDE_PCI_ALL flag Set. ? 0x02: PCI Sub-hierarchy - The device
identified by the ‘Path’ field is a PCI-PCI bridge. In this case, the
specified bridge device and all its downstream devices are included in
the scope. This type must not be in Device Scope of DRHD structures
with INCLUDE_PCI_ALL flag Set. ? 0x03: IOAPIC - The device identified
by the ‘Path’ field is an I/O APIC (or I/O SAPIC) device, enumerated
through the ACPI MADT I/O APIC (or I/O SAPIC) structure. ? 0x04:
MSI_CAPABLE_HPET1 - The device identified by the ‘Path’ field is an
HPET device capable of generating MSI (Message Signaled interrupts).
HPET hardware is reported through ACPI HPET structure. Other values
for this field are reserved for future use.
Length
1
1
Length of this Entry in Bytes. (6 + X), where X is the size in bytes of the “Path” field.
Reserved
2
2
Reserved (0).
Enumeration ID
1
4
When the ‘Type’
field indicates ‘IOAPIC’, this field provides the I/O APICID as
provided in the I/O APIC (or I/O SAPIC) structure in the ACPI MADT
(Multiple APIC Descriptor Table). This field is treated reserved (0)
for all other ‘Type’fields.
Start Bus Number
1
5
This field describes the bus number (bus number of the first PCI Bus produced by the PCI Host Bridge) under which the device identified by this Device Scope resides.
Path
2 * N
6
Describes the hierarchical path from the Host Bridge to the device specified by the Device Scope Entry.
For example, a
device in a N-deep hierarchy is identified by N {PCI Device Number,
PCI Function Number} pairs, where N is a positive integer. Even
offsets contain the Device numbers, and odd offsets contain the
Function numbers.
The first {Device,
Function} pair resides on the bus identified by the ‘Start Bus Number’
field. Each subsequent pair resides on the bus directly behind the
bus of the device identified by the previous pair. The identity (Bus,
Device, Function) of the target device is obtained by recursively
walking down these N {Device, Function} pairs.
If the ‘Path’ field
length is 2 bytes (N=1), the Device Scope Entry identifies a
‘Root-Complex Integrated Device’. The requester-id of ‘Root-Complex
Integrated Devices’ are static and not impacted by system software bus
rebalancing actions.
If the ‘Path’ field
length is more than 2 bytes (N > 1), the Device Scope Entry
identifies a device behind one or more system software visible PCI-PCI
bridges. Bus rebalancing actions by system software modifying bus
assignments of the device’s parent bridge impacts the bus number
portion of device’s requester-id.
3.4 Reserved Memory Region Reporting Structure
BIOS may report each such
reserved memory region through the RMRR structures, along with the
devices that requires access to the specified reserved memory region.
Reserved memory ranges that are either not DMA targets, or memory ranges
that may be target of BIOS initiated DMA only during pre-boot phase
(such as from a boot disk drive) must not be included in the reserved
memory region reporting. The base address of each RMRR region must be
4KB aligned and the size must be an integer multiple of 4KB. BIOS must
report the RMRR reported memory addresses as reserved in the system
memory map returned through methods such as INT15, EFI GetMemoryMap etc.
The reserved memory region reporting structures are optional. If there
are no RMRR structures, the system software concludes that the platform
does not have any reserved memory ranges that are DMA targets.
The RMRR regions are expected to
be used only for USB and UMA Graphics legacy usages for reserved
memory. Platform designers must avoid or limit reserved memory regions
since these require system software to create holes in the DMA virtual
address range available to system software and its drivers.
Field
Byte Length
Byte Offset
Description
Type
2
0
1 - Reserved Memory Region Reporting Structure
Length
2
2
Varies (24 + size of Device Scope structure)
Reserved
2
4
Reserved.
Segment Number
2
6
PCI Segment Number associated with devices identified through the Device Scope field.
Reserved Memory Region Base Address
8
8
Base address of 4KB-aligned reserved memory region.
Reserved Memory Region Limit Address
8
16
Last address of the
reserved memory region. The reserved memory region size (Limit - Base +
1) must be an integer multiple of 4KB.
Device Scope[]
-
24
The Device Scope
structure contains one or more Device Scope entries that identify
devices requiring access to the specified reserved memory region. The
devices identified in this structure must be devices under the scope
of one of the remapping hardware units reported in DRHD.
3.5 Root Port ATS Capability Reporting Structure
This structure is applicable
only for platforms supporting Device-IOTLBs as reported through the
Extended Capability register. For each PCI Segment in the platform that
supports Device-IOTLBs, BIOS provides an ATSR structure. The ATSR
structures identifies PCI Express Root-Ports supporting Address
Translation Services (ATS) transactions. Software must enable ATS on
endpoint devices behind a Root Port only if the Root Port is reported as supporting ATS transactions.
Field
Byte Length
Byte Offset
Description
Type
2
0
2 - Root Port ATS Capability Reporting Structure
Length
2
2
Varies (8 + size of Device Scope Structure)
Flags
1
4
? Bit 0: ALL_PORTS:
If Set, indicates all PCI Express Root Ports in the specified PCI
Segment supports ATS transactions. If Clear, indicates ATS
transactions are supported only on Root Ports identified through the
Device Scope field.
? Bits 1-7: Reserved.
Reserved
1
5
Reserved (0).
Segment Number
2
6
The PCI Segment associated with this ATSR structure.
Device Scope []
-
8
If the ALL_PORTS
flag is Set, the Device Scope structure is omitted. If ALL_PORTS flag
is Clear, the Device Scope structure contains Device Scope Entries
that identifies Root Ports supporting ATS transactions. All Device
Scope Entries in this structure must have a Device Scope Entry Type of
02h.