Device Tree Usage - eLinux.org
https://elinux.org/Device_Tree_Usage#PCI_Address_Translation pcie 设备树介绍
高手作品
【原创】Linux PCI驱动框架分析(三) - LoyenWang - 博客园
【原创】Linux PCI驱动框架分析(二) - LoyenWang - 博客园
【原创】Linux PCI驱动框架分析(一) - LoyenWang - 博客园
2.1 Device Tree
- 设备树用于描述硬件的信息,包含节点各类属性,在dts文件中定义,最终会被编译成dtb文件加载到内存中;
- 内核会在启动过程中去解析dtb文件,解析成
device_node
描述的Device Tree
; - 根据
device_node
节点,创建platform_device
结构,并最终注册进系统,这个也就是PCIe Host设备的创建过程;
我们看看PCIe Host的设备树内容:
pcie: pcie@fd0e0000 {
compatible = "xlnx,nwl-pcie-2.11";
status = "disabled";
#address-cells = <3>;
#size-cells = <2>;
#interrupt-cells = <1>;
msi-controller;
device_type = "pci";
interrupt-parent = <&gic>;
interrupts = <0 118 4>,
<0 117 4>,
<0 116 4>,
<0 115 4>, /* MSI_1 [63...32] */
<0 114 4>; /* MSI_0 [31...0] */
interrupt-names = "misc", "dummy", "intx", "msi1", "msi0";
msi-parent = <&pcie>;
reg = <0x0 0xfd0e0000 0x0 0x1000>,
<0x0 0xfd480000 0x0 0x1000>,
<0x80 0x00000000 0x0 0x1000000>;
reg-names = "breg", "pcireg", "cfg";
ranges = <0x02000000 0x00000000 0xe0000000 0x00000000 0xe0000000 0x00000000 0x10000000 /* non-prefetchable memory */
0x43000000 0x00000006 0x00000000 0x00000006 0x00000000 0x00000002 0x00000000>;/* prefetchable memory */
bus-range = <0x00 0xff>;
interrupt-map-mask = <0x0 0x0 0x0 0x7>;
interrupt-map = <0x0 0x0 0x0 0x1 &pcie_intc 0x1>,
<0x0 0x0 0x0 0x2 &pcie_intc 0x2>,
<0x0 0x0 0x0 0x3 &pcie_intc 0x3>,
<0x0 0x0 0x0 0x4 &pcie_intc 0x4>;
pcie_intc: legacy-interrupt-controller {
interrupt-controller;
#address-cells = <0>;
#interrupt-cells = <1>;
};
};
关键字段描述如下:
compatible
:用于匹配PCIe Host驱动;msi-controller
:表示是一个MSI(Message Signaled Interrupt
)控制器节点,这里需要注意的是,有的SoC中断控制器使用的是GICv2版本,而GICv2并不支持MSI,所以会导致该功能的缺失;device-type
:必须是"pci"
;interrupts
:包含NWL PCIe控制器的中断号;interrupts-name
:msi1, msi0
用于MSI中断,intx
用于旧式中断,与interrupts
中的中断号对应;reg
:包含用于访问PCIe控制器操作的寄存器物理地址和大小;reg-name
:分别表示Bridge registers
,PCIe Controller registers
,Configuration space region
,与reg
中的值对应;ranges
:PCIe地址空间转换到CPU的地址空间中的范围;bus-range
:PCIe总线的起始范围;interrupt-map-mask
和interrupt-map
:标准PCI属性,用于定义PCI接口到中断号的映射;legacy-interrupt-controller
:旧式的中断控制器;
ranges
:PCIe地址空间转换到CPU的地址空间中的范围;
Advanced Topics
Advanced Sample Machine
Now that we've got the basics defined, let's add some hardware to the sample machine to discuss some of the more complicated use cases.
The advanced sample machine adds a PCI host bridge with control registers memory mapped to 0x10180000, and BARs programmed to start above the address 0x80000000.
Given what we already know about the device tree, we can start with the addition of the following node to describe the PCI host bridge.
pci@10180000 { compatible = "arm,versatile-pci-hostbridge", "pci"; reg = <0x10180000 0x1000>; interrupts = <8 0>; };
PCI Host Bridge
This section describes the Host/PCI bridge node.
Note, some basic knowledge of PCI is assumed in this section. This is NOT a tutorial about PCI, if you need some more in depth information, please read[1]. You can also refer to either ePAPR v1.1 or the PCI Bus Binding to Open Firmware. A complete working example for a Freescale MPC5200 can be found here.
PCI Bus numbering
Each PCI bus segment is uniquely numbered, and the bus numbering is exposed in the pci node by using the bus-range
property, which contains two cells. The first cell gives the bus number assigned to this node, and the second cell gives the maximum bus number of any of the subordinate PCI busses.
The sample machine has a single pci bus, so both cells are 0.
pci@0x10180000 { compatible = "arm,versatile-pci-hostbridge", "pci"; reg = <0x10180000 0x1000>; interrupts = <8 0>; bus-range = <0 0>; };
PCI Address Translation
Similar to the local bus described earlier, the PCI address space is completely separate from the CPU address space, so address translation is needed to get from a PCI address to a CPU address. As always, this is done using the range, #address-cells
, and #size-cells
properties.
pci@0x10180000 { compatible = "arm,versatile-pci-hostbridge", "pci"; reg = <0x10180000 0x1000>; interrupts = <8 0>; bus-range = <0 0>; #address-cells = <3> #size-cells = <2>; ranges = <0x42000000 0 0x80000000 0x80000000 0 0x20000000 0x02000000 0 0xa0000000 0xa0000000 0 0x10000000 0x01000000 0 0x00000000 0xb0000000 0 0x01000000>; };
As you can see, child addresses (PCI addresses) use 3 cells, and PCI ranges are encoded into 2 cells. The first question might be, why do we need three 32 bit cells to specify a PCI address. The three cells are labeled phys.hi, phys.mid and phys.low [2].
phys.hi cell: npt000ss bbbbbbbb dddddfff rrrrrrrr
phys.mid cell: hhhhhhhh hhhhhhhh hhhhhhhh hhhhhhhh
phys.low cell: llllllll llllllll llllllll llllllll
PCI addresses are 64 bits wide, and are encoded into phys.mid and phys.low. However, the really interesting things are in phys.high which is a bit field:
n
: relocatable region flag (doesn't play a role here)p
: prefetchable (cacheable) region flagt
: aliased address flag (doesn't play a role here)ss
: space code- 00: configuration space
- 01: I/O space
- 10: 32 bit memory space
- 11: 64 bit memory space
bbbbbbbb
: The PCI bus number. PCI may be structured hierarchically. So we may have PCI/PCI bridges which will define sub busses.ddddd
: The device number, typically associated with IDSEL signal connections.fff
: The function number. Used for multifunction PCI devices.rrrrrrrr
: Register number; used for configuration cycles.
For the purpose of PCI address translation, the important fields are p
and ss
. The value of p and ss in phys.hi determines which PCI address space is being accessed. So looking onto our ranges property, we have three regions:
- a 32 bit prefetchable memory region beginning on PCI address 0x80000000 of 512 MByte size which will be mapped onto address 0x80000000 on the host CPU.
- a 32 bit non-prefetchable memory region beginning on PCI address 0xa0000000 of 256 MByte size which will be mapped onto address 0xa0000000 on the host CPU.
- an I/O region beginning on PCI address 0x00000000 of 16 MByte size which will be mapped onto address 0xb0000000 on the host CPU.
To throw a wrench into the works, the presence of the phys.hi bitfield means that an operating system needs to know that the node represents a PCI bridge so that it can ignore the irrelevant fields for the purpose of translation. An OS will look for the string "pci" in the PCI bus nodes to determine whether it needs to mask of the extra fields.
PCI DMA Address Translation
The above ranges define how the CPU see the PCI memory, and helps the CPU to set up the right memory windows and write the right parameters into various PCI device registers. This is sometimes referred to as outbound memory.
A special case of address translation concerns how the PCI host hardware sees the core memory of the system. This happens when the PCI host controller will act as master and independently access the core memory of the system. As this is often a different view than that of the CPU (due to how the memory lines have been wired) this may need to be programmed into the PCI host controller on initialization. This is seen as a kind of DMA as the PCI bus independently performs direct memory access, and for this reason the mappings are named dma-ranges. This type of memory mapping is sometimes referred to as inbound memory and is not part of the PCI device tree specification.
In some cases, a ROM (BIOS) or similar will set up these registers on boot, but in other cases, the PCI controller is completely uninitialized and these translations need to be set up from the device tree. The PCI host driver will then typically parse the dma-ranges property and set up some registers in the host controller accordingly.
Expanding on the example above:
pci@0x10180000 { compatible = "arm,versatile-pci-hostbridge", "pci"; reg = <0x10180000 0x1000>; interrupts = <8 0>; bus-range = <0 0>; #address-cells = <3> #size-cells = <2>; ranges = <0x42000000 0 0x80000000 0x80000000 0 0x20000000 0x02000000 0 0xa0000000 0xa0000000 0 0x10000000 0x01000000 0 0x00000000 0xb0000000 0 0x01000000 dma-ranges = <0x02000000 0 0x00000000 0x80000000 0 0x20000000>; };
This dma-ranges entry indicates that from the PCI host controller's point of view, the 512 MB at PCI address 0x00000000 will appear in the main core memory at address 0x80000000. As you can see we just set the ss address type to 0x02 indicating this is some 32bit memory.
Advanced Interrupt Mapping
Now we come to the most interesting part, PCI interrupt mapping. A PCI device can trigger interrupts using the wires #INTA, #INTB, #INTC and #INTD. The # hash sign in front of the interrupt names means it is active low, this is a common convention, and PCI interrupt lines are always active low. A single-function device is obligated to use #INTA for interrupts. A multi-function device must use #INTA if it uses a single interrupt pin, #INTA and #INTB if it uses two interrupt pins, etc. Due to these rules, #INTA is normally used by more functions than #INTB, #INTC, and #INTD. To distribute the load across the four IRQ lines backing #INTA through #INTD, each PCI slot or device is typically wired to different inputs on the interrupt controller in rotating manner so as to avoid having all #INTA clients connected to the same incoming interrupt line. This procedure is referred to as swizzling the interrupts. So, the device tree needs a way of mapping each PCI interrupt signal to the inputs of the interrupt controller. The #interrupt-cells
, interrupt-map
and interrupt-map-mask
properties are used to describe the interrupt mapping.
Actually, the interrupt mapping described here isn't limited to PCI busses, any node can specify complex interrupt maps, but the PCI case is by far the most common.
pci@0x10180000 { compatible = "arm,versatile-pci-hostbridge", "pci"; reg = <0x10180000 0x1000>; interrupts = <8 0>; bus-range = <0 0>; #address-cells = <3> #size-cells = <2>; ranges = <0x42000000 0 0x80000000 0x80000000 0 0x20000000 0x02000000 0 0xa0000000 0xa0000000 0 0x10000000 0x01000000 0 0x00000000 0xb0000000 0 0x01000000>; #interrupt-cells = <1>; interrupt-map-mask = <0xf800 0 0 7>; interrupt-map = <0xc000 0 0 1 &intc 9 3 // 1st slot 0xc000 0 0 2 &intc 10 3 0xc000 0 0 3 &intc 11 3 0xc000 0 0 4 &intc 12 3 0xc800 0 0 1 &intc 10 3 // 2nd slot 0xc800 0 0 2 &intc 11 3 0xc800 0 0 3 &intc 12 3 0xc800 0 0 4 &intc 9 3>; };
First you'll notice that PCI interrupt numbers use only one cell, unlike the system interrupt controller which uses 2 cells; one for the irq number, and one for flags. PCI only needs one cell for interrupts because PCI interrupts are specified to always be level-low sensitive.
In our example board, we have 2 PCI slots with 4 interrupt lines, respectively, so we have to map 8 interrupt lines to the interrupt controller. This is done using the interrupt-map property. The exact procedure for interrupt mapping is described in[3] .
Because the interrupt number (#INTA etc.) is not sufficient to distinguish between several PCI devices on a single PCI bus, we also have to denote which PCI device triggered the interrupt line. Fortunately, every PCI device has a unique device number that we can use for. To distinguish between interrupts of several PCI devices we need a tuple consisting of the PCI device number and the PCI interrupt number. Speaking more generally, we construct a unit interrupt specifier which has four cells:
- three #address-cells consisting of phys.hi, phys.mid, phys.low, and
- one #interrupt-cell (#INTA, #INTB, #INTC, #INTD).
Because we only need the device number part of the PCI address, the interrupt-map-mask property comes into play. interrupt-map-mask is also a 4-tuple like the unit interrupt specifier. The 1's in the mask denote which part of the unit interrupt specifier should be taken into account. In our example we can see that only the device number part of phys.hi is required and we need 3 bits to distinguish between the four interrupt lines (Counting PCI interrupt lines start at 1, not at 0!).
Now we can construct the interrupt-map property. This property is a table and each entry in this table consists of a child (PCI bus) unit interrupt specifier, a parent handle (the interrupt controller which is responsible for serving the interrupts) and a parent unit interrupt specifier. So in the first line we can read that the PCI interrupt #INTA is mapped onto IRQ 9, level low sensitive of our interrupt controller. [4].
The only missing part for now are the weird numbers int the PCI bus unit interrupt specifier. The important part of the unit interrupt specifier is the device number from the phys.hi bit field. Device number is board specific, and it depends on how each PCI host controller activates the IDSEL pin on each device. In this example, PCI slot 1 is assigned device id 24 (0x18), and PCI slot 2 is assigned device id 25 (0x19). The value of phys.hi for each slot is determined by shifting the device number up by 11 bits into the ddddd section of the bitfield as follows:
- phys.hi for slot 1 is 0xC000, and
- phys.hi for slot 2 is 0xC800.
Putting it all together the interrupt-map property show:
- #INTA of slot 1 is IRQ9, level low sensitive on the primary interrupt controller
- #INTB of slot 1 is IRQ10, level low sensitive on the primary interrupt controller
- #INTC of slot 1 is IRQ11, level low sensitive on the primary interrupt controller
- #INTD of slot 1 is IRQ12, level low sensitive on the primary interrupt controller
and
- #INTA of slot 2 is IRQ10, level low sensitive on the primary interrupt controller
- #INTB of slot 2 is IRQ11, level low sensitive on the primary interrupt controller
- #INTC of slot 2 is IRQ12, level low sensitive on the primary interrupt controller
- #INTD of slot 2 is IRQ9, level low sensitive on the primary interrupt controller
The interrupts = <8 0>;
property describes the interrupts the host/PCI-bridge controller itself may trigger. Don't mix up these interrupts with interrupts PCI devices might trigger (using INTA, INTB, ...).
One final thing to note. Just like with the interrupt-parent property, the presence of an interrupt-map property on a node will change the default interrupt controller for all child and grandchild nodes. In this PCI example, that means that the PCI host bridge becomes the default interrupt controller. If a device attached via the PCI bus has a direct connection to another interrupt controller, then it also needs to specify its own interrupt-parent property.