http://www.mjmwired.net/kernel/Documentation/MSI-HOWTO.txt 1 The MSI Driver Guide HOWTO 2 Tom L Nguyen tom.l.nguyen[AT]intel[DOT]com 3 10/03/2003 4 Revised Feb 12, 2004 by Martine Silbermann 5 email: Martine.Silbermann[AT]hp[DOT]com 6 Revised Jun 25, 2004 by Tom L Nguyen 7 8 1. About this guide 9 10 This guide describes the basics of Message Signaled Interrupts (MSI), 11 the advantages of using MSI over traditional interrupt mechanisms, 12 and how to enable your driver to use MSI or MSI-X. Also included is 13 a Frequently Asked Questions (FAQ) section. 14 15 1.1 Terminology 16 17 PCI devices can be single-function or multi-function. In either case, 18 when this text talks about enabling or disabling MSI on a "device 19 function," it is referring to one specific PCI device and function and 20 not to all functions on a PCI device (unless the PCI device has only 21 one function). 22 23 2. Copyright 2003 Intel Corporation 24 25 3. What is MSI/MSI-X? 26 27 Message Signaled Interrupt (MSI), as described in the PCI Local Bus 28 Specification Revision 2.3 or later, is an optional feature, and a 29 required feature for PCI Express devices. MSI enables a device function 30 to request service by sending an Inbound Memory Write on its PCI bus to 31 the FSB as a Message Signal Interrupt transaction. Because MSI is 32 generated in the form of a Memory Write, all transaction conditions, 33 such as a Retry, Master-Abort, Target-Abort or normal completion, are 34 supported. 35 36 A PCI device that supports MSI must also support pin IRQ assertion 37 interrupt mechanism to provide backward compatibility for systems that 38 do not support MSI. In systems which support MSI, the bus driver is 39 responsible for initializing the message address and message data of 40 the device function's MSI/MSI-X capability structure during device 41 initial configuration. 42 43 An MSI capable device function indicates MSI support by implementing 44 the MSI/MSI-X capability structure in its PCI capability list. The 45 device function may implement both the MSI capability structure and 46 the MSI-X capability structure; however, the bus driver should not 47 enable both. 48 49 The MSI capability structure contains Message Control register, 50 Message Address register and Message Data register. These registers 51 provide the bus driver control over MSI. The Message Control register 52 indicates the MSI capability supported by the device. The Message 53 Address register specifies the target address and the Message Data 54 register specifies the characteristics of the message. To request 55 service, the device function writes the content of the Message Data 56 register to the target address. The device and its software driver 57 are prohibited from writing to these registers. 58 59 The MSI-X capability structure is an optional extension to MSI. It 60 uses an independent and separate capability structure. There are 61 some key advantages to implementing the MSI-X capability structure 62 over the MSI capability structure as described below. 63 64 - Support a larger maximum number of vectors per function. 65 66 - Provide the ability for system software to configure 67 each vector with an independent message address and message 68 data, specified by a table that resides in Memory Space. 69 70 - MSI and MSI-X both support per-vector masking. Per-vector 71 masking is an optional extension of MSI but a required 72 feature for MSI-X. Per-vector masking provides the kernel the 73 ability to mask/unmask a single MSI while running its 74 interrupt service routine. If per-vector masking is 75 not supported, then the device driver should provide the 76 hardware/software synchronization to ensure that the device 77 generates MSI when the driver wants it to do so. 78 79 4. Why use MSI? 80 81 As a benefit to the simplification of board design, MSI allows board 82 designers to remove out-of-band interrupt routing. MSI is another 83 step towards a legacy-free environment. 84 85 Due to increasing pressure on chipset and processor packages to 86 reduce pin count, the need for interrupt pins is expected to 87 diminish over time. Devices, due to pin constraints, may implement 88 messages to increase performance. 89 90 PCI Express endpoints uses INTx emulation (in-band messages) instead 91 of IRQ pin assertion. Using INTx emulation requires interrupt 92 sharing among devices connected to the same node (PCI bridge) while 93 MSI is unique (non-shared) and does not require BIOS configuration 94 support. As a result, the PCI Express technology requires MSI 95 support for better interrupt performance. 96 97 Using MSI enables the device functions to support two or more 98 vectors, which can be configured to target different CPUs to 99 increase scalability. 100 101 5. Configuring a driver to use MSI/MSI-X 102 103 By default, the kernel will not enable MSI/MSI-X on all devices that 104 support this capability. The CONFIG_PCI_MSI kernel option 105 must be selected to enable MSI/MSI-X support. 106 107 5.1 Including MSI/MSI-X support into the kernel 108 109 To allow MSI/MSI-X capable device drivers to selectively enable 110 MSI/MSI-X (using pci_enable_msi()/pci_enable_msix() as described 111 below), the VECTOR based scheme needs to be enabled by setting 112 CONFIG_PCI_MSI during kernel config. 113 114 Since the target of the inbound message is the local APIC, providing 115 CONFIG_X86_LOCAL_APIC must be enabled as well as CONFIG_PCI_MSI. 116 117 5.2 Configuring for MSI support 118 119 Due to the non-contiguous fashion in vector assignment of the 120 existing Linux kernel, this version does not support multiple 121 messages regardless of a device function is capable of supporting 122 more than one vector. To enable MSI on a device function's MSI 123 capability structure requires a device driver to call the function 124 pci_enable_msi() explicitly. 125 126 5.2.1 API pci_enable_msi 127 128 int pci_enable_msi(struct pci_dev *dev) 129 130 With this new API, a device driver that wants to have MSI 131 enabled on its device function must call this API to enable MSI. 132 A successful call will initialize the MSI capability structure 133 with ONE vector, regardless of whether a device function is 134 capable of supporting multiple messages. This vector replaces the 135 pre-assigned dev->irq with a new MSI vector. To avoid a conflict 136 of the new assigned vector with existing pre-assigned vector requires 137 a device driver to call this API before calling request_irq(). 138 139 5.2.2 API pci_disable_msi 140 141 void pci_disable_msi(struct pci_dev *dev) 142 143 This API should always be used to undo the effect of pci_enable_msi() 144 when a device driver is unloading. This API restores dev->irq with 145 the pre-assigned IOAPIC vector and switches a device's interrupt 146 mode to PCI pin-irq assertion/INTx emulation mode. 147 148 Note that a device driver should always call free_irq() on the MSI vector 149 that it has done request_irq() on before calling this API. Failure to do 150 so results in a BUG_ON() and a device will be left with MSI enabled and 151 leaks its vector. 152 153 5.2.3 MSI mode vs. legacy mode diagram 154 155 The below diagram shows the events which switch the interrupt 156 mode on the MSI-capable device function between MSI mode and 157 PIN-IRQ assertion mode. 158 159 ------------ pci_enable_msi ------------------------ 160 | | <=============== | | 161 | MSI MODE | | PIN-IRQ ASSERTION MODE | 162 | | ===============> | | 163 ------------ pci_disable_msi ------------------------ 164 165 166 Figure 1. MSI Mode vs. Legacy Mode 167 168 In Figure 1, a device operates by default in legacy mode. Legacy 169 in this context means PCI pin-irq assertion or PCI-Express INTx 170 emulation. A successful MSI request (using pci_enable_msi()) switches 171 a device's interrupt mode to MSI mode. A pre-assigned IOAPIC vector 172 stored in dev->irq will be saved by the PCI subsystem and a new 173 assigned MSI vector will replace dev->irq. 174 175 To return back to its default mode, a device driver should always call 176 pci_disable_msi() to undo the effect of pci_enable_msi(). Note that a 177 device driver should always call free_irq() on the MSI vector it has 178 done request_irq() on before calling pci_disable_msi(). Failure to do 179 so results in a BUG_ON() and a device will be left with MSI enabled and 180 leaks its vector. Otherwise, the PCI subsystem restores a device's 181 dev->irq with a pre-assigned IOAPIC vector and marks the released 182 MSI vector as unused. 183 184 Once being marked as unused, there is no guarantee that the PCI 185 subsystem will reserve this MSI vector for a device. Depending on 186 the availability of current PCI vector resources and the number of 187 MSI/MSI-X requests from other drivers, this MSI may be re-assigned. 188 189 For the case where the PCI subsystem re-assigns this MSI vector to 190 another driver, a request to switch back to MSI mode may result 191 in being assigned a different MSI vector or a failure if no more 192 vectors are available. 193 194 5.3 Configuring for MSI-X support 195 196 Due to the ability of the system software to configure each vector of 197 the MSI-X capability structure with an independent message address 198 and message data, the non-contiguous fashion in vector assignment of 199 the existing Linux kernel has no impact on supporting multiple 200 messages on an MSI-X capable device functions. To enable MSI-X on 201 a device function's MSI-X capability structure requires its device 202 driver to call the function pci_enable_msix() explicitly. 203 204 The function pci_enable_msix(), once invoked, enables either 205 all or nothing, depending on the current availability of PCI vector 206 resources. If the PCI vector resources are available for the number 207 of vectors requested by a device driver, this function will configure 208 the MSI-X table of the MSI-X capability structure of a device with 209 requested messages. To emphasize this reason, for example, a device 210 may be capable for supporting the maximum of 32 vectors while its 211 software driver usually may request 4 vectors. It is recommended 212 that the device driver should call this function once during the 213 initialization phase of the device driver. 214 215 Unlike the function pci_enable_msi(), the function pci_enable_msix() 216 does not replace the pre-assigned IOAPIC dev->irq with a new MSI 217 vector because the PCI subsystem writes the 1:1 vector-to-entry mapping 218 into the field vector of each element contained in a second argument. 219 Note that the pre-assigned IOAPIC dev->irq is valid only if the device 220 operates in PIN-IRQ assertion mode. In MSI-X mode, any attempt at 221 using dev->irq by the device driver to request for interrupt service 222 may result in unpredictable behavior. 223 224 For each MSI-X vector granted, a device driver is responsible for calling 225 other functions like request_irq(), enable_irq(), etc. to enable 226 this vector with its corresponding interrupt service handler. It is 227 a device driver's choice to assign all vectors with the same 228 interrupt service handler or each vector with a unique interrupt 229 service handler. 230 231 5.3.1 Handling MMIO address space of MSI-X Table 232 233 The PCI 3.0 specification has implementation notes that MMIO address 234 space for a device's MSI-X structure should be isolated so that the 235 software system can set different pages for controlling accesses to the 236 MSI-X structure. The implementation of MSI support requires the PCI 237 subsystem, not a device driver, to maintain full control of the MSI-X 238 table/MSI-X PBA (Pending Bit Array) and MMIO address space of the MSI-X 239 table/MSI-X PBA. A device driver is prohibited from requesting the MMIO 240 address space of the MSI-X table/MSI-X PBA. Otherwise, the PCI subsystem 241 will fail enabling MSI-X on its hardware device when it calls the function 242 pci_enable_msix(). 243 244 5.3.2 API pci_enable_msix 245 246 int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec) 247 248 This API enables a device driver to request the PCI subsystem 249 to enable MSI-X messages on its hardware device. Depending on 250 the availability of PCI vectors resources, the PCI subsystem enables 251 either all or none of the requested vectors. 252 253 Argument 'dev' points to the device (pci_dev) structure. 254 255 Argument 'entries' is a pointer to an array of msix_entry structs. 256 The number of entries is indicated in argument 'nvec'. 257 struct msix_entry is defined in /driver/pci/msi.h: 258 259 struct msix_entry { 260 u16 vector; /* kernel uses to write alloc vector */ 261 u16 entry; /* driver uses to specify entry */ 262 }; 263 264 A device driver is responsible for initializing the field 'entry' of 265 each element with a unique entry supported by MSI-X table. Otherwise, 266 -EINVAL will be returned as a result. A successful return of zero 267 indicates the PCI subsystem completed initializing each of the requested 268 entries of the MSI-X table with message address and message data. 269 Last but not least, the PCI subsystem will write the 1:1 270 vector-to-entry mapping into the field 'vector' of each element. A 271 device driver is responsible for keeping track of allocated MSI-X 272 vectors in its internal data structure. 273 274 A return of zero indicates that the number of MSI-X vectors was 275 successfully allocated. A return of greater than zero indicates 276 MSI-X vector shortage. Or a return of less than zero indicates 277 a failure. This failure may be a result of duplicate entries 278 specified in second argument, or a result of no available vector, 279 or a result of failing to initialize MSI-X table entries. 280 281 5.3.3 API pci_disable_msix 282 283 void pci_disable_msix(struct pci_dev *dev) 284 285 This API should always be used to undo the effect of pci_enable_msix() 286 when a device driver is unloading. Note that a device driver should 287 always call free_irq() on all MSI-X vectors it has done request_irq() 288 on before calling this API. Failure to do so results in a BUG_ON() and 289 a device will be left with MSI-X enabled and leaks its vectors. 290 291 5.3.4 MSI-X mode vs. legacy mode diagram 292 293 The below diagram shows the events which switch the interrupt 294 mode on the MSI-X capable device function between MSI-X mode and 295 PIN-IRQ assertion mode (legacy). 296 297 ------------ pci_enable_msix(,,n) ------------------------ 298 | | <=============== | | 299 | MSI-X MODE | | PIN-IRQ ASSERTION MODE | 300 | | ===============> | | 301 ------------ pci_disable_msix ------------------------ 302 303 Figure 2. MSI-X Mode vs. Legacy Mode 304 305 In Figure 2, a device operates by default in legacy mode. A 306 successful MSI-X request (using pci_enable_msix()) switches a 307 device's interrupt mode to MSI-X mode. A pre-assigned IOAPIC vector 308 stored in dev->irq will be saved by the PCI subsystem; however, 309 unlike MSI mode, the PCI subsystem will not replace dev->irq with 310 assigned MSI-X vector because the PCI subsystem already writes the 1:1 311 vector-to-entry mapping into the field 'vector' of each element 312 specified in second argument. 313 314 To return back to its default mode, a device driver should always call 315 pci_disable_msix() to undo the effect of pci_enable_msix(). Note that 316 a device driver should always call free_irq() on all MSI-X vectors it 317 has done request_irq() on before calling pci_disable_msix(). Failure 318 to do so results in a BUG_ON() and a device will be left with MSI-X 319 enabled and leaks its vectors. Otherwise, the PCI subsystem switches a 320 device function's interrupt mode from MSI-X mode to legacy mode and 321 marks all allocated MSI-X vectors as unused. 322 323 Once being marked as unused, there is no guarantee that the PCI 324 subsystem will reserve these MSI-X vectors for a device. Depending on 325 the availability of current PCI vector resources and the number of 326 MSI/MSI-X requests from other drivers, these MSI-X vectors may be 327 re-assigned. 328 329 For the case where the PCI subsystem re-assigned these MSI-X vectors 330 to other drivers, a request to switch back to MSI-X mode may result 331 being assigned with another set of MSI-X vectors or a failure if no 332 more vectors are available. 333 334 5.4 Handling function implementing both MSI and MSI-X capabilities 335 336 For the case where a function implements both MSI and MSI-X 337 capabilities, the PCI subsystem enables a device to run either in MSI 338 mode or MSI-X mode but not both. A device driver determines whether it 339 wants MSI or MSI-X enabled on its hardware device. Once a device 340 driver requests for MSI, for example, it is prohibited from requesting 341 MSI-X; in other words, a device driver is not permitted to ping-pong 342 between MSI mod MSI-X mode during a run-time. 343 344 5.5 Hardware requirements for MSI/MSI-X support 345 346 MSI/MSI-X support requires support from both system hardware and 347 individual hardware device functions. 348 349 5.5.1 Required x86 hardware support 350 351 Since the target of MSI address is the local APIC CPU, enabling 352 MSI/MSI-X support in the Linux kernel is dependent on whether existing 353 system hardware supports local APIC. Users should verify that their 354 system supports local APIC operation by testing that it runs when 355 CONFIG_X86_LOCAL_APIC=y. 356 357 In SMP environment, CONFIG_X86_LOCAL_APIC is automatically set; 358 however, in UP environment, users must manually set 359 CONFIG_X86_LOCAL_APIC. Once CONFIG_X86_LOCAL_APIC=y, setting 360 CONFIG_PCI_MSI enables the VECTOR based scheme and the option for 361 MSI-capable device drivers to selectively enable MSI/MSI-X. 362 363 Note that CONFIG_X86_IO_APIC setting is irrelevant because MSI/MSI-X 364 vector is allocated new during runtime and MSI/MSI-X support does not 365 depend on BIOS support. This key independency enables MSI/MSI-X 366 support on future IOxAPIC free platforms. 367 368 5.5.2 Device hardware support 369 370 The hardware device function supports MSI by indicating the 371 MSI/MSI-X capability structure on its PCI capability list. By 372 default, this capability structure will not be initialized by 373 the kernel to enable MSI during the system boot. In other words, 374 the device function is running on its default pin assertion mode. 375 Note that in many cases the hardware supporting MSI have bugs, 376 which may result in system hangs. The software driver of specific 377 MSI-capable hardware is responsible for deciding whether to call 378 pci_enable_msi or not. A return of zero indicates the kernel 379 successfully initialized the MSI/MSI-X capability structure of the 380 device function. The device function is now running on MSI/MSI-X mode. 381 382 5.6 How to tell whether MSI/MSI-X is enabled on device function 383 384 At the driver level, a return of zero from the function call of 385 pci_enable_msi()/pci_enable_msix() indicates to a device driver that 386 its device function is initialized successfully and ready to run in 387 MSI/MSI-X mode. 388 389 At the user level, users can use the command 'cat /proc/interrupts' 390 to display the vectors allocated for devices and their interrupt 391 MSI/MSI-X modes ("PCI-MSI"/"PCI-MSI-X"). Below shows MSI mode is 392 enabled on a SCSI Adaptec 39320D Ultra320 controller. 393 394 CPU0 CPU1 395 0: 324639 0 IO-APIC-edge timer 396 1: 1186 0 IO-APIC-edge i8042 397 2: 0 0 XT-PIC cascade 398 12: 2797 0 IO-APIC-edge i8042 399 14: 6543 0 IO-APIC-edge ide0 400 15: 1 0 IO-APIC-edge ide1 401 169: 0 0 IO-APIC-level uhci-hcd 402 185: 0 0 IO-APIC-level uhci-hcd 403 193: 138 10 PCI-MSI aic79xx 404 201: 30 0 PCI-MSI aic79xx 405 225: 30 0 IO-APIC-level aic7xxx 406 233: 30 0 IO-APIC-level aic7xxx 407 NMI: 0 0 408 LOC: 324553 325068 409 ERR: 0 410 MIS: 0 411 412 6. MSI quirks 413 414 Several PCI chipsets or devices are known to not support MSI. 415 The PCI stack provides 3 possible levels of MSI disabling: 416 * on a single device 417 * on all devices behind a specific bridge 418 * globally 419 420 6.1. Disabling MSI on a single device 421 422 Under some circumstances it might be required to disable MSI on a 423 single device. This may be achieved by either not calling pci_enable_msi() 424 or all, or setting the pci_dev->no_msi flag before (most of the time 425 in a quirk). 426 427 6.2. Disabling MSI below a bridge 428 429 The vast majority of MSI quirks are required by PCI bridges not 430 being able to route MSI between busses. In this case, MSI have to be 431 disabled on all devices behind this bridge. It is achieves by setting 432 the PCI_BUS_FLAGS_NO_MSI flag in the pci_bus->bus_flags of the bridge 433 subordinate bus. There is no need to set the same flag on bridges that 434 are below the broken bridge. When pci_enable_msi() is called to enable 435 MSI on a device, pci_msi_supported() takes care of checking the NO_MSI 436 flag in all parent busses of the device. 437 438 Some bridges actually support dynamic MSI support enabling/disabling 439 by changing some bits in their PCI configuration space (especially 440 the Hypertransport chipsets such as the nVidia nForce and Serverworks 441 HT2000). It may then be required to update the NO_MSI flag on the 442 corresponding devices in the sysfs hierarchy. To enable MSI support 443 on device "0000:00:0e", do: 444 445 echo 1 > /sys/bus/pci/devices/0000:00:0e/msi_bus 446 447 To disable MSI support, echo 0 instead of 1. Note that it should be 448 used with caution since changing this value might break interrupts. 449 450 6.3. Disabling MSI globally 451 452 Some extreme cases may require to disable MSI globally on the system. 453 For now, the only known case is a Serverworks PCI-X chipsets (MSI are 454 not supported on several busses that are not all connected to the 455 chipset in the Linux PCI hierarchy). In the vast majority of other 456 cases, disabling only behind a specific bridge is enough. 457 458 For debugging purpose, the user may also pass pci=nomsi on the kernel 459 command-line to explicitly disable MSI globally. But, once the appro- 460 priate quirks are added to the kernel, this option should not be 461 required anymore. 462 463 6.4. Finding why MSI cannot be enabled on a device 464 465 Assuming that MSI are not enabled on a device, you should look at 466 dmesg to find messages that quirks may output when disabling MSI 467 on some devices, some bridges or even globally. 468 Then, lspci -t gives the list of bridges above a device. Reading 469 /sys/bus/pci/devices/0000:00:0e/msi_bus will tell you whether MSI 470 are enabled (1) or disabled (0). In 0 is found in a single bridge 471 msi_bus file above the device, MSI cannot be enabled. 472 473 7. FAQ 474 475 Q1. Are there any limitations on using the MSI? 476 477 A1. If the PCI device supports MSI and conforms to the 478 specification and the platform supports the APIC local bus, 479 then using MSI should work. 480 481 Q2. Will it work on all the Pentium processors (P3, P4, Xeon, 482 AMD processors)? In P3 IPI's are transmitted on the APIC local 483 bus and in P4 and Xeon they are transmitted on the system 484 bus. Are there any implications with this? 485 486 A2. MSI support enables a PCI device sending an inbound 487 memory write (0xfeexxxxx as target address) on its PCI bus 488 directly to the FSB. Since the message address has a 489 redirection hint bit cleared, it should work. 490 491 Q3. The target address 0xfeexxxxx will be translated by the 492 Host Bridge into an interrupt message. Are there any 493 limitations on the chipsets such as Intel 8xx, Intel e7xxx, 494 or VIA? 495 496 A3. If these chipsets support an inbound memory write with 497 target address set as 0xfeexxxxx, as conformed to PCI 498 specification 2.3 or latest, then it should work. 499 500 Q4. From the driver point of view, if the MSI is lost because 501 of errors occurring during inbound memory write, then it may 502 wait forever. Is there a mechanism for it to recover? 503 504 A4. Since the target of the transaction is an inbound memory 505 write, all transaction termination conditions (Retry, 506 Master-Abort, Target-Abort, or normal completion) are 507 supported. A device sending an MSI must abide by all the PCI 508 rules and conditions regarding that inbound memory write. So, 509 if a retry is signaled it must retry, etc... We believe that 510 the recommendation for Abort is also a retry (refer to PCI 511 specification 2.3 or latest).
The MSI Driver Guide HOWTO
最新推荐文章于 2021-11-20 15:46:36 发布