SUN xVM direct IO one pager

Template Version: @(#)onepager.txt 1.35 07/11/07 SMI
Copyright 2008 Sun Microsystems

1. Introduction
1.1. Project/Component Working Name:
Dixieland: Direct IO for xVM Hypervisor and Logical Domains

1.2. Name of Document Author/Supplier:
David Edmondson (dme@sun.com)

1.3. Date of This Document:
2008-10-15

1.3.1. Date this project was conceived:
2008

1.4. Name of Major Document Customer(s)/Consumer(s):
1.4.1. The PAC or CPT you expect to review your project:
Solaris PAC
1.4.2. The ARC(s) you expect to review your project:
PSARC, FWARC
1.4.3. The Director/VP who is "Sponsoring" this project:
Greg Lavender (greg.lavender@sun.com)
Mike Sanfratello (mike.sanfratello@sun.com)
Jerri-Ann Meyer (Jerriann.Meyer@Sun.COM)
1.4.4. The name of your business unit:
Software

1.5. Email Aliases:
1.5.1. Responsible Manager:
Rich Zatorski (razl@sun.com)
1.5.2. Responsible Engineer:
David Edmondson (dme@sun.com)
1.5.3. Marketing Manager:
Joost Pronk Van Hoogeveen (Joost.Pronk@sun.com)
1.5.4. Interest List:
pcie-iov-arch-interest@sun.com

2. Project Summary
2.1. Project Description:

Direct IO will allow xVM Hypervisor and Logical Domains guest
domains direct access to PCI devices. This is used to improve
the throughput and latency of device access for guest domains
so configured.

2.2. Risks and Assumptions:

- Networking performance will show the most improvement from
Direct IO, and so will be a focus of the development and
testing.

- The performance gain of Direct IO may not be sufficient to
justify the increased complexity of configuration.

- Serviceability may suffer if fault reporting is more
confusing than in a non-direct IO environment.

- All significant x64 server platforms entering the market
will have either Intel VT-d or AMD IOMMU capabilities.

- Updating the xVM Hypervisor implementation to Xen 3.3 is a
pre-requisite for the x64 component of this project, and any
delay in that work will impact this project.

- Support for MSI/MSI-X interrupts under xVM Hypervisor is a
pre-requisite for the x64 component of this project, and any
delay in that work will impact this project.

3. Business Summary

3.1. Problem Area:

Current virtualization solutions allow physical I/O devices to
be assigned to guest domains at the PCI root port level. That
is, a PCI root port and all its child PCI devices are assigned
to a single guest domain. Most (all?) x86 systems have just
one root port, while most sun4v SPARC systems have one or
two. In order to provide I/O services to additional guest
domains, we must rely upon virtual I/O devices.

A virtual I/O device provides I/O services to a guest domain
by establishing a communications channel to a domain with a
physical device via the hypervisor. There are various
techniques for doing this, but all involve CPU intervention
for DMA transfers. They therefore tend to limit the bandwidth
of high-performance I/O devices and put additional load on the
CPUs.

Direct I/O is an attempt to address this issue by assigning
I/O devices to guest domains at a PCI device or function
level. That is, it allows one domain to own the root port, and
others to own the individual devices or functions under that
root port. We refer to a guest domain which owns one of these
devices as an I/O domain. The I/O and memory space of these
devices is mapped directly into the I/O domain. The I/O domain
can then attach the normal leaf device driver and can directly
access its control registers and DMA engines. Access to the
PCI config space registers of the device is still redirected
to a communications channel and handled either by the
hypervisor or the domain which owns the root port.

This may add some complexity in configuration and fault
diagnosis, but should allow I/O performance very close to bare
systems running a single domain.

3.2. Market/Requester:
// Who is the customer or client that needs the project?
// Include names or description of companies or groups
// outside Sun that want the project.
// NOTE: If this is for an Open Exposure project, DO NOT
// include any Sun Proprietary info, Customer names, etc!

3.3. Business Justification:
// Why is it important or valuable to do this project?
// Include monetary estimate and precision.
// NOTE: If this is for an Open Exposure project, DO NOT
// include any Sun Proprietary info - it is OK to leave this
// section blank.

3.4. Competitive Analysis:
// Who are the current and anticipated players in this market?
// How/why will we succeed when competing with them?
// NOTE: If this is for an Open Exposure project, DO NOT
// include any Sun Proprietary info - it is OK to leave this
// section blank.

3.5. Opportunity Window/Exposure:
// Time-to-market window, if any, and precision.
// NOTE: If this is for an Open Exposure project, DO NOT
// include any Sun Proprietary info - it is OK to leave this
// section blank.

3.6. How will you know when you are done?:

- A defined set of PCI devices can be marked as reserved in
the root domain and then made available to a guest domain,
wherein the guest domain can attach/load driver(s) and
perform IO directly using the device.

- Network performance of a guest domain using direct IO with a
10G NIC will be within 5% of that of the root domain for a
typical suite of benchmarks.

- Disk performance of a guest domain using direct IO will be
within 5% of that of the root domain for a typical suite of
benchmarks.

4. Technical Description:
4.1. Details:

Solaris will be updated such that, when owning the root port
of a PCI bus ("the root domain"), it is able to mark a PCI
device or function as reserved. Any device marked as reserved
will not have a traditional Solaris device driver bound to it,
and is therefore not usable by the root domain directly.

The xVM and Logical Domains hypervisors will be updated to
allow the root domain to declare a set of PCI devices or
functions as available for use by a domain other than the root
domain ("the IO domain").

The guest will be updated such that, when running as an IO
domain, it will enumerate the PCI devices or functions made
available to it by a root domain and bind drivers to those
devices in the traditional Solaris manner.

As much as possible the implementation is shared between the
xVM Hypervisor and Logical Domains platforms. However, given
the differences between the platforms some differences will be
observed:

- Access to PCI configuration space will be done by
three different mechanisms; two for the xVM
Hypervisor (determined by whether the IO domain is
paravirtualised or fully virtualsed) and one for
Logical Domains.

- The Logical Domains implementation will support IO
domains running un-modified versions of Solaris
(presuming that such versions of Solaris are already
supported with Logical Domains).

The xVM Hypervisor implementation will support fully
virtualised IO domains running un-modified operating
systems (e.g. Solaris 10, Linux and Microsoft
Windows). Operating systems that are
paravirtualised (e.g. OpenSolaris, Linux) will need
to be updated to act as an IO domain.

- Enumeration of PCI devices is performed differently
between xVM Hypervisor (via Solaris' pci_autoconfig
module) and Logical Domains (via OBP). This will not
change.

For the xVM Hypervisor the majority of the work in the
underlying implementation (based on the work of the Xen open
source community) has been completed externally. The project
will assist in updating the Xen implementation used by Solaris
to Xen 3.3 to incorporate the required changes.

The Logical Domains hypervisor will be updated to support
Direct IO.

Fully virtualised IO domains running on the xVM Hypervisor
will require either Intel VT-d or AMD IOMMU capability in the
underlying platform. Paravirtualised IO domains will not
require such capability, though its' absence will render the
platform insecure and is therefore not recommended.

4.2. Bug/RFE Number(s):

4.3. In Scope:

- Modifying the Logical Domains hypervisor to support Direct
IO.

- Modifying Solaris to reserve PCI devices or functions in the
root domain.

- Adapting the open source components of xVM Hypervisor that
deal with PCI devices to work correctly on Solaris.

- Implementing the inter-domain protocols which allow xVM
Hypervisor guest domains to access PCI configuration space.

- Adapting the Solaris x64 PCI enumeration code to run in
guest domains.

4.4. Out of Scope:

- Hybrid IO support in guest domains will not be implemented
as part of this project.

- The assignment of distinct PCI devices or functions behind a
PCI-E to PCI bridge to different IO domains will not be
possible.

- The Logical Domains implementation will not support Niagara1
based systems.

- PCI-X and legacy PCI devices can not be assigned to an I/O
domain.

- On SPARC, only PCI-E devices with drivers which support MSI
or MSI-X will be supported.

- Only selected PCI-E cards will be qualified, the list of
cards is to be determined and may differ between xVM
Hypervisor and Logical Domains.

- Migration of guest domains with assigned PCI devices will
not be supported in xVM Hypervisor (Logical Domains does not
currently support migration of guest domains).

- Updating xVM Ops Centre or the xVM Server BUI to support PCI
device assignment (though the project team is working with
the parties responsible for those components).

- Forwarding of attributable PCI fault reports to guest
domains.

- Support for the management of PCI SR-IOV devices, in
particular this project includes no support for enabling
SR-IOV mode. Should another project provide this capability,
any VFs created should be usable with the Direct IO
functionality.

4.5. Interfaces:
// What interfaces are introduced, modified or deleted by this
// proposal? What interfaces does it import and export? What are
// the new exported interface's stability levels?
// See http://sac.eng/cgi-bin/bp.cgi?NAME=interface_taxonomy.bp
// (Think of Files/directories, Ports, DTD/Schema, admin tools and
// config files, APIs, CLIs, etc, as well as incompatible or
// unexpected changes that may affect consumers and/or customers)

4.6. Doc Impact:

Logical Domains Admin Guide, Logical Domains manager Man
pages. May also want to update the Logical Domains
Beginner's Guide and/or produce a new white paper.

xVM Server Admin Guide.

4.7. Admin/Config Impact:

Changes to the Logical Domains manager CLI, xVM Hypervisor
CLI will be required.

4.8. HA Impact:
Not applicable.

4.9. I18N/L10N Impact:
No.

4.10. Packaging & Delivery:
// What packages, clusters or metaclusters does this proposal
// impact? What is its impact on install/upgrade?

4.11. Security Impact:
// How does this proposal interact with security-related APIs
// or interfaces? Does it rely on any Java policy or platform
// user/permissions implication? If the feature exposes any
// new ports, listeners, or any similar communication points
// which may have security implications, note these here.

4.12. Dependencies:

- Update of xVM Hypervisor to Xen 3.3.

- Support for MSI/MSI-X in xVM Hypervisor.

5. Reference Documents:
Direct I/O Glossary of terms -
http://cpubringup.sfbay.sun.com/twiki/bin/view/CSWIO/PCIeIOV/Glossary
Project TWiki -
http://cpubringup.sfbay.sun.com/twiki/bin/view/CSWIO/PCIeIOV/DirectIO
PSARC/2006/260: Solaris on Xen
PSARC/2005/633: LDoms: Project Q Logial Domaining Umbrella
Intel VT-d
http://software.intel.com/en-us/articles/intel-virtualization-technology-for-directed-io-vt-d-enhancing-intel-platforms-for-efficient-virtualization-of-io-devices
AMD-v
http://www.amd.com/us-en/assets/content_type/DownloadableAssets/AMD_WP_Virtualizing_Server_Workloads-PID.pdf
PSARC/2008/561: AMD IOMMU

6. Resources and Schedule:
6.1. Projected Availability:
cq2 2009

6.2. Cost of Effort:
Design/Development: 9 people for 6 months
Test: XXX dme: todo

6.3. Cost of Capital Resources:
XXX dme: todo
- x86 hardware for development,
- x86 hardware for QT,
- sparc hardware for development,
- sparc hardware for QT.
- option cards for development.
- option cards for QT.

6.4. Product Approval Committee requested information:
6.4.1. Consolidation or Component Name:
6.4.3. Type of CPT Review and Approval expected:
Standard

6.4.4. Project Boundary Conditions:
// Give the document's URL http://....

6.4.5. Is this a necessary project for OEM agreements:
No.

6.4.6. Notes:

6.4.7. Target RTI Date/Release:
RTI for ON consolidation: June 2009

6.4.8. Target Code Design Review Date:
March 2009

6.4.9. Update approval addition:
Not applicable.

6.5. ARC review type:
Standard.

6.6. ARC Exposure:
Open.

6.6.1. Rationale:
Not applicable.

7. Prototype Availability:
7.1. Prototype Availability:
// Functional subset expected to be needed to leave "prototype"
// stage.

7.2. Prototype Cost:
// Subset of Cost of Effort to leave "prototype" stage.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值