grant table是xen基于共享内存的,在不同domain之间进行通信的一种机制,grant table需要domain和xen共同配合才能进行
* Xen's grant tables provide a generic mechanism to memory sharing
* between domains. This shared memory interface underpins the split
* device drivers for block and network IO.
*
* Each domain has its own grant table. This is a data structure that
* is shared with Xen; it allows the domain to tell Xen what kind of
* permissions other domains have on its pages. Entries in the grant
* table are identified by grant references. A grant reference is an
* integer, which indexes into the grant table. It acts as a
* capability which the grantee can use to perform operations on the
* granter’s memory.
*
* This capability-based system allows shared-memory communications
* between unprivileged domains. A grant reference also encapsulates
* the details of a shared page, removing the need for a domain to
* know the real machine address of a page it is sharing. This makes
* it possible to share memory correctly with domains running in
* fully virtualised memory.
先来看domain中对grant table的操作
include/xen/interface/grant_table.h 中对grant table的操作注释
/* Some rough guidelines on accessing and updating grant-table entries
* in a concurrency-safe manner. For more information, Linux contains a
* reference implementation for guest OSes (arch/xen/kernel/grant_table.c).
*
* NB. WMB is a no-op on current-generation x86 processors. However, a
* compiler barrier will still be required.
*
* Introducing a valid entry into the grant table:
* 1. Write ent->domid.
* 2. Write ent->frame:
* GTF_permit_access: Frame to which access is permitted.
* GTF_accept_transfer: Pseudo-phys frame slot being filled by new
* frame, or zero if none.
* 3. Write memory barrier (WMB).
* 4. Write ent->flags, inc. valid type.
*
* Invalidating an unused GTF_permit_access entry:
* 1. flags = ent->flags.
* 2. Observe that !(flags & (GTF_reading|GTF_writing)).
* 3. Check result of SMP-safe CMPXCHG(&ent->flags, flags, 0).
* NB. No need for WMB as reuse of entry is control-dependent on success of
* step 3, and all architectures guarantee ordering of ctrl-dep writes.
*
* Invalidating an in-use GTF_permit_access entry:
* This cannot be done directly. Request assistance from the domain controller
* which can set a timeout on the use of a grant entry and take necessary
* action. (NB. This is not yet implemented!).
*
* Invalidating an unused GTF_accept_transfer entry:
* 1. flags = ent->flags.
* 2. Observe that !(flags & GTF_transfer_committed). [*]
* 3. Check result of SMP-safe CMPXCHG(&ent->flags, flags, 0).
* NB. No need for WMB as reuse of entry is control-dependent on success of
* step 3, and all architectures guarantee ordering of ctrl-dep writes.
* [*] If GTF_transfer_committed is set then the grant entry is 'committed'.
* The guest must /not/ modify the grant entry until the address of the
* transferred frame is written. It is safe for the guest to spin waiting
* for this to occur (detect by observing GTF_transfer_completed in
* ent->flags).
*
* Invalidating a committed GTF_accept_transfer entry:
* 1. Wait for (ent->flags & GTF_transfer_completed).
*
* Changing a GTF_permit_access from writable to read-only:
* Use SMP-safe CMPXCHG to set GTF_readonly, while checking !GTF_writing.
*
* Changing a GTF_permit_access from read-only to writable:
* Use SMP-safe bit-setting instruction.
*/
grant_entry是一个结构体,代表某个page的共享信息,我们只分析v1版本的grant_entry结构体。domain的grant table由多个grant entry的数组组成,每个grant entry在数组中的索引用一个uint32_t来表示,作为一个grant reference,又称为GR
/*
* Reference to a grant entry in a specified domain's grant table.
*/
typedef uint32_t grant_ref_t;
/*
* A grant table comprises a packed array of grant entries in one or more
* page frames shared between Xen and a guest.
* [XEN]: This field is written by Xen and read by the sharing guest.
* [GST]: This field is written by the guest and read by Xen.
*/
/*
* Version 1 of the grant table entry structure is maintained purely
* for backwards compatibility. New guests should use version 2.
*
struct grant_entry_v1 {
/* GTF_xxx: various type and flag information. [XEN,GST] */
uint16_t flags;
/* The domain being granted foreign privileges. [GST] */
domid_t domid;
/*
* GTF_permit_access: Frame that @domid is allowed to map and access. [GST]
* GTF_accept_transfer: Frame whose ownership transferred by @domid. [XEN]
*/
uint32_t frame;
};
grant_entry中的flags记录了grant entry的类型,最常用的是GTF_permit_access, GTP_accept_transfer两种:GTF_permit_access由共享page的domain指定授权给哪个domain(domid)来访问,包括读和写,以及访问哪个page frame(frame)。GTF_accept_transfer表示domid接收其他domain转移给自己的page。
grant_entry的flags还记录着当前grant entry的状态,e.g.
/*
* Subflags for GTF_permit_access.
* GTF_readonly: Restrict @domid to read-only mappings and accesses. [GST]
* GTF_reading: Grant entry is currently mapped for reading by @domid. [XEN]
* GTF_writing: Grant entry is currently mapped for writing by @domid. [XEN]
* GTF_sub_page: Grant access to only a subrange of the page. @domid
* will only be allowed to copy from the grant, and not
* map it. [GST]
*/
#define _GTF_readonly (2)
#define GTF_readonly (1U<<_GTF_readonly)
#define _GTF_reading (3)
#define GTF_reading (1U<<_GTF_reading)
#define _GTF_writing (4)
#define GTF_writing (1U<<_GTF_writing)
#define _GTF_sub_page (8)
#define GTF_sub_page (1U<<_GTF_sub_page)
/*
* Subflags for GTF_accept_transfer:
* GTF_transfer_committed: Xen sets this flag to indicate that it is committed
* to transferring ownership of a page frame. When a guest sees this flag
* it must /not/ modify the grant entry until GTF_transfer_completed is
* set by Xen.
* GTF_transfer_completed: It is safe for the guest to spin-wait on this flag
* after reading GTF_transfer_committed. Xen will always write the frame
* address, followed by ORing this flag, in a timely manner.
*/
#define _GTF_transfer_committed (2)
#define GTF_transfer_committed (1U<<_GTF_transfer_committed)
#define _GTF_transfer_completed (3)
#define GTF_transfer_completed (1U<<_GTF_transfer_completed)
xen中定义了结构体grant_table,用来保存每个domain内部的grant table表,对于映射类型的grant entry,xen中用一个active_grant_entry来跟踪映射的变化,domain内部是没有这个grant_table结构体的,通过映射xen的内存页得到自己的grant table
/* Per-domain grant information. */
struct grant_table {
/* Table size. Number of frames shared with guest */
unsigned int nr_grant_frames;
/* Shared grant table (see include/public/grant_table.h). */
union {
void **shared_raw;
struct grant_entry_v1 **shared_v1;
union grant_entry_v2 **shared_v2;
};
/* Number of grant status frames shared with guest (for version 2) */
unsigned int nr_status_frames;
/* State grant table (see include/public/grant_table.h). */
grant_status_t **status;
/* Active grant table. */
struct active_grant_entry **active;
/* Mapping tracking table. */
struct grant_mapping **maptrack;
unsigned int maptrack_head;
unsigned int maptrack_limit;
/* Lock protecting updates to active and shared grant tables. */
spinlock_t lock;
/* The defined versions are 1 and 2. Set to 0 if we don't know
what version to use yet. */
unsigned gt_version;
};
/* Count of writable host-CPU mappings. */
#define GNTPIN_hstw_shift (0)
#define GNTPIN_hstw_inc (1 << GNTPIN_hstw_shift)
#define GNTPIN_hstw_mask (0xFFU << GNTPIN_hstw_shift)
/* Count of read-only host-CPU mappings. */
#define GNTPIN_hstr_shift (8)
#define GNTPIN_hstr_inc (1 << GNTPIN_hstr_shift)
#define GNTPIN_hstr_mask (0xFFU << GNTPIN_hstr_shift)
/* Count of writable device-bus mappings. */
#define GNTPIN_devw_shift (16)
#define GNTPIN_devw_inc (1 << GNTPIN_devw_shift)
#define GNTPIN_devw_mask (0xFFU << GNTPIN_devw_shift)
/* Count of read-only device-bus mappings. */
#define GNTPIN_devr_shift (24)
#define GNTPIN_devr_inc (1 << GNTPIN_devr_shift)
#define GNTPIN_devr_mask (0xFFU << GNTPIN_devr_shift)
/* Active grant entry - used for shadowing GTF_permit_access grants. */
struct active_grant_entry {
u32 pin; /* Reference count information. */
domid_t domid; /* Domain being granted access. */
struct domain *trans_domain;
uint32_t trans_gref;
unsigned long frame; /* Frame being granted. */
unsigned long gfn; /* Guest's idea of the frame being granted. */
unsigned is_sub_page:1; /* True if this is a sub-page grant. */
unsigned start:15; /* For sub-page grants, the start offset
in the page. */
unsigned length:16; /* For sub-page grants, the length of the
grant. */
};
/*
* Tracks a mapping of another domain's grant reference. Each domain has a
* table of these, indexes into which are returned as a 'mapping handle'.
*/
struct grant_mapping {
u32 ref; /* grant ref */
u16 flags; /* 0-4: GNTMAP_* ; 5-15: unused */
domid_t domid; /* granting domain */
};
xen通过do_grant_table_op来执行grant table相关的hypercall,我们重点关注如下几个操作:GNTTABOP_map_grant_ref, GNTTABOP_unmap_grant_ref, GNTTABOP_transfer, GNTTABOP_copy
GNTTABOP_map_grant_ref和GNTTABOP_unmap_grant_ref用来映射/撤销映射一个GR
/*
* GNTTABOP_map_grant_ref: Map the grant entry (<dom>,<ref>) for access
* by devices and/or host CPUs. If successful, <handle> is a tracking number
* that must be presented later to destroy the mapping(s). On error, <handle>
* is a negative status code.
* NOTES:
* 1. If GNTMAP_device_map is specified then <dev_bus_addr> is the address
* via which I/O devices may access the granted frame.
* 2. If GNTMAP_host_map is specified then a mapping will be added at
* either a host virtual address in the current address space, or at
* a PTE at the specified machine address. The type of mapping to
* perform is selected through the GNTMAP_contains_pte flag, and the
* address is specified in <host_addr>.
* 3. Mappings should only be destroyed via GNTTABOP_unmap_grant_ref. If a
* host mapping is destroyed by other means then it is *NOT* guaranteed
* to be accounted to the correct grant reference!
*/
struct gnttab_map_grant_ref {
/* IN parameters. */
uint64_t host_addr;
uint32_t flags; /* GNTMAP_* */
grant_ref_t ref;
domid_t dom; /* remote domain */
/* OUT parameters. */
int16_t status; /* => enum grant_status */
grant_handle_t handle;
uint64_t dev_bus_addr;
};
typedef struct gnttab_map_grant_ref gnttab_map_grant_ref_t;
DEFINE_XEN_GUEST_HANDLE(gnttab_map_grant_ref_t);
其中flags有两个维度的定义,GNTMAP_device_map, GNTMAP_host_map用来表示这种映射是用于IO操作,e.g. mmio, dma这种,还是一般的内存操作。GNTMAP_application_map用于表示被映射的page是否可以由目标domain的用户态程序访问,GNTMAP_contains_pte表明被映射的page包含源domain的页表
我们来看gnttab_map_grant_ref的实现
static long
gnttab_map_grant_ref(
XEN_GUEST_HANDLE_PARAM(gnttab_map_grant_ref_t) uop, unsigned int count)
{
int i;
struct gnttab_map_grant_ref op;
for ( i = 0; i < count; i++ )
{
if (i && hypercall_preempt_check())
return i;
if ( unlikely(__copy_from_guest_offset(&op, uop, i, 1)) )
return -EFAULT;
__gnttab_map_grant_ref(&op);
if ( unlikely(__copy_to_guest_offset(uop, i, &op, 1)) )
return -EFAULT;
}
return 0;
}
其中__copy_from_guest_offset和__copy_to_guest_offset宏用来把参数从guest拷贝到xen以及从xen拷贝回guest,在gnttab_map_grant_ref的实现中,guest传递了一组共count个数的gnttab_map_grant_ref,每次通过传递offset依次拷贝一个gnttab_map_grant_ref
#define __copy_to_guest_offset(hnd, off, ptr, nr) ({ \
const typeof(*(ptr)) *_s = (ptr); \
char (*_d)[sizeof(*_s)] = (void *)(hnd).p; \
((void)((hnd).p == (ptr))); \
__raw_copy_to_guest(_d+(off), _s, sizeof(*_s)*(nr));\
})
#define __copy_from_guest_offset(ptr, hnd, off, nr) ({ \
const typeof(*(ptr)) *_s = (hnd).p; \
typeof(*(ptr)) *_d = (ptr); \
__raw_copy_from_guest(_d, _s+(off), sizeof(*_d)*(nr));\
})
XEN_GUEST_HANDLE_PARAM宏是引入用来区分guest传递给xen的指针,用于hypercall参数的指针用XEN_GUEST_HANDLE_PARAM宏封装,否则用XEN_GUEST_HANDLE封装,请参考 http://lists.xen.org/archives/html/xen-devel/2012-08/msg01324.html
在include/public/arch-x86/xen.h里有关于XEN_GUEST_HANDLE, XEN_GUEST_HANDLE_PARAM的宏定义,在x86架构下两者没有区别
#define ___DEFINE_XEN_GUEST_HANDLE(name, type) \
typedef struct { type *p; } __guest_handle_ ## name
/*
* XEN_GUEST_HANDLE represents a guest pointer, when passed as a field
* in a struct in memory.
* XEN_GUEST_HANDLE_PARAM represent a guest pointer, when passed as an
* hypercall argument.
* XEN_GUEST_HANDLE_PARAM and XEN_GUEST_HANDLE are the same on X86 but
* they might not be on other architectures.
*/
#define __DEFINE_XEN_GUEST_HANDLE(name, type) \
___DEFINE_XEN_GUEST_HANDLE(name, type); \
___DEFINE_XEN_GUEST_HANDLE(const_##name, const type)
#define DEFINE_XEN_GUEST_HANDLE(name) __DEFINE_XEN_GUEST_HANDLE(name, name)
#define __XEN_GUEST_HANDLE(name) __guest_handle_ ## name
#define XEN_GUEST_HANDLE(name) __XEN_GUEST_HANDLE(name)
#define XEN_GUEST_HANDLE_PARAM(name) XEN_GUEST_HANDLE(name)
那么XEN_GUEST_HANDLE_PARAM(gnttab_map_grant_ref_t)实际指向的是结构体__guest_handle_gnttab_map_grant_ref_t,定义为
typedef struct { gnttab_map_grant_ref_t* p } __guest_handle_gnttab_map_grant_ref_t
typedef struct { gnttab_map_grant_ref_t* p } __guest_handle_const_gnttab_map_grant_ref_t
最终映射通过__gnttab_map_grant_ref完成,该函数后续分析
GNTTABOP_unmap_grant_ref则用于撤销之前创建的map,注意撤销之后需要有个flush TLB的动作,通过调用flush_tlb_mask来完成
/*
* GNTTABOP_unmap_grant_ref: Destroy one or more grant-reference mappings
* tracked by <handle>. If <host_addr> or <dev_bus_addr> is zero, that
* field is ignored. If non-zero, they must refer to a device/host mapping
* that is tracked by <handle>
* NOTES:
* 1. The call may fail in an undefined manner if either mapping is not
* tracked by <handle>.
* 3. After executing a batch of unmaps, it is guaranteed that no stale
* mappings will remain in the device or host TLBs.
*/
struct gnttab_unmap_grant_ref {
/* IN parameters. */
uint64_t host_addr;
uint64_t dev_bus_addr;
grant_handle_t handle;
/* OUT parameters. */
int16_t status; /* => enum grant_status */
};
typedef struct gnttab_unmap_grant_ref gnttab_unmap_grant_ref_t;
DEFINE_XEN_GUEST_HANDLE(gnttab_unmap_grant_ref_t);
static long
gnttab_unmap_grant_ref(
XEN_GUEST_HANDLE_PARAM(gnttab_unmap_grant_ref_t) uop, unsigned int count)
{
int i, c, partial_done, done = 0;
struct gnttab_unmap_grant_ref op;
struct gnttab_unmap_common common[GNTTAB_UNMAP_BATCH_SIZE];
while ( count != 0 )
{
c = min(count, (unsigned int)GNTTAB_UNMAP_BATCH_SIZE);
partial_done = 0;
for ( i = 0; i < c; i++ )
{
if ( unlikely(__copy_from_guest(&op, uop, 1)) )
goto fault;
__gnttab_unmap_grant_ref(&op, &(common[i]));
++partial_done;
if ( unlikely(__copy_field_to_guest(uop, &op, status)) )
goto fault;
guest_handle_add_offset(uop, 1);
}
flush_tlb_mask(current->domain->domain_dirty_cpumask);
for ( i = 0; i < partial_done; i++ )
__gnttab_unmap_common_complete(&(common[i]));
count -= c;
done += c;
if (count && hypercall_preempt_check())
return done;
}
return 0;
fault:
flush_tlb_mask(current->domain->domain_dirty_cpumask);
for ( i = 0; i < partial_done; i++ )
__gnttab_unmap_common_complete(&(common[i]));
return -EFAULT;
}
GNTTABOP_transfer_grant_ref用于把page从源domain传递给目标domain,和map/unmap不同的是,transfer之后,源domain就永远丧失这个page了。首先由目标domain发起一个GR,该GR的flag包含GTF_accept_transfer,domid为源domain,该GR表明目标domain已经同意接收源domain的page transfer了。之后源domain通过gnttab_transfer开始传递
/*
* GNTTABOP_transfer_grant_ref: Transfer <frame> to a foreign domain. The
* foreign domain has previously registered its interest in the transfer via
* <domid, ref>.
*
* Note that, even if the transfer fails, the specified page no longer belongs
* to the calling domain *unless* the error is GNTST_bad_page.
*/
struct gnttab_transfer {
/* IN parameters. */
xen_pfn_t mfn;
domid_t domid;
grant_ref_t ref;
/* OUT parameters. */
int16_t status;
};
typedef struct gnttab_transfer gnttab_transfer_t;
DEFINE_XEN_GUEST_HANDLE(gnttab_transfer_t);
GNTTABOP_copy用于把源domain的内存内容拷贝到目标domain中,显而易见的是xen很适合做这类操作因为hypervisor能看到所有domain的内存分布,同时这种操作不用刷新TLB因此代价不一定比map更高(一个是cpu内存总线的锁,一个是cpu TLB cache的刷,很难说谁的代价更高,在intel SNB下有NUMA的支持,cpu和内存之间的延迟更低,同步开销更小,笔者觉得copy的代价甚至还要低于map)
/*
* GNTTABOP_copy: Hypervisor based copy
* source and destinations can be eithers MFNs or, for foreign domains,
* grant references. the foreign domain has to grant read/write access
* in its grant table.
*
* The flags specify what type source and destinations are (either MFN
* or grant reference).
*
* Note that this can also be used to copy data between two domains
* via a third party if the source and destination domains had previously
* grant appropriate access to their pages to the third party.
*
* source_offset specifies an offset in the source frame, dest_offset
* the offset in the target frame and len specifies the number of
* bytes to be copied.
*/
#define _GNTCOPY_source_gref (0)
#define GNTCOPY_source_gref (1<<_GNTCOPY_source_gref)
#define _GNTCOPY_dest_gref (1)
#define GNTCOPY_dest_gref (1<<_GNTCOPY_dest_gref)
struct gnttab_copy {
/* IN parameters. */
struct {
union {
grant_ref_t ref;
xen_pfn_t gmfn;
} u;
domid_t domid;
uint16_t offset;
} source, dest;
uint16_t len;
uint16_t flags; /* GNTCOPY_* */
/* OUT parameters. */
int16_t status;
};
typedef struct gnttab_copy gnttab_copy_t;
DEFINE_XEN_GUEST_HANDLE(gnttab_copy_t);
gnttab_copy调用了__gnttab_copy,最终是通过memcpy来完成整个内容的拷贝的,后续详细分析该函数