xen grant table机制分析

grant table是xen基于共享内存的,在不同domain之间进行通信的一种机制,grant table需要domain和xen共同配合才能进行

 * Xen's grant tables provide a generic mechanism to memory sharing
 * between domains. This shared memory interface underpins the split
 * device drivers for block and network IO.
 *
 * Each domain has its own grant table. This is a data structure that
 * is shared with Xen; it allows the domain to tell Xen what kind of
 * permissions other domains have on its pages. Entries in the grant
 * table are identified by grant references. A grant reference is an
 * integer, which indexes into the grant table. It acts as a
 * capability which the grantee can use to perform operations on the
 * granter’s memory.
 *
 * This capability-based system allows shared-memory communications
 * between unprivileged domains. A grant reference also encapsulates
 * the details of a shared page, removing the need for a domain to
 * know the real machine address of a page it is sharing. This makes
 * it possible to share memory correctly with domains running in
 * fully virtualised memory.

先来看domain中对grant table的操作

include/xen/interface/grant_table.h 中对grant table的操作注释

/* Some rough guidelines on accessing and updating grant-table entries
 * in a concurrency-safe manner. For more information, Linux contains a
 * reference implementation for guest OSes (arch/xen/kernel/grant_table.c).
 *
 * NB. WMB is a no-op on current-generation x86 processors. However, a
 *     compiler barrier will still be required.
 *
 * Introducing a valid entry into the grant table:
 *  1. Write ent->domid.
 *  2. Write ent->frame:
 *      GTF_permit_access:   Frame to which access is permitted.
 *      GTF_accept_transfer: Pseudo-phys frame slot being filled by new
 *                           frame, or zero if none.
 *  3. Write memory barrier (WMB).
 *  4. Write ent->flags, inc. valid type.
 *
 * Invalidating an unused GTF_permit_access entry:
 *  1. flags = ent->flags.
 *  2. Observe that !(flags & (GTF_reading|GTF_writing)).
 *  3. Check result of SMP-safe CMPXCHG(&ent->flags, flags, 0).
 *  NB. No need for WMB as reuse of entry is control-dependent on success of
 *      step 3, and all architectures guarantee ordering of ctrl-dep writes.
 *
 * Invalidating an in-use GTF_permit_access entry:
 *  This cannot be done directly. Request assistance from the domain controller
 *  which can set a timeout on the use of a grant entry and take necessary
 *  action. (NB. This is not yet implemented!).
 *
 * Invalidating an unused GTF_accept_transfer entry:
 *  1. flags = ent->flags.
 *  2. Observe that !(flags & GTF_transfer_committed). [*]
 *  3. Check result of SMP-safe CMPXCHG(&ent->flags, flags, 0).
 *  NB. No need for WMB as reuse of entry is control-dependent on success of
 *      step 3, and all architectures guarantee ordering of ctrl-dep writes.
 *  [*] If GTF_transfer_committed is set then the grant entry is 'committed'.
 *      The guest must /not/ modify the grant entry until the address of the
 *      transferred frame is written. It is safe for the guest to spin waiting
 *      for this to occur (detect by observing GTF_transfer_completed in
 *      ent->flags).
 *
 * Invalidating a committed GTF_accept_transfer entry:
 *  1. Wait for (ent->flags & GTF_transfer_completed).
 *
 * Changing a GTF_permit_access from writable to read-only:
 *  Use SMP-safe CMPXCHG to set GTF_readonly, while checking !GTF_writing.
 *
 * Changing a GTF_permit_access from read-only to writable:
 *  Use SMP-safe bit-setting instruction.
 */

grant_entry是一个结构体,代表某个page的共享信息,我们只分析v1版本的grant_entry结构体。domain的grant table由多个grant entry的数组组成,每个grant entry在数组中的索引用一个uint32_t来表示,作为一个grant reference,又称为GR

/*
 * Reference to a grant entry in a specified domain's grant table.
 */
typedef uint32_t grant_ref_t;

/*
 * A grant table comprises a packed array of grant entries in one or more
 * page frames shared between Xen and a guest.
 * [XEN]: This field is written by Xen and read by the sharing guest.
 * [GST]: This field is written by the guest and read by Xen.
 */

/*
 * Version 1 of the grant table entry structure is maintained purely
 * for backwards compatibility.  New guests should use version 2.
 *
struct grant_entry_v1 {
    /* GTF_xxx: various type and flag information.  [XEN,GST] */
    uint16_t flags;
    /* The domain being granted foreign privileges. [GST] */
    domid_t  domid;
    /*
     * GTF_permit_access: Frame that @domid is allowed to map and access. [GST]
     * GTF_accept_transfer: Frame whose ownership transferred by @domid. [XEN]
     */
    uint32_t frame;
};
grant_entry中的flags记录了grant entry的类型,最常用的是GTF_permit_access, GTP_accept_transfer两种:GTF_permit_access由共享page的domain指定授权给哪个domain(domid)来访问,包括读和写,以及访问哪个page frame(frame)。GTF_accept_transfer表示domid接收其他domain转移给自己的page。

grant_entry的flags还记录着当前grant entry的状态,e.g.

/*
 * Subflags for GTF_permit_access.
 *  GTF_readonly: Restrict @domid to read-only mappings and accesses. [GST]
 *  GTF_reading: Grant entry is currently mapped for reading by @domid. [XEN]
 *  GTF_writing: Grant entry is currently mapped for writing by @domid. [XEN]
 *  GTF_sub_page: Grant access to only a subrange of the page.  @domid
 *                will only be allowed to copy from the grant, and not
 *                map it. [GST]
 */
#define _GTF_readonly       (2)
#define GTF_readonly        (1U<<_GTF_readonly)
#define _GTF_reading        (3)
#define GTF_reading         (1U<<_GTF_reading)
#define _GTF_writing        (4)
#define GTF_writing         (1U<<_GTF_writing)
#define _GTF_sub_page       (8)
#define GTF_sub_page        (1U<<_GTF_sub_page)

/*
 * Subflags for GTF_accept_transfer:
 *  GTF_transfer_committed: Xen sets this flag to indicate that it is committed
 *      to transferring ownership of a page frame. When a guest sees this flag
 *      it must /not/ modify the grant entry until GTF_transfer_completed is
 *      set by Xen.
 *  GTF_transfer_completed: It is safe for the guest to spin-wait on this flag
 *      after reading GTF_transfer_committed. Xen will always write the frame
 *      address, followed by ORing this flag, in a timely manner.
 */
#define _GTF_transfer_committed (2)
#define GTF_transfer_committed  (1U<<_GTF_transfer_committed)
#define _GTF_transfer_completed (3)
#define GTF_transfer_completed  (1U<<_GTF_transfer_completed)

xen中定义了结构体grant_table,用来保存每个domain内部的grant table表,对于映射类型的grant entry,xen中用一个active_grant_entry来跟踪映射的变化,domain内部是没有这个grant_table结构体的,通过映射xen的内存页得到自己的grant table

/* Per-domain grant information. */
struct grant_table {
    /* Table size. Number of frames shared with guest */
    unsigned int          nr_grant_frames;
    /* Shared grant table (see include/public/grant_table.h). */
    union {
        void **shared_raw;
        struct grant_entry_v1 **shared_v1;
        union grant_entry_v2 **shared_v2;
    };
    /* Number of grant status frames shared with guest (for version 2) */
    unsigned int          nr_status_frames;
    /* State grant table (see include/public/grant_table.h). */
    grant_status_t       **status;
    /* Active grant table. */
    struct active_grant_entry **active;
    /* Mapping tracking table. */
    struct grant_mapping **maptrack;
    unsigned int          maptrack_head;
    unsigned int          maptrack_limit;
    /* Lock protecting updates to active and shared grant tables. */
    spinlock_t            lock;
    /* The defined versions are 1 and 2.  Set to 0 if we don't know
       what version to use yet. */
    unsigned              gt_version;
};
 /* Count of writable host-CPU mappings. */
#define GNTPIN_hstw_shift    (0)
#define GNTPIN_hstw_inc      (1 << GNTPIN_hstw_shift)
#define GNTPIN_hstw_mask     (0xFFU << GNTPIN_hstw_shift)
 /* Count of read-only host-CPU mappings. */
#define GNTPIN_hstr_shift    (8)
#define GNTPIN_hstr_inc      (1 << GNTPIN_hstr_shift)
#define GNTPIN_hstr_mask     (0xFFU << GNTPIN_hstr_shift)
 /* Count of writable device-bus mappings. */
#define GNTPIN_devw_shift    (16)
#define GNTPIN_devw_inc      (1 << GNTPIN_devw_shift)
#define GNTPIN_devw_mask     (0xFFU << GNTPIN_devw_shift)
 /* Count of read-only device-bus mappings. */
#define GNTPIN_devr_shift    (24)
#define GNTPIN_devr_inc      (1 << GNTPIN_devr_shift)
#define GNTPIN_devr_mask     (0xFFU << GNTPIN_devr_shift)
/* Active grant entry - used for shadowing GTF_permit_access grants. */
struct active_grant_entry {
    u32           pin;    /* Reference count information.             */
    domid_t       domid;  /* Domain being granted access.             */
    struct domain *trans_domain;
    uint32_t      trans_gref;
    unsigned long frame;  /* Frame being granted.                     */
    unsigned long gfn;    /* Guest's idea of the frame being granted. */
    unsigned      is_sub_page:1; /* True if this is a sub-page grant. */
    unsigned      start:15; /* For sub-page grants, the start offset
                               in the page.                           */
    unsigned      length:16; /* For sub-page grants, the length of the
                                grant.                                */
};
/*
 * Tracks a mapping of another domain's grant reference. Each domain has a
 * table of these, indexes into which are returned as a 'mapping handle'.
 */
struct grant_mapping {
    u32      ref;           /* grant ref */
    u16      flags;         /* 0-4: GNTMAP_* ; 5-15: unused */
    domid_t  domid;         /* granting domain */
};

xen通过do_grant_table_op来执行grant table相关的hypercall,我们重点关注如下几个操作:GNTTABOP_map_grant_ref, GNTTABOP_unmap_grant_ref, GNTTABOP_transfer, GNTTABOP_copy

GNTTABOP_map_grant_ref和GNTTABOP_unmap_grant_ref用来映射/撤销映射一个GR

/*
 * GNTTABOP_map_grant_ref: Map the grant entry (<dom>,<ref>) for access
 * by devices and/or host CPUs. If successful, <handle> is a tracking number
 * that must be presented later to destroy the mapping(s). On error, <handle>
 * is a negative status code.
 * NOTES:
 *  1. If GNTMAP_device_map is specified then <dev_bus_addr> is the address
 *     via which I/O devices may access the granted frame.
 *  2. If GNTMAP_host_map is specified then a mapping will be added at
 *     either a host virtual address in the current address space, or at
 *     a PTE at the specified machine address.  The type of mapping to
 *     perform is selected through the GNTMAP_contains_pte flag, and the
 *     address is specified in <host_addr>.
 *  3. Mappings should only be destroyed via GNTTABOP_unmap_grant_ref. If a
 *     host mapping is destroyed by other means then it is *NOT* guaranteed
 *     to be accounted to the correct grant reference!
 */
struct gnttab_map_grant_ref {
    /* IN parameters. */
    uint64_t host_addr;
    uint32_t flags;               /* GNTMAP_* */
    grant_ref_t ref;
    domid_t  dom;                 /* remote domain */
    /* OUT parameters. */
    int16_t  status;              /* => enum grant_status */
    grant_handle_t handle;
    uint64_t dev_bus_addr;
};
typedef struct gnttab_map_grant_ref gnttab_map_grant_ref_t;
DEFINE_XEN_GUEST_HANDLE(gnttab_map_grant_ref_t);

其中flags有两个维度的定义,GNTMAP_device_map, GNTMAP_host_map用来表示这种映射是用于IO操作,e.g. mmio, dma这种,还是一般的内存操作。GNTMAP_application_map用于表示被映射的page是否可以由目标domain的用户态程序访问,GNTMAP_contains_pte表明被映射的page包含源domain的页表

我们来看gnttab_map_grant_ref的实现

static long
gnttab_map_grant_ref(
    XEN_GUEST_HANDLE_PARAM(gnttab_map_grant_ref_t) uop, unsigned int count)
{
    int i;
    struct gnttab_map_grant_ref op;


    for ( i = 0; i < count; i++ )
    {
        if (i && hypercall_preempt_check())
            return i;
        if ( unlikely(__copy_from_guest_offset(&op, uop, i, 1)) )
            return -EFAULT;
        __gnttab_map_grant_ref(&op);
        if ( unlikely(__copy_to_guest_offset(uop, i, &op, 1)) )
            return -EFAULT;
    }


    return 0;
}
其中__copy_from_guest_offset和__copy_to_guest_offset宏用来把参数从guest拷贝到xen以及从xen拷贝回guest,在gnttab_map_grant_ref的实现中,guest传递了一组共count个数的gnttab_map_grant_ref,每次通过传递offset依次拷贝一个gnttab_map_grant_ref

#define __copy_to_guest_offset(hnd, off, ptr, nr) ({    \
    const typeof(*(ptr)) *_s = (ptr);                   \
    char (*_d)[sizeof(*_s)] = (void *)(hnd).p;          \
    ((void)((hnd).p == (ptr)));                         \
    __raw_copy_to_guest(_d+(off), _s, sizeof(*_s)*(nr));\
})

#define __copy_from_guest_offset(ptr, hnd, off, nr) ({  \
    const typeof(*(ptr)) *_s = (hnd).p;                 \
    typeof(*(ptr)) *_d = (ptr);                         \
    __raw_copy_from_guest(_d, _s+(off), sizeof(*_d)*(nr));\
})
XEN_GUEST_HANDLE_PARAM宏是引入用来区分guest传递给xen的指针,用于hypercall参数的指针用XEN_GUEST_HANDLE_PARAM宏封装,否则用XEN_GUEST_HANDLE封装,请参考 http://lists.xen.org/archives/html/xen-devel/2012-08/msg01324.html

在include/public/arch-x86/xen.h里有关于XEN_GUEST_HANDLE, XEN_GUEST_HANDLE_PARAM的宏定义,在x86架构下两者没有区别

#define ___DEFINE_XEN_GUEST_HANDLE(name, type) \
    typedef struct { type *p; } __guest_handle_ ## name
/*
 * XEN_GUEST_HANDLE represents a guest pointer, when passed as a field
 * in a struct in memory.
 * XEN_GUEST_HANDLE_PARAM represent a guest pointer, when passed as an
 * hypercall argument.
 * XEN_GUEST_HANDLE_PARAM and XEN_GUEST_HANDLE are the same on X86 but
 * they might not be on other architectures.
 */
#define __DEFINE_XEN_GUEST_HANDLE(name, type) \
    ___DEFINE_XEN_GUEST_HANDLE(name, type);   \
    ___DEFINE_XEN_GUEST_HANDLE(const_##name, const type)
#define DEFINE_XEN_GUEST_HANDLE(name)   __DEFINE_XEN_GUEST_HANDLE(name, name)
#define __XEN_GUEST_HANDLE(name)        __guest_handle_ ## name
#define XEN_GUEST_HANDLE(name)          __XEN_GUEST_HANDLE(name)
#define XEN_GUEST_HANDLE_PARAM(name)    XEN_GUEST_HANDLE(name)
那么XEN_GUEST_HANDLE_PARAM(gnttab_map_grant_ref_t)实际指向的是结构体__guest_handle_gnttab_map_grant_ref_t,定义为

typedef struct { gnttab_map_grant_ref_t* p } __guest_handle_gnttab_map_grant_ref_t
typedef struct { gnttab_map_grant_ref_t* p } __guest_handle_const_gnttab_map_grant_ref_t

最终映射通过__gnttab_map_grant_ref完成,该函数后续分析


GNTTABOP_unmap_grant_ref则用于撤销之前创建的map,注意撤销之后需要有个flush TLB的动作,通过调用flush_tlb_mask来完成

/*
 * GNTTABOP_unmap_grant_ref: Destroy one or more grant-reference mappings
 * tracked by <handle>. If <host_addr> or <dev_bus_addr> is zero, that
 * field is ignored. If non-zero, they must refer to a device/host mapping
 * that is tracked by <handle>
 * NOTES:
 *  1. The call may fail in an undefined manner if either mapping is not
 *     tracked by <handle>.
 *  3. After executing a batch of unmaps, it is guaranteed that no stale
 *     mappings will remain in the device or host TLBs.
 */
struct gnttab_unmap_grant_ref {
    /* IN parameters. */
    uint64_t host_addr;
    uint64_t dev_bus_addr;
    grant_handle_t handle;
    /* OUT parameters. */
    int16_t  status;              /* => enum grant_status */
};
typedef struct gnttab_unmap_grant_ref gnttab_unmap_grant_ref_t;
DEFINE_XEN_GUEST_HANDLE(gnttab_unmap_grant_ref_t);

static long
gnttab_unmap_grant_ref(
    XEN_GUEST_HANDLE_PARAM(gnttab_unmap_grant_ref_t) uop, unsigned int count)
{
    int i, c, partial_done, done = 0;
    struct gnttab_unmap_grant_ref op;
    struct gnttab_unmap_common common[GNTTAB_UNMAP_BATCH_SIZE];

    while ( count != 0 )
    {
        c = min(count, (unsigned int)GNTTAB_UNMAP_BATCH_SIZE);
        partial_done = 0;

        for ( i = 0; i < c; i++ )
        {
            if ( unlikely(__copy_from_guest(&op, uop, 1)) )
                goto fault;
            __gnttab_unmap_grant_ref(&op, &(common[i]));
            ++partial_done;
            if ( unlikely(__copy_field_to_guest(uop, &op, status)) )
                goto fault;
            guest_handle_add_offset(uop, 1);
        }

        flush_tlb_mask(current->domain->domain_dirty_cpumask);

        for ( i = 0; i < partial_done; i++ )
            __gnttab_unmap_common_complete(&(common[i]));

        count -= c;
        done += c;

        if (count && hypercall_preempt_check())
            return done;
    }
    return 0;

fault:
    flush_tlb_mask(current->domain->domain_dirty_cpumask);

    for ( i = 0; i < partial_done; i++ )
        __gnttab_unmap_common_complete(&(common[i]));
    return -EFAULT;
}

GNTTABOP_transfer_grant_ref用于把page从源domain传递给目标domain,和map/unmap不同的是,transfer之后,源domain就永远丧失这个page了。首先由目标domain发起一个GR,该GR的flag包含GTF_accept_transfer,domid为源domain,该GR表明目标domain已经同意接收源domain的page transfer了。之后源domain通过gnttab_transfer开始传递

/*
 * GNTTABOP_transfer_grant_ref: Transfer <frame> to a foreign domain. The
 * foreign domain has previously registered its interest in the transfer via
 * <domid, ref>.
 *
 * Note that, even if the transfer fails, the specified page no longer belongs
 * to the calling domain *unless* the error is GNTST_bad_page.
 */
struct gnttab_transfer {
    /* IN parameters. */
    xen_pfn_t     mfn;
    domid_t       domid;
    grant_ref_t   ref;
    /* OUT parameters. */
    int16_t       status;
};
typedef struct gnttab_transfer gnttab_transfer_t;
DEFINE_XEN_GUEST_HANDLE(gnttab_transfer_t);

GNTTABOP_copy用于把源domain的内存内容拷贝到目标domain中,显而易见的是xen很适合做这类操作因为hypervisor能看到所有domain的内存分布,同时这种操作不用刷新TLB因此代价不一定比map更高(一个是cpu内存总线的锁,一个是cpu TLB cache的刷,很难说谁的代价更高,在intel SNB下有NUMA的支持,cpu和内存之间的延迟更低,同步开销更小,笔者觉得copy的代价甚至还要低于map)

/*
 * GNTTABOP_copy: Hypervisor based copy
 * source and destinations can be eithers MFNs or, for foreign domains,
 * grant references. the foreign domain has to grant read/write access
 * in its grant table.
 *
 * The flags specify what type source and destinations are (either MFN
 * or grant reference).
 *
 * Note that this can also be used to copy data between two domains
 * via a third party if the source and destination domains had previously
 * grant appropriate access to their pages to the third party.
 *
 * source_offset specifies an offset in the source frame, dest_offset
 * the offset in the target frame and  len specifies the number of
 * bytes to be copied.
 */

#define _GNTCOPY_source_gref      (0)
#define GNTCOPY_source_gref       (1<<_GNTCOPY_source_gref)
#define _GNTCOPY_dest_gref        (1)
#define GNTCOPY_dest_gref         (1<<_GNTCOPY_dest_gref)

struct gnttab_copy {
    /* IN parameters. */
    struct {
        union {
            grant_ref_t ref;
            xen_pfn_t   gmfn;
        } u;
        domid_t  domid;
        uint16_t offset;
    } source, dest;
    uint16_t      len;
    uint16_t      flags;          /* GNTCOPY_* */
    /* OUT parameters. */
    int16_t       status;
};
typedef struct gnttab_copy  gnttab_copy_t;
DEFINE_XEN_GUEST_HANDLE(gnttab_copy_t);

gnttab_copy调用了__gnttab_copy,最终是通过memcpy来完成整个内容的拷贝的,后续详细分析该函数



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值