NOTE: Currently the buffer creation/read/write are implemented by driver specified ioctls. So right now you can't find functions like "drm_gem_create" in kernel sources.
And that's also why we need "dumb buffer" in drm's ioctls(dumb_create, dumb_destroy...)
The Graphics Execution Manager
Part of the Direct Rendering Manager
==============================
Keith Packard <keithp@keithp.com>
Eric Anholt <eric@anholt.net>
2008-5-9
Contents:
1. GEM Overview
2. API overview and conventions
3. Object Creation/Destruction
4. Reading/writing contents
5. Mapping objects to userspace
6. Memory Domains
7. Execution (Intel specific)
8. Other misc Intel-specific functions
1. Graphics Execution Manager Overview
Gem
is
designed to manage graphics memory, control access to the graphics
device execution context and handle the essentially NUMA environment unique
to modern graphics hardware. Gem allows multiple applications to share
graphics device resources without the need to constantly reload the entire
graphics card. Data may be shared between multiple applications with gem
ensuring that the correct memory synchronization occurs.
Graphics data can consume arbitrary amounts of memory, with 3D applications
constructing ever larger sets of textures and vertices. With graphics cards
memory space growing larger every year, and graphics APIs growing more
complex, we can no longer insist that each application save a complete copy
of their graphics state so that the card can be re-initialized
from
user
space at each context
switch
. Ensuring that graphics data remains persistent
across context switches allows applications significant
new
functionality
while
also improving performance
for
existing APIs.
Modern linux desktops include significant 3D rendering
as
a fundemental
component of the desktop image construction process. 2D and 3D applications
paint their content to offscreen storage and the central 'compositing
manager' constructs the final screen image
from
those window contents. This
means that pixel image data
from
these applications must move within reach
of the compositing manager and used
as
source operands
for
screen image
rendering operations.
Gem provides simple mechanisms to manage graphics data and control execution
flow within the linux operating system. Using many existing kernel
subsystems, it does
this
with a modest amount of code.
2. API Overview and Conventions
All APIs here are defined
in
terms of ioctls appplied to the DRM file
descriptor. To create and manipulate objects, an application must be
'authorized'
using
the DRI or DRI2 protocols with the X server. To relax
that, we will need to implement some better access control mechanisms within
the hardware portion of the driver to prevent inappropriate
cross-application data access.
Any DRM driver which does not support GEM will
return
-ENODEV
for
all of
these ioctls. Invalid
object
handles
return
-EINVAL. Invalid
object
names
return
-ENOENT. Other errors are
as
documented
in
the specific API below.
To avoid the need to translate ioctl contents
on
mixed-size systems (with
32-bit user space running
on
a 64-bit kernel), the ioctl data structures
contain explicitly sized objects,
using
64-bits
for
all size and pointer
data and 32-bits
for
identifiers. In addition, the 64-bit objects are all
carefully aligned
on
64-bit boundaries. Because of
this
, all pointers
in
the
ioctl data structures are passed
as
uint64_t values. Suitable casts will
be necessary.
One significant operation which
is
explicitly left
out
of
this
API
is
object
locking. Applications are expected to perform locking of shared objects
outside of the GEM api. This kind of locking
is
not necessary to safely
manipulate the graphics engine, and with multiple objects interacting
in
unknown ways, per-
object
locking would likely introduce all kinds of
lock
-order issues. Punting
this
to the application seems like the only
sensible plan. Given that DRM already offers a global
lock
on
the hardware,
this
doesn't change the current situation.
3. Object Creation and Destruction
Gem provides
explicit
memory management primitives. System pages are
allocated when the
object
is
created, either
as
the fundemental storage
for
hardware
where
system memory
is
used
by
the graphics processor directly, or
as
backing store
for
graphics-processor resident memory.
Objects are referenced
from
user space
using
handles. These are,
for
all
intents and purposes, equivalent to file descriptors. We could simply use
file descriptors were it not
for
the small limit (1024) of file descriptors
available to applications, and
for
the fact that the X server (a rather
significant user of
this
API) uses
'select'
and has a limited maximum file
descriptor
for
that operation. Given the ability to allocate more file
descriptors, and given the ability to place these
'higher'
in
the file
descriptor space, we'd love to simply use file descriptors.
Objects may be published with a name so that other applications can access
them. The name remains valid
as
long
as
the
object
exists. Right now, our
DRI APIs use 32-bit integer names, so that's what we expose here<br>
A. Creation
struct
drm_gem_create {
/**
* Requested size for the object.
*
* The (page-aligned) allocated size for the object
* will be returned.
*/
uint64_t size;
/**
* Returned handle for the object.
*
* Object handles are nonzero.
*/
uint32_t handle;
uint32_t pad;
};
/* usage */
create.size = 16384;
ret = ioctl (fd, DRM_IOCTL_GEM_CREATE, &create);
if
(ret == 0)
return
create.handle;
Note that the size
is
rounded up to a page boundary, and that
the rounded-up size
is
returned
in
'size'
. No name
is
assigned to
this
object
, making it local to
this
process.
If insufficient memory
is
availabe, -ENOMEM will be returned.
B. Closing
struct
drm_gem_close {
/** Handle of the object to be closed. */
uint32_t handle;
uint32_t pad;
};
/* usage */
close.handle = <handle>;
ret = ioctl (fd, DRM_IOCTL_GEM_CLOSE, &close);
This call makes the specified handle invalid, and
if
no other
applications are
using
the
object
, any necessary graphics hardware
synchronization
is
performed and the resources used
by
the
object
released.
C. Naming
struct
drm_gem_flink {
/** Handle for the object being named */
uint32_t handle;
/** Returned global name */
uint32_t name;
};
/* usage */
flink.handle = <handle>;
ret = ioctl (fd, DRM_IOCTL_GEM_FLINK, &flink);
if
(ret == 0)
return
flink.name;
Flink creates a name
for
the
object
and returns it to the
application. This name can be used
by
other applications to gain
access to the same
object
.
D. Opening
by
name
struct
drm_gem_open {
/** Name of object being opened */
uint32_t name;
/** Returned handle for the object */
uint32_t handle;
/** Returned size of the object */
uint64_t size;
};
/* usage */
open.name = <name>;
ret = ioctl (fd, DRM_IOCTL_GEM_OPEN, &open);
if
(ret == 0) {
*sizep = open.size;
return
open.handle;
}
Open accesses an existing
object
and returns a handle
for
it. If the
object
doesn't exist, -ENOENT
is
returned. The size of the
object
is
also returned. This handle has all the same capabilities
as
the
handle used to create the
object
. In particular, the
object
is
not
destroyed until all handles are closed.
4. Basic read/write operations
By
default
, gem objects are not mapped to the applications address space,
getting data
in
and
out
of them
is
done with I/O operations instead. This
allows the data to reside
in
otherwise unmapped pages, including pages
in
video memory
on
an attached discrete graphics card. In addition,
using
explicit
I/O operations allows better control over cache contents,
as
graphics devices are generally not cache coherent with the CPU, mapping
pages used
for
graphics
into
an application address space requires the use
of expensive cache flushing operations. Providing direct control over
graphics data access ensures that data are handled
in
the most efficient
possible fashion.
A. Reading
struct
drm_gem_pread {
/** Handle for the object being read. */
uint32_t handle;
uint32_t pad;
/** Offset into the object to read from */
uint64_t offset;
/** Length of data to read */
uint64_t size;
/** Pointer to write the data into. */
uint64_t data_ptr;
/* void * */
};
This copies data
into
the specified
object
at the specified
position. Any necessary graphics device synchronization and
flushing will be done automatically.
struct
drm_gem_pwrite {
/** Handle for the object being written to. */
uint32_t handle;
uint32_t pad;
/** Offset into the object to write to */
uint64_t offset;
/** Length of data to write */
uint64_t size;
/** Pointer to read the data from. */
uint64_t data_ptr;
/* void * */
};
This copies data
out
of the specified
object
into
the
waiting user memory. Again, device synchronization will
be handled
by
the kernel to ensure user space sees a
consistent view of the graphics device.
5. Mapping objects to user space
For most objects, reading/writing
is
the preferred interaction mode.
However, when the CPU
is
involved
in
rendering to cover deficiencies
in
hardware support
for
particular operations, the CPU will want to directly
access the relevant objects.
Because mmap
is
fairly heavyweight, we allow applications to retain maps to
objects persistently and then update how they're
using
the memory through a
separate
interface
. Applications which fail to use
this
separate
interface
may exhibit unpredictable behaviour
as
memory consistency will not be
preserved.
A. Mapping
struct
drm_gem_mmap {
/** Handle for the object being mapped. */
uint32_t handle;
uint32_t pad;
/** Offset in the object to map. */
uint64_t offset;
/**
* Length of data to map.
*
* The value will be page-aligned.
*/
uint64_t size;
/** Returned pointer the data was mapped at */
uint64_t addr_ptr;
/* void * */
};
/* usage */
mmap.handle = <handle>;
mmap.offset = <offset>;
mmap.size = <size>;
ret = ioctl (fd, DRM_IOCTL_GEM_MMAP, &mmap);
if
(ret == 0)
return
(
void
*) (uintptr_t) mmap.addr_ptr;
B. Unmapping
munmap (addr, length);
Nothing strange here, just use the normal munmap syscall.
6. Memory Domains
Graphics devices remain a strong bastion of non cache-coherent memory. As a
result, accessing data through one functional unit will end up loading that
cache with data which then needs to be manually synchronized when that data
is
used with another functional unit.
Tracking
where
data are resident
is
done
by
identifying how functional units
deal with caches. Each cache
is
labeled
as
a separate memory domain. Then,
each sequence of operations
is
expected to load data
into
various read
domains and leave data
in
at most one write domain. Gem tracks the read and
write memory domains of each
object
and performs the necessary
synchronization operations when objects move
from
one domain
set
to another.
For example,
if
operation
'A'
constructs an image that
is
immediately used
by
operation
'B'
, then when the read domain
for
'B'
is
not the same
as
the
write domain
for
'A'
, then the write domain must be flushed, and the read
domain invalidated. If these two operations are both executed
in
the same
command queue, then the flush operation can go inbetween them
in
the same
queue, avoiding any kind of CPU-based synchronization and leaving the GPU to
do
the work itself.
6.1 Memory Domains (GPU-independent)
* DRM_GEM_DOMAIN_CPU.
Objects
in
this
domain are
using
caches which are connected to the CPU.
Moving objects
from
non-CPU domains
into
the CPU domain can involve waiting
for
the GPU to finish with operations
using
this
object
. Moving objects
from
this
domain to a GPU domain can involve flushing CPU caches and chipset
buffers.
6.1 GPU-independent memory domain ioctl
This ioctl
is
independent of the GPU
in
use. So far, no use other than
synchronizing objects to the CPU domain have been found;
if
that turns
out
to be generally
true
,
this
ioctl may be simplified further.
A. Explicit domain control
struct
drm_gem_set_domain {
/** Handle for the object */
uint32_t handle;
/** New read domains */
uint32_t read_domains;
/** New write domain */
uint32_t write_domain;
};
/* usage */
set_domain.handle = <handle>;
set_domain.read_domains = <read_domains>;
set_domain.write_domain = <write_domain>;
ret = ioctl (fd, DRM_IOCTL_GEM_SET_DOMAIN, &set_domain);
When the application wants to explicitly manage memory domains
for
an
object
, it can use
this
function. Usually,
this
is
only used
when the application wants to synchronize
object
contents between
the GPU and CPU-based application rendering. In that
case
,
the <read_domains> would be
set
to DRM_GEM_DOMAIN_CPU, and
if
the
application were going to write to the
object
, the <write_domain>
would also be
set
to DRM_GEM_DOMAIN_CPU. After the call, gem
guarantees that all previous rendering operations involving
this
object
are complete. The application
is
then free to access the
object
through the address returned
by
the mmap call. Afterwards,
when the application again uses the
object
through the GPU, any
necessary CPU flushing will occur and the
object
will be correctly
synchronized with the GPU.
Note that
this
synchronization
is
not required
for
any accesses
going through the driver itself. The pread, pwrite and execbuffer
ioctls all perform the necessary domain management internally.
Explicit synchronization
is
only necessary when accessing the
object
through the mmap'd address.
7. Execution (Intel specific)
Managing the command buffers
is
inherently chip-specific, so the core of gem
doesn't have any intrinsic functions. Rather, execution
is
left to the
device-specific portions of the driver.
The Intel DRM_I915_GEM_EXECBUFFER ioctl takes a list of gem objects, all of
which are mapped to the graphics device. The last
object
in
the list
is
the
command buffer.
7.1. Relocations
Command buffers often refer to other objects, and to allow the kernel driver
to move objects around, a sequence of relocations
is
associated with each
object
. Device-specific relocation operations are used to place the
target-
object
relative value
into
the
object
.
The Intel driver has a single relocation type:
struct
drm_i915_gem_relocation_entry {
/**
* Handle of the buffer being pointed to by this
* relocation entry.
*
* It's appealing to make this be an index into the
* mm_validate_entry list to refer to the buffer,
* but this allows the driver to create a relocation
* list for state buffers and not re-write it per
* exec using the buffer.
*/
uint32_t target_handle;
/**
* Value to be added to the offset of the target
* buffer to make up the relocation entry.
*/
uint32_t delta;
/**
* Offset in the buffer the relocation entry will be
* written into
*/
uint64_t offset;
/**
* Offset value of the target buffer that the
* relocation entry was last written as.
*
* If the buffer has the same offset as last time, we
* can skip syncing and writing the relocation. This
* value is written back out by the execbuffer ioctl
* when the relocation is written.
*/
uint64_t presumed_offset;
/**
* Target memory domains read by this operation.
*/
uint32_t read_domains;
/*
* Target memory domains written by this operation.
*
* Note that only one domain may be written by the
* whole execbuffer operation, so that where there are
* conflicts, the application will get -EINVAL back.
*/
uint32_t write_domain;
};
'target_handle'
, the handle to the target
object
. This
object
must
be one of the objects listed
in
the execbuffer request or
bad things will happen. The kernel doesn't check
for
this
.
'offset'
is
where
,
in
the source
object
, the relocation data
are written. Each relocation value
is
a 32-bit value consisting
of the location of the target
object
in
the GPU memory space plus
the
'delta'
value included
in
the relocation.
'presumed_offset'
is
where
user-space believes the target
object
lies
in
GPU memory space. If
this
value matches
where
the
object
actually
is
, then no relocation data are written, the kernel
assumes that user space has
set
up data
in
the source
object
using
this
presumption. This offers a fairly important optimization
as
writing relocation data requires mapping of the source
object
into
the kernel memory space.
'read_domains'
and
'write_domains'
list the usage
by
the source
object
of the target
object
. The kernel unions all of the domain
information
from
all relocations
in
the execbuffer request. No more
than one write_domain
is
allowed, otherwise an EINVAL error
is
returned. read_domains must contain write_domain. This domain
information
is
used to synchronize buffer contents
as
described
above
in
the section
on
domains.
7.1.1 Memory Domains (Intel specific)
The Intel GPU has several
internal
caches which are not coherent and hence
require
explicit
synchronization. Memory domains provide the necessary data
to synchronize what
is
needed
while
leaving other cache contents intact.
* DRM_GEM_DOMAIN_I915_RENDER.
The GPU 3D and 2D rendering operations use a unified rendering cache, so
operations doing 3D painting and 2D blts will use
this
domain
* DRM_GEM_DOMAIN_I915_SAMPLER
Textures are loaded
by
the sampler through a separate cache, so
any texture reading will use
this
domain. Note that the sampler
and renderer use different caches, so moving an
object
from
render target
to texture source will require a domain transfer.
* DRM_GEM_DOMAIN_I915_COMMAND
The command buffer doesn't have an
explicit
cache (although it does
read ahead quite a bit), so
this
domain just indicates that the
object
needs to be flushed to the GPU.
* DRM_GEM_DOMAIN_I915_INSTRUCTION
All of the programs
on
Gen4 and later chips use an instruction cache to
speed program execution. It must be explicitly flushed when
new
programs
are written to memory
by
the CPU.
* DRM_GEM_DOMAIN_I915_VERTEX
Vertex data uses two different vertex caches, but they're
both flushed with the same instruction.
7.2 Execution
object
list (Intel specific)
struct
drm_i915_gem_exec_object {
/**
* User's handle for a buffer to be bound into the GTT
* for this operation.
*/
uint32_t handle;
/**
* List of relocations to be performed on this buffer
*/
uint32_t relocation_count;
/* struct drm_i915_gem_relocation_entry *relocs */
uint64_t relocs_ptr;
/**
* Required alignment in graphics aperture
*/
uint64_t alignment;
/**
* Returned value of the updated offset of the object,
* for future presumed_offset writes.
*/
uint64_t offset;
};
Each
object
involved
in
a particular execution operation must be
listed
using
one of these structures.
'handle'
references the
object
.
'relocs_ptr'
is
a user-mode pointer to a array of
'relocation_count'
drm_i915_gem_relocation_entry structs (see above) that
define the relocations necessary
in
this
buffer. Note that all
relocations must reference other exec_object structures
in
the same
execbuffer ioctl and that those other buffers must come earlier
in
the exec_object array. In other words, the dependencies mapped
by
the
exec_object relocations must form a directed acyclic graph.
'alignment'
is
the
byte
alignment necessary
for
this
buffer. Each
object
has specific alignment requirements,
as
the kernel doesn't
know what each
object
is
being used
for
, those requirements must be
provided
by
user mode. If an
object
is
used
in
two different ways,
it's quite possible that the alignment requirements will differ.
'offset'
is
a
return
value, receiving the location of the
object
during
this
execbuffer operation. The application should use
this
as
the presumed offset
in
future operations;
if
the
object
does not
move, then kernel need not write relocation data.
7.3 Execbuffer ioctl (Intel specific)
struct
drm_i915_gem_execbuffer {
/**
* List of buffers to be validated with their
* relocations to be performend on them.
*
* These buffers must be listed in an order such that
* all relocations a buffer is performing refer to
* buffers that have already appeared in the validate
* list.
*/
/* struct drm_i915_gem_validate_entry *buffers */
uint64_t buffers_ptr;
uint32_t buffer_count;
/**
* Offset in the batchbuffer to start execution from.
*/
uint32_t batch_start_offset;
/**
* Bytes used in batchbuffer from batch_start_offset
*/
uint32_t batch_len;
uint32_t DR1;
uint32_t DR4;
uint32_t num_cliprects;
uint64_t cliprects_ptr;
/* struct drm_clip_rect *cliprects */
};
'buffers_ptr'
is
a user-mode pointer to an array of
'buffer_count'
drm_i915_gem_exec_object structures which contains the complete
set
of objects required
for
this
execbuffer operation. The last entry
in
this
array, the
'batch buffer'
,
is
the buffer of commands which will
be linked to the ring and executed.
'batch_start_offset'
is
the
byte
offset within the batch buffer which
contains the first command to execute. So far, we haven't found a
reason to use anything other than
'0'
here, but the thought was that
some space might be allocated
for
additional initialization which
could be skipped
in
some cases. This must be a multiple of 4.
'batch_len'
is
the length,
in
bytes, of the data to be executed
(i.e., the amount of data after batch_start_offset). This must
be a multiple of 4.
'num_cliprects'
and
'cliprects_ptr'
reference an array of
drm_clip_rect structures that
is
num_cliprects
long
. The entire
batch buffer will be executed multiple times, once
for
each
rectangle
in
this
list. If num_cliprects
is
0, then no clipping
rectangle will be
set
.
'DR1'
and
'DR4'
are portions of the 3DSTATE_DRAWING_RECTANGLE
command which will be queued when
this
operation
is
clipped
(num_cliprects != 0).
DR1 bit definition
31 Fast Scissor Clip Disable (debug only).
Disables a hardware optimization that
improves performance. This should have
no visible effect, other than reducing
performance
30 Depth Buffer Coordinate Offset Disable.
This disables the addition of the
depth buffer offset bits which are used
to change the location of the depth buffer
relative to the front buffer.
27:26 X Dither Offset. Specifies the X pixel
offset to use when accessing the dither table
25:24 Y Dither Offset. Specifies the Y pixel
offset to use when accessing the dither
table.
DR4 bit definition
31:16 Drawing Rectangle Origin Y. Specifies the Y
origin of coordinates relative to the
draw buffer.
15:0 Drawing Rectangle Origin X. Specifies the X
origin of coordinates relative to the
draw buffer.
As you can see, these two fields are necessary
for
correctly
offsetting drawing within a buffer which contains multiple surfaces.
Note that DR1
is
only used
on
Gen3 and earlier hardware and that
newer hardware sticks the dither offset elsewhere.
7.3.1 Detailed Execution Description
Execution of a single batch buffer requires several preparatory
steps to make the objects visible to the graphics engine and resolve
relocations to account
for
their current addresses.
A. Mapping and Relocation
Each exec_object structure
in
the array
is
examined
in
turn.
If the
object
is
not already bound to the GTT, it
is
assigned a
location
in
the graphics address space. If no space
is
available
in
the GTT, some other
object
will be evicted. This may require waiting
for
previous execbuffer requests to complete before that
object
can
be unmapped. With the location assigned, the pages
for
the
object
are pinned
in
memory
using
find_or_create_page and the GTT entries
updated to point at the relevant pages
using
drm_agp_bind_pages.
Then the array of relocations
is
traversed. Each relocation record
looks up the target
object
and,
if
the presumed offset does not
match the current offset (remember that
this
buffer has already been
assigned an address
as
it must have been mapped earlier), the
relocation value
is
computed
using
the current offset. If the
object
is
currently
in
use
by
the graphics engine, writing the data
out
must be preceeded
by
a delay
while
the
object
is
still busy.
Once it
is
idle, then the page containing the relocation
is
mapped
by
the CPU and the updated relocation data written
out
.
The read_domains and write_domain entries
in
each relocation are
used to compute the
new
read_domains and write_domain values
for
the
target buffers. The actual execution of the domain changes must wait
until all of the exec_object entries have been evaluated
as
the
complete
set
of domain information will not be available until then.
B. Memory Domain Resolution
After all of the
new
memory domain data has been pulled
out
of the
relocations and computed
for
each
object
, the list of objects
is
again traversed and the
new
memory domains compared against the
current memory domains. There are two basic operations involved here:
* Flushing the current write domain. If the
new
read domains
are not equal to the current write domain, then the current
write domain must be flushed. Otherwise, reads will not see data
present
in
the write domain cache. In addition, any
new
read domains
other than the current write domain must be invalidated to ensure
that the flushed data are re-read
into
their caches.
* Invaliding
new
read domains. Any domains which were not currently
used
for
this
object
must be invalidated
as
old objects which
were mapped at the same location may have stale data
in
the
new
domain caches.
If the CPU cache
is
being invalidated and some GPU cache
is
being
flushed, then we'll have to wait
for
rendering to complete so that
any pending GPU writes will be complete before we flush the GPU
cache.
If the CPU cache
is
being flushed, then we use
'clflush'
to
get
data
written
from
the CPU.
Because the GPU caches cannot be partially flushed or invalidated,
we don't actually flush them during
this
traversal stage. Rather, we
gather the invalidate and flush bits up
in
the device structure.
Once all of the
object
domain changes have been evaluated, then the
gathered invalidate and flush bits are examined. For any GPU flush
operations, we emit a single MI_FLUSH command that performs all of
the necessary flushes. We then look to see
if
the CPU cache was
flushed. If so, we use the chipset flush magic (writing to a special
page) to
get
the data
out
of the chipset and
into
memory.
C. Queuing Batch Buffer to the Ring
With all of the objects resident
in
graphics memory space, and all
of the caches prepared with appropriate data, the batch buffer
object
can be queued to the ring. If there are clip rectangles, then
the buffer
is
queued once per rectangle, with suitable clipping
inserted
into
the ring just before the batch buffer.
D. Creating an IRQ Cookie
Right after the batch buffer
is
placed
in
the ring, a request to
generate an IRQ
is
added to the ring along with a command to write a
marker
into
memory. When the IRQ fires, the driver can look at the
memory location to see
where
in
the ring the GPU has passed. This
magic cookie value
is
stored
in
each
object
used
in
this
execbuffer
command; it
is
used whereever you saw
'wait for rendering'
above
in
this
document.
E. Writing back the
new
object
offsets
So that the application has a better idea what to use
for
'presumed_offset'
values later, the current
object
offsets are
written back to the exec_object structures.
8. Other misc Intel-specific functions.
To complete the driver, a few other functions were necessary.
8.1 Initialization
from
the X server
As the X server
is
currently responsible
for
apportioning memory between 2D
and 3D, it must tell the kernel which region of the GTT aperture
is
available
for
3D objects to be mapped
into
.
struct
drm_i915_gem_init {
/**
* Beginning offset in the GTT to be managed by the
* DRM memory manager.
*/
uint64_t gtt_start;
/**
* Ending offset in the GTT to be managed by the DRM
* memory manager.
*/
uint64_t gtt_end;
};
/* usage */
init.gtt_start = <gtt_start>;
init.gtt_end = <gtt_end>;
ret = ioctl (fd, DRM_IOCTL_I915_GEM_INIT, &init);
The GTT aperture between gtt_start and gtt_end will be used to map
objects. This also tells the kernel that the ring can be used,
pulling the ring addresses
from
the device registers.
8.2 Pinning objects
in
the GTT
For scan-
out
buffers and the current shared depth and back buffers, we need
to have them always available
in
the GTT, at least
for
now. Pinning means to
lock
their pages
in
memory along with keeping them at a
fixed
offset
in
the
graphics aperture. These operations are available only to root.
struct
drm_i915_gem_pin {
/** Handle of the buffer to be pinned. */
uint32_t handle;
uint32_t pad;
/** alignment required within the aperture */
uint64_t alignment;
/** Returned GTT offset of the buffer. */
uint64_t offset;
};
/* usage */
pin.handle = <handle>;
pin.alignment = <alignment>;
ret = ioctl (fd, DRM_IOCTL_I915_GEM_PIN, &pin);
if
(ret == 0)
return
pin.offset;
Pinning an
object
ensures that it will not be evicted
from
the GTT
or moved. It will stay resident until destroyed or unpinned.
struct
drm_i915_gem_unpin {
/** Handle of the buffer to be unpinned. */
uint32_t handle;
uint32_t pad;
};
/* usage */
unpin.handle = <handle>;
ret = ioctl (fd, DRM_IOCTL_I915_GEM_UNPIN, &unpin);
Unpinning an
object
makes it possible to evict
this
object
from
the
GTT. It doesn't ensure that it will be evicted, just that it may.
|