TLS的四种模式 Global Dynamic,Local Dynamic,Initial Exec和Local Exec


1.实现了IPP3编译器的TLS的4种模式:Global Dynamic, Local Dynamic, Initial Exec和Local Exec。


2.clang支持-fpic/-fPIC选项


3.clang不支持如下编译属性:
  __attribute__((tls_model("global-dynamic")))、
  __attribute__((tls_model("local-dynamic")))、
  __attribute__((tls_model("initial-exec")))、
  __attribute__((tls_model("local-exec"))。
  目前该问题还没有解决,如果要解决,需要修改clang前端和LLVM公共部分的代码。


4.由于PSC提供的库没有对thread支持,目前无法对TLS的代码进行测试。


5.根据LLVM的机制,选择何种TLS模式是根据“relocaltion model”是否为PIC来确定的。
  代码如下:
    TLSModel::Model getTLSModel(const GlobalValue *GV, Reloc::Model reloc) {
        ……
        if (reloc == Reloc::PIC_) {
            if (isLocal || isHidden)
                return TLSModel::LocalDynamic;
            else
                return TLSModel::GeneralDynamic;
        } else {
            if (!isDeclaration || isHidden)
                return TLSModel::LocalExec;
            else
                return TLSModel::InitialExec;
        }
    }
  从上面的代码可以知道,"Global Dynamic"和"Local Dynamic"两种TLS模式只工作在PIC模式下,而"Local Exec"和"Initial Exec"两种TLS模式只工作在non-PIC模式下。也就是说ABI_for_IPP3_v1.17_add_TLS.ppt文档中提供的non-PIC模式下的"Global Dynamic"、
"Local Dynamic"两种TLS模式,和PIC模式下的"Local Exec"、"Initial Exec"两种TLS模式,
虽然在代码中实现了,但是并不会起作用。


--------------------------------------------------------------------------------------------------------------------------

https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter8-20.html


Thread-Local Storage Access Models

Each TLS reference follows one of the following access models. These models are listed from the most general, but least optimized, to the fastest, but most restrictive.

General Dynamic (GD) - dynamic TLS

This model allows reference of all TLS variables, from either a shared object or a dynamic executable. This model also supports the deferred allocation of a TLS block when the block is first referenced from a specific thread.

Local Dynamic (LD) - dynamic TLS of local symbols

This model is a optimization of the GD model. The compiler might determine that a variable is bound locally, or protected, within the object being built. In this case, the compiler instructs the link-editor to statically bind the dynamic tlsoffset and use this model. This model provides a performance benefit over the GD model. Only one call to tls_get_addr() is required per function, to determine the address of dtv0,m. The dynamic TLS offset, bound at link-edit time, is added to the dtv0,maddress for each reference.

Initial Executable (IE) - static TLS with assigned offsets

This model can only reference TLS variables which are available as part of the initial static TLS template. This template is composed of all TLS blocks that are available at process startup, plus a small backup reservation. See Program Startup. In this model, the thread pointer-relative offset for a given variable x is stored in the GOT entry for x.

This model can reference a limited number of TLS variables from shared libraries loaded after initial process startup, such as by means of lazy loading, filters, or dlopen(3C). This access is satisfied from a fixed backup reservation. This reservation can only provide storage for uninitialized TLS data items. For maximum flexibility, shared objects should reference thread-local variables using a dynamic TLS model.


Note - Filters can be employed to dynamically select the use of static TLS. A shared object can be built to use dynamic TLS, and act as an auxiliary filter upon a counterpart built to use static TLS. If resourses allow the static TLS object to be loaded, the object is used. Otherwise, a fall back to the dynamic TLS object insures that the functionality provided by the shared object is always available. For more information on filters see Shared Objects as Filters.


Local Executable (LE) - static TLS

This model can only reference TLS variables which are part of the TLS block of the dynamic executable. The link-editor calculates the thread pointer-relative offsets statically, without the need for dynamic relocations, or the extra reference to the GOT. This model can not be used to reference variables outside of the dynamic executable.

--------------------------------------------------------------------------------------------------------------------------

https://docs.oracle.com/cd/E23824_01/html/819-0690/gentextid-22601.html#chapter8-7


Runtime Allocation of Thread-Local Storage

TLS is created at three occasions during the lifetime of a program.

  • At program startup.

  • When a new thread is created.

  • When a thread references a TLS block for the first time after a shared object is loaded following program startup.

Thread-local data storage is laid out at runtime as illustrated in Figure 14-1.

Figure 14-1 Runtime Storage Layout of Thread-Local Storage

image:Runtime Thread-Local Storage Layout

Program Startup

At program startup, the runtime system creates TLS for the main thread.

First, the runtime linker logically combines the TLS templates for all loaded dynamic objects, including the dynamic executable, into a single static template. Each dynamic objects's TLS template is assigned an offset within the combined template, tlsoffsetm, as follows.

  • tlsoffset1 = round(tlssize1, align1 )

  • tlsoffsetm+1 = round(tlsoffsetm + tlssizem+1, alignm+1)

tlssizem+1 and alignm+1 are the size and alignment, respectively, for the allocation template for dynamic object m. Where 1 <= m <= M, and M is the total number of loaded dynamic objects. The round(offset, align) function returns an offset rounded up to the next multiple of align.

Next, the runtime linker computes the allocation size that is required for the startup TLS, tlssizeS. This size is equal to tlsoffsetM, plus an additional 512 bytes. This addition provides a backup reservation for static TLS references. Shared objects that make static TLS references, and are loaded after process initialization, are assigned to this backup reservation. However, this reservation is a fixed, limited size. In addition, this reservation is only capable of providing storage for uninitialized TLS data items. For maximum flexibility, shared objects should reference thread-local variables using a dynamic TLS model.

The static TLS arena associated with the calculated TLS size tlssizeS, is placed immediately preceding the thread pointer tpt. Accesses to this TLS data is based off of subtractions from tpt.

The static TLS arena is associated with a linked list of initialization records. Each record in this list describes the TLS initialization image for one loaded dynamic object. Each record contains the following fields.

  • A pointer to the TLS initialization image.

  • The size of the TLS initialization image.

  • The tlsoffsetm of the object.

  • A flag indicating whether the object uses a static TLS model.

The thread library uses this information to allocate storage for the initial thread. This storage is initialized, and a dynamic TLS vector for the initial thread is created.

Thread Creation

For the initial thread, and for each new thread created, the thread library allocates a new TLS block for each loaded dynamic object. Blocks can be allocated separately, or as a single contiguous block.

Each thread t, has an associated thread pointer tpt, which points to the thread control block, TCB. The thread pointer, tp, always contains the value of tpt for the current running thread.

The thread library then creates a vector of pointers, dtvt, for the current thread t. The first element of each vector contains a generation number gent, which is used to determine when the vector needs to be extended. See Deferred Allocation of Thread-Local Storage Blocks.

Each element remaining in the vector dtvt,m, is a pointer to the block that is reserved for the TLS belonging to the dynamic object m.

For dynamically loaded, post-startup objects, the thread library defers the allocation of TLS blocks. Allocation occurs when the first reference is made to a TLS variable within the loaded object. For blocks whose allocation has been deferred, the pointer dtvt,m is set to an implementation-defined special value.


Note - The runtime linker can group TLS templates for all startup objects so as to share a single element in the vector, dtv t,1. This grouping does not affect the offset calculations described previously or the creation of the list of initialization records. For the following sections, however, the value of M, the total number of objects, start with the value of 1.


The thread library then copies the initialization images to the corresponding locations within the new block of storage.

Post-Startup Dynamic Loading

A shared object containing only dynamic TLS can be loaded following process startup without limitations. The runtime linker extends the list of initialization records to include the initialization template of the new object. The new object is given an index of m = M + 1. The counter M is incremented by 1. However, the allocation of new TLS blocks is deferred until the blocks are actually referenced.

When a shared object that contains only dynamic TLS is unloaded, the TLS blocks used by that shared object are freed.

A shared object containing static TLS can be loaded following process startup with limitations. Static TLS references can only be satisfied from any remaining backup TLS reservation. See Program Startup. This reservation is limited in size. In addition, this reservation can only provide storage for uninitialized TLS data items.

A shared object that contains static TLS is never unloaded. The shared object is tagged as non-deletable as a consequence of processing the static TLS.

Deferred Allocation of Thread-Local Storage Blocks

In a dynamic TLS model, when a thread t needs to access a TLS block for object m, the code updates the dtvt and performs the initial allocation of the TLS block. The thread library provides the following interface to provide for dynamic TLS allocation.

typedef struct {
    unsigned long ti_moduleid;
    unsigned long ti_tlsoffset;
} TLS_index;

extern void *__tls_get_addr(TLS_index *ti);     (SPARC and x64)
extern void *___tls_get_addr(TLS_index *ti);    (32–bit x86)

Note - The SPARC and 64–bit x86 definitions of this function have the same function signature. However, the 32–bit x86 version does not use the default calling convention of passing arguments on the stack. Instead, the 32–bit x86 version passes its arguments by means of the %eax register which is more efficient. To denote that this alternate calling method is used, the 32–bit x86 function name has three leading underscores in its name.


Both versions of tls_get_addr() check the per-thread generation counter, gent, to determine whether the vector needs to be updated. If the vector dtvt is out of date, the routine updates the vector, possibly reallocating the vector to make room for more entries. The routine then checks to see if the TLS block corresponding to dtvt,m has been allocated. If the vector has not been allocated, the routine allocates and initializes the block. The routine uses the information in the list of initialization records provided by the runtime linker. The pointer dtv t,m is set to point to the allocated block. The routine returns a pointer to the given offset within the block.


  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值