浅析Linux线程中数据

最新推荐文章于 2022-05-09 23:31:18 发布

MaximusZhou

最新推荐文章于 2022-05-09 23:31:18 发布

阅读量2.2k

点赞数

分类专栏： Linux 文章标签： linux 进程多线程数据数据结构

本文链接：https://blog.csdn.net/MaximusZhou/article/details/41702215

版权

Linux 专栏收录该内容

14 篇文章 1 订阅

订阅专栏

本文首先概述了线程中有哪些数据私有的，以及进程中哪些数据线程是共享的，然后详细分析了线程在用户空间中的数据，最后通过一个多线程程序来分析线程中的数据分布。

概述

线程包含了表示进程内执行环境必需的信息，其中包括进程中标识的线程ID、一组寄存器值、栈、调度优先级和策略、信号屏蔽字（每个线程有自己的信号屏蔽字，但对某个信号的处理方式是进程中所有线程共享的）、errno变量(每个线程都自己的局部errno)以及线程私有数据。进程的所有信息对该进程的所有线程都是共享的，包括可执行的程序文本、程序的全局内存和堆内存以及文件描述符。原则上线程的私有数据并不是真的私有，因为线程的特点就是共享地址空间，只是线程的私有空间就是一般而言通过正常手段不会触及其它线程空间的数据而已，如果通过非正常途径当然也是可以访问的。在Linux中，默认情况下，创建一个线程，在用户空间中分配的空间大小为8M，另外还有4K作为线程警戒区，内核中还会分配一个task_struct结构用于相应的线程。

线程用户空间中数据

Linux线程由Linux内核、glibc和libpthread这三种共同支持实现。在用户空间，和线程关系自身比较密切的，有下面三个部分：

一是类似于线程控制块（TCB）的数据结构，在用户空间中代表着一个线程的存在；

二是线程私有堆栈；

三是线程局部数据（TLS）；

在多线程程序中，各个线程之间的大部分数据都是共享的，但上面三部分数据是各个线程特有的。这三种数据位于同一块内存中，是在创建线程的时候，用mmap系统调用分配出来的，要访问这块地址，需要通过gs寄存器，对于同一个进程内的每一个线程，gs寄存器指向的地址都是不一样的，这样可以保证各个线程之间不会相互干扰。这三块数据在内存分布上，大致如下：

-----------

pthread

-----------

TLS

-----------

Stack

-----------

上面说到，这块内存通过gs寄存器访问，那么gs寄存器指向这块地址的哪个地方呢？是指向pthread结构的首地址。在调用pthread创建线程时，会调用mmap来为线程分配空间，但在mmap之前，会尝试在进程中查找有没有现成的可用空间，这是因为在通常情况下，我们创建了一个线程，当线程运行完后退出时，其占用的空间并没有释放，所以如果A线程退出后，我们又需要创建一个新线程B，那么我们就可以看看A线程的堆栈空间是否满足要求，满足要求的话我们就直接用了。为了线程退出时，释放其所占用的空间，有两种方法，一种是在主线程中调用pthread_join；另一种方法是线程创建时指定detach属性或者创建后在新的线程中调用pthread_detach(pthread_self())，使得线程退出时，自动释放所占用的资源。注意这里释放的只是内核空间中所占用的资源（比如task_struct），而在用户空间中，线程所占用的资源(即在堆上用mmap分配的空间)仍然是没有释放的。下面来看这三个部分分别包括什么数据：

phtread部分保存的是一个类型为pthread的结构体，该结构体包括该线程控制块（Thread Control Block）字段、mutex相关字段、cleanup线程退出的善后工作相关字段、cancelhandling线程取消相关字段、引用线程私有数据相关字段、start_routine入口函数相关字段以及存放线程start_routine的返回值相关字段等线程属性相关信息。Linux中，返回的线程id就是这个pthread结构的地址，也就是gs寄存器中的值。

TLS是值线程本地存储，它主要保存了自定义_thread修饰符修饰的变量；一些库级别预定义的变量，比如errno；线程的私有数据实质也存在这部分。

Stack就是线程执行运行时所用的栈，比如线程中的局部变量就在这部分。下面通过一个多线程程序例子来看线程中数据分布，代码如下：

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <pthread.h>
#include <sys/syscall.h>
#include <assert.h>

#define gettid() syscall(__NR_gettid)  /*get LWP ID */

pthread_key_t key;

__thread int count = 42;
__thread unsigned long long count2 ;
static __thread int count3;

/*thread key destructor*/
void keydestr(void* string)
{
    printf("destructor excuted in thread %p,address (%p) param=%s\n",pthread_self(),string,string);
    free(string);
}

void * thread1(void *arg)
{
    int b;
    pthread_t tid=pthread_self();
    size_t size = 8;

    int autovar = 0;
    static staticvar = 1;
    printf("In thread1, autovaraddress = %p, staticvaraddress = %p\n", &autovar, &staticvar);

    printf("In thread1, tid = %p, gettid = %d\n",tid,gettid());

    char* key_content = ( char* )malloc(size);
    if(key_content != NULL)
    {
        strcpy(key_content,"maximus0");
    }
    pthread_setspecific(key,(void *)key_content);

    count = 1024;
    count2 = 2048;
    count3 = 4096;
    printf("In thread1, tid=%p, count(%p) = %8d, count2(%p) = %6llu, count3(%p) = %6d\n",tid,&count,count,&count2,count2,&count3,count3);

    sleep(2);
    printf("thread1 %p keyselfaddress = %p, returns keyaddress = %p\n",tid,&key, pthread_getspecific(key));

    sleep(30);
    printf("thread1 exit\n");
}

void * thread2(void *arg)
{
    int b;
    pthread_t tid=pthread_self();
    size_t size = 8;

    int autovar = 0;
    static staticvar = 1;
    printf("In thread2, autovaraddress = %p, staticvaraddress = %p\n", &autovar, &staticvar);

    printf("In thread2, tid = %p, gettid = %d\n",tid,gettid());

    char* key_content = ( char* )malloc(size);
    if(key_content != NULL)
    {
        strcpy(key_content,"ABCDEFG");
    }
    pthread_setspecific(key,(void *)key_content);

    count = 1025;
    count2 = 2049;
    count3 = 4097;
    printf("In thread2, tid=%p, count(%p) = %8d, count2(%p) = %6llu, count3(%p) = %6d\n",tid,&count,count,&count2,count2,&count3,count3);

    sleep(1);
    printf("thread2 %p keyselfaddress = %p, returns keyaddress = %p\n",tid,&key, pthread_getspecific(key));

    sleep(50);
    printf("thread2 exit\n");
}


int main(void)
{
    int b;
    int autovar = 0;
    static staticvar = 1;

    pthread_t tid1,tid2;
    printf("start,pid=%d\n",getpid());

    printf("In main, autovaraddress = %p, staticvaraddress = %p\n", &autovar, &staticvar);

    pthread_key_create(&key,keydestr);

    pthread_create(&tid1,NULL,thread1,NULL);
    pthread_create(&tid2,NULL,thread2,NULL);

    printf("In main, pthread_create tid1 = %p\n",tid1);
    printf("In main, pthread_create tid2 = %p\n",tid2);

    if(pthread_join(tid2,NULL) == 0)
    {
        printf("In main,pthread_join thread2 success!\n");
        sleep(5);
    }

    pthread_key_delete(key);
    printf("main thread exit\n");

    return 0;
}

编译并运行程序，结果如下：

$gcc -Wall -lpthread -o hack_thread_data hack_thread_data.c
$./hack_thread_data
start,pid=52168
In main, autovaraddress = 0x7fffd3eceea8, staticvaraddress = 0x601650
In main, pthread_create tid1 = 0x7ffe4baee700
In main, pthread_create tid2 = 0x7ffe4b2ed700
In thread2, autovaraddress = 0x7ffe4b2eceb0, staticvaraddress = 0x601654
In thread2, tid = 0x7ffe4b2ed700, gettid = 52170
In thread2, tid=0x7ffe4b2ed700, count(0x7ffe4b2ed6e8) =     1025, count2(0x7ffe4b2ed6f0) =   2049, count3(0x7ffe4b2ed6f8) =   4097
In thread1, autovaraddress = 0x7ffe4baedeb0, staticvaraddress = 0x601658
In thread1, tid = 0x7ffe4baee700, gettid = 52169
In thread1, tid=0x7ffe4baee700, count(0x7ffe4baee6e8) =     1024, count2(0x7ffe4baee6f0) =   2048, count3(0x7ffe4baee6f8) =   4096
thread2 0x7ffe4b2ed700 returns keyaddress = 0x1598270
thread1 0x7ffe4baee700 returns keyaddress = 0x1598290
thread1 exit
destructor excuted in thread 0x7ffe4baee700,address (0x1598290) param=maximus0
thread2 exit
destructor excuted in thread 0x7ffe4b2ed700,address (0x1598270) param=ABCDEFG
In main,pthread_join thread2 success!
main thread exit

在线程1结束之前、线程1结束之后、线程2结束之前以及线程结束之后，使用命令cat /proc/52168/maps都得到进程的地址空间都是如下（这说明线程退出后（即使detach这个线程），线程在用户空间所占用的空间并不会释放）：

从地址空间可以看出：

1）创建的线程和主线程在地址空间的位置。线程中的局部变量都在相应线程的栈空间，而线程的静态变量都是在.data节，线程动态分配的空间都是在堆（heap）上。

2）__thread声明的变量每一个线程有一份独立实体，各个线程的值互不干扰。可以用来修饰那些带有全局性且值可能变，但是又不值得用全局变量保护的变量。

3）通过在线程调用syscall(__NR_gettid)，可以获得每个线程在内核中对应的进程ID。如果直接在线程中调用getpid，则实质获得的是该线程所在的线程组tgid。线程在内核中对应的进程ID，也可以通过命令ps axj -L来查看，其中选项-L会增加显示LWP列，即线程对应的实质PID，或者使用命令top，然后按下H（注意是大写）键，也是显示所有的线程。

4）线程id实质是线程栈中的某个地址。

5）线程在用户空间分两部分，权限---p部分的空间实质上是线程的警容缓冲区，这个缓冲区的大小，可以在创建线程时通过修改线程属性guardsize的值来指定，这个值控制着线程栈末尾之后用以避免栈溢出的扩展内存大小，这个值默认值为PAGESIZE字节。当前线程的栈空间的大小也可以在创建线程时指定。默认线程栈空间的大小为8M，比如thread2就是7ffe4aaee000-7ffe4b2ee000，即大小为8M，警容缓冲区默认大小为4KB，比如thread2就是7ffe4aaed000-7ffe4aaee000。

6）在某个线程中使用sleep，只会让当前线程阻塞，不会影响其他线程。

参考资料

http://blog.csdn.net/liuxuejiang158blog/article/details/14100897
http://www.linuxsir.org/bbs/thread317267.html
http://javadino.blog.sohu.com/74292914.html
http://blog.chinaunix.net/uid-24774106-id-3650136.html
http://blog.chinaunix.net/uid-24774106-id-3651266.html
http://blog.csdn.net/dog250/article/details/7704898
http://www.longene.org/forum/viewtopic.php?f=17&t=414
http://www.longene.org/forum/viewtopic.php?f=17&t=429
http://www.longene.org/forum/viewtopic.php?f=17&t=441