声明
- 在Android系统中经常会遇到一些系统原理性的问题,在此专栏中集中来讨论下。
- Thread 类是 Android 为线操作而做的一个封装。代码在 system/core/libutils/Threads.cpp 中,其中还封装了些与线程同步相关(与Pthread相关)的类。此篇介绍常用同步类相关的知识。
- 此篇参考一些博客和书籍,不方便逐一列出,仅供学习、知识分享。
1 Android Thread类概述
1.1 Thread类
虽然说Android可以直接使用C++的pthread类编程,但Android毕竟需要Java世界,为了更好的和Java交互还是添加Thread类,它构造函数中的那个canCallJava是重点。
/*
* This is our thread object!
*/
// camCallJava 表示这个线程是否会使用JNI函数
Thread::Thread(bool canCallJava)
: mCanCallJava(canCallJava),
mThread(thread_id_t(-1)),
mLock("Thread::mLock"),
mStatus(NO_ERROR),
mExitPending(false), mRunning(false)
#if defined(__ANDROID__)
, mTid(-1)
#endif
{
}
1.2 canCallJava 变量的处理
status_t Thread::run(const char* name, int32_t priority, size_t stack)
{
if (name == nullptr) {
ALOGW("Thread name not provided to Thread::run");
name = 0;
}
Mutex::Autolock _l(mLock);
if (mRunning) {
// thread already started
return INVALID_OPERATION;
}
// reset status and exitPending to their default value, so we can
// try again after an error happened (either below, or in readyToRun())
mStatus = NO_ERROR;
mExitPending = false;
mThread = thread_id_t(-1);
// hold a strong reference on ourself
mHoldSelf = this;
mRunning = true;
bool res;
// 如果mCanCallJava为真,则调用createThreadEtc函数,线程函数是_threadLoop
// _threadLoop是Threads.cpp中定义的一个函数
if (mCanCallJava) {
res = createThreadEtc(_threadLoop,
this, name, priority, stack, &mThread);
} else {
res = androidCreateRawThreadEtc(_threadLoop,
this, name, priority, stack, &mThread);
}
if (res == false) {
mStatus = UNKNOWN_ERROR; // something happened!
mRunning = false;
mThread = thread_id_t(-1);
mHoldSelf.clear(); // "this" may have gone away after this.
return UNKNOWN_ERROR;
}
// Do not refer to mStatus here: The thread is already running (may, in fact
// already have exited with a valid mStatus result). The NO_ERROR indication
// here merely indicates successfully starting the thread and does not
// imply successful termination/execution.
return NO_ERROR;
// Exiting scope of mLock is a memory barrier and allows new thread to run
}
上面的mCanCallJava将线程创建函数的逻辑分为两个分支,虽传入的参数都有_threadLoop,但调用的函数却不同。先直接看mCanCallJava为true的这个分支:
代码位置:system/core/include/utils/AndroidThreads.h
// Create thread with lots of parameters
inline bool createThreadEtc(thread_func_t entryFunction,
void *userData,
const char* threadName = "android:unnamed_thread",
int32_t threadPriority = PRIORITY_DEFAULT,
size_t threadStackSize = 0,
thread_id_t *threadId = 0)
{
//调用androidCreateThreadEtc函数
return androidCreateThreadEtc(entryFunction, userData, threadName,
threadPriority, threadStackSize, threadId) ? true : false;
}
// gCreateThreadFn是函数指针,初始化时和mCanCallJava为false时使用的是同一个
static android_create_thread_fn gCreateThreadFn = androidCreateRawThreadEtc;
int androidCreateThreadEtc(android_thread_func_t entryFunction,
void *userData,
const char* threadName,
int32_t threadPriority,
size_t threadStackSize,
android_thread_id_t *threadId)
{
return gCreateThreadFn(entryFunction, userData, threadName,
threadPriority, threadStackSize, threadId);
}
如果没有人修改这个函数指针,那么mCanCallJava就是虚晃一枪,并无什么作用,很可惜,代码中有的地方是会修改这个函数指针的指向的。
1.3 Zygote上对它的用法
AndroidRuntime调用startReg的地方,就有可能修改这个函数指针,代码位置:frameworks/base/core/jni/AndroidRuntime.cpp
/*
* Register android native functions with the VM.
*/
/*static*/ int AndroidRuntime::startReg(JNIEnv* env)
{
ATRACE_NAME("RegisterAndroidNatives");
/*
* This hook causes all future threads created in this process to be
* attached to the JavaVM. (This needs to go away in favor of JNI
* Attach calls.)
*/
// 这里会修改函数指针为javaCreateThreadEtc
androidSetCreateThreadFunc((android_create_thread_fn) javaCreateThreadEtc);
ALOGV("--- registering native functions ---\n");
/*
* Every "register" function calls one or more things that return
* a local reference (e.g. FindClass). Because we haven't really
* started the VM yet, they're all getting stored in the base frame
* and never released. Use Push/Pop to manage the storage.
*/
env->PushLocalFrame(200);
if (register_jni_procs(gRegJNI, NELEM(gRegJNI), env) < 0) {
env->PopLocalFrame(NULL);
return -1;
}
env->PopLocalFrame(NULL);
//createJavaThread("fubar", quickTest, (void*) "hello");
return 0;
}
所以,如果mCanCallJava为true,则将调用javaCreateThreadEtc。那么,这个函数有什么特殊之处呢?来看其代码,如下所示:
/*
* This is invoked from androidCreateThreadEtc() via the callback
* set with androidSetCreateThreadFunc().
*
* We need to create the new thread in such a way that it gets hooked
* into the VM before it really starts executing.
*/
/*static*/ int AndroidRuntime::javaCreateThreadEtc(
android_thread_func_t entryFunction,
void* userData,
const char* threadName,
int32_t threadPriority,
size_t threadStackSize,
android_thread_id_t* threadId)
{
void** args = (void**) malloc(3 * sizeof(void*)); // javaThreadShell must free
int result;
LOG_ALWAYS_FATAL_IF(threadName == nullptr, "threadName not provided to javaCreateThreadEtc");
args[0] = (void*) entryFunction;
args[1] = userData;
args[2] = (void*) strdup(threadName); // javaThreadShell must free
//调用的还是androidCreateRawThreadEtc,但线程函数却换成了javaThreadShell
result = androidCreateRawThreadEtc(AndroidRuntime::javaThreadShell, args,
threadName, threadPriority, threadStackSize, threadId);
return result;
}
/*
* When starting a native thread that will be visible from the VM, we
* bounce through this to get the right attach/detach action.
* Note that this function calls free(args)
*/
/*static*/ int AndroidRuntime::javaThreadShell(void* args) {
void* start = ((void**)args)[0];
void* userData = ((void **)args)[1];
char* name = (char*) ((void **)args)[2]; // we own this storage
free(args);
JNIEnv* env;
int result;
/* hook us into the VM */
//把这个线程attach到JNI环境中,这样这个线程就可以调用JNI的函数了
if (javaAttachThread(name, &env) != JNI_OK)
return -1;
/* start the thread running */
//调用实际的线程函数干活
result = (*(android_thread_func_t)start)(userData);
/* unhook us */
//从JNI环境中detach出来。
javaDetachThread();
free(name);
return result;
}
1.4 好处
明白了 mCanCallJava 为 true 的目的了吗?它创建的新线程将:
- 在调用你的线程函数之前会attach到 JNI环境中,这样,你的线程函数就可以无忧无虑地使用JNI函数了。
- 线程函数退出后,它会从JNI环境中detach,释放一些资源。
第二点尤其重要,因为进程退出前,dalvik/ART会检查是否有attach了,但是最后未detach的线程如果有,则会直接 abort(这不是一件好事)。如果你关闭JNI check选项,就不会做这个检查,这个检查和资源释放有关系。建议还是重视 JNIcheck。如果直接使用 POSIX 的线程创建函数,那么凡是使用过attach的,最后就都需要detach。
1.5 线程函数_threadLoop介绍
不论一分为二是如何处理的,最终的线程函数_threadLoop都会被调用,为什么不直接调用用户传入的线程函数呢?莫非_threadLoop会有什么暗箱操作吗?
int Thread::_threadLoop(void* user)
{
Thread* const self = static_cast<Thread*>(user);
sp<Thread> strong(self->mHoldSelf);
wp<Thread> weak(strong);
self->mHoldSelf.clear();
#if defined(__ANDROID__)
// this is very useful for debugging with gdb
self->mTid = gettid();
#endif
bool first = true;
do {
bool result;
if (first) {
first = false;
//self代表继承Thread类的对象,第一次进来将调用readyToRun,看看是否准备好
self->mStatus = self->readyToRun();
result = (self->mStatus == NO_ERROR);
if (result && !self->exitPending()) {
// Binder threads (and maybe others) rely on threadLoop
// running at least once after a successful ::readyToRun()
// (unless, of course, the thread has already been asked to exit
// at that point).
// This is because threads are essentially used like this:
// (new ThreadSubclass())->run();
// The caller therefore does not retain a strong reference to
// the thread and the thread would simply disappear after the
// successful ::readyToRun() call instead of entering the
// threadLoop at least once.
result = self->threadLoop();
}
} else {
//调用子类实现的threadLoop函数,注意这段代码运行在一个do-while循环中。
//这表示即使我们的threadLoop返回了,线程也不一定会退出。
result = self->threadLoop();
}
// establish a scope for mLock
/*
线程退出的条件:
1 result 为false。这表明,如果子类在threadLoop中返回false,线程就可以退出。这属于主动退出的情况,是threadLoop自己不想继续干活了,所以返回false。在自己的代码中千万别写错threadLoop的返回值。
2 mExitPending为true,这个变量可由Thread类的requestExit函数设置,这种情况属于被动退出,因为由外界强制设置了退出条件。
*/
{
Mutex::Autolock _l(self->mLock);
if (result == false || self->mExitPending) {
self->mExitPending = true;
self->mRunning = false;
// clear thread ID so that requestExitAndWait() does not exit if
// called by a new thread using the same thread ID as this one.
self->mThread = thread_id_t(-1);
// note that interested observers blocked in requestExitAndWait are
// awoken by broadcast, but blocked on mLock until break exits scope
self->mThreadExitedCondition.broadcast();
break;
}
}
// Release our strong reference, to let a chance to the thread
// to die a peaceful death.
strong.clear();
// And immediately, re-acquire a strong reference for the next loop
strong = weak.promote();
} while(strong != 0);
return 0;
}
注意:_threadLoop运行在一个循环中,它的返回值可以决定是否退出线程。
2 常用同步类
多线程编程中不可回避的话题,只简单介绍一下Android提供的同步类。这些类,只对系统提供的多线程同步函数(这种函数我们也称之为Raw API)进行了面向对象的封装。
Android提供了两个封装好的同步类,它们是Mutex和Condition。这是重量级的同步技术,一般内核会有对应的支持。另外,OS还提供了简单的原子操作,这些也算是同步技术的一种。下面分别来介绍这三种东西。
2.1 互斥类 Mutex
Mutex是互斥类,用于多线程访问同一个资源的时候,保证一次只能有一个线程能访问该资源。下面来看Mutex的实现方式,它们都很简单。
Mutex::Mutex()
{
HANDLE hMutex;
assert(sizeof(hMutex) == sizeof(mState));
hMutex = CreateMutex(NULL, FALSE, NULL);
mState = (void*) hMutex;
}
Mutex::Mutex(const char* name)
{
// XXX: name not used for now
HANDLE hMutex;
assert(sizeof(hMutex) == sizeof(mState));
hMutex = CreateMutex(NULL, FALSE, NULL);
mState = (void*) hMutex;
}
Mutex::Mutex(int type, const char* name)
{
// XXX: type and name not used for now
HANDLE hMutex;
assert(sizeof(hMutex) == sizeof(mState));
hMutex = CreateMutex(NULL, FALSE, NULL);
mState = (void*) hMutex;
}
Mutex::~Mutex()
{
CloseHandle((HANDLE) mState);
}
status_t Mutex::lock()
{
DWORD dwWaitResult;
dwWaitResult = WaitForSingleObject((HANDLE) mState, INFINITE);
return dwWaitResult != WAIT_OBJECT_0 ? -1 : NO_ERROR;
}
void Mutex::unlock()
{
if (!ReleaseMutex((HANDLE) mState))
ALOG(LOG_WARN, "thread", "WARNING: bad result from unlocking mutex\n");
}
status_t Mutex::tryLock()
{
DWORD dwWaitResult;
dwWaitResult = WaitForSingleObject((HANDLE) mState, 0);
if (dwWaitResult != WAIT_OBJECT_0 && dwWaitResult != WAIT_TIMEOUT)
ALOG(LOG_WARN, "thread", "WARNING: bad result from try-locking mutex\n");
return (dwWaitResult == WAIT_OBJECT_0) ? 0 : -1;
}
关于Mutex的使用,除了初始化外,最重要的是lock和unlock函数的使用,它们的用法如下:
- 要想独占资源,必须先调用Mutex的lock函数。这个区域就被锁住了。如果这块区域之前已被别人锁住,lock函数则会等待,直到可以进入这块区域为止。系统保证一次只有一个线程能lock成功。
- 当资源访问完毕,记得调用Mutex的unlock以释放互斥区域。这样,其他线程的lock才可以成功返回。
- Mutex还提供了一个trylock函数,该函数只是尝试去锁住该区域,使用者需要根据trylock的返回值判断是否成功锁住了该区域。
注意,以上这些内容都和Raw API有关,在学习Linux系统编程时应该都学过这些基础概念。Mutex类确实比Raw API方便好用,不过还是稍显麻烦。
2.2 条件类 Condition
多线程同步中的条件类对应的是下面一种使用场景:
- 线程A做初始化工作,而其他线程比如线程B、C必须等到初始化工作完后才能工作,即线程B、C在等待一个条件,我们称B、C为等待者。
- 当线程A完成初始化工作时,会触发这个条件,那么等待者B、C就会被唤醒。触发这个条件的A就是触发者。
上面的使用场景非常形象,而且条件类提供的函数也非常形象,它的代码位置:system/core/include/utils/Condition.h
class Condition {
public:
enum {
PRIVATE = 0,
SHARED = 1
};
enum WakeUpType {
WAKE_UP_ONE = 0,
WAKE_UP_ALL = 1
};
Condition();
Condition(int type);//如果type是SHARED,表示支持跨进程的条件同步
~Condition();
// Wait on the condition variable. Lock the mutex before calling.
//线程B和C等待事件,wait这个名字是不是很形象呢?
status_t wait(Mutex& mutex);
// same with relative timeout
//线程B和C的超时等待,B和C可以指定等待时间,当超过这个时间,条件却还不满足,则退出等待
status_t waitRelative(Mutex& mutex, nsecs_t reltime);
// Signal the condition variable, allowing exactly one thread to continue.
//触发者A用来通知条件已经满足,但是B和C只有一个会被唤醒
void signal();
// Signal the condition variable, allowing one or all threads to continue.
void signal(WakeUpType type) {
if (type == WAKE_UP_ONE) {
signal();
} else {
broadcast();
}
}
// Signal the condition variable, allowing all threads to continue.
//触发者A用来通知条件已经满足,所有等待者都会被唤醒
void broadcast();
private:
#if !defined(_WIN32)
pthread_cond_t mCond;
#else
void* mState;
#endif
};
inline Condition::Condition() {
pthread_cond_init(&mCond, NULL);
}
inline Condition::Condition(int type) {
if (type == SHARED) {//设置跨进程的同步支持
pthread_condattr_t attr;
pthread_condattr_init(&attr);
pthread_condattr_setpshared(&attr, PTHREAD_PROCESS_SHARED);
pthread_cond_init(&mCond, &attr);
pthread_condattr_destroy(&attr);
} else {
pthread_cond_init(&mCond, NULL);
}
}
inline Condition::~Condition() {
pthread_cond_destroy(&mCond);
}
inline status_t Condition::wait(Mutex& mutex) {
return -pthread_cond_wait(&mCond, &mutex.mMutex);
}
inline status_t Condition::waitRelative(Mutex& mutex, nsecs_t reltime) {
#if defined(HAVE_PTHREAD_COND_TIMEDWAIT_RELATIVE)
struct timespec ts;
ts.tv_sec = reltime/1000000000;
ts.tv_nsec = reltime%1000000000;
return -pthread_cond_timedwait_relative_np(&mCond, &mutex.mMutex, &ts);
//有些系统没有实现POSIX的相关函数,所以不同系统需要调用不同的函数
#else // HAVE_PTHREAD_COND_TIMEDWAIT_RELATIVE
struct timespec ts;
#if defined(__linux__)
clock_gettime(CLOCK_REALTIME, &ts);
#else // __APPLE__
// we don't support the clocks here.
struct timeval t;
gettimeofday(&t, NULL);
ts.tv_sec = t.tv_sec;
ts.tv_nsec= t.tv_usec*1000;
#endif
ts.tv_sec += reltime/1000000000;
ts.tv_nsec+= reltime%1000000000;
if (ts.tv_nsec >= 1000000000) {
ts.tv_nsec -= 1000000000;
ts.tv_sec += 1;
}
return -pthread_cond_timedwait(&mCond, &mutex.mMutex, &ts);
#endif // HAVE_PTHREAD_COND_TIMEDWAIT_RELATIVE
}
inline void Condition::signal() {
/*
* POSIX says pthread_cond_signal wakes up "one or more" waiting threads.
* However bionic follows the glibc guarantee which wakes up "exactly one"
* waiting thread.
*
* man 3 pthread_cond_signal
* pthread_cond_signal restarts one of the threads that are waiting on
* the condition variable cond. If no threads are waiting on cond,
* nothing happens. If several threads are waiting on cond, exactly one
* is restarted, but it is not specified which.
*/
pthread_cond_signal(&mCond);
}
inline void Condition::broadcast() {
pthread_cond_broadcast(&mCond);
}
可以看出,Condition的实现全是凭借调用了Raw API的pthread_cond_xxx函数。这里要重点说明的是,Condition类必须配合Mutex来使用。
上面代码中,不论是wait、waitRelative、signal还是broadcast的调用,都放在一个Mutex的lock和unlock范围中,尤其是wait和waitRelative函数的调用,这是强制性的。
来看一个实际的例子,加深一下对Condition类和Mutex类使用的印象。这个例子是Thread类的requestExitAndWait,目的是等待工作线程退出,代码如下所示:
status_t Thread::requestExitAndWait()
{
Mutex::Autolock _l(mLock);//使用Autolock,mLock被锁住
if (mThread == getThreadId()) {
ALOGW(
"Thread (this=%p): don't call waitForExit() from this "
"Thread object's thread. It's a guaranteed deadlock!",
this);
return WOULD_BLOCK;
}
mExitPending = true;
while (mRunning == true) {
//条件变量的等待,这里为什么要通过while循环来反复检测mRunning?因为某些时候即使条件类没有被触发,wait也会返回。
mThreadExitedCondition.wait(mLock);
}
// This next line is probably not needed any more, but is being left for
// historical reference. Note that each interested party will clear flag.
mExitPending = false;
//退出前,局部变量Mutex::Autolock_l的析构会被调用,unlock也就会被自动调用。
return mStatus;
}
那么,什么地方会触发这个条件呢?是在工作线程退出前。其代码如下所示:
int Thread::_threadLoop(void* user)
{
Thread* const self = static_cast<Thread*>(user);
sp<Thread> strong(self->mHoldSelf);
wp<Thread> weak(strong);
self->mHoldSelf.clear();
#if defined(__ANDROID__)
// this is very useful for debugging with gdb
self->mTid = gettid();
#endif
bool first = true;
do {
bool result;
if (first) {
first = false;
self->mStatus = self->readyToRun();
result = (self->mStatus == NO_ERROR);
//调用子类的threadLoop函数
if (result && !self->exitPending()) {
// Binder threads (and maybe others) rely on threadLoop
// running at least once after a successful ::readyToRun()
// (unless, of course, the thread has already been asked to exit
// at that point).
// This is because threads are essentially used like this:
// (new ThreadSubclass())->run();
// The caller therefore does not retain a strong reference to
// the thread and the thread would simply disappear after the
// successful ::readyToRun() call instead of entering the
// threadLoop at least once.
result = self->threadLoop();
}
} else {
result = self->threadLoop();
}
// establish a scope for mLock
{
Mutex::Autolock _l(self->mLock);
//如果mExitPending为true,则退出
if (result == false || self->mExitPending) {
self->mExitPending = true;
//mRunning的修改位于锁的保护中。
self->mRunning = false;
// clear thread ID so that requestExitAndWait() does not exit if
// called by a new thread using the same thread ID as this one.
self->mThread = thread_id_t(-1);
// note that interested observers blocked in requestExitAndWait are
// awoken by broadcast, but blocked on mLock until break exits scope
self->mThreadExitedCondition.broadcast();
break;//退出循环,此后该线程函数会退出
}
}
// Release our strong reference, to let a chance to the thread
// to die a peaceful death.
strong.clear();
// And immediately, re-acquire a strong reference for the next loop
strong = weak.promote();
} while(strong != 0);
return 0;
}
关于Android多线程的同步类,暂时到此了。
2.3 原子操作函数
什么是原子操作?所谓原子操作,就是该操作绝不会在执行完毕前被任何其他任务或事件打断,也就说,原子操作是最小的执行单位。
static int g_flag = 0; //全局变量g_flag
static Mutex lock ;//全局的锁
//线程1执行thread1
void thread1()
{
//g_flag递减,每次操作前锁住
lock.lock();
g_flag--;
lock.unlock();
}
//线程2中执行thread2函数
void thread2()
{
lock.lock();
g_flag++; //线程2对g_flag进行递增操作,每次操作前要取得锁
lock.unlock();
}
为什么需要Mutex来帮忙呢?因为g_flags++或者g_flags—操作都不是原子操作。从汇编指令的角度看,C/C++中的一条语句对应了数条汇编指令。以g_flags++操作为例,它生成的汇编指令可能就是以下三条:
- 从内存中取数据到寄存器。
- 对寄存器中的数据进行递增操作,结果还在寄存器中。
- 寄存器的结果写回内存。
这三条汇编指令,如果按正常的顺序连续执行,是没有问题的,但在多线程时就不能保证了。在一般情况下,处理这种问题可以使用Mutex来加锁保护,但Mutex的使用比它所要保护的内容还复杂,例如,锁的使用将导致从用户态转入内核态,有较大的浪费。那么,有没有简便些的办法让这些加、减等操作不被中断呢?
Android提供了相关的原子操作函数。源码位置:system/core/include/utils/Atomic.h
//下面所有函数的返回值都是操作前的旧值
//原子加1和原子减1
ANDROID_ATOMIC_INLINE
int32_t android_atomic_inc(volatile int32_t* addr)
{
volatile atomic_int_least32_t* a = (volatile atomic_int_least32_t*)addr;
/* Int32_t, if it exists, is the same as int_least32_t. */
return atomic_fetch_add_explicit(a, 1, memory_order_release);
}
ANDROID_ATOMIC_INLINE
int32_t android_atomic_dec(volatile int32_t* addr)
{
volatile atomic_int_least32_t* a = (volatile atomic_int_least32_t*)addr;
return atomic_fetch_sub_explicit(a, 1, memory_order_release);
}
//原子加法操作,value为被加数
ANDROID_ATOMIC_INLINE
int32_t android_atomic_add(int32_t value, volatile int32_t* addr)
{
volatile atomic_int_least32_t* a = (volatile atomic_int_least32_t*)addr;
return atomic_fetch_add_explicit(a, value, memory_order_release);
}
//原子“与”和“或”操作
ANDROID_ATOMIC_INLINE
int32_t android_atomic_and(int32_t value, volatile int32_t* addr)
{
volatile atomic_int_least32_t* a = (volatile atomic_int_least32_t*)addr;
return atomic_fetch_and_explicit(a, value, memory_order_release);
}
ANDROID_ATOMIC_INLINE
int32_t android_atomic_or(int32_t value, volatile int32_t* addr)
{
volatile atomic_int_least32_t* a = (volatile atomic_int_least32_t*)addr;
return atomic_fetch_or_explicit(a, value, memory_order_release);
}