简介
(本文原地址在我的博客CheapTalks, 欢迎大家来看看~)
注:本篇文章的所有源码与注释都可以在YogiAi/Process.start中找到,只想阅读代码的同学可以直奔主题。
众所周知,Android 系统是基于 Linux 内核的移动操作系统。而 Linux 又是通过 fork 来复制进程,复制的时候只是创建唯一识别符等轻量操作,真正对于资源的使用是借助了写时复制的机制(copy-on-write)。进程与线程的概念在 Linux 的世界中只有资源拥有的差别,本质上它们的内核实现都使用了 task_struct 这同一个结构体,都拥有着各自的 PID, PPID。
在 Android 自成的上层 framework 世界中,进程的概念被层层的封装后已经很模糊了,基本开发者完成开发过程只需熟悉Activity, BroadcastReceiver, Service等四大组件的作用与使用场景,再加上网络访问、数据存储、UI 绘制等等组合而成的业务逻辑即可完成一个很不错的应用。
然而研究 Android 的进程启动时机与实现原理对于进阶学习还是大有裨益的,不仅能够学到进程线程的内功知识,也能够学到设计大师的封装奥妙。Android 应用进程的创建是通过 fork Zygote 进程来实现的,所以所有的应用进程的 PPID 都是 Zygote 的 PID。复制 Zygote 的实例后会得到一个虚拟机实例,除此之外,新建的进程还会获取到一个消息循环、 Binder 的进程通信池以及一个 Binder 主线程。
在这一篇文章中,我将详细解析Android 进程的创建过程。
system_server 端
AMS.startProcessLocked
ActivityManagerService 运行在 system_server 进程,为应用提供各种服务。系统可能因为发送广播,启动服务,运行 Activity 等原因启动一个新的进程。这时候会 binder call 到 AMS,调用 startProcessLocked 方法进行处理。
final ProcessRecord startProcessLocked(String processName,
ApplicationInfo info, boolean knownToBeDead, int intentFlags,
String hostingType, ComponentName hostingName, boolean allowWhileBooting) {
...
if (app == null) {
app = newProcessRecordLocked(null, info, processName);
mProcessNames.put(processName, info.uid, app);
} else {
// If this is a new package in the process, add the package to the list
app.addPackage(info.packageName);
}
...
startProcessLocked(app, hostingType, hostingNameStr);
return (app.pid != 0) ? app : null;
}复制代码
private final void startProcessLocked(ProcessRecord app,
String hostingType, String hostingNameStr) {
...
try {
// 获得创建的应用程序进程的用户 ID,用户组 ID
int uid = app.info.uid;
int[] gids = null;
try {
gids = mContext.getPackageManager().getPackageGids(
app.info.packageName);
} catch (PackageManager.NameNotFoundException e) {
Slog.w(TAG, "Unable to retrieve gids", e);
}
...
// 创建进程,并制定这个进程的路口时 ActivityThread 的静态方法 main
int pid = Process.start("android.app.ActivityThread",
mSimpleProcessManagement ? app.processName : null, uid, uid,
gids, debugFlags, null);
...
}复制代码
Process.start
进程的启动入口在 Process.start 方法,start 方法只做了初始化参数的工作,真正的复制进程工作是在 zygote 进行完成的,system_server 与 zygote 进程的通信使用的 socket。system_server 会将参数信息写入到 socket 中,然后阻塞等待 zygote 的回应。
public static final int start(final String processClass,
final String niceName,
int uid, int gid, int[] gids,
int debugFlags,
String[] zygoteArgs)
{
// 是否支持 Binder 进程间通信的机制
if (supportsProcesses()) {
try {
// 如果支持, 就请求 Zygote 来创建一个应用程序进程
return startViaZygote(processClass, niceName, uid, gid, gids,
debugFlags, zygoteArgs);
} catch (ZygoteStartFailedEx ex) {
Log.e(LOG_TAG,
"Starting VM process through Zygote failed");
throw new RuntimeException(
"Starting VM process through Zygote failed", ex);
}
} else {
// Running in single-process mode
// 如果不支持, 就使用一个线程来模拟进程
Runnable runnable = new Runnable() {
public void run() {
Process.invokeStaticMain(processClass);
}
};
// Thread constructors must not be called with null names (see spec).
if (niceName != null) {
new Thread(runnable, niceName).start();
} else {
new Thread(runnable).start();
}
return 0;
}
}复制代码
private static int startViaZygote(final String processClass,
final String niceName,
final int uid, final int gid,
final int[] gids,
int debugFlags,
String[] extraArgs)
throws ZygoteStartFailedEx {
int pid;
synchronized(Process.class) {
// 初始化进程的启动参数列表
ArrayList<String> argsForZygote = new ArrayList<String>();
...
// 初始化完毕
pid = zygoteSendArgsAndGetPid(argsForZygote);
}
...
return pid;
}复制代码
private static int zygoteSendArgsAndGetPid(ArrayList<String> args)
throws ZygoteStartFailedEx {
int pid;
// 创建一个连接到 Zygote 的 LocalSocket 对象
openZygoteSocketIfNeeded();
try {
// 将要创建的应用程序的进程启动参数传到 LocalSocket 对象中
sZygoteWriter.write(Integer.toString(args.size()));
sZygoteWriter.newLine();
int sz = args.size();
for (int i = 0; i < sz; i++) {
String arg = args.get(i);
if (arg.indexOf('\n') >= 0) {
throw new ZygoteStartFailedEx(
"embedded newlines not allowed");
}
sZygoteWriter.write(arg);
sZygoteWriter.newLine();
}
sZygoteWriter.flush();
// Should there be a timeout on this?
// 通过 Socket 读取 Zygote 创建成功的进程 PID
// Socket 对端的请求在 ZygoteInit.runSelectLoopMode中进行处理
// 读取成功之后会对 PID 进行检查,无异常的话就会推出
pid = sZygoteInputStream.readInt();
if (pid < 0) {
throw new ZygoteStartFailedEx("fork() failed");
}
} catch (IOException ex) {
...
}
return pid;
}复制代码
zygote 端
ZygoteInit.main
zygote 进程的 Socket服务端是在此处进行创建初始化的,当接受到 socket 客户端的请求时会进行处理。本质上是启动了一个无限循环来处理客户端的请求。这里 system_server 与 zygote 是典型的 C/S 架构。
public static void main(String argv[]) {
try {
VMRuntime.getRuntime().setMinimumHeapSize(5 * 1024 * 1024);
// Start profiling the zygote initialization.
SamplingProfilerIntegration.start();
// 在 Zygote 服务端注册一个 Socket Server, 用来创建新进程
registerZygoteSocket();
...
if (ZYGOTE_FORK_MODE) {
runForkMode();
} else {
// 开始处理进程创建的 Socket 请求
runSelectLoopMode();
}
closeServerSocket();
} catch (MethodAndArgsCaller caller) {
// ActivityThread 的静态方法在这被回调执行
// 这里间接调用方法,巧妙的利用了异常处理机制来清理前面的调用栈
caller.run();
} catch (RuntimeException ex) {
Log.e(TAG, "Zygote died with exception", ex);
closeServerSocket();
throw ex;
}
}复制代码
private static void runSelectLoopMode() throws MethodAndArgsCaller {
ArrayList<FileDescriptor> fds = new ArrayList();
ArrayList<ZygoteConnection> peers = new ArrayList();
FileDescriptor[] fdArray = new FileDescriptor[4];
fds.add(sServerSocket.getFileDescriptor());
peers.add(null);
int loopCount = GC_LOOP_COUNT;
while (true) {
...
if (index < 0) {
throw new RuntimeException("Error in select()");
} else if (index == 0) {
// 新的进程创建请求
ZygoteConnection newPeer = acceptCommandPeer();
peers.add(newPeer);
fds.add(newPeer.getFileDesciptor());
} else {
boolean done;
// 处理这个进程请求
done = peers.get(index).runOnce();
if (done) {
peers.remove(index);
fds.remove(index);
}
}
}
}复制代码
ZygoteConnection.runOnce
在 zygote 进程主要执行这三个操作:
- 调用 Zygote.forkAndSpecialize 进行进程复制操作
- 调用 handleChildProc 处理新建进程资源初始化,如创建 Binder 线程池,启动一个主线程消息队列
- 调用 handleParentProc 将新建进程的 PID 返回给 system_server,表示创建结果
boolean runOnce() throws ZygoteInit.MethodAndArgsCaller {
String args[];
Arguments parsedArgs = null;
FileDescriptor[] descriptors;
try {
// 读取启动参数
args = readArgumentList();
descriptors = mSocket.getAncillaryFileDescriptors();
} catch (IOException ex) {
Log.w(TAG, "IOException on command socket " + ex.getMessage());
closeSocket();
return true;
}
...
int pid;
try {
// 将 String 数组封装成 Arguments
parsedArgs = new Arguments(args);
applyUidSecurityPolicy(parsedArgs, peer);
applyDebuggerSecurityPolicy(parsedArgs);
applyRlimitSecurityPolicy(parsedArgs, peer);
applyCapabilitiesSecurityPolicy(parsedArgs, peer);
int[][] rlimits = null;
if (parsedArgs.rlimits != null) {
rlimits = parsedArgs.rlimits.toArray(intArray2d);
}
// fork 操作
// 将会有两个进程从这里返回
// PID=0意味着是子进程
pid = Zygote.forkAndSpecialize(parsedArgs.uid, parsedArgs.gid,
parsedArgs.gids, parsedArgs.debugFlags, rlimits);
} catch (IllegalArgumentException ex) {
logAndPrintError (newStderr, "Invalid zygote arguments", ex);
pid = -1;
} catch (ZygoteSecurityException ex) {
logAndPrintError(newStderr,
"Zygote security policy prevents request: ", ex);
pid = -1;
}
if (pid == 0) {
// in child
// 创建出的新进程
handleChildProc(parsedArgs, descriptors, newStderr);
// should never happen
return true;
} else { /* pid != 0 */
// in parent...pid of < 0 means failure
// 父进程将在这里进行处理
return handleParentProc(pid, descriptors, parsedArgs);
}
}复制代码
private void handleChildProc(Arguments parsedArgs,
FileDescriptor[] descriptors, PrintStream newStderr)
throws ZygoteInit.MethodAndArgsCaller {
...
if (parsedArgs.runtimeInit) {
// 在新创建的应用程序进程中初始化运行时库,创建一个 Binder 线程池
RuntimeInit.zygoteInit(parsedArgs.remainingArgs);
} else {
ClassLoader cloader;
// 获取 classloader
if (parsedArgs.classpath != null) {
cloader
= new PathClassLoader(parsedArgs.classpath,
ClassLoader.getSystemClassLoader());
} else {
cloader = ClassLoader.getSystemClassLoader();
}
// 读取 ActivityThread 类名
String className;
try {
className = parsedArgs.remainingArgs[0];
} catch (ArrayIndexOutOfBoundsException ex) {
logAndPrintError (newStderr,
"Missing required class name argument", null);
return;
}
String[] mainArgs
= new String[parsedArgs.remainingArgs.length - 1];
System.arraycopy(parsedArgs.remainingArgs, 1,
mainArgs, 0, mainArgs.length);
try {
// 触发 ActivityThread.main 方法
ZygoteInit.invokeStaticMain(cloader, className, mainArgs);
} catch (RuntimeException ex) {
logAndPrintError (newStderr, "Error starting. ", ex);
}
}
}复制代码
private boolean handleParentProc(int pid,
FileDescriptor[] descriptors, Arguments parsedArgs) {
...
try {
// 在这里通过 Socket 通知对端进程已经创建成功,并返回 PID
mSocketOutStream.writeInt(pid);
} catch (IOException ex) {
Log.e(TAG, "Error reading from command socket", ex);
return true;
}
...
return false;
}复制代码
Zygote.forkAndSpecialize
进行进程创建的工作也是主要做了三件事:
- 调用 ZygoteHooks.preFork 停止上次创建进程的 daemon 线程
- 调用 nativeForkAndSpecialize 在 c++层创建新进程
- 调用 ZygoteHooks.postForkCoomon 在父进程、子进程中启动 daemon 线程
public static int forkAndSpecialize(int uid, int gid, int[] gids, int runtimeFlags,
int[][] rlimits, int mountExternal, String seInfo, String niceName, int[] fdsToClose,
int[] fdsToIgnore, String instructionSet, String appDataDir) {
VM_HOOKS.preFork();
// Resets nice priority for zygote process.
resetNicePriority();
// 进入内核层进行进程的 fork 操作
int pid = nativeForkAndSpecialize(
uid, gid, gids, runtimeFlags, rlimits, mountExternal, seInfo, niceName, fdsToClose,
fdsToIgnore, instructionSet, appDataDir);
...
VM_HOOKS.postForkCommon();
return pid;
}复制代码
native层
Zygote.cpp
这里先看进程的复制
static jint com_android_internal_os_Zygote_nativeForkAndSpecialize(
JNIEnv* env, jclass, jint uid, jint gid, jintArray gids,
jint runtime_flags, jobjectArray rlimits,
jint mount_external, jstring se_info, jstring se_name,
jintArray fdsToClose,
jintArray fdsToIgnore,
jstring instructionSet, jstring appDataDir) {
...
return ForkAndSpecializeCommon(env, uid, gid, gids, runtime_flags,
rlimits, capabilities, capabilities, mount_external, se_info,
se_name, false, fdsToClose, fdsToIgnore, instructionSet, appDataDir);
}复制代码
// Utility routine to fork zygote and specialize the child process.
static pid_t ForkAndSpecializeCommon(JNIEnv* env, uid_t uid, gid_t gid, jintArray javaGids,
jint runtime_flags, jobjectArray javaRlimits,
jlong permittedCapabilities, jlong effectiveCapabilities,
jint mount_external,
jstring java_se_info, jstring java_se_name,
bool is_system_server, jintArray fdsToClose,
jintArray fdsToIgnore,
jstring instructionSet, jstring dataDir) {
SetSigChldHandler();
sigset_t sigchld;
sigemptyset(&sigchld);
sigaddset(&sigchld, SIGCHLD);
...
// 进行 fork
pid_t pid = fork();
if (pid == 0) {
// 子进程操作
PreApplicationInit();
// Clean up any descriptors which must be closed immediately
DetachDescriptors(env, fdsToClose);
...
// Keep capabilities across UID change, unless we're staying root.
if (uid != 0) {
EnableKeepCapabilities(env);
}
SetInheritable(env, permittedCapabilities);
DropCapabilitiesBoundingSet(env);
...
if (!is_system_server) {
// 如果不是 system_server 进程,需要创建进程组
int rc = createProcessGroup(uid, getpid());
if (rc != 0) {
if (rc == -EROFS) {
ALOGW("createProcessGroup failed, kernel missing CONFIG_CGROUP_CPUACCT?");
} else {
ALOGE("createProcessGroup(%d, %d) failed: %s", uid, pid, strerror(-rc));
}
}
}
SetGids(env, javaGids);
SetRLimits(env, javaRlimits);
...
SetCapabilities(env, permittedCapabilities, effectiveCapabilities, permittedCapabilities);
// 设置进程调度策略
SetSchedulerPolicy(env);
} else if (pid > 0) {
// Zygote 进程将会执行这里
...
}
return pid;
}复制代码
fork.cpp
调用 clone 方法进行进程的复制,复制完成后父进程与子进程会调用各自的回调方法
int fork() {
__bionic_atfork_run_prepare();
pthread_internal_t* self = __get_thread();
int result = clone(nullptr,
nullptr,
(CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD),
nullptr,
nullptr,
nullptr,
&(self->tid));
if (result == 0) {
// Update the cached pid, since clone() will not set it directly (as
// self->tid is updated by the kernel).
self->set_cached_pid(gettid());
// 调用子进程的回调,具体代码参考 pthread.atfork.cpp
__bionic_atfork_run_child();
} else {
__bionic_atfork_run_parent();
}
return result;
}复制代码
pthread_atfork.cpp
template<typename F>
void walk_forward(F f) {
for (atfork_t* it = first_; it != nullptr; it = it->next) {
f(it);
}
}
void __bionic_atfork_run_prepare() {
// We lock the atfork list here, unlock it in the parent, and reset it in the child.
// This ensures that nobody can modify the handler array between the calls
// to the prepare and parent/child handlers.
pthread_mutex_lock(&g_atfork_list_mutex);
// Call pthread_atfork() prepare handlers. POSIX states that the prepare
// handlers should be called in the reverse order of the parent/child
// handlers, so we iterate backwards.
g_atfork_list.walk_backwards([](atfork_t* it) {
if (it->prepare != nullptr) {
it->prepare();
}
});
}
void __bionic_atfork_run_child() {
g_atfork_list_mutex = PTHREAD_RECURSIVE_MUTEX_INITIALIZER_NP;
pthread_mutex_lock(&g_atfork_list_mutex);
g_atfork_list.walk_forward([](atfork_t* it) {
if (it->child != nullptr) {
it->child();
}
});
pthread_mutex_unlock(&g_atfork_list_mutex);
}复制代码
ZygoteHooks.cc
再看在执行进程 fork 前后的虚拟机操作
static jlong ZygoteHooks_nativePreFork(JNIEnv* env, jclass) {
Runtime* runtime = Runtime::Current();
CHECK(runtime->IsZygote()) << "runtime instance not started with -Xzygote";
runtime->PreZygoteFork();
// Grab thread before fork potentially makes Thread::pthread_key_self_ unusable.
return reinterpret_cast<jlong>(ThreadForEnv(env));
}复制代码
static void ZygoteHooks_nativePostForkChild(JNIEnv* env, jclass, jlong token, jint debug_flags,
jstring instruction_set) {
Thread* thread = reinterpret_cast<Thread*>(token);
// Our system thread ID, etc, has changed so reset Thread state.
thread->InitAfterFork();
EnableDebugFeatures(debug_flags);
if (instruction_set != nullptr) {
ScopedUtfChars isa_string(env, instruction_set);
InstructionSet isa = GetInstructionSetFromString(isa_string.c_str());
Runtime::NativeBridgeAction action = Runtime::NativeBridgeAction::kUnload;
if (isa != kNone && isa != kRuntimeISA) {
action = Runtime::NativeBridgeAction::kInitialize;
}
Runtime::Current()->DidForkFromZygote(env, action, isa_string.c_str());
} else {
Runtime::Current()->DidForkFromZygote(env, Runtime::NativeBridgeAction::kUnload, nullptr);
}
}复制代码
runtime.cc
自此,新进程创建成功,相关的环境也已经初始化完毕
void Runtime::PreZygoteFork() {
heap_->PreZygoteFork();
}
void Runtime::DidForkFromZygote(JNIEnv* env, NativeBridgeAction action, const char* isa) {
is_zygote_ = false;
if (is_native_bridge_loaded_) {
switch (action) {
case NativeBridgeAction::kUnload:
UnloadNativeBridge();
is_native_bridge_loaded_ = false;
break;
case NativeBridgeAction::kInitialize:
// 跨平台桥连库
InitializeNativeBridge(env, isa);
break;
}
}
// Create the thread pools.
// 创建 java 堆处理线程池
heap_->CreateThreadPool();
if (jit_options_.get() != nullptr && jit_.get() == nullptr) {
// Create the JIT if the flag is set and we haven't already create it (happens for run-tests).
// 创建 JIT
CreateJit();
jit_->CreateInstrumentationCache(jit_options_->GetCompileThreshold());
jit_->CreateThreadPool();
}
// 设置信号处理函数
StartSignalCatcher();
// Start the JDWP thread. If the command-line debugger flags specified "suspend=y",
// this will pause the runtime, so we probably want this to come last.
// 启动JDWP线程,当命令debuger的flags指定"suspend=y"时,则暂停runtime
Dbg::StartJdwp();
}复制代码
ActivityThread.main
回到 java framework 层,新进程启动成功后会运行 ActivityThread 的 main, 开启一个主线程的消息队列,等待与 system_server 进行交互。
public static final void main(String[] args) {
SamplingProfilerIntegration.start();
Process.setArgV0("<pre-initialized>");
// 创建主线程消息循环
// 每一个应用程序启动完成之后都会自动的进行这个消息循环
Looper.prepareMainLooper();
if (sMainThreadHandler == null) {
sMainThreadHandler = new Handler();
}
// 创建 ActivityThread 实例
ActivityThread thread = new ActivityThread();
thread.attach(false);
if (false) {
Looper.myLooper().setMessageLogging(new
LogPrinter(Log.DEBUG, "ActivityThread"));
}
Looper.loop();
if (Process.supportsProcesses()) {
throw new RuntimeException("Main thread loop unexpectedly exited");
}
thread.detach();
String name = (thread.mInitialApplication != null)
? thread.mInitialApplication.getPackageName()
: "<unknown>";
Slog.i(TAG, "Main thread of " + name + " is now exiting");
}复制代码
总结
- Android 应用的进程创建入口点在 AMS.startProcessLocked, 通过调用 Process.start 来发出一个 socket request 来请求 zygote 进程进行 fork, 创建一个新的进程。
- 本质上复制进程只需要调用 fork 方法即可,但是 Android 对于新进程的操作有着额外的封装。一个新进程的诞生做了以下三点工作:
- 写时复制了 zygote 进程
- 开启了一个 Binder 进程池方便进程 IPC 操作
- 开启了一个主线程消息循环
- 进程的创建工作是持有者 AMS 的锁进行的,如果 Zygote 因为 CPU 负载过高或者内存缺乏等等原因创建进程的速度变慢,使得 system_server 其它 Binder 线程阻塞,那么很有可能会造成第三方应用间接的耗时和ANR