这也是n年前的笔记,当时应该还是Android 4.X的时代。
Linux Namespace
Linux在fork/clone Process时,涉及到namespace的flag有:
#define CLONE_NEWNS 0x00020000 /* New mnt namespace group? */
#define CLONE_NEWUTS 0x04000000 /* New utsname group? */
#define CLONE_NEWIPC 0x08000000 /* New ipcs */
#define CLONE_NEWUSER 0x10000000 /* New user namespace */
#define CLONE_NEWPID 0x20000000 /* New pid namespace */
#define CLONE_NEWNET 0x40000000 /* New network namespace */
Android当前只利用了CLONE_NEWNS。
如果进程想创建自己的space,可以利用系统调用unshare,其实现如下:
文件: fork.c
函数: SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)
if (unshare_flags & CLONE_NEWNS)
unshare_flags |= CLONE_FS;
...
/* If unsharing namespace, must also unshare filesystem information.*/
unshare_fs(...)
unshare_userns(...)
--> create_user_ns
--> proc_alloc_inum:
unshare_nsproxy_namespaces
--> create_new_namespaces
--> copy_mnt_ns
--> dup_mnt_ns
--> alloc_mnt_ns
--> proc_alloc_inum
可以看到unshare系统调用会为当前进程创建一个新的mnt space。
Android的Namespace实现
从4.2开始,Android开始支持不同的user可以有独立的私有存储空间,见如下的描述。
Multi-user external storage
Starting in Android 4.2, devices can support multiple users, and external storage must meet the following constraints:
- Each user must have their own isolated primary external storage, and must not have access to the primary external storage of other users.
- The /sdcard path must resolve to the correct user-specific primary external storage based on the user a process is running as.
- Storage for large OBB files in the Android/obb directory may be shared between multiple users as an optimization.
- Secondary external storage must not be writable by apps, except in package-specific directories as allowed by synthesized permissions.
The default platform implementation of this feature leverages Linux kernel namespaces to create isolated mount tables for each Zygote-forked process, and then uses bind mounts to offer the correct user-specific primary external storage into that private namespace.
At boot, the system mounts a single emulated external storage FUSE daemon atEMULATED_STORAGE_SOURCE, which is hidden from apps. After the Zygote forks, it bind mounts the appropriate user-specific subdirectory from under the FUSE daemon toEMULATED_STORAGE_TARGET so that external storage paths resolve correctly for the app. Because an app lacks accessible mount points for other users' storage, they can only access storage for the user it was started as.
This implementation also uses the shared subtree kernel feature to propagate mount events from the default root namespace into app namespaces, which ensures that features like ASEC containers and OBB mounting continue working correctly. It does this by mounting the rootfs as shared, and then remounting it as slave after each Zygote namespace is created.
独立的存储空间就是利用Namespace的mnt空间实现的。
Zygot初始化时:
文件: runtime.cc
函数: Runtime::InitZygote()
unshare(CLONE_NEWNS)
// See storage config details at http://source.android.com/tech/storage/
// Create private mount namespace shared by all children
创建App Process时:
文件: frameworks/base/core/jni/com_android_internal_os_Zygot.cpp
函数: ForkAndSpecializeCommon
MountEmulatedStorage
unshare(CLONE_NEWNS)
/*Create a second private mount namespace for our process*/
mount(...,MS_BIND,...)
由上可见,Zygot进程和Java进程都会调用unshare创建自己的mnt space。
另外我们还需注意到 mount(…, MS_BIND,…) 的调用。新建立了mnt space后,在此新space下绑定文件目录,从而此文件目录可成为进程的私有空间。关于bind的使用,请查看mount中文手册的“绑定挂载”部分。也可参考“man 2 mount”。
上述unshare和mount的调用,在Android 5.X中被封装成了函数MountEmulatedStorage(…)。
下面我们看看实际的效果:
从文件 /proc/$PID/ns 可见:
- 所有kernel thread的ns都是一样的;
- 所有native daemons的ns都是一样的, 和kernel thread的ns是一样的;
- 进程Zygot的mnt_namespace已经变得和上述2类Process不一样了,
这是由于在Runtime::InitZygote中zygote创建了自己的namespace。 - App Process的mnt ns都不一样,这是由于在创建App Process时,Zygot为他们创建了自己私有的namespace。
上图中的mnt:[XXXX] 显示的就是调用proc_alloc_inum为ns分配的ns->proc_inum.
至于mnt space的直接体验,可参见 宋宝华- Linux namespace - Docker 背后的故事的“第三步,mount名称空间”部分。
至于mnt space为什么能实现上述的功能,这里就不细讲了,相关的文章应该不少。