1 Capability
(一)UID的缺陷
权限颗粒太粗
容易引起权利过剩(溢出)
权利溢出/过剩引起的安全问题
(二)capability简介
我们需要细粒度的权限,Linux引入了Capability。
Capabilities的主要思想在于分割root用户的特权,即将root的特权分割成不同的能力,每种能力代表一定的特权操作。例如:能力CAP_SYS_MODULE表示用户能够加载(或卸载)内核模块的特权操作,而CAP_SETUID表示用户能够修改进程用户身份的特权操作。在Capbilities中系统将根据进程拥有的能力来进行特权操作的访问控制。
在Capilities中,只有进程和可执行文件才具有能力。每个进程拥有三组能力集,分别称为cap_effective, cap_inheritable, cap_permitted。可执行文件也拥有三组能力集,对应于进程的三组能力集,分别称为cap_effective, cap_allowed,cap_forced.
在/android/kernel/include/uapi/linux中,我们可以看到capability.h文件
/*
* This is <linux/capability.h>
*
...
*
* ftp://www.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.6/
*/
#ifndef _UAPI_LINUX_CAPABILITY_H
#define _UAPI_LINUX_CAPABILITY_H
.....
typedef struct __user_cap_header_struct {
__u32 version;
int pid;
} __user *cap_user_header_t;
typedef struct __user_cap_data_struct {
__u32 effective;
__u32 permitted;
__u32 inheritable;
} __user *cap_user_data_t;
.....
struct vfs_cap_data {
__le32 magic_etc; /* Little endian */
struct {
__le32 permitted; /* Little endian */
__le32 inheritable; /* Little endian */
} data[VFS_CAP_U32];
};
#define CAP_CHOWN </span>0
#define CAP_DAC_OVERRIDE 1
#define CAP_DAC_READ_SEARCH 2
#define CAP_FOWNER 3
#define CAP_FSETID 4
#define CAP_KILL 5
#define CAP_SETGID 6
#define CAP_SETUID 7
#define CAP_SETPCAP 8
#define CAP_LINUX_IMMUTABLE 9
#define CAP_NET_BIND_SERVICE 10
.....
通过cat /proc/pid/status我们可以得到一个进程的快照,可以看到当前进程的cap信息
Pid: 2225
PPid: 2217
TracerPid: 0
Uid: 1000 1000 1000 1000
Gid: 1000 1000 1000 1000
FDSize: 256
Groups: 4 24 27 30 46 113 128 1000
NStgid: 2225
NSpid: 2225
NSpgid: 2225
NSsid: 2225
VmPeak: 27212 kB
VmSize: 27212 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 5728 kB
VmRSS: 5728 kB
VmData: 2272 kB
VmStk: 136 kB
VmExe: 976 kB
VmLib: 2304 kB
VmPTE: 72 kB
VmPMD: 12 kB
VmSwap: 0 kB
Threads: 1
SigQ: 0/7798
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000010000
SigIgn: 0000000000380004
SigCgt: 000000004b817efb
# CapInh: 0000000000000000
# CapPrm: 0000000000000000
# CapEff: 0000000000000000
# CapBnd: 0000003fffffffff
Seccomp: 0
Cpus_allowed: 00000000,00000000,00000000,00000001
Cpus_allowed_list: 0
Mems_allowed: 00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 81
nonvoluntary_ctxt_switches: 61
(三)进程capability
Permitted Capability Sets-----------P(permitted)
当前进程的权利的围栏,最大权利范围,是Effective Capability Sets的超集
Effective Capability Sets-----------P(effective)
当前进程的实际使用(支配)的权利集,该集内的Capability从属于Permitted Capability Sets。
Inheritable Capability Sets--------p(Inheritable)
子进程唯一可以直接继承的Capability Sets。在Capability模式下,子进程的Inheritable Capability Sets = 父进程的Inheritable Capability Sets。
(四)可执行文件capability
Permitted Capability Sets---------F(permitted)
是进程Permitted Capability Sets的超集,P(permitted)=(F(permitted) & cap_bset
Effective Capability Set---------F(effective)
仅1bit,true | false,表示进程P(Permitted)是否自动全部加入到进程的P(Effective)中
if true, P(permitted)=P(effective)
通常用于与传统的Root-setUID(UID=ROOT并且具有S位,4755)可执行文件向下兼容。
Inheritable Capability Sets------F(Inheritable)
与进程的Inheritable Capability Sets一起作用(位与)以决定新的进程的Permitted Capability Sets
Inheritable和Permitted的关系如何?猜测:不是子集,超集的关系,可能有交集。
(五)Capability BoundSet
Capability BoundSet是进程的属性,是进程自己为自己设定的安全围栏(Capability Sets),限制可执行文件的Permitted Capability Sets仅有局部能转化为进程的Permitted Capability Sets
Capability BoundSet能够被子进程继承,Init进程默认Capability BoundSet为全1
(六)相互之间关系
P'(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & cap_bset)
P'(effective) = F(effective) ? P'(permitted) : 0
P'(inheritable) = P(inheritable)
(七)setuid和cap的兼容性问题
新技术必须要向下(前)兼容,才能保证旧的实体(应用)工作正常
旧的普通进程
P(Permitted) = P(Effective) = P(Inheritable) = 0x0000000000000000
从上面Bash的快照也可以看出来
旧的Root(uid=root)-setUID(具有s位)可执行文件(UID=ROOT并且具有S位,有能力让执行者提权,让子进程UID=ROOT,4755),从这里可知对应的cap为:
F(Effective) = True;(子进程可以继承父进程的uid) F(Permitted)= F(Inheritable) = 0xFFFFFFFFFFFFFFFF
旧的Root EUID(EUID=root的普通进程)的进程:
F(Effective) = True; F(Permitted) = F(Inheritable) = 0xFFFFFFFFFFFFFFFF
考虑一个平民运行Root-setUID的可执行文件的场景:
P'(permitted) = P(inheritable) | cap_bset
P'(effective) = P'(permitted)
按照平民不识Capability也不会去修改(其祖先也未修改),那么,
P(inheritable) = 0x0000000000000000,cap_bset =0xFFFFFFFFFFFFFFFF,
所以, P'(effective) =P'(permitted)=0xFFFFFFFFFFFFFFFF
依然具有皇族的特权!!
2 SELINUX
(一)DAC和MAC的策略区别
DAC(Discretionary Access Control 自主访问控制: 传统Unix/Linux安全管理模型;主体对它所属的对象和运行的程序拥有全部的控制权
MAC(Mandatory Access Control强制访问控制):SELinux基于的安全策略;管理员管理访问控制。管理员制定策略,用户不能改变它。策略定义了哪个主体能访问哪个对象。采用最小特权方式,默认情况下应用程序和用户没有任何权限
DAC模式下,如果进程具有ROOT权限,当恶意病毒攻击成功并注入进程后,则可以利用进程的ROOT权限,做任何事情。
MAC模式下:root进程所能操作的对象和权限均在安全策略中明确列出,比如,只允许访问网络和访问特定文件等。即便root进程被恶意病毒攻击注入了,你仍然无法借由root进程为所欲为,所有安全策略上没有授权的行为仍然是不允许的。
(二)SEAndroid和SElinux
SEAndroid(Security-Enhanced Android)将原本运用在Linux操作系统上的SELinux,移植至Android平台上。除了移植SELinux以外,还做了很多针对于Android的安全提高,比如把Binder IPC、Socket、Properties访问控制加入到了SEAndroid的控制中。
SEAndroid的核心理念即便恶意应用篡得了ROOT权限,恶意应用仍然被有限的控制着而不能为所欲为
在Android4.3之前,Apk内部可以通过Java的Runtime执行一个具有Root-setUID的可执行文件而提升Effective UID来完成一些特权操作,典型的Root包中的su就是这个原理
Android4.3中修复了这个漏洞。在android/vm/native/dalvik_system_Zygote.cpp中,每个Apk的进程主动的把自己的Capability BoundSet从原先的0xFFFFFFFFFFFFFFFF一个个Drop掉最后变成:0x0000000000000000。APK 运行Root-setUID的进程Capability时,在DAC模式下,Root-setUID会让APK进程euid提权到root,从而拥有了root权限,但是在MAC模式中,因为BoundSet被drop为0x0,导致APK进程权限改变,使得新的进程的RUID = EUID = APK进程的RUID,具体如下:
P'(permitted) = P(inheritable) | cap_bset
P'(effective) = P'(permitted)
其中普通进程的P(inheritable) = 0x0000000000000000,又cap_bset低32bit全为0
所以,新进程的P'(effective) 为Empty,所以,CAP_SETUID当然也没有,那么系统期于Root-setUID的可执行文件来提升EUID到ROOT当然不允许了。所以,新的进程的RUID = EUID = APK进程的RUID
/*
* dalvik.system.Zygote
*/
#define ZYGOTE_LOG_TAG "Zygote"
/* must match values in dalvik.system.Zygote */
enum {
DEBUG_ENABLE_DEBUGGER = 1,
DEBUG_ENABLE_CHECKJNI = 1 << 1,
DEBUG_ENABLE_ASSERT = 1 << 2,
DEBUG_ENABLE_SAFEMODE = 1 << 3,
DEBUG_ENABLE_JNI_LOGGING = 1 << 4,
};
/* must match values in dalvik.system.Zygote */
enum {
MOUNT_EXTERNAL_NONE = 0,
MOUNT_EXTERNAL_SINGLEUSER = 1,
MOUNT_EXTERNAL_MULTIUSER = 2,
MOUNT_EXTERNAL_MULTIUSER_ALL = 3,
};
/*
* This signal handler is for zygote mode, since the zygote
* must reap its children
*/
static void sigchldHandler(int s)
{
pid_t pid;
int status;
while ((pid = waitpid(-1, &status, WNOHANG)) > 0) {
/* Log process-death status that we care about. In general it is not
safe to call ALOG(...) from a signal handler because of possible
reentrancy. However, we know a priori that the current implementation
of ALOG() is safe to call from a SIGCHLD handler in the zygote process.
If the ALOG() implementation changes its locking strategy or its use
of syscalls within the lazy-init critical section, its use here may
become unsafe. */
if (WIFEXITED(status)) {
if (WEXITSTATUS(status)) {
ALOG(LOG_DEBUG, ZYGOTE_LOG_TAG, "Process %d exited cleanly (%d)",
(int) pid, WEXITSTATUS(status));
} else {
IF_ALOGV(/*should use ZYGOTE_LOG_TAG*/) {
ALOG(LOG_VERBOSE, ZYGOTE_LOG_TAG,
"Process %d exited cleanly (%d)",
(int) pid, WEXITSTATUS(status));
}
}
} else if (WIFSIGNALED(status)) {
if (WTERMSIG(status) != SIGKILL) {
ALOG(LOG_DEBUG, ZYGOTE_LOG_TAG,
"Process %d terminated by signal (%d)",
(int) pid, WTERMSIG(status));
} else {
IF_ALOGV(/*should use ZYGOTE_LOG_TAG*/) {
ALOG(LOG_VERBOSE, ZYGOTE_LOG_TAG,
"Process %d terminated by signal (%d)",
(int) pid, WTERMSIG(status));
}
}
#ifdef WCOREDUMP
if (WCOREDUMP(status)) {
ALOG(LOG_INFO, ZYGOTE_LOG_TAG, "Process %d dumped core",
(int) pid);
}
#endif /* ifdef WCOREDUMP */
}
/*
* If the just-crashed process is the system_server, bring down zygote
* so that it is restarted by init and system server will be restarted
* from there.
*/
if (pid == gDvm.systemServerPid) {
ALOG(LOG_INFO, ZYGOTE_LOG_TAG,
"Exit zygote because system server (%d) has terminated",
(int) pid);
kill(getpid(), SIGKILL);
}
}
if (pid < 0) {
ALOG(LOG_WARN, ZYGOTE_LOG_TAG,
"Zygote SIGCHLD error in waitpid: %s",strerror(errno));
}
}
/*
* configure sigchld handler for the zygote process
* This is configured very late, because earlier in the dalvik lifecycle
* we can fork() and exec() for the verifier/optimizer, and we
* want to waitpid() for those rather than have them be harvested immediately.
*
* This ends up being called repeatedly before each fork(), but there's
* no real harm in that.
*/
static void setSignalHandler()
{
int err;
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sa.sa_handler = sigchldHandler;
err = sigaction (SIGCHLD, &sa, NULL);
if (err < 0) {
ALOGW("Error setting SIGCHLD handler: %s", strerror(errno));
}
}
/*
* Set the SIGCHLD handler back to default behavior in zygote children
*/
static void unsetSignalHandler()
{
int err;
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sa.sa_handler = SIG_DFL;
err = sigaction (SIGCHLD, &sa, NULL);
if (err < 0) {
ALOGW("Error unsetting SIGCHLD handler: %s", strerror(errno));
}
}
/*
* Calls POSIX setgroups() using the int[] object as an argument.
* A NULL argument is tolerated.
*/
static int setgroupsIntarray(ArrayObject* gidArray)
{
gid_t *gids;
u4 i;
s4 *contents;
if (gidArray == NULL) {
return 0;
}
/* just in case gid_t and u4 are different... */
gids = (gid_t *)alloca(sizeof(gid_t) * gidArray->length);
contents = (s4 *)(void *)gidArray->contents;
for (i = 0 ; i < gidArray->length ; i++) {
gids[i] = (gid_t) contents[i];
}
return setgroups((size_t) gidArray->length, gids);
}
/*
* Sets the resource limits via setrlimit(2) for the values in the
* two-dimensional array of integers that's passed in. The second dimension
* contains a tuple of length 3: (resource, rlim_cur, rlim_max). NULL is
* treated as an empty array.
*
* -1 is returned on error.
*/
static int setrlimitsFromArray(ArrayObject* rlimits)
{
u4 i;
struct rlimit rlim;
if (rlimits == NULL) {
return 0;
}
memset (&rlim, 0, sizeof(rlim));
ArrayObject** tuples = (ArrayObject **)(void *)rlimits->contents;
for (i = 0; i < rlimits->length; i++) {
ArrayObject * rlimit_tuple = tuples[i];
s4* contents = (s4 *)(void *)rlimit_tuple->contents;
int err;
if (rlimit_tuple->length != 3) {
ALOGE("rlimits array must have a second dimension of size 3");
return -1;
}
rlim.rlim_cur = contents[1];
rlim.rlim_max = contents[2];
err = setrlimit(contents[0], &rlim);
if (err < 0) {
return -1;
}
}
return 0;
}
/*
* Create a private mount namespace and bind mount appropriate emulated
* storage for the given user.
*/
static int mountEmulatedStorage(uid_t uid, u4 mountMode) {
// See storage config details at http://source.android.com/tech/storage/
userid_t userid = multiuser_get_user_id(uid);
// Create a second private mount namespace for our process
if (unshare(CLONE_NEWNS) == -1) {
ALOGE("Failed to unshare(): %s", strerror(errno));
return -1;
}
// Create bind mounts to expose external storage
if (mountMode == MOUNT_EXTERNAL_MULTIUSER
|| mountMode == MOUNT_EXTERNAL_MULTIUSER_ALL) {
// These paths must already be created by init.rc
const char* source = getenv("EMULATED_STORAGE_SOURCE");
const char* target = getenv("EMULATED_STORAGE_TARGET");
const char* legacy = getenv("EXTERNAL_STORAGE");
if (source == NULL || target == NULL || legacy == NULL) {
ALOGE("Storage environment undefined; unable to provide external storage");
return -1;
}
// Prepare source paths
char source_user[PATH_MAX];
char target_user[PATH_MAX];
// /mnt/shell/emulated/0
snprintf(source_user, PATH_MAX, "%s/%d", source, userid);
// /storage/emulated/0
snprintf(target_user, PATH_MAX, "%s/%d", target, userid);
if (fs_prepare_dir(source_user, 0000, 0, 0) == -1
|| fs_prepare_dir(target_user, 0000, 0, 0) == -1) {
return -1;
}
// Unfortunately bind mounts from outside ANDROID_STORAGE retain the
// recursive-shared property (kernel bug?). This means any additional bind
// mounts (e.g., /storage/emulated/0/Android/obb) will also appear, shared
// in all namespaces, at their respective source paths (e.g.,
// /mnt/shell/emulated/0/Android/obb), leading to hundreds of
// /proc/mounts-visible bind mounts. As a workaround, mark
// EMULATED_STORAGE_SOURCE (e.g., /mnt/shell/emulated) also a slave so that
// subsequent bind mounts are confined to this namespace. Note,
// EMULATED_STORAGE_SOURCE must already serve as a mountpoint, which it
// should for the "sdcard" fuse volume.
if (mount(NULL, source, NULL, (MS_SLAVE | MS_REC), NULL) == -1) {
SLOGW("Failed to mount %s as MS_SLAVE: %s", source, strerror(errno));
// Fallback: Mark rootfs as slave. All mounts under "/" will be hidden
// from other apps and users. This shouldn't happen unless the sdcard
// service is broken.
if (mount("rootfs", "/", NULL, (MS_SLAVE | MS_REC), NULL) == -1) {
SLOGE("Failed to mount rootfs as MS_SLAVE: %s", strerror(errno));
return -1;
}
}
if (mountMode == MOUNT_EXTERNAL_MULTIUSER_ALL) {
// Mount entire external storage tree for all users
if (mount(source, target, NULL, MS_BIND, NULL) == -1) {
ALOGE("Failed to mount %s to %s: %s", source, target, strerror(errno));
return -1;
}
} else {
// Only mount user-specific external storage
if (mount(source_user, target_user, NULL, MS_BIND, NULL) == -1) {
ALOGE("Failed to mount %s to %s: %s", source_user, target_user, strerror(errno));
return -1;
}
}
if (fs_prepare_dir(legacy, 0000, 0, 0) == -1) {
return -1;
}
// Finally, mount user-specific path into place for legacy users
if (mount(target_user, legacy, NULL, MS_BIND | MS_REC, NULL) == -1) {
ALOGE("Failed to mount %s to %s: %s", target_user, legacy, strerror(errno));
return -1;
}
} else {
ALOGE("Mount mode %d unsupported", mountMode);
return -1;
}
return 0;
}
/* native public static int fork(); */
static void Dalvik_dalvik_system_Zygote_fork(const u4* args, JValue* pResult)
{
pid_t pid;
if (!gDvm.zygote) {
dvmThrowIllegalStateException(
"VM instance not started with -Xzygote");
RETURN_VOID();
}
if (!dvmGcPreZygoteFork()) {
ALOGE("pre-fork heap failed");
dvmAbort();
}
setSignalHandler();
dvmDumpLoaderStats("zygote");
pid = fork();
#ifdef HAVE_ANDROID_OS
if (pid == 0) {
/* child process */
extern int gMallocLeakZygoteChild;
gMallocLeakZygoteChild = 1;
}
#endif
RETURN_INT(pid);
}
/*
* Enable/disable debug features requested by the caller.
*
* debugger
* If set, enable debugging; if not set, disable debugging. This is
* easy to handle, because the JDWP thread isn't started until we call
* dvmInitAfterZygote().
* checkjni
* If set, make sure "check JNI" is enabled.
* assert
* If set, make sure assertions are enabled. This gets fairly weird,
* because it affects the result of a method called by class initializers,
* and hence can't affect pre-loaded/initialized classes.
* safemode
* If set, operates the VM in the safe mode. The definition of "safe mode" is
* implementation dependent and currently only the JIT compiler is disabled.
* This is easy to handle because the compiler thread and associated resources
* are not requested until we call dvmInitAfterZygote().
*/
static void enableDebugFeatures(u4 debugFlags)
{
ALOGV("debugFlags is 0x%02x", debugFlags);
gDvm.jdwpAllowed = ((debugFlags & DEBUG_ENABLE_DEBUGGER) != 0);
if ((debugFlags & DEBUG_ENABLE_CHECKJNI) != 0) {
/* turn it on if it's not already enabled */
dvmLateEnableCheckedJni();
}
if ((debugFlags & DEBUG_ENABLE_JNI_LOGGING) != 0) {
gDvmJni.logThirdPartyJni = true;
}
if ((debugFlags & DEBUG_ENABLE_ASSERT) != 0) {
/* turn it on if it's not already enabled */
dvmLateEnableAssertions();
}
if ((debugFlags & DEBUG_ENABLE_SAFEMODE) != 0) {
#if defined(WITH_JIT)
/* turn off the jit if it is explicitly requested by the app */
if (gDvm.executionMode == kExecutionModeJit)
gDvm.executionMode = kExecutionModeInterpFast;
#endif
}
#ifdef HAVE_ANDROID_OS
if ((debugFlags & DEBUG_ENABLE_DEBUGGER) != 0) {
/* To let a non-privileged gdbserver attach to this
* process, we must set its dumpable bit flag. However
* we are not interested in generating a coredump in
* case of a crash, so also set the coredump size to 0
* to disable that
*/
if (prctl(PR_SET_DUMPABLE, 1, 0, 0, 0) < 0) {
ALOGE("could not set dumpable bit flag for pid %d: %s",
getpid(), strerror(errno));
} else {
struct rlimit rl;
rl.rlim_cur = 0;
rl.rlim_max = RLIM_INFINITY;
if (setrlimit(RLIMIT_CORE, &rl) < 0) {
ALOGE("could not disable core file generation for pid %d: %s",
getpid(), strerror(errno));
}
}
}
#endif
}
/*
* Set Linux capability flags.
*
* Returns 0 on success, errno on failure.
*/
static int setCapabilities(int64_t permitted, int64_t effective)
{
#ifdef HAVE_ANDROID_OS
__user_cap_header_struct capheader;
memset(&capheader, 0, sizeof(capheader));
capheader.version = _LINUX_CAPABILITY_VERSION;
capheader.pid = 0;
__user_cap_data_struct capdata[2];
memset(&capdata, 0, sizeof(capdata));
capdata[0].effective = effective;
capdata[1].effective = effective >> 32;
capdata[0].permitted = permitted;
capdata[1].permitted = permitted >> 32;
if (capset(&capheader, &capdata[0]) == -1) {
ALOGE("capset(perm=%llx, eff=%llx) failed: %s", permitted, effective, strerror(errno));
return errno;
}
#endif /*HAVE_ANDROID_OS*/
return 0;
}
/*
* Set SELinux security context.
*
* Returns 0 on success, -1 on failure.
*/
static int setSELinuxContext(uid_t uid, bool isSystemServer,
const char *seInfo, const char *niceName)
{
#ifdef HAVE_ANDROID_OS
return selinux_android_setcontext(uid, isSystemServer, seInfo, niceName);
#else
return 0;
#endif
}
static bool needsNoRandomizeWorkaround() {
#if !defined(__arm__)
return false;
#else
int major;
int minor;
struct utsname uts;
if (uname(&uts) == -1) {
return false;
}
if (sscanf(uts.release, "%d.%d", &major, &minor) != 2) {
return false;
}
// Kernels before 3.4.* need the workaround.
return (major < 3) || ((major == 3) && (minor < 4));
#endif
}
/*
* Utility routine to fork zygote and specialize the child process.
*/
static pid_t forkAndSpecializeCommon(const u4* args, bool isSystemServer)
{
pid_t pid;
uid_t uid = (uid_t) args[0];
gid_t gid = (gid_t) args[1];
ArrayObject* gids = (ArrayObject *)args[2];
u4 debugFlags = args[3];
ArrayObject *rlimits = (ArrayObject *)args[4];
u4 mountMode = MOUNT_EXTERNAL_NONE;
int64_t permittedCapabilities, effectiveCapabilities;
char *seInfo = NULL;
char *niceName = NULL;
if (isSystemServer) {
/*
* Don't use GET_ARG_LONG here for now. gcc is generating code
* that uses register d8 as a temporary, and that's coming out
* scrambled in the child process. b/3138621
*/
//permittedCapabilities = GET_ARG_LONG(args, 5);
//effectiveCapabilities = GET_ARG_LONG(args, 7);
permittedCapabilities = args[5] | (int64_t) args[6] << 32;
effectiveCapabilities = args[7] | (int64_t) args[8] << 32;
} else {
mountMode = args[5];
permittedCapabilities = effectiveCapabilities = 0;
StringObject* seInfoObj = (StringObject*)args[6];
if (seInfoObj) {
seInfo = dvmCreateCstrFromString(seInfoObj);
if (!seInfo) {
ALOGE("seInfo dvmCreateCstrFromString failed");
dvmAbort();
}
}
StringObject* niceNameObj = (StringObject*)args[7];
if (niceNameObj) {
niceName = dvmCreateCstrFromString(niceNameObj);
if (!niceName) {
ALOGE("niceName dvmCreateCstrFromString failed");
dvmAbort();
}
}
}
if (!gDvm.zygote) {
dvmThrowIllegalStateException(
"VM instance not started with -Xzygote");
return -1;
}
if (!dvmGcPreZygoteFork()) {
ALOGE("pre-fork heap failed");
dvmAbort();
}
setSignalHandler();
dvmDumpLoaderStats("zygote");
pid = fork();
if (pid == 0) {
int err;
/* The child process */
#ifdef HAVE_ANDROID_OS
extern int gMallocLeakZygoteChild;
gMallocLeakZygoteChild = 1;
/* keep caps across UID change, unless we're staying root */
if (uid != 0) {
err = prctl(PR_SET_KEEPCAPS, 1, 0, 0, 0);
if (err < 0) {
ALOGE("cannot PR_SET_KEEPCAPS: %s", strerror(errno));
dvmAbort();
}
}
for (int i = 0; prctl(PR_CAPBSET_READ, i, 0, 0, 0) >= 0; i++) {
err = prctl(PR_CAPBSET_DROP, i, 0, 0, 0);
if (err < 0) {
if (errno == EINVAL) {
ALOGW("PR_CAPBSET_DROP %d failed: %s. "
"Please make sure your kernel is compiled with "
"file capabilities support enabled.",
i, strerror(errno));
} else {
ALOGE("PR_CAPBSET_DROP %d failed: %s.", i, strerror(errno));
dvmAbort();
}
}
}
#endif /* HAVE_ANDROID_OS */
if (mountMode != MOUNT_EXTERNAL_NONE) {
err = mountEmulatedStorage(uid, mountMode);
if (err < 0) {
ALOGE("cannot mountExternalStorage(): %s", strerror(errno));
if (errno == ENOTCONN || errno == EROFS) {
// When device is actively encrypting, we get ENOTCONN here
// since FUSE was mounted before the framework restarted.
// When encrypted device is booting, we get EROFS since
// FUSE hasn't been created yet by init.
// In either case, continue without external storage.
} else {
dvmAbort();
}
}
}
err = setgroupsIntarray(gids);
if (err < 0) {
ALOGE("cannot setgroups(): %s", strerror(errno));
dvmAbort();
}
err = setrlimitsFromArray(rlimits);
if (err < 0) {
ALOGE("cannot setrlimit(): %s", strerror(errno));
dvmAbort();
}
err = setresgid(gid, gid, gid);
if (err < 0) {
ALOGE("cannot setresgid(%d): %s", gid, strerror(errno));
dvmAbort();
}
err = setresuid(uid, uid, uid);
if (err < 0) {
ALOGE("cannot setresuid(%d): %s", uid, strerror(errno));
dvmAbort();
}
if (needsNoRandomizeWorkaround()) {
int current = personality(0xffffFFFF);
int success = personality((ADDR_NO_RANDOMIZE | current));
if (success == -1) {
ALOGW("Personality switch failed. current=%d error=%d\n", current, errno);
}
}
err = setCapabilities(permittedCapabilities, effectiveCapabilities);
if (err != 0) {
ALOGE("cannot set capabilities (%llx,%llx): %s",
permittedCapabilities, effectiveCapabilities, strerror(err));
dvmAbort();
}
err = set_sched_policy(0, SP_DEFAULT);
if (err < 0) {
ALOGE("cannot set_sched_policy(0, SP_DEFAULT): %s", strerror(-err));
dvmAbort();
}
err = setSELinuxContext(uid, isSystemServer, seInfo, niceName);
if (err < 0) {
ALOGE("cannot set SELinux context: %s\n", strerror(errno));
dvmAbort();
}
// Set the comm to a nicer name.
if (isSystemServer && niceName == NULL) {
dvmSetThreadName("system_server");
} else {
dvmSetThreadName(niceName);
}
// These free(3) calls are safe because we know we're only ever forking
// a single-threaded process, so we know no other thread held the heap
// lock when we forked.
free(seInfo);
free(niceName);
/*
* Our system thread ID has changed. Get the new one.
*/
Thread* thread = dvmThreadSelf();
thread->systemTid = dvmGetSysThreadId();
/* configure additional debug options */
enableDebugFeatures(debugFlags);
unsetSignalHandler();
gDvm.zygote = false;
if (!dvmInitAfterZygote()) {
ALOGE("error in post-zygote initialization");
dvmAbort();
}
} else if (pid > 0) {
/* the parent process */
free(seInfo);
free(niceName);
}
return pid;
}
/*
* native public static int nativeForkAndSpecialize(int uid, int gid,
* int[] gids, int debugFlags, int[][] rlimits, int mountExternal,
* String seInfo, String niceName);
*/
static void Dalvik_dalvik_system_Zygote_forkAndSpecialize(const u4* args,
JValue* pResult)
{
pid_t pid;
pid = forkAndSpecializeCommon(args, false);
RETURN_INT(pid);
}
/*
* native public static int nativeForkSystemServer(int uid, int gid,
* int[] gids, int debugFlags, int[][] rlimits,
* long permittedCapabilities, long effectiveCapabilities);
*/
static void Dalvik_dalvik_system_Zygote_forkSystemServer(
const u4* args, JValue* pResult)
{
pid_t pid;
pid = forkAndSpecializeCommon(args, true);
/* The zygote process checks whether the child process has died or not. */
if (pid > 0) {
int status;
ALOGI("System server process %d has been created", pid);
gDvm.systemServerPid = pid;
/* There is a slight window that the system server process has crashed
* but it went unnoticed because we haven't published its pid yet. So
* we recheck here just to make sure that all is well.
*/
if (waitpid(pid, &status, WNOHANG) == pid) {
ALOGE("System server process %d has died. Restarting Zygote!", pid);
kill(getpid(), SIGKILL);
}
}
RETURN_INT(pid);
}
const DalvikNativeMethod dvm_dalvik_system_Zygote[] = {
{ "nativeFork", "()I",
Dalvik_dalvik_system_Zygote_fork },
{ "nativeForkAndSpecialize", "(II[II[[IILjava/lang/String;Ljava/lang/String;)I",
Dalvik_dalvik_system_Zygote_forkAndSpecialize },
{ "nativeForkSystemServer", "(II[II[[IJJ)I",
Dalvik_dalvik_system_Zygote_forkSystemServer },
{ NULL, NULL, NULL },
};