Linux内核-容器之namespace

1. 介绍

简单玩了下Linux kernel为容器技术提供的基础设施之一namespace(另一个是cgroups),包括uts/user/pid/mnt/ipc/net六个(3.13.0的内核). 这东西主要用来做资源的隔离,我感觉本质上是全局资源的映射,映射之间独立了自然隔离了。主要涉及到的东西是:

  • clone
  • setns
  • unshare
  • /proc/pid/ns, /proc/pid/uid_map, /proc/pid/gid_map等

后面简单分析了一下内核源码里面是怎么实现这几个namespace以及以几个简单系统调用为例,看看namespace怎么产生影响的,然后简单分析下setns和unshare的实现


2. 测试流程及代码

下面是一些简单的例子,主要测试uts/pid/user/mnt四个namespace的效果,测试代码主要用到三个进程,一个是clone系统调用执行/bin/bash后的进程,也是生成新的子namespace的初始进程,然后是打开/proc/pid/ns下的namespace链接文件,用setns将第二个可执行文件的进程加入/bin/bash的进程的namespace(容器),并让其fork出一个子进程,测试pid namespace的差异。值得注意的几个点:

  • 不同版本的内核setns和unshare对namespace的支持不一样,较老的内核可能只支持ipc/net/uts三个namespace
  • 某个进程创建后其pid namespace就固定了,使用setns和unshare改变后,其本身的pid namespace不会改变,只有fork出的子进程的pid namespace改变(改变的是每个进程的nsproxy->pid_namespace_for_children)
  • 用setns添加mnt namespace应该放在其他namespace之后,否则可能出现无法打开/proc/pid/ns/…的错误
// 代码1: 开一些新的namespace(启动新容器)
#define _GNU_SOURCE
#include <sys/wait.h>
#include <sched.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define errExit(msg)  do { perror(msg); exit(EXIT_FAILURE); \
} while (0)

/* Start function for cloned child */
static int childFunc(void *arg)
{
  const char *binary = "/bin/bash";
  char *const argv[] = {
    "/bin/bash",
    NULL
  };
  char *const envp[] = { NULL };

  /* wrappers for execve */
  // has const char * as argument list
  // execl 
  // execle  => has envp
  // execlp  => need search PATH 

  // has char *const arr[] as argument list 
  // execv 
  // execvpe => need search PATH and has envp
  // execvp  => need search PATH 

  //int ret = execve(binary, argv, envp);
  int ret = execv(binary, argv);
  if (ret < 0) {
    errExit("execve error");
  }
  return ret;
}

#define STACK_SIZE (1024 * 1024)    /* Stack size for cloned child */

int main(int argc, char *argv[])
{
  char *stack; 
  char *stackTop;                 
  pid_t pid;
  stack = malloc(STACK_SIZE);
  if (stack == NULL)
    errExit("malloc");
  stackTop = stack + STACK_SIZE;  /* Assume stack grows downward */

  //pid = clone(childFunc, stackTop, CLONE_NEWUTS | CLONE_NEWNS | CLONE_NEWPID | CLONE_NEWUSER | SIGCHLD, NULL);
  pid = clone(childFunc, stackTop, CLONE_NEWUTS | CLONE_NEWNS | CLONE_NEWPID | CLONE_NEWUSER | CLONE_NEWIPC | SIGCHLD, NULL);
//pid = clone(childFunc, stackTop, CLONE_NEWUTS | //CLONE_NEWNS | CLONE_NEWPID | CLONE_NEWUSER | CLONE_NEWIPC //| CLONE_NEWNET | SIGCHLD, NULL);
  if (pid == -1)
    errExit("clone");
  printf("clone() returned %ld\n", (long) pid);

  if (waitpid(pid, NULL, 0) == -1)  
    errExit("waitpid");
  printf("child has terminated\n");

  exit(EXIT_SUCCESS);
}
// 代码2: 使用setns加入新进程
#define _GNU_SOURCE  // ?
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>
#include <sys/utsname.h>
#include <unistd.h>
#include <sys/types.h>
#include <sched.h>
#include <fcntl.h>
#include <wait.h>

// mainly setns and unshare system calls

/* int setns(int fd, int nstype); */

// 不同版本内核/proc/pid/ns下namespace文件情况
/*
   CLONE_NEWCGROUP (since Linux 4.6)
   fd must refer to a cgroup namespace.

   CLONE_NEWIPC (since Linux 3.0)
   fd must refer to an IPC namespace.

   CLONE_NEWNET (since Linux 3.0)
   fd must refer to a network namespace.

   CLONE_NEWNS (since Linux 3.8)
   fd must refer to a mount namespace.

   CLONE_NEWPID (since Linux 3.8)
   fd must refer to a descendant PID namespace.

   CLONE_NEWUSER (since Linux 3.8)
   fd must refer to a user namespace.

   CLONE_NEWUTS (since Linux 3.0)
   fd must refer to a UTS namespace.
   */

/* // 特殊的pid namespace 
   CLONE_NEWPID behaves somewhat differently from the other nstype
values: reassociating the calling thread with a PID namespace changes
only the PID namespace that child processes of the caller will be
created in; it does not change the PID namespace of the caller
itself.  Reassociating with a PID namespace is allowed only if the
PID namespace specified by fd is a descendant (child, grandchild,
etc.)  of the PID namespace of the caller.  For further detail
  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值