docker 基础工作原理（一）

最新推荐文章于 2024-06-13 17:49:07 发布

cbl709

最新推荐文章于 2024-06-13 17:49:07 发布

阅读量1.7w

点赞数 2

分类专栏： linux应用程序文章标签： docker namespace cgroup

本文链接：https://blog.csdn.net/cbl709/article/details/42570161

版权

注：以下博文来源于我的独立博客网站：http://www.chenbiaolong.com/，由于原网站是用markdown写的，复制到这边格式有点问题。

以后博文将主要迁移到www.chenbiaolong.com博客，各位可以多多关注支持。

相信很多人和我一样，初学docker时一直无法搞懂docker镜像的工作机理。这几天对docker如何工作进行了一番研究，简单整理一下。

docker的两大核心基础技术是namespace和cgroup，cgroup主要作资源的限制隔离，它可以限制一组进程中能使用的最大资源使用量，相对比较好理解；namespace同样可以实现资源隔离，不同的是它是通过使PID,IPC,Network等系统资源不再是全局性的，而是属于特定的Namespace实现的。每个Namespace里面的资源对其他Namespace都是透明的，这个概念有点类似于linux的多用户机制。

namespace的详细介绍可以参考Introduction to Linux namespaces系列文章，国内已经有人对这系列博文进行了翻译：linux namespace简介，本文接下来对namepace的介绍主要参考以上两篇博文。
现在linux内核中支持的namespace主要有：

Mount namespaces (CLONE_NEWNS)
isolate the set of filesystem mount points seen by a group of processes. Thus, processes in different mount namespaces can have different views of the filesystem hierarchy. With the addition of mount namespaces, the mount() and umount() system calls ceased operating on a global set of mount points visible to all processes on the system and instead performed operations that affected just the mount namespace associated with the calling process.

UTS namespaces(CLONE_NEWUTS)
isolate two system identifiers—nodename and domainname—returned by the uname() system call; the names are set using the sethostname() and setdomainname() system calls. In the context of containers, the UTS namespaces feature allows each container to have its own hostname and NIS domain name. This can be useful for initialization and configuration scripts that tailor their actions based on these names. The term “UTS” derives from the name of the structure passed to the uname() system call: struct utsname. The name of that structure in turn derives from “UNIX Time-sharing System”

IPC namespaces (CLONE_NEWIPC)
isolate certain interprocess communication (IPC) resources, namely, System V IPC objects and (since Linux 2.6.30) POSIX message queues. The common characteristic of these IPC mechanisms is that IPC objects are identified by mechanisms other than filesystem pathnames. Each IPC namespace has its own set of System V IPC identifiers and its own POSIX message queue filesystem.

PID namespaces (CLONE_NEWPID, Linux 2.6.24)
isolate the process ID number space. In other words, processes in different PID namespaces can have the same PID. One of the main benefits of PID namespaces is that containers can be migrated between hosts while keeping the same process IDs for the processes inside the container. PID namespaces also allow each container to have its own init (PID 1), the “ancestor of all processes” that manages various system initialization tasks and reaps orphaned child processes when they terminate.

Network namespaces (CLONE_NEWNET, started in Linux 2.4.19 2.6.24 and largely completed by about Linux 2.6.29)
provide isolation of the system resources associated with networking. Thus, each network namespace has its own network devices, IP addresses, IP routing tables, /proc/net directory, port numbers, and so on.Network namespaces make containers useful from a networking perspective: each container can have its own (virtual) network device and its own applications that bind to the per-namespace port number space; suitable routing rules in the host system can direct network packets to the network device associated with a specific container. Thus, for example, it is possible to have multiple containerized web servers on the same host system, with each server bound to port 80 in its (per-container) network namespace.

User namespaces (CLONE_NEWUSER, started in Linux 2.6.23 and completed in Linux 3.8)
isolate the user and group ID number spaces. In other words, a process’s user and group IDs can be different inside and outside a user namespace. The most interesting case here is that a process can have a normal unprivileged user ID outside a user namespace while at the same time having a user ID of 0 inside the namespace. This means that the process has full root privileges for operations inside the user namespace, but is unprivileged for operations outside the namespace.

在这篇博文中，将主要介绍PID namespace和mount namepace。通过这两个namespace模拟docker在基础文件系统中运行的原理。我们将利用busybox建立一个可以满足linux系统运行的基础环境，并且通过chroot切换根目录，实现环境隔离。

利用chroot和busybox实现文件系统隔离

chroot可以实现根路径的切换，但如果新的根路径环境下没有基础库和程序（比如bash），那么chroot将不能正常切换根路径。busybox提供了能保证linux系统正常运行的基础工具（如bash、ls等命令工具），我们可以利用chroot+busybox在我们本地系统中建立一个新的沙箱系统（当然此时并没有名字空间的隔离，只是根文件系统实现了隔离）。
首先从官方下载busybox的源码，这里我使用的是1.22版本。解压后直接使用默认配置,然后运行make,make install。

     
     
     
      
      
      [root
      
      
      @localhost busybox-
      
      
      1.22.
      
      
      1]
      
      
      # make defconfig #使用默认配置
     
     
     
     
     
     
      
      
      [root
      
      
      @localhost busybox-
      
      
      1.22.
      
      
      1]
      
      
      # make
     
     
     
     
     
     
      
      
      [root
      
      
      @localhost busybox-
      
      
      1.22.
      
      
      1]
      
      
      # make install

一切顺利的话，系统会在当前路径下生成一个_install文件夹。

     
     
     
      
      
      [root
      
      
      @localhost busybox-
      
      
      1.22.
      
      
      1] cd _install
     
     
     
     
     
     
      
      
      [root
      
      
      @localhost _install]
      
      
      #ls
     
     
     
     
     
     
      
      
      bin  linuxrc  sbin  usr

_install 文件夹中包含了许多基础工具

     
     
     
      
      
      [root@localhost _install]# 
      
      
      cd bin
     
     
     
     
     
     
      
      
      [root@localhost bin]# 
      
      
      ls
     
     
     
     
     
     
      
      
      ash      chgrp   cttyhack       dumpkmap  fgrep   
      
      
      hostname  kill     lsattr    more     netstat        printenv   rmdir         setserial  
      
      
      sync    usleep
     
     
     
     
     
     
      
      
      base64   chmod   date           
      
      
      echo      fsync   hush      linux32  lzop      mount       nice           
      
      
      ps         rpm           
      
      
      sh         tar     
      
      
      vi
     
     
     
     
     
     
      
      
      busybox  chown   dd             ed        getopt  ionice    linux64  makemime  mountpoint  pidof          
      
      
      pwd        run-parts     
      
      
      sleep      touch   watch
     
     
     
     
     
     
      
      
      cat      conspy  df             egrep     
      
      
      grep    iostat    
      
      
      ln       
      
      
      mkdir     mpstat      ping           reformime  scriptreplay  stat       true    zcat
     
     
     
     
     
     
      
      
      catv     
      
      
      cp      dmesg          false     gunzip  ipcalc    login    mknod     mt          ping6          rev        sed           stty       umount
     
     
     
     
     
     
      
      
      c

最低0.47元/天解锁文章

cbl709

关注

2
点赞
踩
12

收藏

觉得还不错? 一键收藏
1
评论
docker 基础工作原理（一）

注：以下博文来源于我的独立博客网站：http://www.chenbiaolong.com/以后博文将主要迁移到这个博客，各位可以多多关注支持。相信很多人和我一样，初学docker时一直无法搞懂docker镜像的工作机理。这几天对docker如何工作进行了一番研究，简单整理一下。docker的两大核心基础技术是namespace和cgroup，cgroup主要作资源的限制隔离，它可以限
复制链接

扫一扫