docker 基础工作原理(一)

注:以下博文来源于我的独立博客网站:http://www.chenbiaolong.com/,由于原网站是用markdown写的,复制到这边格式有点问题。

以后博文将主要迁移到www.chenbiaolong.com博客,各位可以多多关注支持。

相信很多人和我一样,初学docker时一直无法搞懂docker镜像的工作机理。这几天对docker如何工作进行了一番研究,简单整理一下。

docker的两大核心基础技术是namespace和cgroup,cgroup主要作资源的限制隔离,它可以限制一组进程中能使用的最大资源使用量,相对比较好理解;namespace同样可以实现资源隔离,不同的是它是通过使PID,IPC,Network等系统资源不再是全局性的,而是属于特定的Namespace实现的。每个Namespace里面的资源对其他Namespace都是透明的,这个概念有点类似于linux的多用户机制。

namespace的详细介绍可以参考Introduction to Linux namespaces系列文章,国内已经有人对这系列博文进行了翻译:linux namespace简介,本文接下来对namepace的介绍主要参考以上两篇博文。
现在linux内核中支持的namespace主要有:

  • Mount namespaces (CLONE_NEWNS)
    isolate the set of filesystem mount points seen by a group of processes. Thus, processes in different mount namespaces can have different views of the filesystem hierarchy. With the addition of mount namespaces, the mount() and umount() system calls ceased operating on a global set of mount points visible to all processes on the system and instead performed operations that affected just the mount namespace associated with the calling process.
  • UTS namespaces(CLONE_NEWUTS)
    isolate two system identifiers—nodename and domainname—returned by the uname() system call; the names are set using the sethostname() and setdomainname() system calls. In the context of containers, the UTS namespaces feature allows each container to have its own hostname and NIS domain name. This can be useful for initialization and configuration scripts that tailor their actions based on these names. The term “UTS” derives from the name of the structure passed to the uname() system call: struct utsname. The name of that structure in turn derives from “UNIX Time-sharing System”
  • IPC namespaces (CLONE_NEWIPC)
    isolate certain interprocess communication (IPC) resources, namely, System V IPC objects and (since Linux 2.6.30) POSIX message queues. The common characteristic of these IPC mechanisms is that IPC objects are identified by mechanisms other than filesystem pathnames. Each IPC namespace has its own set of System V IPC identifiers and its own POSIX message queue filesystem.
  • PID namespaces (CLONE_NEWPID, Linux 2.6.24)
    isolate the process ID number space. In other words, processes in different PID namespaces can have the same PID. One of the main benefits of PID namespaces is that containers can be migrated between hosts while keeping the same process IDs for the processes inside the container. PID namespaces also allow each container to have its own init (PID 1), the “ancestor of all processes” that manages various system initialization tasks and reaps orphaned child processes when they terminate.
  • Network namespaces (CLONE_NEWNET, started in Linux 2.4.19 2.6.24 and largely completed by about Linux 2.6.29)
    provide isolation of the system resources associated with networking. Thus, each network namespace has its own network devices, IP addresses, IP routing tables, /proc/net directory, port numbers, and so on.Network namespaces make containers useful from a networking perspective: each container can have its own (virtual) network device and its own applications that bind to the per-namespace port number space; suitable routing rules in the host system can direct network packets to the network device associated with a specific container. Thus, for example, it is possible to have multiple containerized web servers on the same host system, with each server bound to port 80 in its (per-container) network namespace.
  • User namespaces (CLONE_NEWUSER, started in Linux 2.6.23 and completed in Linux 3.8)
    isolate the user and group ID number spaces. In other words, a process’s user and group IDs can be different inside and outside a user namespace. The most interesting case here is that a process can have a normal unprivileged user ID outside a user namespace while at the same time having a user ID of 0 inside the namespace. This means that the process has full root privileges for operations inside the user namespace, but is unprivileged for operations outside the namespace.

在这篇博文中,将主要介绍PID namespace和mount namepace。通过这两个namespace模拟docker在基础文件系统中运行的原理。我们将利用busybox建立一个可以满足linux系统运行的基础环境,并且通过chroot切换根目录,实现环境隔离。

利用chroot和busybox实现文件系统隔离

chroot可以实现根路径的切换,但如果新的根路径环境下没有基础库和程序(比如bash),那么chroot将不能正常切换根路径。busybox提供了能保证linux系统正常运行的基础工具(如bash、ls等命令工具),我们可以利用chroot+busybox在我们本地系统中建立一个新的沙箱系统(当然此时并没有名字空间的隔离,只是根文件系统实现了隔离)。
首先从官方下载busybox的源码,这里我使用的是1.22版本。解压后直接使用默认配置,然后运行make,make install。

     
     
     
1
2
3
     
     
     
[root @localhost busybox- 1.22. 1] # make defconfig #使用默认配置
[root @localhost busybox- 1.22. 1] # make
[root @localhost busybox- 1.22. 1] # make install

一切顺利的话,系统会在当前路径下生成一个_install文件夹。

     
     
     
1
2
3
     
     
     
[root @localhost busybox- 1.22. 1] cd _install
[root @localhost _install] #ls
bin linuxrc sbin usr

_install 文件夹中包含了许多基础工具

     
     
     
1
2
3
4
5
6
7
8
     
     
     
[root@localhost _install]# cd bin
[root@localhost bin]# ls
ash chgrp cttyhack dumpkmap fgrep hostname kill lsattr more netstat printenv rmdir setserial sync usleep
base64 chmod date echo fsync hush linux32 lzop mount nice ps rpm sh tar vi
busybox chown dd ed getopt ionice linux64 makemime mountpoint pidof pwd run-parts sleep touch watch
cat conspy df egrep grep iostat ln mkdir mpstat ping reformime scriptreplay stat true zcat
catv cp dmesg false gunzip ipcalc login mknod mt ping6 rev sed stty umount
c
  • 2
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值