Jake Edge: 用systemd-nspawn创建容器

本文转发自:https://lwn.net/Articles/572957/

介绍了systemd-nspawn创建最轻量级的container。

 

 Typically Lennart Poettering gives his conference talks about various aspects of the systemd init replacement, and his presentation at LinuxCon Europewas in the same vein.  But, instead of the core functionality of systemd, he spoke about a mostly unknown utility that ships with it: systemd-nspawn. The tool started as a debugging aid for systemd development, but has many more uses than just that, he said.  In fact, systemd-nspawn is like thechroot command—but it is a "chroot on steroids" according to the title of his talk.


Poettering began by noting that most people think of systemd as an init system, which it is, but that's just where it started and it is more than that now.  Systemd is a set of "components needed to build up an operating system on top of the Linux kernel", he said.  As part of the development of systemd, the team looked at various kernel features to see if they were relevant to the project.

One of the features considered was containers.  Containers on Linux usually means either using LXC or libvirt LXC, he said. Those two are, he stressed, totally separate projects despite the name similarity.  Both are quite different from the well-known (and understood) chrootcommand (and underlying system call).  There is no configuration required for chroot, unlike the other two.  The systemd project needed a way to run inside of containers or virtual machines, but wanted a simple tool that was more like chroot than either LXC or libvirt LXC. Enter systemd-nspawn.

The idea was to write a tool that does much of what LXC and libvirt LXC do, but is easier to use.  It is targeted at "building, testing, debugging, and profiling", not at deployment.  systemd-nspawn uses the same kernel APIs that the other two tools use, but is not a competitor to them because it is not targeted at running in a production environment.

Like chroot, systemd-nspawn "just works" with "no configuration".  The latter is not quite true, Poettering said, but the configuration has been deliberately kept simple. As an example, he showed the yum command needed to create a minimal Fedora 19 installation in a directory (similar commands for multiple distributions are available in the man page).  That became the basis for his subsequent demos.

After setting up a distribution directory, one can boot a container with a simple command (as root):

    systemd-nspawn -bD dir

The -D specifies the root directory for the container and-b says to boot it using systemd inside the container. Omitting -b is similar to booting a kernel with theinit=/bin/bash command-line parameter, which results in a root shell. While he called it "booting" a container, there is no actual kernel boot that occurs as all of the containers are running under the host kernel. Poettering then showed that starting the container goes through the normal startup sequence for the distribution by starting various services inside the container and so on.  When complete, you get a login prompt.


Logging in as "root" with no password enters the container, which, unsurprisingly, looks like a Fedora 19 installation.  It is a "full container", Poettering said; additional software can be installed inside it using yum, for example.  He showed a ps command both inside and outside the container to show that the processes were running on the system (of course) but that they had different PIDs inside and outside.

The container will automatically get its network configuration and time from the host, but set its hostname based on the directory name (or-M name).  It also bind mounts /etc/resolv.conf from the host so that name resolution works inside the container.  As one might expect, when finished with the container you can poweroff to shut it down or use reboot to restart it.

Poettering then moved on to some tools that make it easier to work with the "nspawn containers" as well as some work that the team has done to make standard tools report things like container names.  For example,cgls (which was systemd-cgls in earlier systemd releases) shows control groups and their processes in a tree-like structure similar to that of pstree.  Also, systemd-cgtop shows control groups in at top-like display, sorting them based on which are using the most CPU time.

Another addition is the machinectlcommand that manages "machines" (either containers or virtual machines) for systemd.   When nspawn creates a new container, it registers that machine with systemd over D-Bus.  Those machines can then be monitored and managed using machinectl.  For example:

    machinectl status mname

That will show status of the machine called mname.  That name is also integrated with tools like ps so that one can specifymachine as an output column to see which container a process is running in.  The machine name registration is also done by libvirt LXC, so those containers are treated similarly; so far, though, LXC is not using the facility.  One of the goals is to eventually allow systemctl(the systemd management program) to take a machine-name argument and have it operate on the instance inside the machine.


The integration with machine names means that systemd-nspawn does require a system that has been booted by systemd in order to function. Earlier versions of systemd shipped with an independent nspawn, but that has fallen by the wayside.

Centralizing the system log information for the nspawn containers, while still allowing getting that information on a per-container basis, is handled by integration with the Journal. Using the -j option to nspawn will link the container's journal with that of the host.  The Journal is "a little like syslog except that it is indexed", Poettering said.  With linked journals, the system logs for multiple containers can be monitored or queried from the host.

Another feature of nspawn is that it can isolate the container from the host network.  As mentioned earlier, by default nspawn inherits the network of the host, but the --private-network argument will create a container without any network devices other than loopback.  That is "ideal for build systems", Poettering said, which shouldn't need the network after the initial package retrieval.

Nspawn is quite useful in a number of scenarios and the systemd team has used it extensively to debug systemd itself, he said.  Normally, an init system is difficult to debug, but when you can use gdb,strace, and similar tools from the host to the programs running in a container, it makes it much easier.  It is a tool that more people, especially in the "DevOps" community, should be aware of, he said—his talk, and articles like this, will hopefully start getting that word out.

关注Linuer


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值