linux命名空间

来自:dockerone.com网站的学习教程系列:

Linux命名空间学习教程(一) UTS
Linux命名空间学习教程(二) IPC
Linux命名空间学习教程(三) PID
Linux命名空间学习教程(四)NS(FS)
Linux命名空间学习教程(五)NET


Linux Namespaces机制 [转]

 addr: http://www.cnblogs.com/lisperl/archive/2012/05/03/2480316.html

       Linux Namespaces机制提供一种资源隔离方案。PID,IPC,Network等系统资源不再是全局性的,而是属于特定的Namespace。每个Namespace里面的资源对其他Namespace都是透明的。要创建新的Namespace,只需要在调用clone时指定相应的flag。Linux Namespaces机制为实现基于容器的虚拟化技术提供了很好的基础,LXC(Linux containers)就是利用这一特性实现了资源的隔离。不同container内的进程属于不同的Namespace,彼此透明,互不干扰。下面我们就从clone系统调用的flag出发,来介绍各个Namespace。

当调用clone时,设定了CLONE_NEWPID,就会创建一个新的PID Namespace,clone出来的新进程将成为Namespace里的第一个进程。一个PID Namespace为进程提供了一个独立的PID环境,PID Namespace内的PID将从1开始,在Namespace内调用fork,vfork或clone都将产生一个在该Namespace内独立的PID。新创建的Namespace里的第一个进程在该Namespace内的PID将为1,就像一个独立的系统里的init进程一样。该Namespace内的孤儿进程都将以该进程为父进程,当该进程被结束时,该Namespace内所有的进程都会被结束。PID Namespace是层次性,新创建的Namespace将会是创建该Namespace的进程属于的Namespace的子Namespace。子Namespace中的进程对于父Namespace是可见的,一个进程将拥有不止一个PID,而是在所在的Namespace以及所有直系祖先Namespace中都将有一个PID。系统启动时,内核将创建一个默认的PID Namespace,该Namespace是所有以后创建的Namespace的祖先,因此系统所有的进程在该Namespace都是可见的。

当调用clone时,设定了CLONE_NEWIPC,就会创建一个新的IPC Namespace,clone出来的进程将成为Namespace里的第一个进程。一个IPC Namespace有一组System V IPC objects 标识符构成,这标识符有IPC相关的系统调用创建。在一个IPC Namespace里面创建的IPC object对该Namespace内的所有进程可见,但是对其他Namespace不可见,这样就使得不同Namespace之间的进程不能直接通信,就像是在不同的系统里一样。当一个IPC Namespace被销毁,该Namespace内的所有IPC object会被内核自动销毁。

PID Namespace和IPC Namespace可以组合起来一起使用,只需在调用clone时,同时指定CLONE_NEWPID和CLONE_NEWIPC,这样新创建的Namespace既是一个独立的PID空间又是一个独立的IPC空间。不同Namespace的进程彼此不可见,也不能互相通信,这样就实现了进程间的隔离。

当调用clone时,设定了CLONE_NEWNS,就会创建一个新的mount Namespace。每个进程都存在于一个mount Namespace里面,mount Namespace为进程提供了一个文件层次视图。如果不设定这个flag,子进程和父进程将共享一个mount Namespace,其后子进程调用mount或umount将会影响到所有该Namespace内的进程。如果子进程在一个独立的mount Namespace里面,就可以调用mount或umount建立一份新的文件层次视图。该flag配合pivot_root系统调用,可以为进程创建一个独立的目录空间。

当调用clone时,设定了CLONE_NEWNET,就会创建一个新的Network Namespace。一个Network Namespace为进程提供了一个完全独立的网络协议栈的视图。包括网络设备接口,IPv4和IPv6协议栈,IP路由表,防火墙规则,sockets等等。一个Network Namespace提供了一份独立的网络环境,就跟一个独立的系统一样。一个物理设备只能存在于一个Network Namespace中,可以从一个Namespace移动另一个Namespace中。虚拟网络设备(virtual network device)提供了一种类似管道的抽象,可以在不同的Namespace之间建立隧道。利用虚拟化网络设备,可以建立到其他Namespace中的物理设备的桥接。当一个Network Namespace被销毁时,物理设备会被自动移回init Network Namespace,即系统最开始的Namespace。

当调用clone时,设定了CLONE_NEWUTS,就会创建一个新的UTS Namespace。一个UTS Namespace就是一组被uname返回的标识符。新的UTS Namespace中的标识符通过复制调用进程所属的Namespace的标识符来初始化。Clone出来的进程可以通过相关系统调用改变这些标识符,比如调用sethostname来改变该Namespace的hostname。这一改变对该Namespace内的所有进程可见。CLONE_NEWUTS和CLONE_NEWNET一起使用,可以虚拟出一个有独立主机名和网络空间的环境,就跟网络上一台独立的主机一样。

以上所有clone flag都可以一起使用,为进程提供了一个独立的运行环境。LXC正是通过在clone时设定这些flag,为进程创建一个有独立PID,IPC,FS,Network,UTS空间的container。一个container就是一个虚拟的运行环境,对container里的进程是透明的,它会以为自己是直接在一个系统上运行的。

一个container就像传统虚拟化技术里面的一台安装了OS的虚拟机,但是开销更小,部署更为便捷。

作者曰:Linux Namespaces机制本身就是为了实现container based virtualizaiton开发的。它提供了一套轻量级、高效率的系统资源隔离方案,远比传统的虚拟化技术开销小,不过它也不是完美的,它为内核的开发带来了更多的复杂性,它在隔离性和容错性上跟传统的虚拟化技术比也还有差距。



Introducing Linux Network Namespaces【转】

      Addr: http://blog.scottlowe.org/2013/09/04/introducing-linux-network-namespaces/

In this post, I’m going to introduce you to the concept of Linux network namespaces. While it might seem a bit esoteric right now, trust me that there is a reason why I’m introducing you to network namespaces—if you, like me, are on a journey to better understand OpenStack, you’ll almost certainly run into network namespaces again.

So what are network namespaces? Generally speaking, an installation of Linux shares a single set of network interfaces and routing table entries. You can modify the routing table entries using policy routing (here’s an introduction I wrote and here’s a write-up on a potential use case for policy routing), but that doesn’t fundamentally change the fact that the set of network interfaces and routing tables/entries are shared across the entire OS. Network namespaces change that fundamental assumption. With network namespaces, you can have different and separate instances of network interfaces and routing tables that operate independent of each other.

This concept is probably best illustrated through some examples. Along the way, I’ll introduce a few new ideas as well. First, though, I need to provide some assumptions.

Assumptions

Throughout these examples, I’m using Ubuntu Server 12.04.3 LTS. Please note that support for network namespaces varies between Linux distributions; Ubuntu supports them but Red Hat doesn’t. (I’m not sure about Fedora. If you know, speak up in the comments.) If you’re thinking about using network namespaces, be sure your Linux distribution includes support.

Further, I’ll assume that you’re either running as root, or that you will prependsudo to the commands listed here as necessary.

Creating and Listing Network Namespaces

Creating a network namespace is actually quite easy. Just use this command:

ip netns add <new namespace name>

For example, let’s say you wanted to create a namespace called “blue”. You’d use this command:

ip netns add blue

To verify that the network namespace has been created, use this command:

ip netns list

You should see your network namespace listed there, ready for you to use.

Assigning Interfaces to Network Namespaces

Creating the network namespace is only the beginning; the next part is to assign interfaces to the namespaces, and then configure those interfaces for network connectivity. One thing that threw me off early in my exploration of network namespaces was that you couldn’t assign physical interfaces to a namespace. How in the world were you supposed to use them, then?

It turns out you can only assign virtual Ethernet (veth) interfaces to a network namespace. Virtual Ethernet interfaces are an interesting construct; they always come in pairs, and they are connected like a tube—whatever comes in one veth interface will come out the other peer veth interface. As a result, you can use veth interfaces to connect a network namespace to the outside world via the “default” or “global” namespace where physical interfaces exist.

Let’s see how that’s done. First, you’d create the veth pair:

ip link add veth0 type veth peer name veth1

I found a few sites that repeated this command to create veth1 and link it toveth0, but my tests showed that both interfaces were created and linked automatically using this command listed above. Naturally, you could substitute other names for veth0 and veth1, if you wanted.

You can verify that the veth pair was created using this command:

ip link list

You should see a pair of veth interfaces (using the names you assigned in the command above) listed there. Right now, they both belong to the “default” or “global” namespace, along with the physical interfaces.

Let’s say that you want to connect the global namespace to the blue namespace. To do that, you’ll need to move one of the veth interfaces to the blue namespace using this command:

ip link set veth1 netns blue

If you then run the ip link list command again, you’ll see that the veth1 interface has disappeared from the list. It’s now in the blue namespace, so to see it you’d need to run this command:

ip netns exec blue ip link list

Whoa! That’s a bit of a complicated command. Let’s break it down:

  • The first part, ip netns exec, is how you execute commands in a different network namespace.

  • Next is the specific namespace in which the command should be run (in this case, the blue namespace).

  • Finally, you have the actual command to be executed in the remote namespace. In this case, you want to see the interfaces in the blue namespace, so you run ip link list.

When you run that command, you should see a loopback interface and the veth1 interface you moved over earlier.

Configuring Interfaces in Network Namespaces

Now that veth1 has been moved to the blue namespace, we need to actually configure that interface. Once again, we’ll use the ip netns exec command, this time to configure the veth1 interface in the blue namespace:

ip netns exec blue ifconfig veth1 10.1.1.1/24 up

As before, the format this command follows is:

ip netns exec <network namespace> <command to run against that namespace>

In this case, you’re using ifconfig to assign an IP address to the veth1 interface and bring that interface up. (Note: you could use the ip addrip route, and ip link commands to accomplish the same thing.)

Once the veth1 interface is up, you can verify that the network configuration of the blue namespace is completely separate by just using a few different commands. For example, let’s assume that your “global” namespace has physical interfaces in the 172.16.1.0/24 range, and your veth1 interface is in a separate namespace and assigned something from the 10.1.1.0/24 range. You could verify how network namespaces keep the network configuration separate using these commands:

  • ip addr list in the global namespace will not show any 10.1.1.0/24-related interfaces or addresses.

  • ip netns exec blue ip addr list will show only the 10.1.1.0/24-related interfaces and addresses, and will not show any interfaces or addresses from the global namespace.

  • Similarly, ip route list in each namespace will show different routing table entries, including different default gateways.

Connecting Network Namespaces to the Physical Network

This part of it threw me for a while. I can’t really explain why, but it did. Once I’d figured it out, it was obvious. To connect a network namespace to the physical network, just use a bridge. In my case, I used an Open vSwitch (OVS) bridge, but a standard Linux bridge would work as well. Place one or more physical interfaces as well as one of the veth interfaces in the bridge, and—bam!—there you go. Naturally, if you had different namespaces, you’d probably want/need to connect them to different physical networks or different VLANs on the physical network.

So there you go—an introduction to Linux network namespaces. It’s quite likely I’ll build on this content later, so while it seems a bit obscure right now just hang on to this knowledge. In the meantime, if you have questions, clarifications, or other information worth sharing with other readers, please feel free to speak up in the comments.


 Separation Anxiety: A Tutorial for Isolating Your System with Linux Namespaces[转]

                Addr:http://www.toptal.com/linux/separation-anxiety-isolating-your-system-with-linux-namespaces
                        
   

    With the advent of tools like DockerLinux Containers, and others, it has become super easy to isolate Linux processes into their own little system environments. This makes it possible to run a whole range of applications on a single real Linux machine and ensure no two of them can interfere with each other, without having to resort to using virtual machines. These tools have been a huge boon to PaaS providers. But what exactly happens under the hood?

These tools rely on a number of features and components of the Linux kernel. Some of these features were introduced fairly recently, while others still require you to patch the kernel itself. But one of the key components, using Linux namespaces, has been a feature of Linux since version 2.6.24 was released in 2008.

Anyone familiar with chroot already has a basic idea of what Linux namespaces can do and how to use namespace generally. Just as chroot allows processes to see any arbitrary directory as the root of the system (independent of the rest of the processes), Linux namespaces allow other aspects of the operating system to be independently modified as well. This includes the process tree, networking interfaces, mount points, inter-process communication resources and more.

Why Use Namespaces for Process Isolation?

In a single-user computer, a single system environment may be fine. But on a server, where you want to run multiple services, it is essential to security and stability that the services are as isolated from each other as possible. Imagine a server running multiple services, one of which gets compromised by an intruder. In such a case, the intruder may be able to exploit that service and work his way to the other services, and may even be able compromise the entire server. Namespace isolation can provide a secure environment to eliminate this risk.

For example, using namespacing, it is possible to safely execute arbitrary or unknown programs on your server. Recently, there has been a growing number of programming contest and “hackathon” platforms, such as HackerRankTopCoderCodeforces, and many more. A lot of them utilize automated pipelines to run and validate programs that are submitted by the contestants. It is often impossible to know in advance the true nature of contestants’ programs, and some may even contain malicious elements. By running these programs namespaced in complete isolation from the rest of the system, the software can be tested and validated without putting the rest of the machine at risk. Similarly, online continuous integration services, such asDrone.io, automatically fetch your code repository and execute the test scripts on their own servers. Again, namespace isolation is what makes it possible to provide these services safely.

Namespacing tools like Docker also allow better control over processes’ use of system resources, making such tools extremely popular for use by PaaS providers. Services like Heroku and Google App Engine use such tools to isolate and run multiple web server applications on the same real hardware. These tools allow them to run each application (which may have been deployed by any of a number of different users) without worrying about one of them using too many system resources, or interfering and/or conflicting with other deployed services on the same machine. With such process isolation, it is even possible to have entirely different stacks of dependency softwares (and versions) for each isolated environment!

If you’ve used tools like Docker, you already know that these tools are capable of isolating processes in small “containers”. Running processes in Docker containers is like running them in virtual machines, only these containers are significantly lighter than virtual machines. A virtual machine typically emulates a hardware layer on top of your operating system, and then runs another operating system on top of that. This allows you to run processes inside a virtual machine, in complete isolation from your real operating system. But virtual machines are heavy! Docker containers, on the other hand, use some key features of your real operating system, including namespaces, and ensure a similar level of isolation, but without emulating the hardware and running yet another operating system on the same machine. This makes them very lightweight.

Process Namespace

Historically, the Linux kernel has maintained a single process tree. The tree contains a reference to every process currently running in a parent-child hierarchy. A process, given it has sufficient privileges and satisfies certain conditions, can inspect another process by attaching a tracer to it or may even be able to kill it.

With the introduction of Linux namespaces, it became possible to have multiple “nested” process trees. Each process tree can have an entirely isolated set of processes. This can ensure that processes belonging to one process tree cannot inspect or kill - in fact cannot even know of the existence of - processes in other sibling or parent process trees.

Every time a computer with Linux boots up, it starts with just one process, with process identifier (PID) 1. This process is the root of the process tree, and it initiates the rest of the system by performing the appropriate maintenance work and starting the correct daemons/services. All the other processes start below this process in the tree. The PID namespace allows one to spin off a new tree, with its own PID 1 process. The process that does this remains in the parent namespace, in the original tree, but makes the child the root of its own process tree.

With PID namespace isolation, processes in the child namespace have no way of knowing of the parent process’s existence. However, processes in the parent namespace have a complete view of processes in the child namespace, as if they were any other process in the parent namespace.

This namespace tutorial outlines the separation of various process trees using namespace systems in Linux.

It is possible to create a nested set of child namespaces: one process starts a child process in a new PID namespace, and that child process spawns yet another process in a new PID namespace, and so on.

With the introduction of PID namespaces, a single process can now have multiple PIDs associated with it, one for each namespace it falls under. In the Linux source code, we can see that a struct named pid, which used to keep track of just a single PID, now tracks multiple PIDs through the use of a struct named upid:

struct upid {
  int nr;                     // the PID value
  struct pid_namespace *ns;   // namespace where this PID is relevant
  // ...
};

struct pid {
  // ...
  int level;                  // number of upids
  struct upid numbers[0];     // array of upids
};

To create a new PID namespace, one must call the clone() system call with a special flag CLONE_NEWPID. (C provides a wrapper to expose this system call, and so do many other popular languages.) Whereas the other namespaces discussed below can also be created using the unshare() system call, a PID namespace can only be created at the time a new process is spawned using clone(). Once clone() is called with this flag, the new process immediately starts in a new PID namespace, under a new process tree. This can be demonstrated with a simple C program:

#define _GNU_SOURCE
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>

static char child_stack[1048576];

static int child_fn() {
  printf("PID: %ld\n", (long)getpid());
  return 0;
}

int main() {
  pid_t child_pid = clone(child_fn, child_stack+1048576, CLONE_NEWPID | SIGCHLD, NULL);
  printf("clone() = %ld\n", (long)child_pid);

  waitpid(child_pid, NULL, 0);
  return 0;
}

Compile and run this program with root privileges and you will notice an output that resembles this:

clone() = 5304
PID: 1

The PID, as printed from within the child_fn, will be 1.

Even though this namespace tutorial code above is not much longer than “Hello, world” in some languages, a lot has happened behind the scenes. The clone() function, as you would expect, has created a new process by cloning the current one and started execution at the beginning of the child_fn() function. However, while doing so, it detached the new process from the original process tree and created a separate process tree for the new process.

Try replacing the static int child_fn() function with the following, to print the parent PID from the isolated process’s perspective:

static int child_fn() {
  printf("Parent PID: %ld\n", (long)getppid());
  return 0;
}

Running the program this time yields the following output:

clone() = 11449
Parent PID: 0

Notice how the parent PID from the isolated process’s perspective is 0, indicating no parent. Try running the same program again, but this time, remove the CLONE_NEWPID flag from within the clone() function call:

pid_t child_pid = clone(child_fn, child_stack+1048576, SIGCHLD, NULL);

This time, you will notice that the parent PID is no longer 0:

clone() = 11561
Parent PID: 11560

However, this is just the first step in our tutorial. These processes still have unrestricted access to other common or shared resources. For example, the networking interface: if the child process created above were to listen on port 80, it would prevent every other process on the system from being able to listen on it.

Linux Network Namespace

This is where a network namespace becomes useful. A network namespace allows each of these processes to see an entirely different set of networking interfaces. Even the loopback interface is different for each network namespace.

Isolating a process into its own network namespace involves introducing another flag to the clone() function call: CLONE_NEWNET;

#define _GNU_SOURCE
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>


static char child_stack[1048576];

static int child_fn() {
  printf("New `net` Namespace:\n");
  system("ip link");
  printf("\n\n");
  return 0;
}

int main() {
  printf("Original `net` Namespace:\n");
  system("ip link");
  printf("\n\n");

  pid_t child_pid = clone(child_fn, child_stack+1048576, CLONE_NEWPID | CLONE_NEWNET | SIGCHLD, NULL);

  waitpid(child_pid, NULL, 0);
  return 0;
}

Output:

Original `net` Namespace:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 00:24:8c:a1:ac:e7 brd ff:ff:ff:ff:ff:ff


New `net` Namespace:
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

What’s going on here? The physical ethernet device enp4s0 belongs to the global network namespace, as indicated by the “ip” tool run from this namespace. However, the physical interface is not available in the new network namespace. Moreover, the loopback device is active in the original network namespace, but is “down” in the child network namespace.

In order to provide a usable network interface in the child namespace, it is necessary to set up additional “virtual” network interfaces which span multiple namespaces. Once that is done, it is then possible to create Ethernet bridges, and even route packets between the namespaces. Finally, to make the whole thing work, a “routing process” must be running in the global network namespace to receive traffic from the physical interface, and route it through the appropriate virtual interfaces to to the correct child network namespaces. Maybe you can see why tools like Docker, which do all this heavy lifting for you, are so popular!

Linux network namespace is comprised of a routing process to multiple child net namespaces.

To do this by hand, you can create a pair of virtual Ethernet connections between a parent and a child namespace by running a single command from the parent namespace:

ip link add name veth0 type veth peer name veth1 netns <pid>

Here, <pid> should be replaced by the process ID of the process in the child namespace as observed by the parent. Running this command establishes a pipe-like connection between these two namespaces. The parent namespace retains the veth0 device, and passes the veth1 device to the child namespace. Anything that enters one of the ends, comes out through the other end, just as you would expect from a real Ethernet connection between two real nodes. Accordingly, both sides of this virtual Ethernet connection must be assigned IP addresses.

Mount Namespace

Linux also maintains a data structure for all the mountpoints of the system. It includes information like what disk partitions are mounted, where they are mounted, whether they are readonly, et cetera. With Linux namespaces, one can have this data structure cloned, so that processes under different namespaces can change the mountpoints without affecting each other.

Creating separate mount namespace has an effect similar to doing a chroot()chroot() is good, but it does not provide complete isolation, and its effects are restricted to the root mountpoint only. Creating a separate mount namespace allows each of these isolated processes to have a completely different view of the entire system’s mountpoint structure from the original one. This allows you to have a different root for each isolated process, as well as other mountpoints that are specific to those processes. Used with care per this tutorial, you can avoid exposing any information about the underlying system.

Learning how to use namespace correctly has multiple benefits as outlined in this namespace tutorial.

The clone() flag required to achieve this is CLONE_NEWNS:

clone(child_fn, child_stack+1048576, CLONE_NEWPID | CLONE_NEWNET | CLONE_NEWNS | SIGCHLD, NULL)

Initially, the child process sees the exact same mountpoints as its parent process would. However, being under a new mount namespace, the child process can mount or unmount whatever endpoints it wants to, and the change will affect neither its parent’s namespace, nor any other mount namespace in the entire system. For example, if the parent process has a particular disk partition mounted at root, the isolated process will see the exact same disk partition mounted at the root in the beginning. But the benefit of isolating the mount namespace is apparent when the isolated process tries to change the root partition to something else, as the change will only affect the isolated mount namespace.

Interestingly, this actually makes it a bad idea to spawn the target child process directly with the CLONE_NEWNSflag. A better approach is to start a special “init” process with the CLONE_NEWNS flag, have that “init” process change the “/”, “/proc”, “/dev” or other mountpoints as desired, and then start the target process. This is discussed in a little more detail near the end of this namespace tutorial.

Other Namespaces

There are other namespaces that these processes can be isolated into, namely user, IPC, and UTS. The user namespace allows a process to have root privileges within the namespace, without giving it that access to processes outside of the namespace. Isolating a process by the IPC namespace gives it its own interprocess communication resources, for example, System V IPC and POSIX messages. The UTS namespace isolates two specific identifiers of the system: nodename and domainname.

A quick example to show how UTS namespace is isolated is shown below:

#define _GNU_SOURCE
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/utsname.h>
#include <sys/wait.h>
#include <unistd.h>


static char child_stack[1048576];

static void print_nodename() {
  struct utsname utsname;
  uname(&utsname);
  printf("%s\n", utsname.nodename);
}

static int child_fn() {
  printf("New UTS namespace nodename: ");
  print_nodename();

  printf("Changing nodename inside new UTS namespace\n");
  sethostname("GLaDOS", 6);

  printf("New UTS namespace nodename: ");
  print_nodename();
  return 0;
}

int main() {
  printf("Original UTS namespace nodename: ");
  print_nodename();

  pid_t child_pid = clone(child_fn, child_stack+1048576, CLONE_NEWUTS | SIGCHLD, NULL);

  sleep(1);

  printf("Original UTS namespace nodename: ");
  print_nodename();

  waitpid(child_pid, NULL, 0);

  return 0;
}

This program yields the following output:

Original UTS namespace nodename: XT
New UTS namespace nodename: XT
Changing nodename inside new UTS namespace
New UTS namespace nodename: GLaDOS
Original UTS namespace nodename: XT

Here, child_fn() prints the nodename, changes it to something else, and prints it again. Naturally, the change happens only inside the new UTS namespace.

More information on what all of the namespaces provide and isolate can be found in the tutorial here

Cross-Namespace Communication

Often it is necessary to establish some sort of communication between the parent and the child namespace. This might be for doing configuration work within an isolated environment, or it can simply be to retain the ability to peek into the condition of that environment from outside. One way of doing that is to keep an SSH daemon running within that environment. You can have a separate SSH daemon inside each network namespace. However, having multiple SSH daemons running uses a lot of valuable resources like memory. This is where having a special “init” process proves to be a good idea again.

The “init” process can establish a communication channel between the parent namespace and the child namespace. This channel can be based on UNIX sockets or can even use TCP. To create a UNIX socket that spans two different mount namespaces, you need to first create the child process, then create the UNIX socket, and then isolate the child into a separate mount namespace. But how can we create the process first, and isolate it later? Linux provides unshare(). This special system call allows a process to isolate itself from the original namespace, instead of having the parent isolate the child in the first place. For example, the following code has the exact same effect as the code previously mentioned in the network namespace section:

#define _GNU_SOURCE
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>


static char child_stack[1048576];

static int child_fn() {
  // calling unshare() from inside the init process lets you create a new namespace after a new process has been spawned
  unshare(CLONE_NEWNET);

  printf("New `net` Namespace:\n");
  system("ip link");
  printf("\n\n");
  return 0;
}

int main() {
  printf("Original `net` Namespace:\n");
  system("ip link");
  printf("\n\n");

  pid_t child_pid = clone(child_fn, child_stack+1048576, CLONE_NEWPID | SIGCHLD, NULL);

  waitpid(child_pid, NULL, 0);
  return 0;
}

And since the “init” process is something you have devised, you can make it do all the necessary work first, and then isolate itself from the rest of the system before executing the target child.

Conclusion

This tutorial is just an overview of how to use namespaces in Linux. It should give you a basic idea of how aLinux developer might start to implement system isolation, an integral part of the architecture of tools like Docker or Linux Containers. In most cases, it would be best to simply use one of these existing tools, which are already well-known and tested. But in some cases, it might make sense to have your very own, customized process isolation mechanism, and in that case, this namespace tutorial will help you out tremendously.

There is a lot more going on under the hood than I’ve covered in this article, and there are more ways you might want to limit your target processes for added safety and isolation. But, hopefully, this can serve as a useful starting point for someone who is interested in knowing more about how namespace isolation with Linux really works.




  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值