Using Zones — An Example
The following example demonstrates the features provided by zones that facilitate consolidation. It shows how to run the two Oracle workloads from the Managing Workloads example on page 22 in a Solaris Container using zones. In that example, both workloads shared the same physical system as well as the file system namespace, name service, network port namespace, user and group namespaces, and more. The sharing of these namespaces can lead to undesirable and sometimes difficult to manage situations, such as when the databases are managed by two different DBA groups. The fact that there is only one oracle user requires close coordination between the DBA groups, since changes made to that user’s environment by one DBA group may impact the other database instance. The same holds true for the sharing of the file system namespace, where a single /var/opt/oratab file is used by multiple Oracle instances.
Sharing namespaces can also inhibit the consolidation from a large number of servers onto fewer systems. Existing procedures and scripts may, for example, assume the system is dedicated to the application. Making changes to these procedures and scripts may be difficult, costly or even impossible. Solaris Zones help resolve these issues because each zone is a virtualized environment with its own private namespaces that can be managed independently of other zones on the system.
For instance, the oracle user in one zone is a completely different user from the oracle user in another zone — they can have different uids, passwords, login shells, home directories, etc. By running each Oracle instance in its own zone, the instances can be completely isolated from each other, simplifying their management. As far as the Oracle instance is concerned, it still runs on a dedicated system.
Requirements
Two zones each running their own Oracle instance are created. The zones require approximately 100 MB of disk space, and the Oracle software and a database each require about 4 GB of disk space.
Note – In this chapter, the prompt is set to the zone name to distinguish between the different zones.
Preparation
The Oracle instances for the sales and marketing databases are recreated in Zones in this example. Consequently, the existing instances created in Chapter 4 should be stopped and the associated user, projects and file system should be deleted. The pool configuration built in Chapter 6 should be disabled.
global # svcadm disable salesdb global # svcadm disable mktdb global # svccfg delete salesdb global # svccfg delete mktdb global # userdel -r oracle global # projdel ora_sales global # projdel ora_mkt global # projdel group.dba global # pooladm -x global # pooladm -d |
Creating the First Zone
The zone used for the marketing database is named mkt. To show how a file system is added to a zone, a separate file system is created on a SVM soft partition (d200). The file system may, of course, also be created on a standard disk slice. The virtual network interface for the zone with IP address 192.168.1.14 is configured on the physical interface hme0 of the system. The directory for the zone is created in the global zone by the global zone administrator. The directory used for the zone must be owned by root and have mode 700 to prevent normal users in the global zone from accessing the zone’s file system.
global # mkdir -p /export/zones/mkt global # chmod 700 /export/zones/mkt global # newfs /dev/md/rdsk/d200 |
Configuring the Zone
The zone is created based on the default template that defines resources used in a typical zone.
global # zonecfg -z mkt mkt: No such zone configured Use ’create’ to begin configuring a new zone. zonecfg:mkt> create zonecfg:mkt> set zonepath=/export/zones/mkt zonecfg:mkt> set autoboot=true |
The virtual network interface with IP address 192.168.1.14 is configured on the hme0 interface of the global zone.
zonecfg:mkt> add net zonecfg:mkt:net> set address=192.168.1.14/24 zonecfg:mkt:net> set physical=hme0 zonecfg:mkt:net> end |
The file system for the Oracle binaries and datafiles in the mkt zone is created on a soft partion named d200 in the global zone. Add the following statements to the zone configuration to have the file system mounted in the zone automatically when the zone boots:
zonecfg:mkt> add fs zonecfg:mkt:fs> type=ufs zonecfg:mkt:fs> set type=ufs zonecfg:mkt:fs> set special=/dev/md/dsk/d200 zonecfg:mkt:fs> set raw=/dev/md/rdsk/d200 zonecfg:mkt:fs> set dir=/u01 zonecfg:mkt:fs> end zonecfg:mkt> verify zonecfg:mkt> commit zonecfg:mkt> exit |
The zone configuration is now complete. The verify command verifies that the current configuration is syntactically correct. The commit command writes the in-memory configuration to stable storage.
Installing the Zone
The zone is now ready to be installed on the system.
global # zoneadm -z mkt install Preparing to install zone <mkt>. Checking <ufs> file system on device </dev/md/rdsk/d200> to be mounted at </export/zones/mkt/root> Creating list of files to copy from the global zone. Copying <2584> files to the zone. Initializing zone product registry. Determining zone package initialization order. Preparing to initialize <916> packages on the zone. Initialized <916> packages on zone. Zone <mkt> is initialized. The file </export/zones/mkt/root/var/sadm/system/logs/install_log> contains a log of the zone installation. |
Booting the Zone
The zone can be booted with the zoneadm boot command. Since this is the first time the zone is booted after installation, the standard system identification questions must be answered, and are displayed on the zone’s console. The console can be accessed from the global zone using the zlogin(1M) command.
global # zoneadm -z mkt boot global # zlogin -C mkt [Connected to zone 'mkt' console] SunOS Release 5.10 Version Generic 64-bit Copyright 1983-2005 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. Hostname: mkt Loading smf(5) service descriptions: 100/100 |
At this point, the normal system identification process for a freshly installed Solaris OS instance is started. The output of this process is omitted here for brevity, and the configuration questions concerning the name service, time zone, etc., should be answered as appropriate for the site. After system identification is complete and the root password is set, the zone is ready for use.
SunOS Release 5.10 Version Generic 64-bit Copyright 1983-2005 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. Hostname: mkt mkt console login: |
To disconnect from the console use ~. (tilde dot) just like in tip(1). The zone can now be accessed over the network using the telnet(1), rlogin(1) or ssh(1) commands, just like a standard Solaris OS system. (Note that root can only login at the console unless the /etc/default/login file is updated).
mkt console login: root Password: Last login: Tue Mar 22 21:55:00 on console Sun Microsystems Inc. SunOS 5.10 Generic January 2005 # df -h Filesystem size used avail capacity Mounted on / 7.9G 4.6G 3.2G 60% / /dev 7.9G 4.6G 3.2G 60% /dev /lib 7.9G 4.6G 3.2G 60% /lib /platform 7.9G 4.6G 3.2G 60% /platform /sbin 7.9G 4.6G 3.2G 60% /sbin /u01 7.9G 8.0M 7.8G 1% /u01 /usr 7.9G 4.6G 3.2G 60% /usr proc 0K 0K 0K 0% /proc ctfs 0K 0K 0K 0% /system/contract swap 15G 272K 15G 1% /etc/svc/volatile mnttab 0K 0K 0K 0% /etc/mnttab fd 0K 0K 0K 0% /dev/fd swap 15G 0K 15G 0% /tmp swap 15G 24K 15G 1% /var/run |
The /lib, /platform, /sbin, and /usr file systems are read-only loopback mounts from the global zone. This reduces the required disk space for the zone considerably, and allows the sharing of text pages, leading to more efficient use of memory. These file systems appear in the zone because they are defined in the default template used to create this zone. All other file systems are private to the zone. The /u01 file system is mounted in the zone during zone boot by zoneadmd. It is not mounted by the zone itself. Also note that the zone is unaware that the file system is in fact residing on /dev/md/dsk/d200.
Installing Oracle
The group dba and the user oracle are required to run the Oracle software. Since the Oracle software uses shared memory, and the maximum amount of shared memory is now a project resource control, a project is needed in which to run Oracle. The project ora_mkt project is created in the zone and the project.max-shm-memory is set to the required value (in this case 2 GB). Since the System V IPC parameters are resource controls in Solaris 10 OS, there is no need to update the /etc/system file and reboot.
mkt # mkdir -p /export/home mkt # groupadd dba mkt # useradd -g dba -d /export/home/oracle -m -s /bin/bash oracle mkt # passwd oracle mkt # projadd -c “Oracle” user.oracle mkt # projadd -c "Oracle" -U oracle ora_mkt mkt # projmod -sK "project.max-shm-memory=(privileged,2G,deny)" ora_mkt mkt # cat /etc/project system:0:::: user.root:1:::: noproject:2:::: default:3:::: group.staff:10:::: ora_mkt:101:Oracle:oracle::project.max-shm-memory=(privileged,2147483648,deny) user.oracle:100:Oracle:::project.max-shm-memory=(privileged,2147483648,deny) |
Note that the zone has its own namespace and that the user, group and project just created are therefore only visible inside the mkt zone. The Oracle software and the database are installed in /u01. In this example, the Oracle software is installed in the zone itself to create an Oracle installation idependent from any other Oracle installations. The software could also be installed in the global zone and then loopback mounted in the local zones. This would allow sharing of the binaries by multiple zones, but also create a coupling between Oracle installations with regards to patch levels and more. This example shows how to use zones to consolidate Oracle instances with maximum isolation from each other, so in this case the software is not shared. The installation can now be performed as described on page 91. Since /usr is mounted read-only in the zone, the default location /usr/local/bin suggested by the Oracle Installer should be changed to a writable directory in the zone, such as /opt/local/bin. The marketing database can be created using the procedure on page 93.
Using the smf service for the marketing database from Chapter 4 (the Managing Workloads example) the database instance can be started by importing the manifest and enabling the mktdb service in the zone.
Creating the Second Zone
The first zone used a directory in /export/zones in the global zone. Since this does not limit the size of the root file system of the local zone it could fill up the file system in the global zone, where /export/zones is located. To prevent a local zone from creating this problem, the zone root file system is created on a separate file system. The second zone is for the sales database and requires the following resources:
l A 100 MB file system for the zone root file system mounted in the global zone on /export/zones/sales. This file system is created on a Solaris Volume Manager soft partition (/dev/md/dsk/d100). A normal slice could also be used but would be quite wasteful given the limited number of slices available on a disk.
l To show how devices can be used in a zone, the disk slice c1t1d0s3 is exported to the zone by the global zone administrator. A UFS file system is created on this slice inside the zone. This requires that both the block and character devices for the slice be exported to the zone. Note that this is for demonstration purposes only and is not the recommended way to use UFS file systems in a zone.
l A virtual network interface with IP address 192.168.1.15 on the hme0 interface of the global zone is also needed.
global # newfs /dev/md/rdsk/d100 global # mkdir -p /export/zones/sales global # mount /dev/md/dsk/d100 /export/zones/sales global # chmod 700 /export/zones/sales |
Configuring and Installing the Second Zone
The steps required to configure and install this zone are the same as for the first zone, with the exception that two devices are added to the zone configuration.
global # zonecfg -z sales sales: No such zone configured Use 'create' to begin configuring a new zone. zonecfg:sales> create zonecfg:sales> set zonepath=/export/zones/sales zonecfg:sales> set autoboot=true zonecfg:sales> add net zonecfg:sales:net> set physical=hme0 zonecfg:sales:net> set address=192.168.1.15/24 zonecfg:sales:net> end zonecfg:sales> add device zonecfg:sales:device> set match=/dev/rdsk/c1t1d0s3 zonecfg:sales:device> end zonecfg:sales> add device zonecfg:sales:device> set match=/dev/dsk/c1t1d0s3 zonecfg:sales:device> end zonecfg:sales> verify zonecfg:sales> commit zonecfg:sales> exit global # zoneadm -z sales install Preparing to install zone <sales>. Creating list of files to copy from the global zone. Copying <2584> files to the zone. Initializing zone product registry. Determining zone package initialization order. Preparing to initialize <916> packages on the zone. Initialized <916> packages on zone. Zone <sales> is initialized. The file </export/zones/sales/root/var/sadm/system/logs/install_log> contains a log of the zone installation. |
Booting the Zone
The first time a zone is booted after installation, the system identification process is performed. It is possible to skip the system identification questions during the first boot of the zone by creating a sysidcfg file in the zone prior to the first boot. The location of the sysidcfg file from the global zone is /export/zone/sales/root/etc/sysidcfg. A sample sysidcfg file is shown below, and can be customized to fit the situation.
global # cat /export/zone/sales/root/etc/sysidcfg system_locale=C timezone=US/Pacific network_interface=primary { hostname=hostname terminal=xterm security_policy=NONE name_service= NIS { domain_name=yourdomain.com } root_password=sS3G0h84sqwJA |
To suppress the question about the NFS version 4 domain, set the NFSMAPID_DOMAIN line in the /export/zones/sales/root/etc/nfs/default file to the appropriate value for your site and create the /export/zones/sales/root/etc/.NFS4inst_state.domain file. The /dev/dsk/c1t1d0s3 and /dev/rdsk/c1t1d0s3 devices are added to the zone configuration to show how devices can be imported into a zone. Note that the only devices present in the /dev/dsk and /dev/rdsk directories are the devices that were explicitly added to the zone configuration.
global # zoneadm -z sales boot global # zlogin sales sales # ls -l /dev/dsk total 0 brw-r----- 1 root sys 32, 3 Mar 24 11:44 c1t1d0s3 sales # ls -l /dev/rdsk total 0 crw-r----- 1 root sys 32, 3 Mar 24 11:44 c1t1d0s3 |
A new file system is created and added to the zone’s /etc/vfstab file.
sales # newfs /dev/rdsk/c1t1d0s3 sales # mkdir /u01 sales # mount /dev/dsk/c1t1d0s3 /u01 sales # cat /etc/vfstab #device device mount FS fsck mount mount #to mount to fsck point type pass at boot options # /proc - /proc proc - no - ctfs - /system/contract ctfs - no - objfs - /system/object objfs - no - fd - /dev/fd fd - no - swap - /tmp tmpfs - yes - /dev/dsk/c1t1d0s3 /dev/rdsk/c1t1d0s3 /u01 ufs 2 yes nologging sales # df -h Filesystem size used avail capacity Mounted on / 94M 70M 14M 83% / /dev 94M 70M 14M 83% /dev /lib 7.9G 4.6G 3.2G 60% /lib /platform 7.9G 4.6G 3.2G 60% /platform /sbin 7.9G 4.6G 3.2G 60% /sbin /usr 7.9G 4.6G 3.2G 60% /usr proc 0K 0K 0K 0% /proc ctfs 0K 0K 0K 0% /system/contract swap 15G 272K 15G 1% /etc/svc/volatile mnttab 0K 0K 0K 0% /etc/mnttab fd 0K 0K 0K 0% /dev/fd swap 15G 0K 15G 0% /tmp swap 15G 24K 15G 1% /var/run /dev/dsk/c1t1d0s3 4.9G 5.0M 4.9G 1% /u01 |
Notice the difference beteen the /u01 file system in this zone and the /u01 file system in the mkt zone. In this zone the physical device is visible while in the mkt zone it is not visible.
Installing Oracle
The installation of the Oracle software is the same as that for the mkt zone. Since the zones have completely separate namespaces, the user, group and project for Oracle must be created in this zone also. The project user.oracle should have the resource control project.maxshm-memory added to it to allow Oracle access to the required shared memory.
sales # mkdir -p /export/home/oracle sales # groupadd dba sales # useradd -g dba -m -d /export/home/oracle -s /bin/bash oracle sales # passwd oracle sales # projadd -c "Oracle" -U oracle ora_sales sales # projmod -sK "project.max-shm-memory=(privileged,2G,deny)" ora_sales sales # cat /etc/project system:0:::: user.root:1:::: noproject:2:::: default:3:::: group.staff:10:::: ora_sales:100:Oracle:oracle::project.max-shm-memory=(privileged,2147483648,deny) |
The Oracle installation can now be performed as described page 91. Since /usr is mounted read-only from the global zone, the default location /usr/local/bin suggested by the Oracle Installer should be changed to a writable directory such as /opt/local/bin. The sales database can be created using the procedure on page 93. Using the smf service for the sales database from Chapter 4 (the Managing Workloads example), the database instance can be started by importing the manifest and enabling the salesdb service in the zone.
Controlling CPU Consumption of Zones
The zone.cpu-shares resource control can be used to limit the CPU usage of zones with respect to other zones. This resource control is set through the zonecfg(1M) command. To give the sales zone twice the amount of CPU resources as the mkt zone, the number of zone.cpu-shares of the sales zone is set to twice the number of zone.cpu-shares of the mkt zone:
global # zonecfg -z sales zonecfg:sales> add rctl zonecfg:sales:rctl> set name=zone.cpu-shares zonecfg:sales:rctl> add value (priv=privileged,limit=20,action=none) zonecfg:sales:rctl> end zonecfg:sales> exit global # zonecfg -z mkt zonecfg:mkt> add rctl zonecfg:mkt:rctl> set name=zone.cpu-shares zonecfg:mkt:rctl> add value (priv=privileged,limit=10,action=none) zonecfg:mkt:rctl> end zonecfg:mkt> exit |
The resource control is made active at the next zone boot. To set the zone.cpu-shares resource control on a running zone the prctl(1) command can be used.
global # prctl -n zone.cpu-shares -r -v 20 -i zone sales global # prctl -n zone.cpu-shares -r -v 10 -i zone mkt |
To observe processes, the prstat(1M) command has been enhanced for zones with the –Z and -z options. The following prstat -Z output from the global zone shows processes running in the global and local zones. The bottom of the output shows a summary line for every running zone. Both zones are running eight instances of the nspin utility to show how CPU usage is controlled by the zone.cpu-shares resource control when contention arises for CPU resources. As can be seen from the output, the sales zone is given twice the amount of CPU resources, even while both zones are requesting the same amount of CPU resources from the system.
global # prstat -Z PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 28848 root 1144K 680K cpu10 12 0 0:00:34 8.2% nspin/1 28844 root 1144K 680K cpu2 13 0 0:00:33 8.0% nspin/1 28845 root 1144K 680K run 9 0 0:00:33 8.0% nspin/1 28846 root 1144K 680K cpu3 8 0 0:00:33 8.0% nspin/1 28843 root 1144K 816K run 11 0 0:00:33 7.8% nspin/1 28849 root 1144K 680K cpu0 13 0 0:00:32 7.7% nspin/1 28847 root 1144K 680K run 12 0 0:00:32 7.6% nspin/1 28850 root 1136K 672K cpu1 14 0 0:00:32 7.5% nspin/1 28772 root 1144K 680K run 8 0 0:00:18 4.1% nspin/1 28771 root 1144K 680K run 3 0 0:00:19 4.1% nspin/1 28775 root 1136K 672K run 10 0 0:00:19 4.1% nspin/1 28774 root 1144K 680K run 9 0 0:00:19 4.1% nspin/1 28769 root 1144K 680K run 1 0 0:00:19 4.0% nspin/1 28768 root 1144K 816K run 12 0 0:00:17 4.0% nspin/1 28770 root 1144K 680K run 13 0 0:00:17 3.9% nspin/1 ZONEID NPROC SIZE RSS MEMORY TIME CPU ZONE 9 17 43M 30M 0.4% 0:04:30 63% sales 10 35 105M 69M 0.8% 0:02:37 32% mkt 0 50 219M 127M 1.5% 0:01:24 0.1% global Total: 102 processes, 331 lwps, load averages: 10.89, 5.64, 3.09 |
To observe processes in one or more specific zones, the prstat command can be given a list of zones to observe with the -z option. The following output was taken while both zones were executing eight instances of the nspin command. Only eight of the sixteen nspin processes are shown here (those in the sales zone).
global # prstat -z sales -a PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 28845 root 1144K 680K run 7 0 0:01:39 8.5% nspin/1 28850 root 1136K 672K run 12 0 0:01:38 8.3% nspin/1 28846 root 1144K 680K run 7 0 0:01:38 8.3% nspin/1 28849 root 1144K 680K run 14 0 0:01:38 8.2% nspin/1 28844 root 1144K 680K cpu0 18 0 0:01:39 8.2% nspin/1 28843 root 1144K 816K run 11 0 0:01:38 8.1% nspin/1 28847 root 1144K 680K cpu3 18 0 0:01:37 8.0% nspin/1 28848 root 1144K 680K cpu10 23 0 0:01:39 7.8% nspin/1 28401 root 11M 8584K sleep 59 0 0:00:02 0.0% svc.startd/11 28399 root 2200K 1456K sleep 59 0 0:00:00 0.0% init/1 28496 root 1280K 1032K sleep 59 0 0:00:00 0.0% sh/1 28507 root 3544K 2608K sleep 59 0 0:00:00 0.0% nscd/23 28516 root 1248K 920K sleep 59 0 0:00:00 0.0% utmpd/1 28388 root 0K 0K sleep 60 - 0:00:00 0.0% zsched/1 28517 root 2072K 1344K sleep 59 0 0:00:00 0.0% ttymon/1 NPROC USERNAME SIZE RSS MEMORY TIME CPU 16 root 39M 29M 0.3% 0:13:14 65% 1 daemon 3528K 1312K 0.0% 0:00:00 0.0% Total: 17 processes, 60 lwps, load averages: 15.47, 9.33, 4.85 |
Controlling CPU Consumption Inside Zones
The zone.cpu-shares resource control determines the CPU consumption of the zone as a whole in relation to other active zones. CPU consumption inside a zone is controlled by the project.cpu-shares resource control. Since zones have their own project database, the CPU consumption inside the zone can be controlled by the local zone administrator. To demonstrate this capability, two projects are added to the project database in the sales zone.
The CPU shares of the projects are set to 40 and 10, giving the first project four times more CPU resources than the second project. Each project runs four instances of the nspin utility.
sales # projadd -K "project.cpu-shares=(privileged,40,none)" -U root abc sales # projadd -K "project.cpu-shares=(privileged,10,none)" -U root xyz sales # cat /etc/project system:0:::: user.root:1:::: noproject:2:::: default:3:::: group.staff:10:::: ora_sales:100:Oracle:oracle::project.max-shm-memory=(privileged,2147483648,deny) abc:101::root::project.cpu-shares=(privileged,40,none) xyz:102::root::project.cpu-shares=(privileged,10,none) sales # newtask -p abc sales # id -p uid=0(root) gid=1(other) projid=(abc) sales # nspin -n 4 & 29004 sales # newtask -p xyz sales # id -p uid = 0(root) gid=1(other) projid=(xyz) sales # nspin -n 4 & 29008 sales # prstat -J PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 29009 root 1144K 680K cpu11 17 0 0:02:19 13% nspin/1 29008 root 1144K 680K run 22 0 0:02:16 13% nspin/1 ... 28507 root 3680K 2888K sleep 59 0 0:00:00 0.0% nscd/24 28997 root 1280K 1032K sleep 59 0 0:00:00 0.0% sh/1 PROJID NPROC SIZE RSS MEMORY TIME CPU PROJECT 101 5 5808K 3832K 0.0% 0:09:09 52% abc 102 5 5808K 3832K 0.0% 0:02:40 14% xyz 1 5 13M 10M 0.1% 0:00:00 0.0% user.root 0 8 33M 24M 0.3% 0:00:08 0.0% system Total: 23 processes, 67 lwps, load averages: 15.89, 13.20, 11.70 |
global # prstat -Z PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 29009 root 1144K 680K cpu11 28 0 0:03:35 13% nspin/1 ... 29004 root 1144K 680K run 24 0 0:01:01 3.5% nspin/1 29006 root 1136K 672K run 27 0 0:01:01 3.4% nspin/1 ZONEID NPROC SIZE RSS MEMORY TIME CPU ZONE 9 21 49M 36M 0.4% 0:18:17 65% sales 10 35 105M 70M 0.8% 1:35:49 34% mkt 0 54 244M 138M 1.7% 0:01:25 0.0% global Total: 110 processes, 340 lwps, load averages: 15.98, 13.96, 12.13 |
In this case, with only the sales and the mkt zones active, the sales zone is entitled to the following percentage of available CPU resources, as calculated by: zone.cpu-sharessales/ (zone.cpu-sharessales + zone.cpu-sharesmkt)= 20 / (20 + 10) * 100 = 66% This 66% is then distributed among the projects in the zone. The project abc is entitled to the following percentage of available CPU resources: project.cpu-sharesabc / (project.cpu-sharesabc + project.cpu-sharesxyz) * 66% = 40 / (40 + 10) * 66% = 53% The xyz project is entitled to 13 percent of total CPU resources (as calculated by 10 / (40 + 10) * 66% = 13%). The output from the prstat -J command in the sales zone confirms that this is the case. Note that the global zone has been omitted from the calculations for simplicity. It does, however, use some CPU resources so the numbers calculated may differ slightly from observed behavior.