Solaris Xen Drop 66 – Xen System Administration

Overview

Introduction to the Hypervisor

The machine monitor within the Solaris Operating System can securely execute multiple virtual machines simultaneously, each running its own operating system, on a single physical system. Each virtual machine instance is called a domain. There are two kinds of domains. The control domain is called domain0, or dom0. A guest OS, or unprivileged domain, is called a domainU or domU. Unlike virtualization using zones, each domain runs a full instance of an operating system.

A hypervisor is also known as a Virtual Machine Monitor (VMM).

How Hypervisors Work

A hypervisor is a software system that partitions a single physical machine into multiple virtual machines, to provide server consolidation and utility computing. Existing applications and binaries run unmodified.

The hypervisor controls the MMU, CPU scheduling, and interrupt controller, presenting a virtual machine to guests.

The hypervisor separates the software from the hardware by forming a layer between the software running in the virtual machine and the hardware. This separation enables the hypervisor to control how guest operating systems running inside a virtual machine use hardware resources. A hypervisor provides a uniform view of underlying hardware. Machines from different vendors with different I/O subsystems appear the same, which means that virtual machines can run on any available computer. Thus, administrators can view hardware as a pool of resources that can run arbitrary services on demand. Because the hypervisor also encapsulates a virtual machine's software state, the hypervisor layer can map and remap virtual machines to available hardware resources at any time and also live migrate virtual machines across computers. These capabilities can also be used for load balancing among a collection of machines, dealing with hardware failures, and scaling systems. When a computer fails and must go offline or when a new machine comes online, the hypervisor layer can simply remap virtual machines accordingly. Virtual machines are also easy to replicate, which lets administrators bring new services online as needed.

Containment means that administrators can suspend virtual machines and resume them at any time, or checkpoint them and roll them back to a previous execution state. With this general-purpose undo capability, systems can more easily recover from crashes or configuration errors. Containment also supports a very general mobility model. Users can copy a suspended virtual machine over a network or store and transport it on removable media. The hypervisor can also provide total mediation of all interactions between the virtual machine and underlying hardware, thus allowing strong isolation between virtual machines and supporting the multiplexing of many virtual machines on a single hardware platform. The hypervisor can then consolidate a collection of virtual machines with low resources onto a single computer, thereby lowering hardware costs and space requirements. Strong isolation is also valuable for reliability and security. Applications that previously ran together on one machine can now be separated on different virtual machines. If one application experiences a fault, the other applications are isolated from this occurrence and will not be affected. Further, if a virtual machine is compromised, the incident is contained to only that compromised virtual machine.

Resource Virtualization

As a key component of virtual machines, the hypervisor provides a layer between software environments and physical hardware that is programmable and transparent to the software above it, while making efficient use of the hardware below it.

Virtualization provides a way to bypass interoperability constraints. Virtualizing a system or component such as a processor, memory, or an I/O device at a given abstraction level maps its interface and visible resources onto the interface and resources of an underlying, possibly different, real system. Consequently, the real system appears as a different virtual system or even as multiple virtual systems.

Virtualization Types

There are two basic types of virtualization, full virtualization and paravirtualization. The hypervisor supports both models.

In a full virtualization, the operating system is completely unaware that it is running in a virtualized environment. In the more lightweight paravirtualization, the operating system is both aware of the virtualization layer and modified to support it, which results in higher performance.

The paravirtualized domU operating system is ported to run on top of the hypervisor, and uses virtual network, disk, and console devices.

Since dom0 must work closely with the hypervisor layer, dom0 is always paravirtualized. DomUs can be either paravirtualized or fully virtualized, and a system can have both varieties running simultaneously.

A hardware virtual machine (HVM) domU runs an unmodified operating system. These hardware-assisted virtual machines take advantage of Intel-VT or AMD Secure Virtual Machine (SVM) processors.

About Domains

Dom0 and domU are separate entities. Other than by login, you cannot access a domU from dom0. A dom0 should be reserved for system management work associated with running a hypervisor. This means, for example, that users should not have logins on dom0. Dom 0 provides shared access to a physical network interface to the guest domains, which have no direct access to physical devices.

A Solaris domU works like a normal Solaris Operating System. All of the usual tools are available.

Domain States

A domain can be in one of six states. States are shown in virt-manager screens and in xm list displays:

Name            ID   Mem VCPUs      State   Time(s)
Domain-0 0 2049 2 r----- 4138.5
sxc18 3 511 1 -b---- 765.5

The states are:

r, running

The domain is currently running on a CPU.

b, blocked

The domain is blocked, and not running or able to be run. This can be caused because the domain is waiting on IO (a traditional wait state) or it has gone to sleep because there was nothing running in it.

p, paused

The domain has been paused, usually through the administrator running xm pause. When in a paused state, the domain will still consume allocated resources like memory, but will not be eligible for scheduling by the hypervisor. Run xm unpause to place the domain in the running state.

c, crashed

The domain has crashed. Usually this state can only occur if the domain has been configured not to restart on crash. See xmdomain.cfg(5) for more information.

s, shutdown

The domain is shut down.

d, dying

The domain is in the process of shutting down or crashing.

SMF Hypervisor Services

  • In Solaris, all of the properties from xend-config.sxp have been put into SMF xctl/xend (config/*).

  • To modify an existing property:

# svccfg -s xctl/xend listprop
# svccfg -s xctl/xend setprop config/dom0-cpus = 1
# svcadm refresh xctl/xend

  • To create a new property.

# svccfg -s xctl/xend setprop config/vncpasswd = astring: /"password/"
# svcadm refresh xctl/xend
# svcadm restart xend
# svcprop xctl/xend

Verify That the xctl Hypervisor Services Are Started

  1. Become superuser, or assume the Primary Administrator role..
  2. Verify that the xctl services are running.
    # svcs -a | grep xctl

    If the system displays the following, the services are not running:

    disabled       12:29:34 svc:/system/xctl/console:default
    disabled 12:29:34 svc:/system/xctl/xend:default
    disabled 12:29:34 svc:/system/xctl/store:default
  3. If the services are not running, verify that you booted an i86xpv kernel.
    # uname -i
    i86xpv

    Reboot if necessary.

  4. If the correct kernel is running, enable the services.
    # svcadm enable xctl/store
    # svcadm enable xctl/xend
    # svcadm enable xctl/console

    You are now ready to create guest domains (domUs).

How To Manage Guest (DomU) Domains

Example

  • Create a domU that uses the following .py file.

: p5b-vm[1]#; cat guest.py
name = "solaris"
vcpus = 2
memory = "512"

extra = "-k"

root = "/dev/dsk/c0d0s0"
disk = ['file:/tank/guests/solaris/disk.img,0,w']

vif = ['']

on_xend_start = "start"
on_xend_stop = "shutdown"

on_shutdown = "destroy"
on_reboot = "restart"
on_crash = "destroy"

  • Notice the on_xend_start and on_xend_stop entries. Either of the two entries can be defined. Both default to "invalid" if not defined.
    • on_xend_start = "start"
    • on_xend_stop = "shutdown"

  • Create the domain, but don't start it.
: p5b-vm[1]#; xm new -f <path to the py file>
: p5b-vm[1]#; xm list
Name ID Mem VCPUs State Time(s)
Domain-0 0 2254 2 r----- 113.3
solaris 512 1 0.0
: p5b-vm[1]#;

Now you can start, suspend, and resume the domain. If it is shut it down, it will still be in the list. And, it is set to auto boot in dom0 poweron.

: p5b-vm[1]#; xm start solaris
: p5b-vm[1]#; xm list
Name ID Mem VCPUs State Time(s)
Domain-0 0 2254 2 r----- 116.4
solaris 5 512 1 r----- 4.2
: p5b-vm[1]#; xm suspend solaris
: p5b-vm[1]#; xm list
Name ID Mem VCPUs State Time(s)
Domain-0 0 2254 2 r----- 129.4
solaris 1 1 31.2
: p5b-vm[1]#; xm resume solaris
: p5b-vm[1]#; xm list
Name ID Mem VCPUs State Time(s)
Domain-0 0 2254 2 r----- 132.6
solaris 6 511 2 -b---- 0.1
: p5b-vm[1]#;
: p5b-vm[1]#; xm shutdown solaris
: p5b-vm[1]#; xm list
Name ID Mem VCPUs State Time(s)
Domain-0 0 2254 2 r----- 134.2
solaris 511 2 0.5
: p5b-vm[1]#;

  • When you suspend a domain, the state is saved in /var/lib/xend/domains. This can fill up / quickly. There will be aa SMF property in xend to change the base dir where domains live. You might want to create a link for now. You can also still use save/restore to specify where to save and load the guest image to and from.

  • If you modify cpus/memory from xm or virsh, these changes will be saved in the configuration file and persist across reboots.

  • If you want to modify other parameters on a domain that is shutdown, you can add the domains uuid to the original .py file and re-run the xm new command.

: p5b-vm[1]#; echo 'uuid = "6dd59cf5-a17c-f7dc-255e-4efddfffb008"' >> <path to py file>
: p5b-vm[1]#; xm new -f <path to the py file>

Enable Live Migration

By default, xend listens only on the loopback address for requests from the localhost. If you want to allow other machines to live migrate to the machine, you must do the following:

  • Listen on all addresses (or you can specify a particular interface IP)
    # svccfg -s xend setprop config/xend-relocation-address = /"/"
  • Create list of hosts from which to accept migrations:
    # svccfg -s xend setprop config/xend-relocation-hosts-allow = /"^flax$ ^localhost$/"
  • Update the config:
    # svcadm refresh xend && svcadm restart xend

How to Debug On Xen

Debugging a Hung domU

  • First, connect to the domU console and verify the domain is not in kernel debugger (kmdb) or a similar state.

  • If a domU appears hung, always use xm dump-core' to take a dump file. Place this in /net/mdb.eng/cores/ and report it when filing a bug. You can look at this file with mdb.

  • If you can reproduce the hang, make the following changes in /etc/system of the domU:
    set cpr_debug=0x3
    set xen_suspend_debug=1
    set xdf:xdfdebug=0x40
    and reproduce. Some debugging output should go to the dom0 console. This is useful for hangs involving save, restor, migrate, shutdown, and reboot operations. It's a good idea to do all testing with these set.

  • Try sending the domU an interrupt to get it to drop into kmdb. Gentle method is xm sysrq mydomu b. Or, you can use 'q' on the Xen console as described below.

Xen Console

  • Currently, the Xen console should be set to a serial port for this to work. Type 3 consecutive ctrl-A's on the xen console. You should see the following output on the console.

(XEN) *** Serial input -> Xen (type 'CTRL-a' three times to switch input to 
dom0).

  • To exit the xen console (and get back to the solaris console), type 3 more ctrl-a's.

  • The following menu.lst example sets both Xen and dom0's console to serial port ttya.

title Solaris dom0
kernel /boot/$ISADIR/xen.gz com1=9600,8n1 console=com1
module /platform/i86xpv/kernel/$ISADIR/unix /platform/i86xpv/kernel/$ISADIR/unix -k -B console=ttya
module /platform/i86pc/$ISADIR/boot_archive

  • The following commands are supported in the Xen console. Commonly used keys are:
    • C - force a Solaris dom0 crash dump (/var/crash/...)
    • q - put solaris dom0 and all solaris domUs at the kmdb prompt (assuming you booted with -k)
    • R - force reboot dom0 (i.e. machine is hung)

(XEN) 'h' pressed -> showing installed handlers
(XEN) key '%' (ascii '25') => Trap to xendbg
(XEN) key 'C' (ascii '43') => trigger a crashdump
(XEN) key 'H' (ascii '48') => dump heap info
(XEN) key 'N' (ascii '4e') => NMI statistics
(XEN) key 'R' (ascii '52') => reboot machine
(XEN) key 'a' (ascii '61') => dump timer queues
(XEN) key 'd' (ascii '64') => dump registers
(XEN) key 'h' (ascii '68') => show this message
(XEN) key 'i' (ascii '69') => dump interrupt bindings
(XEN) key 'm' (ascii '6d') => memory info
(XEN) key 'n' (ascii '6e') => trigger an NMI
(XEN) key 'q' (ascii '71') => dump domain (and guest debug) info
(XEN) key 'r' (ascii '72') => dump run queues
(XEN) key 't' (ascii '74') => display multi-cpu clock info
(XEN) key 'u' (ascii '75') => dump numa info
(XEN) key 'v' (ascii '76') => dump Intel's VMCS
(XEN) key 'z' (ascii '7a') => print ioapic info

Event Channels

  • To dump out info on the event channels.
> ::evtchns
Type Evtchn IRQ IPL CPU Masked Pending ISR(s)
ipi 1 256 15 0 0 0 xc_serv
ipi 2 257 13 0 0 0 xc_serv
ipi 3 258 11 0 0 0 poke_cpu
virq:debug 4 259 15 0 0 0 xen_debug_handler
pirq 5 9 9 0 0 0 acpi_wrapper_isr
virq:timer 6 260 14 0 0 0 cbe_fire
ipi 7 261 14 0 0 0 cbe_fire
pirq 8 19 5 0 0 0 ata_intr
pirq 9 16 9 0 0 0 pepb_intx_intr
virq:console 10 262 9 0 0 0 xenconsintr_priv
pirq 11 18 1 0 0 0 uhci_intr
pirq 12 23 1 0 0 0 uhci_intr
pirq 13 17 6 0 0 0 rge_intr
ipi 14 258 11 1 0 0 poke_cpu
ipi 15 257 13 1 0 0 xc_serv
ipi 16 261 14 1 0 0 cbe_fire
ipi 17 256 15 1 0 0 xc_serv
virq:timer 18 260 14 1 0 0 cbe_fire
device 19 263 1 0 0 0 evtchn_device_upcall
evtchn 20 264 1 0 0 0 xenbus_intr
device 21 263 1 0 0 0 evtchn_device_upcall
device 22 263 1 0 0 0 evtchn_device_upcall
pirq 23 22 9 1 0 0 audiohd_intr
device 24 263 1 0 0 0 evtchn_device_upcall
evtchn 25 265 6 0 0 0 intr
evtchn 26 266 5 1 0 0 xdb_intr
evtchn 27 267 5 0 0 0 xdb_intr
>
  • Do get more information for Type=device: Pass in the event channel number for the array index. For this example, I'm looking at the following:
Type          Evtchn IRQ IPL CPU Masked Pending ISR(s) 
device 19 263 1 0 0 0 evtchn_device_upcall

  • Using event channel 19 (0t19), dump evtsoftdata
> *(port_user+(0x8*(0t19)))::print struct evtsoftdata
{
dip = 0xfffffffec08afd68
ring = 0xfffffffec5a10000
ring_cons = 0x185
ring_prod = 0x185
ring_overflow = 0
evtchn_wait = {
_opaque = 0
}
evtchn_lock = {
_opaque = [ 0 ]
}
evtchn_pollhead = {
bsys_version = 0xc757f840
boot_mem = 0
bsys_alloc = 0
bsys_free = 0x1ec
bsys_getproplen = 0xfffffffec757f608
bsys_getprop = 0
bsys_nextprop = 0xfffffffec08afd68
bsys_printf = 0
bsys_doint = 0xfffffffec73b4dc8
bsys_ealloc = 0xde00000000
}
pid = 0x1ec
}
>

  • Also determine which user process is using this event channel.
> *(port_user+(0x8*(0t19)))::print struct evtsoftdata pid | ::pid2proc | ::print proc_t p_user.u_psargs
p_user.u_psargs = [ "/usr/lib/xenstored --pid-file=/var/run/xenstore.pid" ]
>

Current Issues and Potential Solutions

  • xend fails to start:

[2007-05-04 14:46:08 100668] ERROR (SrvDaemon:353) Exception starting xend (not w
ell-formed (invalid token): line 19, column 0)
Traceback (most recent call last):
File "/usr/lib/python2.4/site-packages/xen/xend/server/SrvDaemon.py", line 345,
in run
File "/usr/lib/python2.4/site-packages/xen/xend/server/SrvServer.py", line 254,
in create
File "/usr/lib/python2.4/site-packages/xen/xend/server/SrvRoot.py", line 40, in
__init__
File "/usr/lib/python2.4/site-packages/xen/web/SrvDir.py", line 82, in get
File "/usr/lib/python2.4/site-packages/xen/web/SrvDir.py", line 52, in getobj
File "/usr/lib/python2.4/site-packages/xen/xend/server/SrvNode.py", line 30, in
__init__
File "/usr/lib/python2.4/site-packages/xen/xend/XendNode.py", line 658, in inst
ance
File "/usr/lib/python2.4/site-packages/xen/xend/XendNode.py", line 87, in __ini
t__
File "/usr/lib/python2.4/site-packages/xen/xend/XendStateStore.py", line 104, i
n load_state
File "/var/tmp/pkgbuild-gbuild/SUNWPython-extra-2.4.2-build/usr/lib/python2.4/s
ite-packages/_xmlplus/dom/minidom.py", line 1915, in parse
File "/var/tmp/pkgbuild-gbuild/SUNWPython-extra-2.4.2-build/usr/lib/python2.4/s
ite-packages/_xmlplus/dom/expatbuilder.py", line 926, in parse
File "/var/tmp/pkgbuild-gbuild/SUNWPython-extra-2.4.2-build/usr/lib/python2.4/s
ite-packages/_xmlplus/dom/expatbuilder.py", line 207, in parseFile
ExpatError: not well-formed (invalid token): line 19, column 0
[2007-05-04 14:46:09 100676] INFO (SrvDaemon:331) Xend Daemon started
[2007-05-04 14:46:09 100676] INFO (SrvDaemon:335) Xend changeset: Tue May 01 17:1
2:19 2007 -0700 15014:66538ef9ecc5.
[2007-05-04 14:46:09 100676] INFO (SrvDaemon:342) Xend version: Unknown.

The failure to start is due to xend>/tt>'s state becoming corrupted. The solution is to do the following:

% rm -rf /var/lib/xend/state
% svcadm clear xend

Debugging a Lost Disk Interrupt

  • In this case, a Linux guest is running. The guest hangs trying to read/write the disk. Nothing looks wrong in ::evtchns, so look at the disk backend driver. You can see below that the frontend's (xdf) producer index is req_prod = 0xb083, and that the backend's (xdb) consumer index isxr_sring.br.req_cons = 0xb063. So, there is work to do but the backend driver doesn't know about it. Dropping down to kmdb and forcing the backend's interrupt routine to run gets the domU going again.
        [0]> xdb_intr::call 0xfffffffed1e52000

# mdb -k
Loading modules: [ unix genunix specfs dtrace xpv_psm scsi_vhci ufs ip hook neti sctp arp usba fctl nca lofs zfs random emlxs md crypto fcp ptm sppp ipc ]
> ::evtchns
Type Evtchn IRQ IPL CPU Masked Pending ISR(s)
ipi 1 256 15 0 0 0 xc_serv
ipi 2 257 13 0 0 0 xc_serv
ipi 3 258 11 0 0 0 poke_cpu
virq:debug 4 259 15 0 0 0 xen_debug_handler
pirq 5 9 9 0 0 0 acpi_wrapper_isr
virq:timer 6 260 14 0 0 0 cbe_fire
ipi 7 261 14 0 0 0 cbe_fire
pirq 8 16 5 0 0 0 mpt_intr
virq:console 9 262 9 0 0 0 xenconsintr_priv
pirq 10 20 1 0 0 0 ehci_intr
pirq 11 21 1 0 0 0 ohci_intr
ipi 12 258 11 1 0 0 poke_cpu
ipi 13 257 13 1 0 0 xc_serv
ipi 14 261 14 1 0 0 cbe_fire
ipi 15 256 15 1 0 0 xc_serv
virq:timer 16 260 14 1 0 0 cbe_fire
ipi 17 258 11 2 0 0 poke_cpu
ipi 18 257 13 2 0 0 xc_serv
ipi 19 261 14 2 0 0 cbe_fire
ipi 20 256 15 2 0 0 xc_serv
virq:timer 21 260 14 2 0 0 cbe_fire
ipi 22 258 11 3 0 0 poke_cpu
ipi 23 257 13 3 0 0 xc_serv
ipi 24 261 14 3 0 0 cbe_fire
ipi 25 256 15 3 0 0 xc_serv
virq:timer 26 260 14 3 0 0 cbe_fire
ipi 27 258 11 4 0 0 poke_cpu
ipi 28 257 13 4 0 0 xc_serv
ipi 29 261 14 4 0 0 cbe_fire
ipi 30 256 15 4 0 0 xc_serv
virq:timer 31 260 14 4 0 0 cbe_fire
ipi 32 258 11 5 0 0 poke_cpu
ipi 33 257 13 5 0 0 xc_serv
ipi 34 261 14 5 0 0 cbe_fire
ipi 35 256 15 5 0 0 xc_serv
virq:timer 36 260 14 5 0 0 cbe_fire
ipi 37 258 11 6 0 0 poke_cpu
ipi 38 257 13 6 0 0 xc_serv
ipi 39 261 14 6 0 0 cbe_fire
ipi 40 256 15 6 0 0 xc_serv
virq:timer 41 260 14 6 0 0 cbe_fire
ipi 42 258 11 7 0 0 poke_cpu
ipi 43 257 13 7 0 0 xc_serv
ipi 44 261 14 7 0 0 cbe_fire
ipi 45 256 15 7 0 0 xc_serv
virq:timer 46 260 14 7 0 0 cbe_fire
pirq 47 17 6 0 0 0 e1000g_intr_pciexpress
pirq 48 18 6 1 0 0 e1000g_intr_pciexpress
evtchn 49 264 1 3 0 0 xenbus_intr
device 50 263 1 0 0 0 evtchn_device_upcall
device 51 263 1 0 0 0 evtchn_device_upcall
pirq 52 40 1 4 0 0 emlxs_msi_intr
pirq 53 41 1 5 0 0 emlxs_msi_intr
device 54 263 1 0 0 0 evtchn_device_upcall
device 55 263 1 0 0 0 evtchn_device_upcall
evtchn 56 265 5 7 0 0 xdb_intr
evtchn 57 266 6 0 0 0 xnb_intr
> ::prtconf ! grep xdb
fffffffec3955008 xdb, instance #0 (driver name: xdb)
> fffffffec3955008::print struct dev_info devi_driver_data
devi_driver_data = 0xfffffffed1e52000
> 0xfffffffed1e52000::print xdb_t xs_ring | ::print xendev_ring_t xr_sring.br
{
xr_sring.br.rsp_prod_pvt = 0xb063
xr_sring.br.req_cons = 0xb063
xr_sring.br.nr_ents = 0x20
xr_sring.br.sring = 0xfffffffed0802000
}
> 0xfffffffed1e52000::print xdb_t xs_ring | ::print xendev_ring_t xr_sring.br.sring | ::print comif_sring_t
{
req_prod = 0xb083
req_event = 0xb064
rsp_prod = 0xb063
rsp_event = 0xb064
pad = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, ... ]
ring = [ '/001' ]
}
>

 
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值