Gracefully turn off an ESXi host and VMs

It’s been a long time… well, here we go.

We have a virtualized server on the office, using a free VMware ESXi 4.1 solution. We also have the machine connected to a UPS.

Unfortunately, the power lines on the office already have failed us, so we wanted to configure the server to shutdown gracefully all the VMs and then shutdown the host itself.

These were the steps we took to take care of this:

Install vmware tools on all the VMs. Using the vsphere console (I think that’s what its called…) is easy enough:

  • right mouse button on top of the VM name > Guest > install VMware tools;
  • on windows guests, just follow the wizard
  • On linux guests, mount /dev/cdrom, copy the tar.gz to some temporary location, extract it, use the *_install.pl script to install (accept all the defaults, ensure you have previously installed gccs, linux-headers, and such (on ubuntu, build-essential and linux-headers should be enough)

Create a new VM (once again, using the vsphere) with minimal disk space and RAM, to just install the UPS drivers. This VM job is just take care of listen to the UPS connected to the host. On the properties of the VM, make sure you add a serial port, mapping directly to the physical serial port of the host

  • On my case, I installed ubuntu server; The UPS we have on the office is a Riello NETDialog, so we downloaded the drivers from the manufacturer (NUT did not have drivers for it) and installed them. Just had to configure the UPS connection to use the serial on /dev/ttyS0

Now comes the nice part: the idea is to make this VM to connect to the host OS and run the appropriate commands to begin the shutdown process. SSH to the rescue… on that VM, just create a pair of SSH keys (ssh-keygen -t RSA, for example) WITHOUT PASSWORD, and push the public one to the host.

  • Of course, you have first to enable SSH on the ESXi host first. For that, we just connected a monitor and keyboard to the host temporarily, logged in using root, and enable the something troubleshooting option, which is a fancy name for turning on the SSH server.
  • with the ssh public key on the ESXi host, you just have to put it on the well known authorized_keys file
  • mkdir .ssh
  • cat id_rsa.pub > .ssh/authorized_keys

Done, you should  now have access to the ESXi host without having to enter a key. Which means, you can automate the process of shuting it down when the UPS starts running on battery, by running an arbitrary command through SSH. On this matter, here goes the script we used on the ESXi host to shutdown everything (this came from a blog whose link I can’t remember. I’m trully sorry. The credits for this should not go to me!)

 

## get all the VMs identifiers

VMID=$(/usr/bin/vim-cmd vmsvc/getallvms | grep -v Vmid | awk '{print $1}')

 

## loop through all the VMs

for i in $VMID

do

## get their state (turned on, off, whatever)

STATE=$(/usr/bin/vim-cmd vmsvc/power.getstate $i | tail -1 | awk '{print $2}')

## if they are running, turn them off (only works correctly if

## vmware tools are installed on the VMs)

if [ $STATE == on ]

then

/usr/bin/vim-cmd vmsvc/power.shutdown $i

fi

done

 

## shutdown the host itself

sleep 30

/sbin/shutdown.sh

/sbin/poweroff

  • we saved the script above inside /home/shutdownVMs.sh (we had to create the directory first) on the ESXi host

There’s a small problem here: I’m quite new to this VMware things, but from my understanding, the host completely wipes out all the files you’ve just created on reboot. Everything you see from / on the filesystem gets wiped when the machine reboots. So we have to take a few steps to get the changes you’ve made on authorized_keys and the shutdown script inside /home  permanent. These steps include packing up the directories and files you’ve created and editing a file (I’ve seen this on a blog I can’t remenber… sorry):

 











## pack your files

tar -C / -czf "/bootbank/home.tgz" /.ssh /home

 

## edit the file /bootbank/boot.cfg and add the new compressed file. In our case,

## the file reads the following:

kernel=b.z

kernelopt=

modules=k.z --- s.z --- c.z --- oem.tgz --- license.tgz --- m.z --- state.tgz --- home.tgz

build=4.1.0-348481

updated=1

bootstate=0

  • We’re done on ESXi; now is just configure the UPS monitoring VM, to run a specific command upon AC fail; We’ve done the following:

 

 

ssh -i *my_private_key* root@*esxi_host_ip 'sh /home/shutdownVMs.sh'

Conclusion

So, with this, you get a VM listening to your UPS state, and connecting to the virtualization host when the AC fails. The graceful shutdown of all the VMs is only possible thanks to the vmware tools installed on each one of the VMs.