来源于:
https://access.redhat.com/solutions/3717341
Unable to boot RHEL 7.6 on baremetal nodes
SOLUTION 已验证 - 已更新 2018年十二月6日09:21 -
环境
Red Hat OpenStack Platform 13
Red Hat Enterprise Linux 7.6
问题
When trying to deploy an image built from the RHEL7.6 QCOW2, the image fails to boot.
The issue is that the double-quotes are in the wrong spot in the GRUB_CMDLINE_LINUX
as such, everything after crashkernel=auto"
is taken as a command instead of a CMDLINE argument.
决议
The image can be edited using guestfish
to modify /etc/default/grub
like so:
sudo guestfish -a rhel-7.6.qcow2
> run
> mount /dev/sda /
> edit /etc/default/grub
Move the quotes from the end of crashkernel=auto" to the end of the whole line. So the change should be:
Change From:
GRUB_CMDLINE_LINUX="console=tty0 crashkernel=auto" console=ttyS0,115200n8 no_timer_check net.ifnames=0
Change To:
GRUB_CMDLINE_LINUX="console=tty0 crashkernel=auto console=ttyS0,115200n8 no_timer_check net.ifnames=0"
> exit
根源
This issue is caused by the double-quotes (") being in the wrong place in the GRUB_CMDLINE_LINUX
line of /etc/default/grub
By default the GRUB_CMDLINE_LINUX
looks like this:
GRUB_CMDLINE_LINUX="console=tty0 crashkernel=auto" console=ttyS0,115200n8 no_timer_check net.ifnames=0
As can be seen, the double-quotes end after crashkernel=auto. This results in everything after that being parsed as a command rather than a command-line option. It should be:
GRUB_CMDLINE_LINUX="console=tty0 crashkernel=auto console=ttyS0,115200n8 no_timer_check net.ifnames=0"
诊断步骤
Working backwards from the Nova logs, we can see the error you're referring to with the timeout in nova-conductor.log:
req-6f866160-cdf6-4a2a-a6ad-60cf1e541b21 6cc770ec85812266b3f 063b25ed7c094053be7a64c4f3caace0 - default default]
Failed to compute_task_build_instances: Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance 74627e47-4c0f-442e-a7e0-76595ef1eb7ee.:
MaxRetriesExceeded: Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance 74628f44-4c0f-442e-a7e0-76595ef1fd3e.
Checking for the request ID in nova-scheduler.log, we can see:
nova-scheduler successfully identifies a node:
2018-11-26 02:06:58.244 1 DEBUG nova.scheduler.utils [req-6f866160-cdf6-4a2a-a6ad-60cf1e541b21 6cc770e121914a658ab3c85812266b3f 063b25ed7c094053be7a64c4f3caace0 - default default]
Attempting to claim resources in the placement API for instance 74628f44-4c0f-442e-a7e0-76595ef1fd3e claim_resources /usr/lib/python2.7/site-packages/nova/scheduler/utils.py:786
2018-11-26 02:06:58.809 1 DEBUG nova.scheduler.filter_scheduler [req-6f866160-cdf6-4a2a-a6ad-60cf1e541b21 6cc770e121914a658ab3c85812266b3f 063b25ed7c094053be7a64c4f3caace0 - default
default] Selected host: (overcloud-controller-0.example.com, b94daa75-82f9-4381-9aa4-52ed77de3431) ram: 768000MB disk: 184320MB io_ops: 0 instances: 0 _consume_selected_host
/usr/lib/python2.7/site-packages/nova/scheduler/filter_scheduler.py:325
The selected Node ID is: b94daa75-82f9-4381-9aa4-52ed77de3431
Checking Ironic for the node ID:
2018-11-26 02:25:26.828 1 ERROR ironic.drivers.modules.agent_base_vendor [req-98b72aaf-5185-4ab0-8df2-5c1618629210 - - - - -] Asynchronous exception: Node failed to deploy. Exception: Failed to install a bootloader when deploying node b94daa75-82f9-4381-9aa4-52ed77de3431. Error: {u'message': u'Command execution failed: Installing GRUB2 boot loader to device /dev/sda failed with Unexpected error while running command.\nCommand: chroot /tmp/tmpoRdOMm /bin/sh -c "grub2-mkconfig -o /boot/grub2/grub.cfg"\nExit code: 127\nStdout: u\'\'\nStderr: u\'/etc/default/grub: line 6: no_timer_check: command not found\\n\'.', u'code': 500, u'type': u'CommandExecutionError', u'details': u'Installing GRUB2 boot loader to device /dev/sda failed with Unexpected error while running command.\nCommand: chroot /tmp/tmpoRdOMm /bin/sh -c "grub2-mkconfig -o /boot/grub2/grub.cfg"\nExit code: 127\nStdout: u\'\'\nStderr: u\'/etc/default/grub: line 6: no_timer_check: command not found\\n\'.'} for node b94daa75-82f9-4381-9aa4-52ed77de3431: InstanceDeployFailure: Failed to install a bootloader when deploying node b94daa75-82f9-4381-9aa4-52ed77de3431. Error: {u'message': u'Command execution failed: Installing GRUB2 boot loader to device /dev/sda failed with Unexpected error while running command.\nCommand: chroot /tmp/tmpoRdOMm /bin/sh -c "grub2-mkconfig -o /boot/grub2/grub.cfg"\nExit code: 127\nStdout: u\'\'\nStderr: u\'/etc/default/grub: line 6: no_timer_check: command not found\\n\'.', u'code': 500, u'type': u'CommandExecutionError', u'details': u'Installing GRUB2 boot loader to device /dev/sda failed with Unexpected error while running command.\nCommand: chroot /tmp/tmpoRdOMm /bin/sh -c "grub2-mkconfig -o /boot/grub2/grub.cfg"\nExit code: 127\nStdout: u\'\'\nStderr: u\'/etc/default/grub: line 6: no_timer_check: command not found\\n\'.'}
The exact problem is that it is unable to find the command no_timer_check
:
Installing GRUB2 boot loader to device /dev/sda failed with Unexpected error while running command.\nCommand: chroot /tmp/tmpoRdOMm /bin/sh -c "grub2-mkconfig -o /boot/grub2/grub.cfg"\nExit code: 127\nStdout: u\'\'\nStderr: u\'/etc/default/grub: line 6: no_timer_check: command not found\\n\'.'}
Looking at the /etc/default/grub
file within the QCOW2 image, we can see where this is coming from:
GRUB_CMDLINE_LINUX="console=tty0 crashkernel=auto" console=ttyS0,115200n8 no_timer_check net.ifnames=0