Play with MAAS (by quqi99)

juju/maas 专栏收录该内容
1 篇文章 0 订阅

问题
四年前写了一篇关于MAAS的博客 [1],但好几年没用它了,今天发现GUI的操作流程变化还有点,记录一下。
Host机
配置了下面的网络:

auto eth0
iface eth0 inet manual
auto br-eth0
iface br-eth0 inet static
    address 192.168.99.124/24
    gateway 192.168.99.1
    bridge_ports eth0
    dns-nameservers 192.168.99.1

然后在virt-manger里创建了一个名为cloud的虚拟网络(192.168.100.0/24),未使用DHCP。
最后,使用virt-manager创建了一个名为maas的虚机,给它分配了两个网卡(ens3=192.168.100.3用作MAAS的IP,另一个网卡ens8=192.168.99.1纯粹为了方便管理之用)
maas虚机
网络配置如下:

$ cat /etc/resolvconf/resolv.conf.d/base 
search maas
nameserver 192.168.100.3

or modify /etc/systemd/resolved.conf to set DNS=192.168.100.3, then run:
sudo systemctl restart systemd-resolved.service
# ifupdown has been replaced by netplan(5) on this system.  See
# /etc/netplan for current configuration.
# To re-enable ifupdown on this system, you can run:
sudo apt install ifupdown
cat /etc/network/interfaces
auto ens3
iface ens3 inet static
address 192.168.100.3
netmask 255.255.255.0
gateway 192.168.100.1

auto ens8
iface ens8 inet static
address 192.168.99.3
netmask 255.255.255.0
gateway 192.168.99.1

安装maas:
NOTE: 千万要注意,最好使用debian包安装,使用snap包安装里会造成snap container里没有设置REQUESTS_CA_BUNDLE环境变量由于python-requests的一个bug导致maas无法使用https服务,例如运行"maas configauth --rbac-url https://node1.lan:5000/ --rbac-service-name maastest"就会抛错CERTIFICATE_VERIFY_FAILED, 具体见: https://zhhuabj.blog.csdn.net/article/details/107182847

#sudo snap install maas --channel=2.8
#sudo snap install maas-test-db                              #enter shell - maas-test-db.psql
sudo add-apt-repository ppa:maas/2.8   #https://launchpad.net/~maas
sudo apt install maas

配置maas虚机可以无密码访问物理机上的libvirt:

sudo chsh maas -s /bin/bash
sudo su - maas
ssh-keygen
ssh-copy-id -i ~/.ssh/id_rsa hua@192.168.100.1
sudo -u maas virsh -c qemu+ssh://hua@192.168.100.1/system list --all

创建maas管理员用户之后就可以通过http://192.168.99.3:5240/MAAS登录GUI管理界面了.

sudo maas createadmin
apikey=$(sudo maas apikey --username admin)
maas login admin http://192.168.100.3:5240/MAAS $apikey
maas login admin http://127.0.0.1:5240/MAAS $apikey
maas admin boot-resources import

enable PXE

在’Subnets’点击subnet 192.168.100.0/24对应的vlan(untagged)后’enable dhcp’才能enable PXE.
在这里插入图片描述

注册maas节点
创建一块空磁盘(truncate --size 10G /images/kvm/controller.raw),然后在virt-manager里创建一个名为controller的新虚机(设置从PXE启动、使用truncate定义的空磁盘,同时从cloud network中定义四块网卡), 我们能从virt-manager中看到pxe的启动过程,但最后却一闪而过了那是因为需要从maas里来启动它。好,从virt-manager中关闭该虚机,同时记得再将启动模式改回PXE
NOTE: 如果此时还不行, 多半是上面一步没有enable pxe吧. 另外, 在commission时无法选择image, 只能使用默认的ubuntu image, 在deploy时可以设置使用其他如centos镜像.
另外,如果先不注册virsh/ipmi信息,先将这台PXE启动的话,这时maas会将pxe信息中的mac信息(并随机生成一个name)生成一个New状态的机器,然后再去编辑这台机器来添加virsh/ipmi信息即可。
这里写图片描述

然后访问GUI的"Nodes -> Add hardware -> Machine"定义节点(注意:四块网卡的MAC地址一次性定义,如果后续此处添加网卡的话还得再执行commission操作):
这里写图片描述

commission过程可从下列几个方式调试:

  • virt-manager vnc可以看到
  • maas节点上的/var/log/maas/maas.log
  • controller虚机(IP可从GUI里找到)里的cloud-init日志

然后根据在virt-manager中定义的四块网卡的MAC地址在maas中也如下图定义一下:
这里写图片描述

四块网卡定义好了之后得重新点击一下"Take action -> Commision",然后将看到如下界面:
这里写图片描述

然后我们将看到"Add Interface"、“Create bond”、"Create Bridge"等按钮,先将ens8和ens9做成bond0,再在bond0上创建一些vlan(先建vlan再建bridge顺序不能变),最后将bond0做成br-bond0,最后如下图:
这里写图片描述

接着还要再点击"Take action -> Deploy" (注意: 此处不是点commissioning) 重新部署,最后就可以从GUI界面中找到IP后通过ssh ubuntu@IP访问了
注:

  • vlan号是在"Subnet -> fabric0" TAB中配置的,不是在"Machine"页配置, 在"Machine"页在bond0和一个vlan关联如bond0.59
  • 上面bond0最好指定subnet
  • juju使用lxd时会自动创建br-bond0,并将bond0上的IP挪到br-bond0上

附录 - CLI

在这里插入图片描述

https://docs.maas.io/2.5/en/manage-cli-advanced
https://maas.io/docs/concepts-and-terms
region相当于一个datacenter或者一个single region, fabrics再去划分region, 每个rack controller都被attached到每一个fabric.
rack下关联node, interfaces, fabric; interface关联subnets, subnets有vlan
fabric下再关联vlan; vlan关联space
interfaces下link到subnet, subnet有vlan

maas login admin http://localhost/MAAS/api/2.0 $(sudo maas-region apikey --username admin)

#machine_id=$(maas admin machines read | jq -r '.[] | select(.hostname=="controller").system_id')
#maas admin interfaces read $machine_id > interface.txt

maas admin rack-controllers read
system_id=$(maas admin rack-controllers read | jq -r .[].system_id)

maas admin machines read
machine_id=$(maas admin machines read | jq -r '.[] | select(.hostname=="controller").system_id')
maas admin machines read |jq ".[] | {hostname:.hostname, system_id: .system_id, status:.status}" --compact-output

maas admin interfaces read "$system_id"
maas admin interfaces read $system_id | jq ".[] |{id:.id, name:.name, mac:.mac_address, vid:.vlan.vid, fabric:.vlan.fabric}" --compact-output
maas admin interface update "$system_id" $interface vlan=5001
#maas admin interface unlink-subnet "$system_id" "$iface_id" id="$link_id"
#maas admin interface delete "$system_id" "$vlan_iface_id"
#maas admin interface link-subnet "$system_id" "$iface_id" mode=AUTO subnet="$iface_subnet_cidr"
interfaces=$(maas admin interfaces read "$system_id")
iface_name="ens3"
iface_id=$(echo "$interfaces" | jq -r ".[] | select(.name==\"$iface_name\") | .id")
link_ids=$(maas admin interface read "$system_id" "$iface_id" | jq -r '.links | .[].id')
for link_id in $link_ids; do
   maas admin interface unlink-subnet "$system_id" "$iface_id" id="$link_id"
done
vlan_iface_ids=$(maas admin interfaces read "$system_id" | jq -r '.[] | select(.type=="vlan").id')
for vlan_iface_id in $vlan_iface_ids; do
   maas admin interface delete "$system_id" "$vlan_iface_id"
done
iface_subnet_cidr=$(maas admin subnets read | jq -r ".[] | select(.name==\"$iface_subnet_name\").cidr")
maas admin interface link-subnet "$system_id" "$iface_id" mode=AUTO subnet="$iface_subnet_cidr"

maas admin fabrics read
maas admin fabrics read | jq ".[] |{name:.name, vlans:.vlans[] | {id:.id, vid:.vid}}" --compact-output
maas admin fabric update 0 name=maas-management
fabric_name="maas-management"
mag_fabric_id=$(maas admin fabrics read| jq -r ".[] | select(.name==\"$fabric_name\").id")

vid=$(maas admin subnets read | jq -M '.[] | select(.cidr=="10.12.1.0/24").vlan.vid')
fabric=$(maas admin subnets read| jq -r '.[] | select(.cidr=="10.12.1.0/24").vlan.fabric')
fabric_id=$(maas admin fabrics read | jq -M '.[] | select(.name=="`echo $fabric`").id')  #$fabric=fabric-1
maas admin vlan read $fabric_id $vid

maas admin spaces create name=os-floating
os_floating_spaceid=$(maas admin spaces read | jq '.[] | select(.name == "'os-floating'")'.id)
maas admin vlan update $mag_fabric_id 5 space=$os_floating_spaceid mtu=1500

maas admin subnets read
maas admin subnet update $(maas admin subnets read| jq -r ".[] | select(.cidr==\"10.231.16.0/24\").id") cidr=10.231.16.0/21
pxe_subnet_id=$(maas admin subnets read| jq -r ".[] | select(.cidr==\"10.231.16.0/21\").id")
maas admin subnet update $pxe_subnet_id name=maas-management

附录 - MAAS Region HA

# https://sites.google.com/site/openstackinthebasement3/maasha
# https://cloud.google.com/community/tutorials/setting-up-postgres-hot-standby
sudo apt remove --purge postgresql maas*
sudo apt autoremove
sudo apt install maas-region-controller
#sudo dpkg-reconfigure maas-region-controller
#sudo dpkg-reconfigure maas-rack-controller

# on maas to create user
sudo maas createadmin --username admin --password ubuntu --email root@example.com

# on maas and maas2
sudo bash -c 'cat >> /etc/postgresql/9.5/main/pg_hba.conf' << EOF
host     replication     repuser         192.168.100.3/32        md5
host     replication     repuser         192.168.100.4/32        md5
host     all             all             192.168.100.3/32        md5
host     all             all             192.168.100.4/32        md5
host     replication     repuser         192.168.99.3/32        md5
host     replication     repuser         192.168.99.4/32        md5
host     all             all             192.168.99.3/32        md5
host     all             all             192.168.99.4/32        md5
host     all             all             192.168.0.0/16        md5
EOF


# on both maas and maas2
sudo -u postgres createuser -U postgres repuser -P -c 5 --replication
#sudo mkdir -p /var/lib/postgresql/9.5/main/mnt/server/archivedir
#sudo chown postgres:postgres /var/lib/postgresql/9.5/main/mnt/server/archivedir

sudo bash -c 'cat >> /etc/postgresql/9.5/main/postgresql.conf' << EOF
listen_addresses = '*'
max_connections = 300
wal_level = hot_standby
synchronous_commit = on
archive_mode = off
#archive_mode = on
#archive_command = 'test ! -f /var/lib/postgresql/9.5/main/mnt/server/archivedir/%f && cp %p /var/lib/postgresql/9.5/main/mnt/server/archivedir/%f'
max_wal_senders = 10
wal_keep_segments = 256
hot_standby = on
restart_after_crash = off
hot_standby_feedback = on
EOF

# on maas, restart postgresql
sudo systemctl restart postgresql.service

# on maas2 run the db backup
sudo systemctl stop postgresql
sudo mv /var/lib/postgresql/9.5/main /var/lib/postgresql/9.5/main.old
sudo -u postgres pg_basebackup -h 192.168.100.3 -D /var/lib/postgresql/9.5/main -U repuser -v -P --xlog-method=stream

sudo cp /usr/share/postgresql/9.5/recovery.conf.sample /var/lib/postgresql/9.5/main/recovery.conf
sudo bash -c 'cat >> /var/lib/postgresql/9.5/main/recovery.conf' << EOF
standby_mode = on
primary_conninfo = 'host=192.168.100.3 port=5432 user=repuser password=password'
EOF

# on maas2 configure the region to point at the main database
sudo systemctl stop maas-regiond
sudo rm /var/lib/maas/{maas_id,secret}
sudo bash -c 'cat > /etc/maas/regiond.conf' << EOF
database_host: 192.168.100.3
database_name: maasdb
database_pass: FiPQSKlBpAvs
database_port: 5432
database_user: maas
maas_url: http://192.168.99.4:5240/MAAS
EOF
sudo chown root:maas /etc/maas/regiond.conf
sudo chmod 640 /etc/maas/regiond.conf
sudo systemctl restart maas-regiond

# on maas2 start postgres, now you should see the second region controller in the maas gui
sudo systemctl restart postgresql
sudo tail -f /var/log/postgresql/postgresql-9.5-main.log
#2019-06-11 12:38:26 JST [29904-4] LOG:  consistent recovery state reached at 0/230000F8
#2019-06-11 12:38:26 JST [29903-1] LOG:  database system is ready to accept read only connections
#2019-06-11 12:38:26 JST [29908-1] LOG:  started streaming WAL from primary at 0/24000000 on timeline 1
sudo -u postgres psql maasdb -c 'SELECT hostname,status,power_state FROM maasserver_node'

# we don't enable maas-dns in this test, on both maas and maas2 servers, clean up the bind9 conflicts:
sudo maas-region edit_named_options --migrate-conflicting-options
sudo systemctl restart bind9
# and also modifty /etc/resolv.conf to use maas dns

# setup HAproxy for load balancing on both maas and maas2 servers
# then refresh the gui you will see the region jump from one server to the other back and forth.
sudo systemctl stop apache2
sudo systemctl disable apache2
sudo apt install haproxy -y
sudo bash -c 'cat >> /etc/haproxy/haproxy.cfg' << EOF
frontend maas
    bind    *:80
    retries 3
    option  redispatch
    option  http-server-close
    default_backend maas
backend maas
    timeout server 30s
    balance roundrobin
    server localhost localhost:5240 check
    server maas 192.168.99.3:5240 check
    server maas2 192.168.99.4:5240 check
EOF
sudo systemctl restart haproxy

# setup keepalived (vip) on both maas and maas2, NOTE: fce use packemaker+corosync instead
sudo apt install keepalived -y
sudo modprobe ip_vs
sudo sh -c 'echo modprobe ip_vs >> /etc/modules'
sudo bash -c 'cat > /etc/sysctl.d/60-keepalived-nonlocal.conf' << EOF
net.ipv4.ip_nonlocal_bind=1
EOF
sudo systemctl restart procps
sudo bash -c 'cat > /etc/keepalived/keepalived.conf' << EOF
# https://docs.maas.io/2.3/en/manage-ha
# Un-comment next 4 lines if using haproxy
vrrp_script chk_haproxy {
    script "killall -0 haproxy"
    interval 2
}
# Un-comment next 4 lines if using apache2
vrrp_script chk_apache2 {
    script "killall -0 apache2"
    interval 2
}
vrrp_script chk_named {
    script "killall -0 named"
    interval 2
}
vrrp_instance maas_region {
    state MASTER
    interface ens8
    priority 150
    virtual_router_id 51
    authentication {
        auth_type PASS
        auth_pass password
    }
    track_script {
        ### Un-comment next line if using haproxy
        chk_haproxy
        ### Un-comment next line if using apache2
        #chk_apache2
        chk_named
    }
    virtual_ipaddress {
        192.168.99.5
    }
}
EOF
sudo systemctl restart keepalived

# adjust an API server to use VIP=192.168.99.5  (http://192.168.99.5/MAAS)
sudo maas-region local_config_set --maas-url http://192.168.99.5/MAAS
sudo systemctl restart maas-regiond

# install rack on maas and maas2, and adjust api server to use VIP=192.168.99.5
sudo apt install -y maas-rack-controller
sudo maas-rack register --url http://192.168.99.5/MAAS --secret $(sudo cat /var/lib/maas/secret)
#sudo maas-rack config --region-url http://192.168.99.5/MAAS
#sudo systemctl restart maas-rackd.service
maas login admin http://192.168.99.5/MAAS/api/2.0 $(sudo maas-region apikey --username admin)
maas admin rack-controllers read | grep hostname | cut -d '"' -f 4

# actuall we didn't enable maas-dns service in this test
sudo systemctl list-unit-files --type=service | grep maas-dhcp
ubuntu@maas:~$ cat /etc/resolv.conf 
nameserver 192.168.99.1
#nameserver 192.168.100.3
#search maas
ubuntu@maas2:~$ cat /etc/resolv.conf 
nameserver 192.168.99.1
#nameserver 192.168.100.4
#search maas

# the following error is because I accidentally created recovery.conf on the master server instead of the standby.
2019-06-11 15:25:05 JST [15474-1] maas@maasdb ERROR:  cannot execute INSERT in a read-only transaction
2019-06-11 15:31:44 JST [18618-1] maas@maasdb ERROR:  cannot execute LISTEN during recovery

# present problem
1, can't add the second rack-controller
2, there is maas-dns and maas-dns ha
3, lots of errors - django.db.utils.InternalError: cannot execute UPDATE in a read-only transaction

sudo maas-region dbshell --installed
maasdb=# \x
maasdb=# select system_id, hostname from maasserver_node;
maasdb=# \?
maasdb=# \l
maasdb=# \c maasdb
maasdb=# \dtq

附录 - Postgres HA replication mode

postgres HA的replication方式有3种:

  1. 基于日志的复制, master完完WAL日志后再发给slave,
    这样如果还没写完日志master就宕机的话未发的日志内的事务将全部丢失
    2, 异步流模式,master库以stream的模式向slave发日志, 不需要等待整个日志填充完毕再发大大降低了丢失数据的风险.但在master提交事务之后standby等待流数据时发生宕机也会导致最后一个事务丢失. 同时备库可以配置成host standby模式向外提供查询服务供分担负载.
    3, 流同步复制模式(synchronous replication),流复制的同步版本, 向master发生commit命令之后, 该命令会被阻塞,直到WAL日志流在所有被配置为同步节点(synchronous_standby_names)的数据库上提交后才会真正提交.因为只有master库和standby库同时宕机才会丢数据. 多层事务嵌套时,子事务不受此保护,只有最上层事务受此保护。纯读操作和回滚不受此影响。同时备库可以配置成HOT Standby,可以向外提供查询服务,供分担负载。采用这种模式的性能损耗依据网络情况和系统繁忙程度而定,网络越差越繁忙的系统性能损耗越严重。

Debug MaaS

#read maas db
grep -r 'database' /etc/maas/regiond.conf
#sudo maas-region dbshell --installed
sudo -iu postgres psql -d template1 -U postgres

# debug maas - https://github.com/maas/maas/blob/master/HACKING.rst
systemctl stop maas-regiond
# disable any log as much as possible
/usr/bin/python3 /usr/sbin/regiond --debug --workers 1

systemctl stop maas-rackd
setcap cap_net_bind_service=+eip /usr/sbin/rackd
/bin/rm -f /var/lib/maas/dhcpd.sock && /bin/rm -f /var/lib/maas/dhcpd.conf && /bin/rm -f /var/lib/maas/dhcpd6.conf && sleep 1 && LOGFILE=/var/log/maas/rackd.log prometheus_multiproc_dir=/var/lib/maas/prometheus /usr/bin/python3 /usr/sbin/rackd --nodaemon

enable debug log - https://discourse.maas.io/t/running-installed-maas-in-debug-logging-mode/168

20200907更新 - maas network_discovery

sudo add-apt-repository ppa:maas/2.7  #2.7 only worked on bionic
sudo apt install -y maas
sudo maas init --admin-username admin --admin-password password --admin-email admin@example.com --admin-ssh-import zhhuabj
sudo maas-region apikey --username=admin > ~/admin-api-key
curl http://192.168.2.111:5240/MAAS
apikey=$(cat ~/admin-api-key)
maas login admin http://192.168.2.111:5240/MAAS $apikey
maas login admin http://127.0.0.1:5240/MAAS $apikey
maas admin boot-resources import

$ maas admin discoveries read |jq ".[] | {ip:.ip, mac_address:.mac_address}" --compact-output
{"ip":"192.168.100.1","mac_address":"52:54:00:14:3a:d4"}
{"ip":"192.168.100.77","mac_address":"52:54:00:f8:9b:2b"}
maas admin discoveries clear-by-mac-and-ip ip=<IP> mac=<MAC>

#delete from maasserver_interface_ip_addresses where staticipaddress_id=64548 and id=64188;
#delete from maasserver_staticipaddress where id=64548 and ip='192.168.128.15';

#http://127.0.0.1:5240/MAAS/r/settings/network/network-discovery (Network discovery -> Active subnet mapping interval)
maas admin  maas get-config name=network_discovery
maas admin discoveries clear all=True
maas admin maas set-config name=network_discovery value=disabled
maas admin maas get-config name=network_discovery
maas admin discoveries read

20200915更新 - maas 2.8中使用centos 8

maas中使用centos的主要障碍是https://code.launchpad.net/~ltrager/curtin/+git/curtin/+merge/374335,但这个目前这个代码已经在UA了,只是maas 2.8的UI还没有允许让你使用centos 8而已,可以这样使用centos 8
但是限制是:

  • maas UI不支持上传centos8镜像(原因是因为这个还没有-http://images.maas.io/ephemeral-v3/daily/streams/v1/com.ubuntu.maas:daily:centos-bases-download.json),只是api可以(通过curtin)
  • maas-image-builder不支持做centos8镜像,似乎packer-maas可以,或者自定义(https://medium.com/@kemnitz.stefan/centos-8-via-maas-9ffcc6c7a22d)
sudo add-apt-repository ppa:maas/2.8
sudo apt update
sudo apt install maas -y
sudo maas init --admin-username admin --admin-password password --admin-email admin@example.com --admin-ssh-import zhhuabj
sudo maas-region apikey --username=admin > ~/admin-api-key
apikey=$(cat ~/admin-api-key)
maas login admin http://127.0.0.1:5240/MAAS $apikey

# Once Curtin has been upgraded the image can be uploaded to MAAS via the API
#https://code.launchpad.net/~ltrager/curtin/+git/curtin/+merge/374335
sudo apt install curtin=20.1-2-g42a9667f-0ubuntu1~18.04.1 -y
sudo systemctl stop maas-rackd && sudo systemctl restart maas-regiond && sudo systemctl start maas-rackd
axel https://people.canonical.com/~zhhuabj/centos8.tar.gz
maas admin boot-resources create name='centos 8' title='centos 8' architecture='amd64/generic' filetype='tgz' content@=centos8-1.0.2-7-gac521bb.tar.gz
但似乎maas-image-builder还不支持做centos8的镜像,但可参考这篇文章做 - https://medium.com/@kemnitz.stefan/centos-8-via-maas-9ffcc6c7a22d
还有一个工具叫packer-maas似乎也能做,见- https://manintheit.org/bash/creating-a-image-for-maas-with-packer/
$ sudo maas-image-builder -o centos8-amd64-root-tgz --arch amd64 centos --edition 8
...
mib.builders.BuildError: Unknown CentOS edition: 8.

$ grep -r 'grub2' /usr/lib/curtin/helpers/common |grep 8
               7|8) grub_name="grub2-pc";;
                        7|8) grubcmd="grub2-install"
使用packer-maas创建centos8镜像如下:
# https://github.com/canonical/packer-maas/tree/master/centos8
sudo apt install packer
# fix bug - https://github.com/canonical/packer-maas/issues/2
wget https://releases.hashicorp.com/packer/1.6.2/packer_1.6.2_linux_amd64.zip
unzip packer_1.6.2_linux_amd64.zip && sudo cp ./packer /usr/bin/ && packer --version
git clone https://github.com/canonical/packer-maas.git
cd packer-maas/centos8  #must be in cetnos8 subdir
#change url in centos8.json to http://mirrors.aliyun.com/centos/8.2.2004/isos/x86_64/CentOS-8.2.2004-x86_64-boot.iso
sudo PACKER_LOG=1 packer build centos8.json  #but will hit this bug - https://github.com/canonical/packer-maas/issues/2
maas admin boot-resources create name='centos/8-custom' title='CentOS 8 Custom' architecture='amd64/generic' filetype='tgz' content@=centos8.tar.gz  #default username is cloud-user

然后可以设置Deploy的默认镜像(commission时只能用ubuntu镜像), 也可以在Deploy时选择用centos镜像, 但commission似乎只能使用ubuntu镜像. deploy日志可见: /var/log/maas/rsyslog/test/2020-09-25/messages
最终可以通过: 'ssh cloud-user@192.168.100.2’访问deploy后的centos8 机器. 如果不是deploy的而是救援模式进去的可以"ssh ubuntu@192.168.100.2 -v", 注意: 救援模式只是commisstion不是deploy所以也是无法选择image的

[cloud-user@test ~]$ uname -a
Linux test.maas 4.18.0-193.14.2.el8_2.x86_64 #1 SMP Sun Jul 26 03:54:29 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

在这里插入图片描述
但这个镜像似乎在uefi虚机时不work, 创建测试uefi虚机的步骤如下:

sudo apt install ovmf -y  #should install ovmf in node1 rather than t440p
sudo systemctl restart libvirtd
virt-manager --debug      #run it in t440p, then connect to node1
Start creating a new VM in virt-manager, but before finishing, click "Customize configuration before install"
Change the Firmware Option from BIOS to EUFI in 'Overview' tab. (If it's not available do a systemctl restart libvirtd)

debug curtin

https://gist.github.com/smoser/2610e9b78b8d7b54319675d9e3986a1b
在maas中查看日志/var/log/maas/rsyslog/test2/2020-09-28/messages, 看到的错误如下:

curthooks -> builtin_curthooks -> setup_grub -> install_grub(instdevs, target, uefi=uefi_bootable, grubcfg=grubcfg) -> 

unshare --fork --pid -- chroot /tmp/tmp6yj0ryyc/target grub2-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=centos --recheck --no-nvram

Stderr: grub2-install: error: /usr/lib/grub/x86_64-efi/modinfo.sh doesn't exist.

/usr/lib/grub/x86_64-efi/modinfo.sh 不存在, 这个网页https://bugzilla.redhat.com/show_bug.cgi?id=1101352 说:
Looks like this is intentional to avoid the fallout of unsuspecting users running grub2-install. To regain ability to grub2-install on EFI, install package grub2-efi-modules.
所以是需要安装grub2-efi-modules模块. 下面方法检查image中检查确认确实没有这个包:

#sudo guestmount -a /var/lib/libvirt/images/test2.qcow2  -i --rw ./root2
sudo modprobe nbd
sudo qemu-nbd --connect=/dev/nbd0 /var/lib/libvirt/images/test2.qcow2 -f qcow2
sudo fdisk /dev/nbd0 -l
sudo qemu-nbd --disconnect /dev/nbd0
$ sudo fdisk /dev/nbd0 -l |grep dev
Disk /dev/nbd0: 20 GiB, 21474836480 bytes, 41943040 sectors
/dev/nbd0p1    2048  1050623  1048576  512M EFI System
/dev/nbd0p2 1050624 41928703 40878080 19.5G Linux filesystem

cd /tmp && mkdir {boot,root}
sudo mount /dev/nbd0p1 ./boot/
sudo mount /dev/nbd0p2 ./root/

$ ls ./root/usr/lib/grub/
i386-pc

$ sudo dpkg -L grub-efi-amd64-bin |grep modinfo.sh
/usr/lib/grub/x86_64-efi/modinfo.sh
$ sudo apt-file search modinfo.sh |grep amd64
grub-efi-amd64-bin: /usr/lib/grub/x86_64-efi/modinfo.sh

先试了使用kickstart定制包, 但不好使: https://manintheit.org/bash/creating-a-image-for-maas-with-packer/

sudo PACKER_LOG=1 HTTPIP=127.0.0.1 HTTPPort=8000 packer build centos8.json

然后用下列方法重新打包, 在image中添加这个包:

mkdir /tmp/centos8 && sudo tar -xf centos8.tar.gz -C /tmp/centos8/
MOUNTDIR=/tmp/centos8
for d in dev sys proc; do sudo mount --bind /$d ${MOUNTDIR}/$d; done
sudo mv ${MOUNTDIR}/etc/resolv.conf ${MOUNTDIR}/etc/resolv.conf.bak && sudo cp /etc/resolv.conf ${MOUNTDIR}/etc/


#sudo chroot $MOUNTDIR bash
sudo chroot $MOUNTDIR yum update
#install grub2-efi-x64-modules instead of grub2-efi-modules to avoid grub2-efi-aa64-modules
sudo chroot $MOUNTDIR yum install grub2-efi-x64-modules -y
sudo chroot $MOUNTDIR ls /usr/lib/grub/x86_64-efi/modinfo.sh
sudo chroot $MOUNTDIR yum list --installed |grep -E 'shim|efi'

sudo umount $MOUNTDIR/{proc,dev,sys,}
sudo mv $MOUNTDIR/etc/resolv.conf.bak $MOUNTDIR/etc/resolv.conf
sudo tar -czf centos8.tar.gz -C $MOUNTDIR .

现在之前的错误没了, 又出现下列新错误:

Command: ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmphc1fr7hk/target', 'dracut', '-f', '/boot/initramfs-4.18.0-193.14.2.el8_2.x86_644.18.0-193.19.1.el8_2.x86_64.img', '4.18.0-193.14.2.el8_2.x86_644.18.0-193.19.1.el8_2.x86_64']        Exit code: 1        Reason: -        Stdout: ''        Stderr: dracut: Cannot find module directory /lib/modules/4.18.0-193.14.2.el8_2.x86_644.18.0-193.19.1.el8_2.x86_64/                dracut: and --no-kernel was not specified

那是因为刚刚升级了centos(不应该升级)导致有两个内核了

bash-4.4# rpm -q kernel
kernel-4.18.0-193.14.2.el8_2.x86_64
kernel-4.18.0-193.19.1.el8_2.x86_64

bash-4.4# rpm -q --queryformat %{VERSION}-%{RELEASE}.%{ARCH} kernel
4.18.0-193.14.2.el8_2.x86_644.18.0-193.19.1.el8_2.x86_64

dracut -f /boot/initramfs-4.18.0-193.14.2.el8_2.x86_644.18.0-193.19.1.el8_2.x86_64.img 4.18.0-193.14.2.el8_2.x86_644.18.0-193.19.1.el8_2.x86_64

这个修改后, 又遇到image缓存了, 之后, 这些问题没有了, 但进系统时总是进入mergency mode, 原因是initrd-switch-root.service服务启动失败, 是因为找不到硬盘.
最后是这样解决的, 修改packer-maas/centos8/http/centos8.ks

#grub2-efi-x64
efibootmgr
#shim-x64
grub2-efi-x64-modules

之前packer-maas测试是安装非签名版grub2-efi-x64-modules时不能安装签名版的grub2-efi-x64与shim-x64, 但用下列实验怎么又可以同时安装.

#https://www.cnblogs.com/ricksteves/p/11623681.html
yum install grub2-efi-x64 shim-x64 grub2-efi-x64-modules -y  #bug - https://bugzilla.redhat.com/show_bug.cgi?id=1201220
grub2-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=centos --recheck --no-nvram
grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg  #soft link to /etc/grub2-efi.cfg

恢复maas cluster

Assume my test env have 3 nodes (31, 32, 33), here are my steps:

1, stop all db and maas service in 31 and 32 and 33
systemctl stop corosync && systemctl stop pacemaker
systemctl stop maas-*

2, start db in 31
rm -rf /var/lib/pgsql/tmp/PGSQL.lock > /dev/null 2>&1 && systemctl restart corosync && systemctl restart pacemaker

3, start db in 32
rm -rf /var/lib/pgsql/tmp/PGSQL.lock > /dev/null 2>&1 && systemctl restart corosync && systemctl restart pacemaker

At this time, 31 will have db_vip and maas_vip by checking 'crm status' and 'ip addr show'.

4, start maas services in 31
systemctl restart maas-regiond && systemctl restart maas-rackd && systemctl restart maas-proxy && systemctl restart maas-dhcpd6 && systemctl restart maas-dhcpd

At this time, veriftied maas ui by 'lynx http://10.5.0.15:5240/MAAS', it works.

5, start maas services in 32
systemctl restart maas-regiond && systemctl restart maas-rackd && systemctl restart maas-proxy && systemctl restart maas-dhcpd6 && systemctl restart maas-dhcpd

At this time, veriftied maas ui, it works.

6, start maas services 33 as well
rm -rf /var/lib/pgsql/tmp/PGSQL.lock > /dev/null 2>&1 && systemctl restart corosync && systemctl restart pacemaker
systemctl restart maas-regiond && systemctl restart maas-rackd && systemctl restart maas-proxy && systemctl restart maas-dhcpd6 && systemctl restart maas-dhcpd

At this time, veriftied maas ui, it works.

20211206 - debug maas with ipmi

在裸机node1上创建三个虚机, 虚机使用一个名为cloud的network (192.168.100.0/24, without dhcp):

  • maasdev=192.168.100.3, 通过netplan设置IP为192.168.100.3
  • 创建两个pxe启动虚机(maastestnode3, maastestnode4), pxe启动的要点是一个空硬盘加pxe启动设置,然后让它们通过pxe启动一次这样在maas中就会通过mac创建两台New状态的machine记录. (但是这里有一个问题,这里还没有和ipmi)关联)
  • node1 上安装ipmi
  • 将maastestnode3注册到impi
 /opt/vbmc/bin/vbmc add maastestnode --port 6003 --address 192.168.100.1 --username admin --password password
/opt/vbmc/bin/vbmc list
ipmitool -I lanplus -H 192.168.100.1 -U admin -P password -p 6003 power status

第一步,maasdev使用netplan配置IP=192.168.100.3,同时将nameservers也指向它:

cat <<EOF | sudo tee /etc/netplan/01-netcfg.yaml
network:
  version: 2
  renderer: networkd
  ethernets:
    enp1s0:
      dhcp4: no
      addresses:
      - 192.168.100.3/24
      gateway4: 192.168.100.1
      nameservers:
        addresses:
        - 192.168.100.3
EOF
sudo netplan apply

第二步,maasdev上安装maas,本来之前想通过源码生编译生成snap安装但有个编译错误,所以后来通过deb包安装。

# on maasdev - https://github.com/maas/maas/blob/master/HACKING.rst
git clone https://git.launchpad.net/maas && maas
git checkout -b 3.0.0 3.0.0
#sudo apt install make -y
#make install-dependencies #Postgres, isc-dhcp, bind9 etc
#ls src/maasserver/djangosettings/development.py
#make && make syncdb && ls db/ && make sampledata

# using snap instead
# it's in a plain directory insted of in a squashfs image, so you modify source code
make clean
#make snap-prime
sudo snap try build/dev-snap/prime
utilities/connect-snap-interfaces
sudo maas init
make sync-dev-snap  #modify the source code
sudo service snap.maas.supervisor restart
# but it has the following error
cp: cannot stat 'src/production-html-snap/*': No such file or directory

# so we use debian instead - https://launchpad.net/~maas
sudo add-apt-repository ppa:maas/3.0
sudo apt update
sudo apt install maas -y

sudo maas createadmin
apikey=$(sudo maas apikey --username admin)
maas login admin http://192.168.100.3:5240/MAAS $apikey
maas admin boot-resources import

#maas-regiond, maas-rackd, maas-dhcpd, maas-proxy, maas-http, maas-syslog
sudo systemctl status maas-*
#need to configure dhcp for subnet for maas-dhcpd error - ConditionPathExists=/var/lib/maas/dhcpd-interfaces was not met
ssh node1 -X
#then access http://192.168.100.3:5240/MAAS to enable dhcp for 192.168.100.0/24 on 'subnet' TAB.
firefox &

第三步,创建两个pxe虚机(maastestnode3, maastestnode4). 首先需要一块空硬盘((truncate --size 10G /images/kvm/maastestnode3.raw), 其次需要设置为pxe启动. (在virt-manager里创建一个名为maastestnode3的新虚机(设置从PXE启动、使用truncate定义的空磁盘,同时定义一块cloud的网卡)。 最后需要从pxe启动一次,这样maas里就通过mac生成了一个New状态的machine. 另外,关掉虚机后记得将启动模式改回pxe

ubuntu@maasdev:~$ maas admin machines read | jq '.[] | {hostname:.hostname,system_id: .system_id,status:
.status_name,ip_addresses: .ip_addresses, node_type_name:.node_type_name, testing_status:.testing_status, commissioning_
status:.commissioning_status}' --compact-output^C
ubuntu@maasdev:~$ maas admin machines read | jq '.[] | {hostname:.hostname,system_id: .system_id,status:.status_name,ip_addresses: .ip_addresses, node_type_name:.node_type_name, testing_status:.testing_status, commissioning_status:.commissioning_status}' --compact-output
{"hostname":"upward-tiger","system_id":"gyyapc","status":"New","ip_addresses":[],"node_type_name":"Machine","testing_status":-1,"commissioning_status":2}
{"hostname":"bold-parrot","system_id":"gr64df","status":"New","ip_addresses":[],"node_type_name":"Machine","testing_status":-1,"commissioning_status":2}

第四步,node1上设置impi

sudo -i
apt install python3-pip python3-dev gcc libvirt-dev ipmitool python3-virtualenv -y
python3 -m virtualenv --system-site-packages --download /opt/vbmc
/opt/vbmc/bin/pip install virtualbmc
cat << EOF | sudo tee -a /etc/systemd/system/vbmcd.service
[Install]
WantedBy = multi-user.target
[Service]
BlockIOAccounting = True
CPUAccounting = True
ExecReload = /bin/kill -HUP $MAINPID
ExecStart = /opt/vbmc/bin/vbmcd --foreground
Group = root
MemoryAccounting = True
PrivateDevices = False
PrivateNetwork = False
PrivateTmp = False
PrivateUsers = False
Restart = on-failure
RestartSec = 2
Slice = vbmc.slice
TasksAccounting = True
TimeoutSec = 120
Type = simple
User = root
[Unit]
After = libvirtd.service
After = syslog.target
After = network.target
Description = vbmc service
EOF
systemctl enable vbmcd
systemctl restart vbmcd
virsh list -all
/opt/vbmc/bin/vbmc add maastestnode3 --port 6003 --address 192.168.100.1 --username admin --password password
/opt/vbmc/bin/vbmc add maastestnode4 --port 6004 --address 192.168.100.1 --username admin --password password
/opt/vbmc/bin/vbmc list
/opt/vbmc/bin/vbmc start maastestnode3
/opt/vbmc/bin/vbmc start maastestnode4
/opt/vbmc/bin/vbmc show maastestnode |grep running
ipmitool -I lanplus -H 192.168.100.1 -U admin -P password -p 6003 power status
ipmitool -I lanplus -H 192.168.100.1 -U admin -P password -p 6004 power status
exit

第五步,这问题似乎与上面的ipmi这一步没关系。这一步用rpdb和log来调试

# debug API
sudo sed -i 's/DEBUG = False/DEBUG = True/g' /usr/lib/python3/dist-packages/maasserver/djangosettings/settings.py
vim /usr/lib/python3/dist-packages/maasserver/api/machines.py#AnonMachinesHandler#create
import rpdb;rpdb.set_trace()
sudo systemctl stop maas-regiond
#/usr/bin/python3 /usr/sbin/regiond --debug --workers 1
sudo -u maas -H DJANGO_SETTINGS_MODULE=maasserver.djangosettings.settings /usr/bin/python3 /usr/sbin/regiond --debug --workers 1
sudo pip3 install rpdb
nc 127.0.0.1 4444

maaslog.info("zhhuabj: power_type: %s  request.data: %s", power_type, str(request.data))

Reference
[1] https://blog.csdn.net/quqi99/article/details/37990507
[2] https://www.cnblogs.com/aegis1019/p/8870251.html
[3] https://discourse.maas.io/t/minimal-maas-setup/5543
[4] https://gist.github.com/brettmilford/0af6a75011adb2755ff003e5ea999992

  • 1
    点赞
  • 0
    评论
  • 1
    收藏
  • 打赏
    打赏
  • 扫一扫,分享海报

©️2021 CSDN 皮肤主题: Age of Ai 设计师:meimeiellie 返回首页

打赏作者

quqi99

你的鼓励就是我创造的最大动力

¥2 ¥4 ¥6 ¥10 ¥20
输入1-500的整数
余额支付 (余额:-- )
扫码支付
扫码支付:¥2
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值