Play with MAAS (by quqi99)

问题
四年前写了一篇关于MAAS的博客 [1],但好几年没用它了,今天发现GUI的操作流程变化还有点,记录一下。
Host机
配置了下面的网络:

auto eth0
iface eth0 inet manual
auto br-eth0
iface br-eth0 inet static
    address 192.168.99.124/24
    gateway 192.168.99.1
    bridge_ports eth0
    dns-nameservers 192.168.99.1

然后在virt-manger里创建了一个名为cloud的虚拟网络(192.168.100.0/24),未使用DHCP。
最后,使用virt-manager创建了一个名为maas的虚机,给它分配了两个网卡(ens3=192.168.100.3用作MAAS的IP,另一个网卡ens8=192.168.99.1纯粹为了方便管理之用)
maas虚机
网络配置如下:

$ cat /etc/resolvconf/resolv.conf.d/base 
search maas
nameserver 192.168.100.3

or modify /etc/systemd/resolved.conf to set DNS=192.168.100.3, then run:
sudo systemctl restart systemd-resolved.service
# ifupdown has been replaced by netplan(5) on this system.  See
# /etc/netplan for current configuration.
# To re-enable ifupdown on this system, you can run:
sudo apt install ifupdown
cat /etc/network/interfaces
auto ens3
iface ens3 inet static
address 192.168.100.3
netmask 255.255.255.0
gateway 192.168.100.1

auto ens8
iface ens8 inet static
address 192.168.99.3
netmask 255.255.255.0
gateway 192.168.99.1

安装maas:
NOTE: 千万要注意,最好使用debian包安装,使用snap包安装里会造成snap container里没有设置REQUESTS_CA_BUNDLE环境变量由于python-requests的一个bug导致maas无法使用https服务,例如运行"maas configauth --rbac-url https://node1.lan:5000/ --rbac-service-name maastest"就会抛错CERTIFICATE_VERIFY_FAILED, 具体见: https://zhhuabj.blog.csdn.net/article/details/107182847

sudo snap install maas --channel=3.0/stable
#sudo snap install maas-test-db
#sudo maas init region+rack --maas-url http://192.168.99.186:5240/MAAS --database-uri maas-test-db:///
sudo apt install -y postgresql
sudo -iu postgres psql -d template1 -U postgres
CREATE USER maas WITH ENCRYPTED PASSWORD 'password';
CREATE DATABASE maasdb;
GRANT all privileges on database maasdb to maas;
\c maasdb
cat << EOF | sudo tee -a /etc/postgresql/12/main/pg_hba.conf
host    maas    maasdb  0/0     md5
EOF
sudo /snap/bin/maas init region+rack --maas-url http://192.168.99.186:5240/MAAS --database-uri "postgres://maas:password@localhost/maasdb"
sudo /snap/bin/maas createadmin --username admin --password password --email admin@example.com --ssh-import lp:<unsername>
sudo /snap/bin/maas apikey --username admin > ~ubuntu/admin-api-key
sudo /snap/bin/maas status
#access http://192.168.99.186:5240/MAAS
#dbshell is missing from snap - https://bugs.launchpad.net/maas/+bug/1877669
#sudo snap run --shell maas
#maas-region dbshell

cat /var/snap/maas/current/regiond.conf
sudo -u postgres psql -d maasdb

sudo add-apt-repository ppa:maas/3.0   #https://launchpad.net/~maas
sudo apt install maas

配置maas虚机可以无密码访问物理机上的libvirt:

sudo chsh maas -s /bin/bash
sudo su - maas
ssh-keygen
ssh-copy-id -i ~/.ssh/id_rsa hua@192.168.100.1
sudo -u maas virsh -c qemu+ssh://hua@192.168.100.1/system list --all

20230312更新 - 若是snap安装的maas得改用下列命令配置maas虚机可以无密码访问物理机上的libvirt:

mkdir -m 0700 -p /var/snap/maas/current/root/.ssh
cd /var/snap/maas/current/root/.ssh
ssh-keygen -f id_rsa -N ''
ssh-copy-id -i /var/snap/maas/current/root/.ssh/id_rsa hua@192.168.100.1

创建maas管理员用户之后就可以通过http://192.168.99.3:5240/MAAS登录GUI管理界面了.

sudo maas createadmin
apikey=$(sudo maas apikey --username admin)
maas login admin http://192.168.100.3:5240/MAAS $apikey
maas login admin http://127.0.0.1:5240/MAAS $apikey
maas admin boot-resources import

enable PXE

在’Subnets’点击subnet 192.168.100.0/24对应的vlan(untagged)后’enable dhcp’才能enable PXE.
在这里插入图片描述

注册maas节点
创建一块空磁盘(truncate --size 10G /images/kvm/controller.raw),然后在virt-manager里创建一个名为controller的新虚机(设置从PXE启动、使用truncate定义的空磁盘,同时从cloud network中定义四块网卡), 我们能从virt-manager中看到pxe的启动过程,但最后却一闪而过了那是因为需要从maas里来启动它。好,从virt-manager中关闭该虚机,同时记得再将启动模式改回PXE (注:若命令中将–pxe改成’–pxe --boot network,hd’就不需要做这步了
20230312更新,也可以使用命令创建:

sudo virt-install --name=n3 --ram=8096 --vcpus=1 --virt-type=kvm --accelerate --pxe  --boot network,hd \
    --connect=qemu:///system --os-variant=ubuntu22.04 --arch=x86_64 \
    --disk=/images/testbed/n3.qcow2,bus=virtio,format=qcow2,cache=none,sparse=true,size=50 \
    --disk=/images/testbed/n3_sdb.qcow2,bus=virtio,format=qcow2,cache=none,sparse=true,size=10 \
    --network bridge:br-maas,model=virtio,mac=52:54:00:63:7e:7e --network bridge:br-eth0,model=virtio,mac=52:54:00:63:7e:7f

注:20240823更些,此处一闪而过是正常的,首先此处还没有定义power type,只要在maas界面看到是New状态就行了。另外,juju 3.5有个bug, 定义两块网卡时总是在commision时会报错20-maas-02-dhcp-unconfigured-ifaces, 那是因为用两块网卡时虚机内部的netplan只会将每一块网卡设置为dhcp,第二块是不管的。这时应该在maas界面上调整(3.5似乎又有bug), 所以先弄一块,有需要再加一块网卡后直接deloy即可(如果此时再做commision仍然还会报这个错)。
NOTE: 如果此时还不行, 多半是上面一步没有enable pxe吧. 另外, 在commission时无法选择image, 只能使用默认的ubuntu image, 在deploy时可以设置使用其他如centos镜像.
另外,如果先不注册virsh/ipmi信息,先将这台PXE启动的话,这时maas会将pxe信息中的mac信息(并随机生成一个name)生成一个New状态的机器,然后再去编辑这台机器来添加virsh/ipmi信息即可。
这里写图片描述

然后访问GUI的"Nodes -> Add hardware -> Machine"定义节点(注意:四块网卡的MAC地址一次性定义,如果后续此处添加网卡的话还得再执行commission操作):
这里写图片描述

commission过程可从下列几个方式调试:

  • virt-manager vnc可以看到
  • maas节点上的/var/log/maas/maas.log
  • controller虚机(IP可从GUI里找到)里的cloud-init日志

20230312更新 - 这一块出错说连不上BMC错误的话,是因为maas改用snap安装的,需要做如下key设置:

mkdir -m 0700 -p /var/snap/maas/current/root/.ssh
cd /var/snap/maas/current/root/.ssh
ssh-keygen -f id_rsa -N ''
ssh-copy-id -i /var/snap/maas/current/root/.ssh/id_rsa hua@192.168.9.1

然后根据在virt-manager中定义的四块网卡的MAC地址在maas中也如下图定义一下:
20230312更新 - 如果此处出错,是应该必须将maas用的网卡放第一位(因为虚机内部的netplan设置只会将第一块网卡默认设置为dhcp,这样只有第一块网卡有IP, 它才能去连maas上的pxe). 另外,maas网卡所用网桥(如cloud)虽然不需要启用dhcp(maas内部会有dhcp server), 但是cloud这个网桥必须是连外网的,因为它在做commision时需要访问外网去下载一些包依赖。我们这里因为使用的virsh创建的cloud网桥没这个问题它默认就是有NAT的,对于其他在netplan创建的网桥没有NAT的这里就得人工运行NAT(sudo iptables -t nat -A POSTROUTING -s 192.168.100.0/24 ! -d 192.168.100.0/24 -j MASQUERADE)
这里写图片描述

四块网卡定义好了之后得重新点击一下"Take action -> Commision",然后将看到如下界面:
这里写图片描述

然后我们将看到"Add Interface"、“Create bond”、"Create Bridge"等按钮,先将ens8和ens9做成bond0,再在bond0上创建一些vlan(先建vlan再建bridge顺序不能变),最后将bond0做成br-bond0,最后如下图:
这里写图片描述

接着还要再点击"Take action -> Deploy" (注意: 此处不是点commissioning) 重新部署,最后就可以从GUI界面中找到IP后通过ssh ubuntu@IP访问了
注:

  • vlan号是在"Subnet -> fabric0" TAB中配置的,不是在"Machine"页配置, 在"Machine"页在bond0和一个vlan关联如bond0.59
  • 上面bond0最好指定subnet
  • juju使用lxd时会自动创建br-bond0,并将bond0上的IP挪到br-bond0上

附录 - CLI

在这里插入图片描述

https://docs.maas.io/2.5/en/manage-cli-advanced
https://maas.io/docs/concepts-and-terms
region相当于一个datacenter或者一个single region, fabrics再去划分region, 每个rack controller都被attached到每一个fabric.
rack下关联node, interfaces, fabric; interface关联subnets, subnets有vlan
fabric下再关联vlan; vlan关联space
interfaces下link到subnet, subnet有vlan

maas login admin http://localhost/MAAS/api/2.0 $(sudo maas-region apikey --username admin)

#machine_id=$(maas admin machines read | jq -r '.[] | select(.hostname=="controller").system_id')
#maas admin interfaces read $machine_id > interface.txt

maas admin rack-controllers read
system_id=$(maas admin rack-controllers read | jq -r .[].system_id)

maas admin machines read
machine_id=$(maas admin machines read | jq -r '.[] | select(.hostname=="controller").system_id')
maas admin machines read |jq ".[] | {hostname:.hostname, system_id: .system_id, status:.status}" --compact-output

maas admin interfaces read "$system_id"
maas admin interfaces read $system_id | jq ".[] |{id:.id, name:.name, mac:.mac_address, vid:.vlan.vid, fabric:.vlan.fabric}" --compact-output
maas admin interface update "$system_id" $interface vlan=5001
#maas admin interface unlink-subnet "$system_id" "$iface_id" id="$link_id"
#maas admin interface delete "$system_id" "$vlan_iface_id"
#maas admin interface link-subnet "$system_id" "$iface_id" mode=AUTO subnet="$iface_subnet_cidr"
interfaces=$(maas admin interfaces read "$system_id")
iface_name="ens3"
iface_id=$(echo "$interfaces" | jq -r ".[] | select(.name==\"$iface_name\") | .id")
link_ids=$(maas admin interface read "$system_id" "$iface_id" | jq -r '.links | .[].id')
for link_id in $link_ids; do
   maas admin interface unlink-subnet "$system_id" "$iface_id" id="$link_id"
done
vlan_iface_ids=$(maas admin interfaces read "$system_id" | jq -r '.[] | select(.type=="vlan").id')
for vlan_iface_id in $vlan_iface_ids; do
   maas admin interface delete "$system_id" "$vlan_iface_id"
done
iface_subnet_cidr=$(maas admin subnets read | jq -r ".[] | select(.name==\"$iface_subnet_name\").cidr")
maas admin interface link-subnet "$system_id" "$iface_id" mode=AUTO subnet="$iface_subnet_cidr"

maas admin fabrics read
maas admin fabrics read | jq ".[] |{name:.name, vlans:.vlans[] | {id:.id, vid:.vid}}" --compact-output
maas admin fabric update 0 name=maas-management
fabric_name="maas-management"
mag_fabric_id=$(maas admin fabrics read| jq -r ".[] | select(.name==\"$fabric_name\").id")

vid=$(maas admin subnets read | jq -M '.[] | select(.cidr=="10.12.1.0/24").vlan.vid')
fabric=$(maas admin subnets read| jq -r '.[] | select(.cidr=="10.12.1.0/24").vlan.fabric')
fabric_id=$(maas admin fabrics read | jq -M '.[] | select(.name=="`echo $fabric`").id')  #$fabric=fabric-1
maas admin vlan read $fabric_id $vid

maas admin spaces create name=os-floating
os_floating_spaceid=$(maas admin spaces read | jq '.[] | select(.name == "'os-floating'")'.id)
maas admin vlan update $mag_fabric_id 5 space=$os_floating_spaceid mtu=1500

maas admin subnets read
maas admin subnet update $(maas admin subnets read| jq -r ".[] | select(.cidr==\"10.231.16.0/24\").id") cidr=10.231.16.0/21
pxe_subnet_id=$(maas admin subnets read| jq -r ".[] | select(.cidr==\"10.231.16.0/21\").id")
maas admin subnet update $pxe_subnet_id name=maas-management

附录 - MAAS Region HA

# https://sites.google.com/site/openstackinthebasement3/maasha
# https://cloud.google.com/community/tutorials/setting-up-postgres-hot-standby
sudo apt remove --purge postgresql maas*
sudo apt autoremove
sudo apt install maas-region-controller
#sudo dpkg-reconfigure maas-region-controller
#sudo dpkg-reconfigure maas-rack-controller

# on maas to create user
sudo maas createadmin --username admin --password ubuntu --email root@example.com

# on maas and maas2
sudo bash -c 'cat >> /etc/postgresql/9.5/main/pg_hba.conf' << EOF
host     replication     repuser         192.168.100.3/32        md5
host     replication     repuser         192.168.100.4/32        md5
host     all             all             192.168.100.3/32        md5
host     all             all             192.168.100.4/32        md5
host     replication     repuser         192.168.99.3/32        md5
host     replication     repuser         192.168.99.4/32        md5
host     all             all             192.168.99.3/32        md5
host     all             all             192.168.99.4/32        md5
host     all             all             192.168.0.0/16        md5
EOF


# on both maas and maas2
sudo -u postgres createuser -U postgres repuser -P -c 5 --replication
#sudo mkdir -p /var/lib/postgresql/9.5/main/mnt/server/archivedir
#sudo chown postgres:postgres /var/lib/postgresql/9.5/main/mnt/server/archivedir

sudo bash -c 'cat >> /etc/postgresql/9.5/main/postgresql.conf' << EOF
listen_addresses = '*'
max_connections = 300
wal_level = hot_standby
synchronous_commit = on
archive_mode = off
#archive_mode = on
#archive_command = 'test ! -f /var/lib/postgresql/9.5/main/mnt/server/archivedir/%f && cp %p /var/lib/postgresql/9.5/main/mnt/server/archivedir/%f'
max_wal_senders = 10
wal_keep_segments = 256
hot_standby = on
restart_after_crash = off
hot_standby_feedback = on
EOF

# on maas, restart postgresql
sudo systemctl restart postgresql.service

# on maas2 run the db backup
sudo systemctl stop postgresql
sudo mv /var/lib/postgresql/9.5/main /var/lib/postgresql/9.5/main.old
sudo -u postgres pg_basebackup -h 192.168.100.3 -D /var/lib/postgresql/9.5/main -U repuser -v -P --xlog-method=stream

sudo cp /usr/share/postgresql/9.5/recovery.conf.sample /var/lib/postgresql/9.5/main/recovery.conf
sudo bash -c 'cat >> /var/lib/postgresql/9.5/main/recovery.conf' << EOF
standby_mode = on
primary_conninfo = 'host=192.168.100.3 port=5432 user=repuser password=password'
EOF

# on maas2 configure the region to point at the main database
sudo systemctl stop maas-regiond
sudo rm /var/lib/maas/{maas_id,secret}
sudo bash -c 'cat > /etc/maas/regiond.conf' << EOF
database_host: 192.168.100.3
database_name: maasdb
database_pass: FiPQSKlBpAvs
database_port: 5432
database_user: maas
maas_url: http://192.168.99.4:5240/MAAS
EOF
sudo chown root:maas /etc/maas/regiond.conf
sudo chmod 640 /etc/maas/regiond.conf
sudo systemctl restart maas-regiond

# on maas2 start postgres, now you should see the second region controller in the maas gui
sudo systemctl restart postgresql
sudo tail -f /var/log/postgresql/postgresql-9.5-main.log
#2019-06-11 12:38:26 JST [29904-4] LOG:  consistent recovery state reached at 0/230000F8
#2019-06-11 12:38:26 JST [29903-1] LOG:  database system is ready to accept read only connections
#2019-06-11 12:38:26 JST [29908-1] LOG:  started streaming WAL from primary at 0/24000000 on timeline 1
sudo -u postgres psql maasdb -c 'SELECT hostname,status,power_state FROM maasserver_node'

# we don't enable maas-dns in this test, on both maas and maas2 servers, clean up the bind9 conflicts:
sudo maas-region edit_named_options --migrate-conflicting-options
sudo systemctl restart bind9
# and also modifty /etc/resolv.conf to use maas dns

# setup HAproxy for load balancing on both maas and maas2 servers
# then refresh the gui you will see the region jump from one server to the other back and forth.
sudo systemctl stop apache2
sudo systemctl disable apache2
sudo apt install haproxy -y
sudo bash -c 'cat >> /etc/haproxy/haproxy.cfg' << EOF
frontend maas
    bind    *:80
    retries 3
    option  redispatch
    option  http-server-close
    default_backend maas
backend maas
    timeout server 30s
    balance roundrobin
    server localhost localhost:5240 check
    server maas 192.168.99.3:5240 check
    server maas2 192.168.99.4:5240 check
EOF
sudo systemctl restart haproxy

# setup keepalived (vip) on both maas and maas2, NOTE: fce use packemaker+corosync instead
sudo apt install keepalived -y
sudo modprobe ip_vs
sudo sh -c 'echo modprobe ip_vs >> /etc/modules'
sudo bash -c 'cat > /etc/sysctl.d/60-keepalived-nonlocal.conf' << EOF
net.ipv4.ip_nonlocal_bind=1
EOF
sudo systemctl restart procps
sudo bash -c 'cat > /etc/keepalived/keepalived.conf' << EOF
# https://docs.maas.io/2.3/en/manage-ha
# Un-comment next 4 lines if using haproxy
vrrp_script chk_haproxy {
    script "killall -0 haproxy"
    interval 2
}
# Un-comment next 4 lines if using apache2
vrrp_script chk_apache2 {
    script "killall -0 apache2"
    interval 2
}
vrrp_script chk_named {
    script "killall -0 named"
    interval 2
}
vrrp_instance maas_region {
    state MASTER
    interface ens8
    priority 150
    virtual_router_id 51
    authentication {
        auth_type PASS
        auth_pass password
    }
    track_script {
        ### Un-comment next line if using haproxy
        chk_haproxy
        ### Un-comment next line if using apache2
        #chk_apache2
        chk_named
    }
    virtual_ipaddress {
        192.168.99.5
    }
}
EOF
sudo systemctl restart keepalived

# adjust an API server to use VIP=192.168.99.5  (http://192.168.99.5/MAAS)
sudo maas-region local_config_set --maas-url http://192.168.99.5/MAAS
sudo systemctl restart maas-regiond

# install rack on maas and maas2, and adjust api server to use VIP=192.168.99.5
sudo apt install -y maas-rack-controller
sudo maas-rack register --url http://192.168.99.5/MAAS --secret $(sudo cat /var/lib/maas/secret)
#sudo maas-rack config --region-url http://192.168.99.5/MAAS
#sudo systemctl restart maas-rackd.service
maas login admin http://192.168.99.5/MAAS/api/2.0 $(sudo maas-region apikey --username admin)
maas admin rack-controllers read | grep hostname | cut -d '"' -f 4

# actuall we didn't enable maas-dns service in this test
sudo systemctl list-unit-files --type=service | grep maas-dhcp
ubuntu@maas:~$ cat /etc/resolv.conf 
nameserver 192.168.99.1
#nameserver 192.168.100.3
#search maas
ubuntu@maas2:~$ cat /etc/resolv.conf 
nameserver 192.168.99.1
#nameserver 192.168.100.4
#search maas

# the following error is because I accidentally created recovery.conf on the master server instead of the standby.
2019-06-11 15:25:05 JST [15474-1] maas@maasdb ERROR:  cannot execute INSERT in a read-only transaction
2019-06-11 15:31:44 JST [18618-1] maas@maasdb ERROR:  cannot execute LISTEN during recovery

# present problem
1, can't add the second rack-controller
2, there is maas-dns and maas-dns ha
3, lots of errors - django.db.utils.InternalError: cannot execute UPDATE in a read-only transaction

sudo maas-region dbshell --installed
maasdb=# \x
maasdb=# select system_id, hostname from maasserver_node;
maasdb=# \?
maasdb=# \l
maasdb=# \c maasdb
maasdb=# \dtq

附录 - Postgres HA replication mode

postgres HA的replication方式有3种:

  1. 基于日志的复制, master完完WAL日志后再发给slave,
    这样如果还没写完日志master就宕机的话未发的日志内的事务将全部丢失
    2, 异步流模式,master库以stream的模式向slave发日志, 不需要等待整个日志填充完毕再发大大降低了丢失数据的风险.但在master提交事务之后standby等待流数据时发生宕机也会导致最后一个事务丢失. 同时备库可以配置成host standby模式向外提供查询服务供分担负载.
    3, 流同步复制模式(synchronous replication),流复制的同步版本, 向master发生commit命令之后, 该命令会被阻塞,直到WAL日志流在所有被配置为同步节点(synchronous_standby_names)的数据库上提交后才会真正提交.因为只有master库和standby库同时宕机才会丢数据. 多层事务嵌套时,子事务不受此保护,只有最上层事务受此保护。纯读操作和回滚不受此影响。同时备库可以配置成HOT Standby,可以向外提供查询服务,供分担负载。采用这种模式的性能损耗依据网络情况和系统繁忙程度而定,网络越差越繁忙的系统性能损耗越严重。

Debug MaaS

#read maas db
grep -r 'database' /etc/maas/regiond.conf
#sudo maas-region dbshell --installed
sudo -iu postgres psql -d template1 -U postgres

# debug maas - https://github.com/maas/maas/blob/master/HACKING.rst
systemctl stop maas-regiond
# disable any log as much as possible
/usr/bin/python3 /usr/sbin/regiond --debug --workers 1

systemctl stop maas-rackd
setcap cap_net_bind_service=+eip /usr/sbin/rackd
/bin/rm -f /var/lib/maas/dhcpd.sock && /bin/rm -f /var/lib/maas/dhcpd.conf && /bin/rm -f /var/lib/maas/dhcpd6.conf && sleep 1 && LOGFILE=/var/log/maas/rackd.log prometheus_multiproc_dir=/var/lib/maas/prometheus /usr/bin/python3 /usr/sbin/rackd --nodaemon

enable debug log - https://discourse.maas.io/t/running-installed-maas-in-debug-logging-mode/168

20200907更新 - maas network_discovery

sudo add-apt-repository ppa:maas/2.7  #2.7 only worked on bionic
sudo apt install -y maas
sudo maas init --admin-username admin --admin-password password --admin-email admin@example.com --admin-ssh-import zhhuabj
sudo maas-region apikey --username=admin > ~/admin-api-key
curl http://192.168.2.111:5240/MAAS
apikey=$(cat ~/admin-api-key)
maas login admin http://192.168.2.111:5240/MAAS $apikey
maas login admin http://127.0.0.1:5240/MAAS $apikey
maas admin boot-resources import

$ maas admin discoveries read |jq ".[] | {ip:.ip, mac_address:.mac_address}" --compact-output
{"ip":"192.168.100.1","mac_address":"52:54:00:14:3a:d4"}
{"ip":"192.168.100.77","mac_address":"52:54:00:f8:9b:2b"}
maas admin discoveries clear-by-mac-and-ip ip=<IP> mac=<MAC>

#delete from maasserver_interface_ip_addresses where staticipaddress_id=64548 and id=64188;
#delete from maasserver_staticipaddress where id=64548 and ip='192.168.128.15';

#http://127.0.0.1:5240/MAAS/r/settings/network/network-discovery (Network discovery -> Active subnet mapping interval)
maas admin  maas get-config name=network_discovery
maas admin discoveries clear all=True
maas admin maas set-config name=network_discovery value=disabled
maas admin maas get-config name=network_discovery
maas admin discoveries read

20200915更新 - maas 2.8中使用centos 8

maas中使用centos的主要障碍是https://code.launchpad.net/~ltrager/curtin/+git/curtin/+merge/374335,但这个目前这个代码已经在UA了,只是maas 2.8的UI还没有允许让你使用centos 8而已,可以这样使用centos 8
但是限制是:

  • maas UI不支持上传centos8镜像(原因是因为这个还没有-http://images.maas.io/ephemeral-v3/daily/streams/v1/com.ubuntu.maas:daily:centos-bases-download.json),只是api可以(通过curtin)
  • maas-image-builder不支持做centos8镜像,似乎packer-maas可以,或者自定义(https://medium.com/@kemnitz.stefan/centos-8-via-maas-9ffcc6c7a22d)
sudo add-apt-repository ppa:maas/2.8
sudo apt update
sudo apt install maas -y
sudo maas init --admin-username admin --admin-password password --admin-email admin@example.com --admin-ssh-import zhhuabj
sudo maas-region apikey --username=admin > ~/admin-api-key
apikey=$(cat ~/admin-api-key)
maas login admin http://127.0.0.1:5240/MAAS $apikey

# Once Curtin has been upgraded the image can be uploaded to MAAS via the API
#https://code.launchpad.net/~ltrager/curtin/+git/curtin/+merge/374335
sudo apt install curtin=20.1-2-g42a9667f-0ubuntu1~18.04.1 -y
sudo systemctl stop maas-rackd && sudo systemctl restart maas-regiond && sudo systemctl start maas-rackd
axel https://people.canonical.com/~zhhuabj/centos8.tar.gz
maas admin boot-resources create name='centos 8' title='centos 8' architecture='amd64/generic' filetype='tgz' content@=centos8-1.0.2-7-gac521bb.tar.gz
但似乎maas-image-builder还不支持做centos8的镜像,但可参考这篇文章做 - https://medium.com/@kemnitz.stefan/centos-8-via-maas-9ffcc6c7a22d
还有一个工具叫packer-maas似乎也能做,见- https://manintheit.org/bash/creating-a-image-for-maas-with-packer/
$ sudo maas-image-builder -o centos8-amd64-root-tgz --arch amd64 centos --edition 8
...
mib.builders.BuildError: Unknown CentOS edition: 8.

$ grep -r 'grub2' /usr/lib/curtin/helpers/common |grep 8
               7|8) grub_name="grub2-pc";;
                        7|8) grubcmd="grub2-install"
使用packer-maas创建centos8镜像如下:
# https://github.com/canonical/packer-maas/tree/master/centos8
sudo apt install packer
# fix bug - https://github.com/canonical/packer-maas/issues/2
wget https://releases.hashicorp.com/packer/1.6.2/packer_1.6.2_linux_amd64.zip
unzip packer_1.6.2_linux_amd64.zip && sudo cp ./packer /usr/bin/ && packer --version
git clone https://github.com/canonical/packer-maas.git
cd packer-maas/centos8  #must be in cetnos8 subdir
#change url in centos8.json to http://mirrors.aliyun.com/centos/8.2.2004/isos/x86_64/CentOS-8.2.2004-x86_64-boot.iso
sudo PACKER_LOG=1 packer build centos8.json  #but will hit this bug - https://github.com/canonical/packer-maas/issues/2
maas admin boot-resources create name='centos/8-custom' title='CentOS 8 Custom' architecture='amd64/generic' filetype='tgz' content@=centos8.tar.gz  #default username is cloud-user

然后可以设置Deploy的默认镜像(commission时只能用ubuntu镜像), 也可以在Deploy时选择用centos镜像, 但commission似乎只能使用ubuntu镜像. deploy日志可见: /var/log/maas/rsyslog/test/2020-09-25/messages
最终可以通过: 'ssh cloud-user@192.168.100.2’访问deploy后的centos8 机器. 如果不是deploy的而是救援模式进去的可以"ssh ubuntu@192.168.100.2 -v", 注意: 救援模式只是commisstion不是deploy所以也是无法选择image的

[cloud-user@test ~]$ uname -a
Linux test.maas 4.18.0-193.14.2.el8_2.x86_64 #1 SMP Sun Jul 26 03:54:29 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

在这里插入图片描述
但这个镜像似乎在uefi虚机时不work, 创建测试uefi虚机的步骤如下:

sudo apt install ovmf -y  #should install ovmf in node1 rather than t440p
sudo systemctl restart libvirtd
virt-manager --debug      #run it in t440p, then connect to node1
Start creating a new VM in virt-manager, but before finishing, click "Customize configuration before install"
Change the Firmware Option from BIOS to EUFI in 'Overview' tab. (If it's not available do a systemctl restart libvirtd)

debug curtin

https://gist.github.com/smoser/2610e9b78b8d7b54319675d9e3986a1b
在maas中查看日志/var/log/maas/rsyslog/test2/2020-09-28/messages, 看到的错误如下:

curthooks -> builtin_curthooks -> setup_grub -> install_grub(instdevs, target, uefi=uefi_bootable, grubcfg=grubcfg) -> 

unshare --fork --pid -- chroot /tmp/tmp6yj0ryyc/target grub2-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=centos --recheck --no-nvram

Stderr: grub2-install: error: /usr/lib/grub/x86_64-efi/modinfo.sh doesn't exist.

/usr/lib/grub/x86_64-efi/modinfo.sh 不存在, 这个网页https://bugzilla.redhat.com/show_bug.cgi?id=1101352 说:
Looks like this is intentional to avoid the fallout of unsuspecting users running grub2-install. To regain ability to grub2-install on EFI, install package grub2-efi-modules.
所以是需要安装grub2-efi-modules模块. 下面方法检查image中检查确认确实没有这个包:

#sudo guestmount -a /var/lib/libvirt/images/test2.qcow2  -i --rw ./root2
sudo modprobe nbd
sudo qemu-nbd --connect=/dev/nbd0 /var/lib/libvirt/images/test2.qcow2 -f qcow2
sudo fdisk /dev/nbd0 -l
sudo qemu-nbd --disconnect /dev/nbd0
$ sudo fdisk /dev/nbd0 -l |grep dev
Disk /dev/nbd0: 20 GiB, 21474836480 bytes, 41943040 sectors
/dev/nbd0p1    2048  1050623  1048576  512M EFI System
/dev/nbd0p2 1050624 41928703 40878080 19.5G Linux filesystem

cd /tmp && mkdir {boot,root}
sudo mount /dev/nbd0p1 ./boot/
sudo mount /dev/nbd0p2 ./root/

$ ls ./root/usr/lib/grub/
i386-pc

$ sudo dpkg -L grub-efi-amd64-bin |grep modinfo.sh
/usr/lib/grub/x86_64-efi/modinfo.sh
$ sudo apt-file search modinfo.sh |grep amd64
grub-efi-amd64-bin: /usr/lib/grub/x86_64-efi/modinfo.sh

先试了使用kickstart定制包, 但不好使: https://manintheit.org/bash/creating-a-image-for-maas-with-packer/

sudo PACKER_LOG=1 HTTPIP=127.0.0.1 HTTPPort=8000 packer build centos8.json

然后用下列方法重新打包, 在image中添加这个包:

mkdir /tmp/centos8 && sudo tar -xf centos8.tar.gz -C /tmp/centos8/
MOUNTDIR=/tmp/centos8
for d in dev sys proc; do sudo mount --bind /$d ${MOUNTDIR}/$d; done
sudo mv ${MOUNTDIR}/etc/resolv.conf ${MOUNTDIR}/etc/resolv.conf.bak && sudo cp /etc/resolv.conf ${MOUNTDIR}/etc/


#sudo chroot $MOUNTDIR bash
sudo chroot $MOUNTDIR yum update
#install grub2-efi-x64-modules instead of grub2-efi-modules to avoid grub2-efi-aa64-modules
sudo chroot $MOUNTDIR yum install grub2-efi-x64-modules -y
sudo chroot $MOUNTDIR ls /usr/lib/grub/x86_64-efi/modinfo.sh
sudo chroot $MOUNTDIR yum list --installed |grep -E 'shim|efi'

sudo umount $MOUNTDIR/{proc,dev,sys,}
sudo mv $MOUNTDIR/etc/resolv.conf.bak $MOUNTDIR/etc/resolv.conf
sudo tar -czf centos8.tar.gz -C $MOUNTDIR .

现在之前的错误没了, 又出现下列新错误:

Command: ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmphc1fr7hk/target', 'dracut', '-f', '/boot/initramfs-4.18.0-193.14.2.el8_2.x86_644.18.0-193.19.1.el8_2.x86_64.img', '4.18.0-193.14.2.el8_2.x86_644.18.0-193.19.1.el8_2.x86_64']        Exit code: 1        Reason: -        Stdout: ''        Stderr: dracut: Cannot find module directory /lib/modules/4.18.0-193.14.2.el8_2.x86_644.18.0-193.19.1.el8_2.x86_64/                dracut: and --no-kernel was not specified

那是因为刚刚升级了centos(不应该升级)导致有两个内核了

bash-4.4# rpm -q kernel
kernel-4.18.0-193.14.2.el8_2.x86_64
kernel-4.18.0-193.19.1.el8_2.x86_64

bash-4.4# rpm -q --queryformat %{VERSION}-%{RELEASE}.%{ARCH} kernel
4.18.0-193.14.2.el8_2.x86_644.18.0-193.19.1.el8_2.x86_64

dracut -f /boot/initramfs-4.18.0-193.14.2.el8_2.x86_644.18.0-193.19.1.el8_2.x86_64.img 4.18.0-193.14.2.el8_2.x86_644.18.0-193.19.1.el8_2.x86_64

这个修改后, 又遇到image缓存了, 之后, 这些问题没有了, 但进系统时总是进入mergency mode, 原因是initrd-switch-root.service服务启动失败, 是因为找不到硬盘.
最后是这样解决的, 修改packer-maas/centos8/http/centos8.ks

#grub2-efi-x64
efibootmgr
#shim-x64
grub2-efi-x64-modules

之前packer-maas测试是安装非签名版grub2-efi-x64-modules时不能安装签名版的grub2-efi-x64与shim-x64, 但用下列实验怎么又可以同时安装.

#https://www.cnblogs.com/ricksteves/p/11623681.html
yum install grub2-efi-x64 shim-x64 grub2-efi-x64-modules -y  #bug - https://bugzilla.redhat.com/show_bug.cgi?id=1201220
grub2-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=centos --recheck --no-nvram
grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg  #soft link to /etc/grub2-efi.cfg

恢复maas ha cluster

Assume my test env have 3 nodes (31, 32, 33), here are my steps:

1, stop all db and maas service in 31 and 32 and 33
systemctl stop corosync && systemctl stop pacemaker
systemctl stop maas-*

2, start db in 31
rm -rf /var/lib/pgsql/tmp/PGSQL.lock > /dev/null 2>&1 && systemctl restart corosync && systemctl restart pacemaker

3, start db in 32
rm -rf /var/lib/pgsql/tmp/PGSQL.lock > /dev/null 2>&1 && systemctl restart corosync && systemctl restart pacemaker

At this time, 31 will have db_vip and maas_vip by checking 'crm status' and 'ip addr show'.

4, start maas services in 31
systemctl restart maas-regiond && systemctl restart maas-rackd && systemctl restart maas-proxy && systemctl restart maas-dhcpd6 && systemctl restart maas-dhcpd

At this time, veriftied maas ui by 'lynx http://10.5.0.15:5240/MAAS', it works.

5, start maas services in 32
systemctl restart maas-regiond && systemctl restart maas-rackd && systemctl restart maas-proxy && systemctl restart maas-dhcpd6 && systemctl restart maas-dhcpd

At this time, veriftied maas ui, it works.

6, start maas services 33 as well
rm -rf /var/lib/pgsql/tmp/PGSQL.lock > /dev/null 2>&1 && systemctl restart corosync && systemctl restart pacemaker
systemctl restart maas-regiond && systemctl restart maas-rackd && systemctl restart maas-proxy && systemctl restart maas-dhcpd6 && systemctl restart maas-dhcpd

At this time, veriftied maas ui, it works.

20220307更新
上面的方法适应于3个节点的db都是好的,至少重启corosync能让db vip回来.但下面升级时的顺序不一样会导致只是重启corosync让db vip回不来(恢复maas cluster的关键是先恢复db cluster)
例如:3个maas节点(maas111, maas222, maas333),maas-region-controller在maas111上,本来感觉正常的升级顺序是:

  • 先升级有maas-region-controller的节点,这里是maas111, 此时,db vip假设到了maas222, 并且也保证了db migration成功
  • 然后应该升级当时的db vip节点(也就是上步假设的maas222), 这样db vip估计能回到maas111
  • 最后升级maas333, 它至少能连上db vip. 只要能连上db vip, maas cluster容易恢复.

问题的状态是maas222与maas333先升级到bionic,maas111没有升级还在xenial, 由于maas-region-controller 在maas111上导致maas222与maas333在升级时报了db migration错误,现在db vip与maas vip都在maas111上.现在要直接升级maas111也会出错(因为没有任何maas vip了,maas111上的maas-region-controller在db migration时也会报错的)

#stop the maas and postgresql services on maas222 and maas333
systemctl stop maas-* postgresql
#on any node (maas222 or maas333), stop the pgsql RA
sudo crm resource stop ms_pgsql

#highest number and most recent timestamp wins (master)
sudo ls -la  /var/lib/postgresql/10/main/pg_wal
#maas111才是master, 但现在maas222,maas333已经和maas111分裂了,客户不想maas111 down,所以选maas2作为master(都是03C会有数据丢失?)
#其实'sudo crm resource stop ms_pgsql'也会造成maas111上的postgres停止,所以
#on maas111
-rw-------  1 postgres postgres 16777216 Mar  3 09:57 00000001000000000000003B
-rw-------  1 postgres postgres 16777216 Mar  3 11:08 00000001000000000000003C
#on maas222
-rw-------  1 postgres postgres 16777216 Mar  3 09:47 00000001000000000000003B
-rw-------  1 postgres postgres 16777216 Mar  3 09:07 00000001000000000000003C
#on maas333
-rw-------  1 postgres postgres 16777216 Mar  3 08:23 00000001000000000000003B
#on the slaves (maas333)
cd /var/lib/postgresql/10/
mv main main.bak.`date +'%Y-%m-%d'`
#on the master (maas222)
cd /var/lib/postgresql/10/
cp -pr main main.bak.`date +'%Y-%m-%d'`

#on the master
1, restart postgresql
   sudo systemctl start postgresql@10-main.service
   sudo systemctl status postgresql@10-main.service
2, Ensure the VIP is present
   sudo crm config show res_pgsql_vip  #db vip is 10.5.150.115 here
   sudo ip a add 10.5.150.115 dev ens3
3, Promote the node to master:
   sudo -iu postgres
   export PGDATA=/var/lib/postgresql/10/main
   /usr/lib/postgresql/10/bin/pg_ctl promote

#on the slave
sudo -u postgres pg_basebackup -X stream -h 10.5.150.115 -U postgres -D /var/lib/postgresql/10/main/ --progress --verbose -c fast

#on any maas node, run:
sudo crm resource start ms_pgsql
sudo crm resource cleanup pgsql

Node List:
  * Online: [ maas222 maas333 ]
  * OFFLINE: [ maas111 ]
Full List of Resources:
  * Clone Set: ms_pgsql [pgsql] (promotable):
    * pgsql     (ocf::heartbeat:pgsql):  Master maas222 (Monitoring)
    * Slaves: [ maas333 ]
    * Stopped: [ maas111 ]
  * res_pgsql_vip       (ocf::heartbeat:IPaddr2):        Started maas222
  * res_maas_vip        (ocf::heartbeat:IPaddr2):        Started maas222

#restart maas on maas222 maas333
sudo systemctl start maas-regiond.service && sudo systemctl start maas-rackd.service
sudo systemctl status maas-regiond.service && sudo systemctl status maas-rackd.service

#之前maas111在从xenial在运行下列命令升级bionic的过程中由于maas-reigion-controller要连db vip来做db migration所以当时失败了.
juju upgrade-series 0 prepare bionic
juju ssh 0 -- sudo do-release-upgrade -f DistUpgradeViewNonInteractive
juju upgrade-series 0 complete
#现在db vip在maas222上恢复了,可以就可以正常运行'apt --fix-broken install'了呢?答案是No,因为之前maas111是db vip,
maas连的是localhost:5432 (https://github.com/maas/maas/blob/2.4.2/debian/maas-region-controller.postinst#L113),

在debian maas中修改database-host的方法如下:
maas-region local_config_set --database-host=10.5.150.115
maas-region local_config_get --database-host
而在snap maas中可以通过下列方法修改, 因为/snap/maas/8724/bin/maas-region将调用/snap/maas/8724/lib/python3.6/site-packages/maasserver/region_script.py来为django设置snap的环境(DJANGO_SETTINGS_MODULE)
snap run --shell maas
/snap/maas/8724/bin/maas-region local_config_set --database-host=10.5.150.115
/snap/maas/8724/bin/maas-region local_config_get --database-host
但是对于bionic即使做上面的设置也是不行的.因为bionic没有直接使用deb,也没有直接使用snap, 而是默认通过snap-transition这个deb包来包装使用snap.在snap-transition/debian/maas.preinst会先删除maas snap再重新安装maas snap,所以上面的database-host在新安装的snap maas中仍然会消失.可能需要重新修改snap-transition在重新安装snap maas后马上运行上面命令吧,但也不容易,因为上面的命令似乎不能在一行中执行:
# snap run --shell maas./snap/maas/8724/bin/maas-region local_config_get --database-host
error: cannot find app "/snap/maas/8724/bin/maas-region" in "maas"

sudo rm -rf /var/lib/dpkg/info/maas-rack-controller.postrm
sudo apt remove maas-rack-controller --purge

20211206 - debug maas with ipmi

maas中的机器生命周期 - https://maas.io/docs/snap/3.1/ui/about-machines#heading–about-the-machine-life-cycle
1, Enlist Phase - 机器经PXE启动后, 通过commissioning scripts之后(eg: 20-maas-01-install-lldpd),会变成New状态. 这时也在在Machines Tab中看到用Mac定义的machine. 这个过程主要是PXE进行dhcp连接,并从tfp下载pxelinux0, pxelinux.cnf, vmlinuz, initrd.img等, cloud-init然后要运行commissioning scripts(也叫enlistment scrips, 用于搜集节点的architecture, mac等重要信息发给region server存入DB, 这些信息的收集即为MAAS自动发现)
2, Commission Phase - 这些New状态的机器(mac地址的)做power setting(eg: IPMI)和network setting(eg: 给网卡关联subnet)之后可以选择commission它, 这样它会从commissioning到commissioned状态;如果之前就人工定义machine(如指定ipmi)会自动commissioned. commissioning阶段会发现机器的网络拓扑,然后machine network interface去连接fabric, vlan, subnet配置,并且assigns a static IP. 这个过程主要是initrd要加载squashfs rootfs(HTTP传输数据), cloud-init然后要运行commission scripts(commmission scripts用于与region server交互,确认节点的一些操作都是按照指令进行,确保deploy将会成功执行)
3, Deploy Phase - commissioned成功的机器可以acquired或者deployed, commissioned失败的机器会标记marked broken. 进行deploy前确认ubuntu kernel/kernel boot option/ssh keys等.deploy后initrd要加载squashfs, cloud-init要运行curtin installation script
4, deployed的机器可以release它

在裸机node1上创建三个虚机, 虚机使用一个名为cloud的network (192.168.100.0/24, without dhcp):

  • maasdev=192.168.100.3, 通过netplan设置IP为192.168.100.3
  • 创建两个pxe启动虚机(maastestnode3, maastestnode4), pxe启动的要点是一个空硬盘加pxe启动设置,然后让它们通过pxe启动一次这样在maas中就会通过mac创建两台New状态的machine记录. (但是这里有一个问题,这里还没有和ipmi)关联)
  • node1 上安装ipmi
  • 将maastestnode3注册到impi
 /opt/vbmc/bin/vbmc add maastestnode --port 6003 --address 192.168.100.1 --username admin --password password
/opt/vbmc/bin/vbmc list
ipmitool -I lanplus -H 192.168.100.1 -U admin -P password -p 6003 power status

第一步,maasdev使用netplan配置IP=192.168.100.3,同时将nameservers也指向它:

cat <<EOF | sudo tee /etc/netplan/01-netcfg.yaml
network:
  version: 2
  renderer: networkd
  ethernets:
    enp1s0:
      dhcp4: no
      addresses:
      - 192.168.100.3/24
      gateway4: 192.168.100.1
      nameservers:
        addresses:
        - 192.168.100.3
EOF
sudo netplan apply

第二步,maasdev上安装maas,本来之前想通过源码生编译生成snap安装但有个编译错误,所以后来通过deb包安装。

# on maasdev - https://github.com/maas/maas/blob/master/HACKING.rst
git clone https://git.launchpad.net/maas && maas
git checkout -b 3.0.0 3.0.0
#sudo apt install make -y
#make install-dependencies #Postgres, isc-dhcp, bind9 etc
#ls src/maasserver/djangosettings/development.py
#make && make syncdb && ls db/ && make sampledata

# using snap instead
# it's in a plain directory insted of in a squashfs image, so you modify source code
make clean
#make snap-prime
sudo snap try build/dev-snap/prime
utilities/connect-snap-interfaces
sudo maas init
make sync-dev-snap  #modify the source code
sudo service snap.maas.supervisor restart
# but it has the following error
cp: cannot stat 'src/production-html-snap/*': No such file or directory

# so we use debian instead - https://launchpad.net/~maas
sudo add-apt-repository ppa:maas/3.0
sudo apt update
sudo apt install maas -y

sudo maas createadmin
apikey=$(sudo maas apikey --username admin)
maas login admin http://192.168.100.3:5240/MAAS $apikey
maas admin boot-resources import

#maas-regiond, maas-rackd, maas-dhcpd, maas-proxy, maas-http, maas-syslog
sudo systemctl status maas-*
#need to configure dhcp for subnet for maas-dhcpd error - ConditionPathExists=/var/lib/maas/dhcpd-interfaces was not met
ssh node1 -X
#then access http://192.168.100.3:5240/MAAS to enable dhcp for 192.168.100.0/24 on 'subnet' TAB.
firefox &

第三步,创建两个pxe虚机(maastestnode3, maastestnode4). 首先需要一块空硬盘((truncate --size 10G /images/kvm/maastestnode3.raw), 其次需要设置为pxe启动. (在virt-manager里创建一个名为maastestnode3的新虚机(设置从PXE启动、使用truncate定义的空磁盘,同时定义一块cloud的网卡)。 最后需要从pxe启动一次,这样maas里就通过mac生成了一个New状态的machine. 另外,关掉虚机后记得将启动模式改回pxe

ubuntu@maasdev:~$ maas admin machines read | jq '.[] | {hostname:.hostname,system_id: .system_id,status:
.status_name,ip_addresses: .ip_addresses, node_type_name:.node_type_name, testing_status:.testing_status, commissioning_
status:.commissioning_status}' --compact-output^C
ubuntu@maasdev:~$ maas admin machines read | jq '.[] | {hostname:.hostname,system_id: .system_id,status:.status_name,ip_addresses: .ip_addresses, node_type_name:.node_type_name, testing_status:.testing_status, commissioning_status:.commissioning_status}' --compact-output
{"hostname":"upward-tiger","system_id":"gyyapc","status":"New","ip_addresses":[],"node_type_name":"Machine","testing_status":-1,"commissioning_status":2}
{"hostname":"bold-parrot","system_id":"gr64df","status":"New","ip_addresses":[],"node_type_name":"Machine","testing_status":-1,"commissioning_status":2}

第四步,node1上设置impi

sudo -i
apt install python3-pip python3-dev gcc libvirt-dev ipmitool python3-virtualenv -y
python3 -m virtualenv --system-site-packages --download /opt/vbmc
/opt/vbmc/bin/pip install virtualbmc
cat << EOF | sudo tee -a /etc/systemd/system/vbmcd.service
[Install]
WantedBy = multi-user.target
[Service]
BlockIOAccounting = True
CPUAccounting = True
ExecReload = /bin/kill -HUP $MAINPID
ExecStart = /opt/vbmc/bin/vbmcd --foreground
Group = root
MemoryAccounting = True
PrivateDevices = False
PrivateNetwork = False
PrivateTmp = False
PrivateUsers = False
Restart = on-failure
RestartSec = 2
Slice = vbmc.slice
TasksAccounting = True
TimeoutSec = 120
Type = simple
User = root
[Unit]
After = libvirtd.service
After = syslog.target
After = network.target
Description = vbmc service
EOF
systemctl enable vbmcd
systemctl restart vbmcd
virsh list -all
/opt/vbmc/bin/vbmc add maastestnode3 --port 6003 --address 192.168.100.1 --username admin --password password
/opt/vbmc/bin/vbmc add maastestnode4 --port 6004 --address 192.168.100.1 --username admin --password password
/opt/vbmc/bin/vbmc list
/opt/vbmc/bin/vbmc start maastestnode3
/opt/vbmc/bin/vbmc start maastestnode4
/opt/vbmc/bin/vbmc show maastestnode |grep running
ipmitool -I lanplus -H 192.168.100.1 -U admin -P password -p 6003 power status
ipmitool -I lanplus -H 192.168.100.1 -U admin -P password -p 6004 power status
exit

第五步,这问题似乎与上面的ipmi这一步没关系。这一步用rpdb和log来调试

# debug API
sudo sed -i 's/DEBUG = False/DEBUG = True/g' /usr/lib/python3/dist-packages/maasserver/djangosettings/settings.py
vim /usr/lib/python3/dist-packages/maasserver/api/machines.py#AnonMachinesHandler#create
import rpdb;rpdb.set_trace()
sudo systemctl stop maas-regiond
#/usr/bin/python3 /usr/sbin/regiond --debug --workers 1
sudo -u maas -H DJANGO_SETTINGS_MODULE=maasserver.djangosettings.settings /usr/bin/python3 /usr/sbin/regiond --debug --workers 1
sudo pip3 install rpdb
nc 127.0.0.1 4444

maaslog.info("zhhuabj: power_type: %s  request.data: %s", power_type, str(request.data))

debug snap maas

snap download maas --channel=3.0/stable
unsquashfs ./maas_*.snap
sudo snap try ./squashfs-root/ --devmode
vim squashfs-root/lib/python3.8/site-packages/maasserver/api/machines.py
ls /snap/maas/x1/lib/python3.8/site-packages/maasserver/api/machines.py

debug maas db code

这个程序在非snap环境下运行良好。

#!/usr/bin/env python                                                           
# coding=utf-8                                                                  

import os, sys, django
#sudo -u maas -H DJANGO_SETTINGS_MODULE=maasserver.djangosettings.settings /usr/bin/python3 ./maasdb.py
#os.environ.setdefault("DJANGO_SETTINGS_MODULE", "maasserver.djangosettings.settings")
django.setup()

from maasserver.models import Interface                                         
from maasserver.enum import NODE_STATUS                                         
from maasserver.enum import NODE_TYPE                                           

def main(mac):                                                                     
    interfaces = Interface.objects.filter(                                       
        mac_address__in=[mac],                                          
        node__node_type=NODE_TYPE.MACHINE,                                      
        node__status__in=[                                                      
            NODE_STATUS.NEW,                                                    
            NODE_STATUS.COMMISSIONING,                                          
        ],                                                                      
    )
    interface = interfaces.first()
    print("interface: %s" % interface)                                                                                                                                                                  
    if interface is not None:                                                   
        node = interface.node.as_self()                                         
        print("node.hostname: %s" % node.hostname)                                                             
        print("node: %s" % node)                                                             
                                                                                
if __name__ == '__main__':                                                      
    if len(sys.argv) < 2:
        print("Usage for deb: sudo -u maas -H DJANGO_SETTINGS_MODULE=maasserver.djangosettings.settings /usr/bin/python3 ./maasdb.py <MAC>")
        print("Usage for snap: sudo -u maas -H DJANGO_SETTINGS_MODULE=maasserver.djangosettings.snap /usr/bin/python3 ./maasdb.py <MAC>")
    else:
        print("args: %s" % sys.argv[1])
        main(sys.argv[1])

但是在snap环境下无法运行:
/snap/maas/x1/bin/python3 /tmp/test.py aa

#!/usr/bin/env python
# coding=utf-8

import sys
sys.path.append('/snap/maas/current/usr/lib/python3/dist-packages')
sys.path.append('/snap/maas/current/usr/lib/python3.8/dist-packages')
import os
os.environ.setdefault("SNAP", "/snap/maas/current")
os.environ.setdefault("SNAP_DATA", "/var/snap/maas/current")
os.environ.setdefault("SNAP_COMMON", "/var/snap/maas/common")
#os.environ.setdefault("DJANGO_SETTINGS_MODULE", "maasserver.djangosettings.settings")
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "maasserver.djangosettings.snap")
import django
django.setup()

from maasserver.models import Interface
from maasserver.enum import NODE_STATUS
from maasserver.enum import NODE_TYPE

def main(mac):
    interfaces = Interface.objects.filter(
        mac_address__in=[mac],
        node__node_type=NODE_TYPE.MACHINE,
        node__status__in=[
            NODE_STATUS.NEW,
            NODE_STATUS.COMMISSIONING,
        ],
    )
    interface = interfaces.first()
    print("interface: %s" % interface)                                                                                                                                                 
    if interface is not None:
        node = interface.node.as_self()
        print("node.hostname: %s" % node.hostname)
        print("node: %s" % node)

if __name__ == '__main__':
    if len(sys.argv) < 2:
        print("Usage for deb: sudo -u maas -H DJANGO_SETTINGS_MODULE=maasserver.djangosettings.settings /usr/bin/python3 ./maasdb.py <MAC>")
        print("Usage for snap: /snap/maas/current/bin/python3 ./maasdb.py <MAC>")
    else:
        print("args: %s" % sys.argv[1])
        main(sys.argv[1])

报的错是:

  File "/snap/maas/x1/usr/lib/python3/dist-packages/django/db/backends/postgresql/base.py", line 178, in get_new_connection
    connection = Database.connect(**conn_params)

尝试debug它无果(必须运行sudo service snap.maas.supervisor restart才能进断点,运行“./squashfs-root/bin/python3 /tmp/test.py aa”无法进断点)。

sudo unsquashfs -d squashfs-root /var/lib/snapd/snaps/maas-*.snap
sudo unsquashfs -d maas-cli /var/lib/snapd/snaps/maas-cli_13.snap
sudo snap remove maas
sudo snap remove maas-cli
sudo snap try ./squashfs-root/ --devmode
sudo snap try ./maas-cli/ --devmode

# using existing maasdb
sudo /snap/bin/maas init region+rack --database-uri "postgres://maas:password@localhost/maasdb"

# it will change /snap/maas/current/usr/lib/python3/dist-packages/django/db/backends/postgresql/base.py
vim ./squashfs-root/usr/lib/python3/dist-packages/django/db/backends/postgresql/base.py#get_new_connection
import rpdb;rpdb.set_trace()
# install rpdb to squashfs-root/lib/python3.8/site-packages/rpdb/
sudo ./squashfs-root/bin/python3 -m pip install rpdb

sudo service snap.maas.supervisor restart
./squashfs-root/bin/python3 /tmp/test.py aa
nc 127.0.0.1 4444

似乎程序只能在snap里面跑,先进snap里面(sudo snap run --shell maas)再跑程序还是一样的错。感觉这个测试程序针对snap的就不能这样写。
改成下列就好了,主要是要在调用django.setup()前得先调用snap_setup来设置DJANGO_SETTINGS_MODULE这个环境变量

#!/usr/bin/env python                                                           
# coding=utf-8                                                                  

import sys
sys.path.append('/snap/maas/current/usr/lib/python3/dist-packages')
sys.path.append('/snap/maas/current/usr/lib/python3.8/dist-packages')
import os
os.environ.setdefault("SNAP", "/snap/maas/current")
os.environ.setdefault("SNAP_DATA", "/var/snap/maas/current")
os.environ.setdefault("SNAP_COMMON", "/var/snap/maas/common")
#os.environ.setdefault("DJANGO_SETTINGS_MODULE", "maasserver.djangosettings.settings")
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "maasserver.djangosettings.snap")
def snap_setup():
    # just like ./src/maascli/__init__.py
    if "SNAP" in os.environ:
        os.environ.update(
            {
                "DJANGO_SETTINGS_MODULE": "maasserver.djangosettings.snap",
                "MAAS_PATH": os.environ["SNAP"],
                "MAAS_ROOT": os.environ["SNAP_DATA"],
                "MAAS_DATA": os.path.join(os.environ["SNAP_COMMON"], "maas"),
                "MAAS_REGION_CONFIG": os.path.join(
                    os.environ["SNAP_DATA"], "regiond.conf"
                ),
            }
        )

snap_setup()
import django
django.setup()

from maasserver.models import Interface                                         
from maasserver.enum import NODE_STATUS                                         
from maasserver.enum import NODE_TYPE                                           


def main(mac):
    #import rpdb;rpdb.set_trace()
    interfaces = Interface.objects.filter(                                       
        mac_address__in=[mac],                                          
        node__node_type=NODE_TYPE.MACHINE,                                      
        node__status__in=[                                                      
            NODE_STATUS.NEW,                                                    
            NODE_STATUS.COMMISSIONING,                                          
        ],                                                                      
    )
    interface = interfaces.first()
    print("interface: %s" % interface)                                                                                                                                                                  
    if interface is not None:                                                   
        node = interface.node.as_self()                                         
        print("node.hostname: %s" % node.hostname)                                                             
        print("node: %s" % node)                                                             
                                                                                
if __name__ == '__main__':                                                      
    if len(sys.argv) < 2:
        print("Usage for deb: sudo -u maas -H DJANGO_SETTINGS_MODULE=maasserver.djangosettings.settings /usr/bin/python3 ./maasdb.py <MAC>")
        print("Usage for snap: /snap/maas/current/bin/python3 ./maasdb.py <MAC>")
    else:
        print("args: %s" % sys.argv[1])
        main(sys.argv[1])

最后通过给snap打日志,发现是IPMI分配的IP重复了:

1, backup the file /var/snap/maas/current/regiond.conf

sudo cp /var/snap/maas/current/regiond.conf .

2, Use 'snap try' to install maas on unsquashfs filesystem

sudo unsquashfs -d squashfs-root /var/lib/snapd/snaps/maas_<SNAP_VERSION>.snap
sudo snap remove maas
sudo snap try ./squashfs-root/ --devmode

3, Init maas with the old existing DB

NOTE: pls use your own DB_USER, DB_PASSWD, and DB_NAME, they should be in /var/snap/maas/current/regiond.conf
sudo /snap/bin/maas init region+rack --database-uri "postgres://maas:password@localhost/maasdb"

4, create the file ./squashfs-root/diff - https://pastebin.ubuntu.com/p/5d5DZfDrmZ/plain/

and patch the diff by:

cd ./squashfs-root
patch -p1 < diff

5, restart the service: sudo systemctl restart snap.maas.supervisor

6, monitor the log which has the prefix 'logme'

diff --git a/lib/python3.8/site-packages/maasserver/api/machines.py b/lib/python3.8/site-packages/maasserver/api/machines.py
index ce44474a4..a6f9395e3 100644
--- a/lib/python3.8/site-packages/maasserver/api/machines.py
+++ b/lib/python3.8/site-packages/maasserver/api/machines.py
@@ -1804,6 +1804,8 @@ class AnonMachinesHandler(AnonNodesHandler):
             request.data, "commission", default=False, validator=StringBool
         )
         machine = None
+        maaslog.info("logme: power_type: %s  request.data: %s", power_type, str(request.data))
+        maaslog.info("logme: power_parameters: %s ", power_parameters)
 
         # BMC enlistment - Check if there is a pre-existing machine within MAAS
         # that has the same BMC as a known node. Currently only IPMI is
@@ -1845,6 +1847,9 @@ class AnonMachinesHandler(AnonNodesHandler):
                     ],
                 ).first()
             if interface is not None:
+                maaslog.info("logme: interface is not None")
+                maaslog.info("logme: node: %s", str(interface.node.as_self()))
+                maaslog.info("logme: node.hostname: %s", interface.node.as_self().hostname)
                 machine = self._update_new_node(
                     interface.node.as_self(),
                     architecture,
@@ -1855,7 +1860,10 @@ class AnonMachinesHandler(AnonNodesHandler):
         # If the machine isn't being enlisted by BMC or MAC create a new
         # machine object.
         if machine is None:
+            maaslog.info("logme: machine is None")
+            maaslog.info("logme: request: %s ", request)
             machine = create_machine(request, requires_arch=True)
+            maaslog.info("logme: machine after create_machine: %s ", machine)
 
             if commission:
                 # Make sure an enlisting NodeMetadata object exists if the
                
$ grep logme var/snap/maas/common/log/maas.log
2022-02-10T06:21:35.041129+00:00 phys-maas30-2 maas.api: [info] logme: power_type: ipmi request.data: <QueryDict: {'architecture': ['amd64'], 'mac_addresses': ['e4:43:4b:21:7d:04'], 'commission': ['True'], 'power_type': ['ipmi'], 'power_parameters': ['{"cipher_suite_id": "3", "k_g": "", "mac_address": "54:48:10:FC:99:1E", "power_address": "192.168.0.120", "power_boot_type": "efi", "power_driver": "LAN_2_0", "power_pass": "GcmPGcUKSnL", "power_user": "maas", "privilege_level": "ADMIN"}']}>
2022-02-10T06:21:35.041360+00:00 phys-maas30-2 maas.api: [info] logme: power_parameters: {"cipher_suite_id": "3", "k_g": "", "mac_address": "54:48:10:FC:99:1E", "power_address": "192.168.0.120", "power_boot_type": "efi", "power_driver": "LAN_2_0", "power_pass": "GcmPGcUKSnL", "power_user": "maas", "privilege_level": "ADMIN"}
2022-02-10T06:21:35.050327+00:00 phys-maas30-2 maas.api: [info] logme: machine is None
2022-02-10T06:21:35.050489+00:00 phys-maas30-2 maas.api: [info] logme: request: <WSGIRequest: POST '/MAAS/api/2.0/machines/'>
2022-02-10T06:21:35.724864+00:00 phys-maas30-2 maas.api: [info] logme: machine after create_machine: hc8r4n (quick-wasp)
2022-02-10T06:28:28.853996+00:00 phys-maas30-2 maas.api: [info] logme: power_type: ipmi request.data: <QueryDict: {'architecture': ['amd64'], 'mac_addresses': ['e4:43:4b:21:39:84'], 'commission': ['True'], 'power_type': ['ipmi'], 'power_parameters': ['{"cipher_suite_id": "3", "k_g": "", "mac_address": "4C:D9:8F:0E:FB:68", "power_address": "192.168.0.120", "power_boot_type": "efi", "power_driver": "LAN_2_0", "power_pass": "TgvaKspovMAfJ", "power_user": "maas", "privilege_level": "ADMIN"}']}>
2022-02-10T06:28:28.854088+00:00 phys-maas30-2 maas.api: [info] logme: power_parameters: {"cipher_suite_id": "3", "k_g": "", "mac_address": "4C:D9:8F:0E:FB:68", "power_address": "192.168.0.120", "power_boot_type": "efi", "power_driver": "LAN_2_0", "power_pass": "TgvaKspovMAfJ", "power_user": "maas", "privilege_level": "ADMIN"}

20220321 - Set up debian based maas ha env on xenial by hand

见:https://zhhuabj.blog.csdn.net/article/details/123642038

2020401 - lp bug 1821770

maas正常会检测桥下的网卡(那是虚机用的),但有时候会把如bond0.59误当成桥,这样再去检查bond0.59下的网卡时会报错:
https://bugs.launchpad.net/maas/+bug/1821770

20220829 - custom commissioning scripts

目前有这些内建commissioning scripts(00-maas-01-lshw, 00-mmas-02-virtuality, 00-mmas-03-install-lldpd, 00-mmas-04-list-modaliases, 00-mmas-05-dhcp-unconfiguraed-ifaces, 99-mmas-01-wait-for-lldpd, 99-mmas-02-capture-lldp), 例如创建一个名为99-maas-02-capture-lldp的名字(名字格式应类似命名,这里关乎到执行顺序问题)的自定义脚本:

#!/bin/sh
echo "Disabling LLDP agent on all ports of NICs using i40e driver..."
for cmd in /sys/kernel/debug/i40e/*/command; do echo "lldp stop" > ${cmd}; done

定义它:

maas commissioning-scripts create name=00-maas-01-lldp-i40e-stop content@=./00-maas-01-lldp-i40e-stop
maas commissioning-scripts read

读取该脚本的输出结果(include_output=1, output是base64编码需要解码):

maas admin node-script-result read $system-id current-commissioning include_output=1 |jq '.results|.[]|select(.name=="01-99-maas-02-capture-lldp")'
maas admin node-script-result read $system-id current-commissioning --help
maas admin node-script-result read $system-id current-commissioning include_output=1 

又例如,目前的序列号是由00-maas-01-lshw来定义的,它是解决lshw得出的,这个值与’dmidecode -s system-serial-number’相同, 与’ipmitool fru print’中的FRU Device ID-0也相同,但如果想要使用’ipmitool fru print’中的FRU Device ID-1的话该怎么办呢?

生产环境中部署maas

这个maas环境共有3个12个裸机节点(其中3个作maas HA, 剩余的8个作计算节点).
网络划分如下,

vlan-mgmt       vlan=10  172.21.0.0/24  gw=172.21.0.14 
vlan-mgmt-svr1  vlan=501 172.21.0.16/28 gw=172.21.0.30    (pxe, svr1)
...
vlan-mgmt-svr7  vlan=507 172.21.0.112/28 gw=172.21.0.126   (pxe, svr2)
vlan-ipmi 20             172.16.2.0/23 
vlan-data-svr1      1001 172.21.51.0/24
...
vlan-data-svr7      1001 172.21.57.0/24

3个maas节点的配置如下,BIOS设置中不需要打开PXE

ZXCLOUD R5300 G4X
RAM 256G
SSD 960G SSD SAS Read Intensive *2
HDD None  (两个HDD做RAID 1)
NVME None
Network 2x 1GbE on-board Intel I210  (一个用于vlan-mgmt, 一个用于vlan-ipmi划vlan)
        PCIe NIC-ZTE NS212-Mellanox CX4-2x10G SFP+
        PCIe Intel I350T2 2x1G(第3个MAAS才有)

# BIOS设置
BMC (IPMI) is configured with a static IP address,
“IPMI over LAN” is enabled,
“boot mode” is set as UEFI instead of BIOS,
PXE boot is disabled

计算节点的配置如下,BIOS设置中需要打开PXE

ZXCLOUD R5300 G4X
CPU Varies by machine
RAM Varies by machine
SSD 960G SSD SAS Read Intensive *2
HDD None  (两个HDD做RAID 1)
NVME None
Network 2x 1GbE on-board Intel I210  (一个用于vlan-mgmt[1-7], 一个用于vlan-ipmi划vlan)
        PCIe NIC-ZTE NS212-Mellanox CX4-2x10G SFP+  (bond1, 用于vlan-data-svr[1-7])
        PCIe Intel I350T2 2x1G
# BIOS设置
“IPMI over LAN” is enabled,
BMC (IPMI) is configured with a static IP address,
“boot mode” is set as UEFI instead of BIOS,
“Network boot (PXE boot)” is enabled,
A NIC for PXE booting is set on the top of the boot order.

为3个MAAS节点手工安装操作系统:

Language - English
Keyboard: English (US)
Network configuration
bonding - bondm with eno1 and eno2, LACP mode (maybe called 802.3ad in installer)
IP address (respectively)
Infra1: 172.21.0.1
Infra2: 172.21.0.2
Infra3: 172.21.0.3
subnet: 172.21.0.0/29
Default gateway - 172.21.0.6
DNS
10.2.222.160
10.2.222.161
Search domain - empty
Hostname  (respectively)
Infra1: bmaas-1
Infra2: bmaas-2
Infra3: bmaas-3
Username - ubuntu
Partitioning - “Use an entire disk”, then edit partions to "/" and "/boot/efi"
Proxy: leave empty
Package selection : ssh server

为3个MAAS节点继续手动配置一些基本的配置,如NTP, console, 网络等)

#配置NTP
# Run on all infra nodes
sudo sed -i 's/#NTP=/NTP=10.2.222.214/g' /etc/systemd/timesyncd.conf
sudo sed -i 's/#RootDistanceMaxSec=/RootDistanceMaxSec=15/g' /etc/systemd/timesyncd.conf 
sudo systemctl restart systemd-timesyncd

#设置console访问以便 IPMI console可以使用
$ sudo sed -i \
    -e "s/^\(GRUB_CMDLINE_LINUX=\).*/\
\1\"console=tty0 console=ttyS0,115200n8\"/" \
    /etc/default/grub
$ sudo update-grub

#网络配置(注:这里只是第1台maas节点的配置)
cat /etc/netplan/01-netcfg.yaml
network:
  bonds:
    bondm:
      addresses: [172.21.0.1/28]
      gateway4: 172.21.0.14
      nameservers:
        addresses:
        - 10.2.222.160
        - 10.2.222.161
      interfaces:
        - eno1
        - eno2
      parameters:
        lacp-rate: fast
        mode: 802.3ad
        transmit-hash-policy: layer3+4
        mii-monitor-interval: 100
  ethernets:
    eno1:
      addresses: []
      dhcp4: false
      dhcp6: false
    eno2:
      addresses: []
      dhcp4: false
      dhcp6: false
    ens4f0:
      addresses: []
      dhcp4: false
      dhcp6: false
    ens4f1:
      addresses: []
      dhcp4: false
      dhcp6: false
    ens5f0:
      addresses: []
      dhcp4: false
      dhcp6: false
    ens5f1:
      addresses: []
      dhcp4: false
      dhcp6: false
  version: 2
  vlans:
    bondm.501:
      dhcp4: false
      addresses: [172.21.0.17/28]
      id: 501
      link: bondm
    bondm.502:
      dhcp4: false
      addresses: [172.21.0.33/28]
      id: 502
      link: bondm
    bondm.503:
      dhcp4: false
      addresses: [172.21.0.49/28]
      id: 503
      link: bondm
    bondm.504:
      dhcp4: false
      addresses: [172.21.0.65/28]
      id: 504
      link: bondm
    bondm.505:
      dhcp4: false
      addresses: [172.21.0.81/28]
      id: 505
      link: bondm
    bondm.507:
      dhcp4: false
      addresses: [172.21.0.113/28]
      id: 507
      link: bondm

#环境变量
cat /etc/environment
PATH="/snap/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games"
LC_ALL="en_US.UTF-8"
LANG="en_US.UTF-8"
LANGUAGE="en_US:en"
EDITOR="vim"

#ssh key, 用ubuntu用户来产生key
ssh-keygen -q -N "" -f ~/.ssh/id_rsa
for i in <infra-1-IP> <infra-2-IP> <infra-3-IP>
do 
  ssh-copy-id -o StrictHostKeyChecking=no ubuntu@${i}
done
for i in 1 2 3; do ssh root@INFRA_HOSTNAME$i 'echo "ubuntu ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/99-ubuntu-deploy';done

#计算节点上作IPMI check
sudo apt install freeipmi-tools
ipmipower -D LAN_2_0 -h IPMI_ADDRESS.[1stIP-lastIP] -u IPMI_USER -p IPMI_PASSWORD --stat

#最后重启所有机器
sudo reboot

剩下的可以手动部署maas cluster, 或通过工具部署, 略.
作一些测试和收集一些信息:

sudo apt install jq iperf
for sys_id in $(maas root machines read tags=<TAG> | jq -r .[].system_id)
do
  maas root machine deploy $sys_id osystem=ubuntu distro_series=focal
done
# Install iperf and lldpd on all servers and run iperf
for ip in $(maas root machines read tags=<TAG> | jq -r '.[].interface_set[].links[] | select(.subnet.name | contains("vlan-mgmt-svr")).ip_address')
do
  ssh $ip "sudo apt-get install -y iperf lldpd; iperf -sD"
done
cd validation
for ip in $(maas root machines read tags=<TAG> | jq -r '.[].interface_set[].links[] | select(.subnet.name | contains("vlan-mgmt-svr")).ip_address')
do
  ssh $ip "echo \$HOSTNAME;lldpctl;echo " > lldp_${ip}.output
done
maas login root http://172.21.0.4/MAAS "$(sudo maas apikey --username root)" 
for ip in $(maas root machines read tags=<TAG> | jq -r '.[].interface_set[].links[] | select(.subnet.name | contains("vlan-mgmt-svr")).ip_address')
do
  iperf -c ${ip} -t30 -i5 > ${ip}-iperf.output
done

20221013更新 - Testing maas-image-builder for windows on focal

#No focal channel on the ppa, so create a bionic vm first
uvt-simplestreams-libvirt sync release=bionic arch=amd64
uvt-kvm create \
    --cpu 6 \
    --memory 16384 \
    --disk 64 \
    --unsafe-caching \
    --host-passthrough \
    image-builder \
    release=bionic
uvt-kvm ssh image-builder
cat <<EOF | sudo tee /etc/apt/sources.list.d/maas-image-builder-partners_stable.list
deb https://USERNAME:PASSWORD@private-ppa.launchpadcontent.net/maas-image-builder-partners/stable/ubuntu bionic main
EOF
cat <<EOF | sudo tee /etc/apt/trusted.gpg.d/maas-image-builder-partners_stable.asc
#It requires python 3.6, so also add the deadsnakes ppa when you didn't use bionic VM:
#apt-add-repository ppa:deadsnakes/ppa -y && sudo apt update
sudo apt install maas-image-builder
#https://maas.io/docs/image-builder#bwi
# windows server 2022 - https://www.microsoft.com/en-us/evalcenter/download-windows-server-2022
wget -O SERVER_EVAL_x64FRE_en-us.iso 'https://go.microsoft.com/fwlink/p/?LinkID=2195280&clcid=0x409&culture=en-us&country=US'
sudo maas-image-builder \
    --vcpus 4 \
    --ram 8192 \
    -o windows-server-2022-amd64-root-dd \
    windows \
    --windows-iso SERVER_EVAL_x64FRE_en-us.iso \
    --windows-edition win2022 \
    --uefi
# https://bugs.launchpad.net/maas-image-builder/+bug/1992651
#Then open a VNC viewer, and type “reset” to reboot a VM in the UEFI shell. 
#After that, wait for the msg "Press any key to boot from CD-ROM..." and press any key(!). 
#Otherwise, it will skip the CD-ROM attached, and try other methods like PXE booting and eventually end up with UEFI shell.
maas login admin http://localhost:5240/MAAS "$(sudo maas apikey --username ubuntu)"  # ubuntu is the maas user
maas admin boot-resources create name=windows/win2022 architecture=amd64/generic filetype=ddtgz content@=./windows-server-2022-root-dd 

# kvm image中的两个bug
#Apply workaround to https://bugs.launchpad.net/maas/+bug/1993836 in all maas nodes
sudo wget -O \
/var/snap/maas/common/maas/boot-resources/current/ubuntu/amd64/generic/focal/stable/squashfs https://images.maas.io/ephemeral-v3/stable/focal/amd64/20221010/squashfs
maas root maas set-config name=boot_images_auto_import value=false
#disable bond on boot interface (bondm) then configure the network on eno1
# to apply workaround to https://bugs.launchpad.net/maas/+bug/1992185

20240827 - maas不work的原因

maas只能工作在一个dhcp server(maas-dhcp ,10.0.0.0/24)环境下,但家里环境会有3个dhcp server (IPv4 gw dhcp 192.168.99.0/24 与 IPv6 dhcp), 这样maas node在commission时会随机的从3个dhcp server获得IP从而造成有时无法PXE.

  • 有时候即使IP是10.0.0.0/24网段(同时有IPv6网段的IP不影响)的也无法PXE,可重启’sudo systemctl restart snap.maas.pebble.service’ 作为workaround (maas 3.5之后统一使用pebble管理了所有服务), 看日志也没法单独看了可以这样:sudo journalctl -xe --no-pager -f
  • 必须将192.168.99.0/24与 10.0.0.0/24放在不同的fabric,并且在maas中要为10.0.0.0/24 enable dhcp, 在maas中为192.168.99.0/24 disable dhcp (在物理路由器中是enable dhcp的,在maas中得disable). 但即使这样即使commission成功之后在deploy时有时又会分配192.168.99.0/24的IP导致PXE失败 , 感觉maas 3.5有bug似的。 原因似乎找到了,应该是libvirt中的network (diable dhcp, but enable NAT)仍会有dhcp打通的防火墙. 所以maas机器应该配置两个网卡(一块是192.168.99.0/24这块了用于访问外网, 第二块是没有nat的纯host network, but without dhcp, then use maas-dhcp to provide dhcp, 然后需要将这两个网段设置在不同的fabric中)
$ sudo iptables-save |grep virbr1
-A LIBVIRT_FWI -d 10.0.0.0/24 -o virbr1 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A LIBVIRT_FWI -o virbr1 -j REJECT --reject-with icmp-port-unreachable
-A LIBVIRT_FWO -s 10.0.0.0/24 -i virbr1 -j ACCEPT
-A LIBVIRT_FWO -i virbr1 -j REJECT --reject-with icmp-port-unreachable
-A LIBVIRT_FWX -i virbr1 -o virbr1 -j ACCEPT
-A LIBVIRT_INP -i virbr1 -p udp -m udp --dport 53 -j ACCEPT
-A LIBVIRT_INP -i virbr1 -p tcp -m tcp --dport 53 -j ACCEPT
-A LIBVIRT_INP -i virbr1 -p udp -m udp --dport 67 -j ACCEPT
-A LIBVIRT_INP -i virbr1 -p tcp -m tcp --dport 67 -j ACCEPT
-A LIBVIRT_OUT -o virbr1 -p udp -m udp --dport 53 -j ACCEPT
-A LIBVIRT_OUT -o virbr1 -p tcp -m tcp --dport 53 -j ACCEPT
-A LIBVIRT_OUT -o virbr1 -p udp -m udp --dport 68 -j ACCEPT
-A LIBVIRT_OUT -o virbr1 -p tcp -m tcp --dport 68 -j ACCEPT
但是用iptables是无法block上面的dhcp traffic的,因为:
you can't firewall DHCP!
because dhcpd uses AF_PACKET 
https://news.ycombinator.com/item?id=36894198
they use AF_PACKET for some reason I forget.. to get the interface or something.. it's like a raw socket that you get the data from before iptables is applied
  • 有时候还是无法顺利PXE,就好像是maas 3.5有bug似的pxe不稳定, 经常性的 wget: cann’t 10.0.0.2 image 时timeout
  • 过了PXE之后,commissioning失败了,可能是因为内置的snap maas-proxy出问题了,得在maas gui的proxy页面换成自己的squid . 另外,也可以在machine内部配置apt proxy加快速度

下一步,考虑在一块网卡eth0上做几个vlan(如 vlan11, vlan12)和一个网桥br-eth0(正常网络):

  • vlan11是disable NAT的,也没有dhcp (用于maas时会有maas-dhcp的dhcp), 这个是纯隔离的没有NAT的应该能隔离external router dhcp server的影响
  • vlan12可以做个可以访问外网的’internal’ external 网络(NAT或者policy route)
  • maas itself和maas machines同时拥有vlan11(eth0)与vlan12(eth1)两个网络
  • 在maas itself中要将eth0与eth1用不同的vlan这样有不同的space, 这样应该能代替fabric来隔离external router dhcp server的影响
  • 考虑使用lxd VMs pods intead of KVM来做home lab, 还要考虑用更多的vlan吗?

Reference
[1] https://blog.csdn.net/quqi99/article/details/37990507
[2] https://www.cnblogs.com/aegis1019/p/8870251.html
[3] https://discourse.maas.io/t/minimal-maas-setup/5543
[4] https://gist.github.com/brettmilford/0af6a75011adb2755ff003e5ea999992

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

quqi99

你的鼓励就是我创造的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值