Using sunbeam to deploy openstack (by quqi99)

作者:张华 发表于:2023-10-15
版权声明:可以任意转载,转载时请务必以超链接形式标明文章原始出处和作者信息及本版权声明(http://blog.csdn.net/quqi99)

what is sunbeam

sunbeam是一个部署openstack的工具,它会用juju定义两个clouds(microk8s and sunbeam), microk8s用于部署openstack控制服务(位于openstack model), sunbeam用于部署sunbeam-controller(位于admin/conroller model):

  • openstack控制面部署在microk8s中
  • openstack数据面(ovn and nova-compute)用snap部署(openstack-hypervisor)

在这里插入图片描述

Deploy sunbeam

思路:

  • 在reset环境时还是应该删除microk8s,由sunbeam来安装microk8s
  • 看到microk8s snap被安装之后,再运行用于特色网络的workaround来下载image
# Backup images, remember to use 'mcirok8s.ctr' instead of 'ctr', and remove 'ctr' by 'sudo apt purge containerd docker.io -y'
microk8s.ctr --namespace k8s.io images ls -q |grep -v sha256 |xargs -I {} sh -c 'fname=$(echo "{}" | tr -s "/" "_"); microk8s.ctr --namespace k8s.io image export ${fname}.tar "{}"'
ls *.tar | xargs -I {} sh -c 'fname=$(echo "{}" | sed "s/_/\//g" | sed "s/.tar$//g");echo $fname; microk8s.ctr --namespace k8s.io image import {} ${fname}'

# or use micro.registry
#sudo microk8s.enable registry
#ls *.tar | xargs -I {} sh -c 'fname=$(echo "{}" | sed "s/_/\//g" | sed "s/.tar$//g"); sudo docker load -i {}; sudo docker tag ${fname} localhost:32000/${fname}; sudo docker push localhost:32000/${fname}'

microk8s.ctr --namespace k8s.io image ls
sed -i "s#registry.k8s.io#registry.cn-hangzhou.aliyuncs.com/google_containers#g" /var/snap/microk8s/current/args/containerd-template.toml
sudo systemctl restart snap.microk8s.daemon-containerd.service
# sudo microk8s.stop && sudo microk8s.start   #并不需要重启microk8s, 重启上面的snap.microk8s.daemon-containerd即可

cat << EOF |tee -a /var/snap/microk8s/current/args/containerd-env
# should use http type
HTTP_PROXY=http://192.168.99.186:3129
HTTPS_PROXY=http://192.168.99.186:3129
NO_PROXY=10.0.0.0/8,127.0.0.0/16,192.168.0.0/16
EOF
  • 再运行"sudo microk8s.stop && sudo microk8s.start’
  • 若此方法不work,可以在国外机器上安装好sunbeam之后,再用上面的命令将镜像备份回来再导入
juju add sunbean && juju add-machine --series jammy --constraints "root-disk=100G mem=32G cores=8"
juju ssh 0
# 遇到各种奇怪的问题时,要先reset环境
sudo snap remove --purge openstack
sudo snap remove --purge juju
sudo snap remove --purge juju-db
sudo snap remove --purge kubectl
sudo /usr/sbin/remove-juju-services
sudo rm -rf /var/lib/juju
rm -rf ~/.local/share/juju
rm -rf ~/snap/juju/
rm -rf ~/snap/openstack
rm -rf ~/snap/openstack-hypervisor
rm -rf ~/snap/microstack/
rm -rf ~/snap/microk8s/
sudo snap remove --purge vault
sudo snap remove --purge microk8s
sudo snap remove --purge openstack-hypervisor
rm -rf /home/hua/.local/share/openstack/deployments.yaml
sudo init 6    #最好重启,否则会有一些calico的网卡和namespace去不了

# 用2023.2/candidate将会遇到:Please run `sunbeam configure` first,所以肜2023.1/stable
sudo snap install openstack --channel 2023.1/stable
python3 -c "import socket; print(socket.getfqdn())"
sunbeam prepare-node-script | bash -x
sudo usermod -a -G snap_daemon $USER && newgrp snap_daemon
#ERROR failed to bootstrap model: machine is already provisioned
sudo remove-juju-services
# 如果hangout在'Bootstrapping juju into machine'无反应,用'journalctl -f'看到是ssh的问题
# 是因为ssh密钥设密码了,同时配置无密码登录,且配置‘NOPASSWD:ALL’
lxc network set lxdbr0 ipv4.address=192.168.9.1/24
lxc network set lxdbr0 ipv6.address none
lxc network set lxdbr0 ipv4.dhcp=true
sunbeam cluster bootstrap --accept-defaults
mkdir -p ~/.kube && sudo chown -R $USER ~/.kube
sudo usermod -a -G snap_microk8s $USER && newgrp snap_microk8s
microk8s.kubectl get pods --all-namespaces
microk8s.ctr --namespace k8s.io image ls
#registry.k8s.io, docker.io, registry.jujucharms.com, quay.io
##echo 'HTTPS_PROXY=http://192.168.99.186:9311' |sudo tee -a /var/snap/microk8s/current/args/containerd-env
microk8s.ctr --namespace k8s.io containers ls
alias kubectl='sudo /snap/bin/microk8s.kubectl'
source <(kubectl completion bash) && kubectl completion bash |sudo tee /etc/bash_completion.d/kubectl
sunbeam cluster list
#Unable to complete operation for new subnet. The number of DNS nameservers exceeds the limit 5.
#The workaround is to modify /run/systemd/resolve/resolv.conf, but don't restart systemd-resolv according to 
#https://github.com/openstack-snaps/snap-openstack/commit/7b7ca702efb490f13624002093e1b0b4cefe3aab
sunbeam configure --accept-defaults --openrc demo-openrc
sunbeam openrc > admin-openrc
source admin-openrc
sunbeam launch ubuntu --name test
sudo journalctl -u snap.openstack.clusterd.service -f
source <(openstack complete)                                                    
openstack complete |sudo tee /etc/bash_completion.d/openstack
openstack hypervisor list
sudo snap get openstack-hypervisor node
sudo snap logs openstack-hypervisor.hypervisor-config-service
sudo snap logs openstack-hypervisor.ovn-controller
#juju switch opensetack && juju ssh ovn-central/0
sudo microk8s.kubectl -n openstack exec -it ovn-central-0 bash
sudo microk8s.kubectl -n openstack exec -it ovn-central-0 -c ovn-northd -- ovn-sbctl --db=ssl:ovn-central-0.ovn-central-endpoints.openstack.svc.cluster.local:16642 -c /etc/ovn/cert_host -C /etc/ovn/ovn-central.crt -p /etc/ovn/key_host list
cat /var/snap/openstack-hypervisor/common/etc/nova/nova.conf

Featured deployment

上面的部署方法是在正常网络条件下部署的,但是如果存在特色网络,可能需要做更多的hack.
1, 首先如果OS不是fresh的,需要reset env

sudo snap remove --purge microk8s 
sudo snap remove --purge juju 
sudo snap remove --purge openstack
sudo snap remove --purge openstack-hypervisor
sudo /usr/sbin/remove-juju-services
sudo rm -rf /var/lib/juju
rm -rf ~/.local/share/juju
rm -rf ~/snap/openstack
rm -rf ~/snap/openstack-hypervisor
rm -rf ~/snap/microstack/
rm -rf ~/snap/juju/
rm -rf ~/snap/microk8s/
sudo init 6    #最好重启,否则会有一些calico的网卡和namespace去不了

2, 由于我的OS有ssh key,并且ssh key是设置有密码的,所以得额外加一步使用无密码的.local/share/juju/ssh/juju_id_rsa

#because my default ssh key has password, so need one extra step to avoid: Timeout before authentication for 192.168.99.179 port 56142
cat .local/share/juju/ssh/juju_id_rsa.pub |sudo tee -a ~/.ssh/authorized_keys
ssh hua@minipc.lan -i .local/share/juju/ssh/juju_id_rsa

3, dns要用长格式:

echo '192.168.99.179  minipc.lan minipc' |sudo tee -a /etc/hosts
python3 -c "import socket; print(socket.getfqdn())"

4, 可查看日志: sudo journalctl -f
5, 特色环境需要能正确下载镜像

echo 'HTTP_PROXY=http://192.168.99.186:9311' |sudo tee -a /var/snap/microk8s/current/args/containerd-env
echo 'HTTPS_PROXY=http://192.168.99.186:9311' |sudo tee -a /var/snap/microk8s/current/args/containerd-env
echo 'NO_PROXY=10.0.0.0/8,192.168.0.0/16,127.0.0.1,172.16.0.0/12' |sudo tee -a /var/snap/microk8s/current/args/containerd-env
sudo snap restart microk8s
microk8s.kubectl get pods --all-namespaces
microk8s.ctr --namespace k8s.io image ls

How to debug snap

为lp bug (https://bugs.launchpad.net/snap-openstack/+bug/2039403)产生了一个小patch

diff --git a/sunbeam-python/sunbeam/utils.py b/sunbeam-python/sunbeam/utils.py
index 542c1c1..cd02ee2 100644
--- a/sunbeam-python/sunbeam/utils.py
+++ b/sunbeam-python/sunbeam/utils.py
@@ -242,7 +242,7 @@ def get_free_nic() -> str:
     return nic
 
 
-def get_nameservers(ipv4_only=True) -> List[str]:
+def get_nameservers(ipv4_only=True, max_count=5) -> List[str]:
     """Return a list of nameservers used by the host."""
     resolve_config = Path("/run/systemd/resolve/resolv.conf")
     nameservers = []
@@ -258,7 +258,7 @@ def get_nameservers(ipv4_only=True) -> List[str]:
         nameservers = list(set(nameservers))
     except FileNotFoundError:
         nameservers = []
-    return nameservers
+        return nameservers[:max_count]

它位于/snap/openstack/274/lib/python3.10/site-packages/sunbeam/utils.py, 但如何在生产环境中能过打diff来快速调试它呢?因为它是read-only的,感觉好麻烦,下面的步骤也不好使,因为在’sudo snap remove openstack’之后sunbeam命令也找不着了也就无法运行‘sunbeam configure --accept-defaults --openrc demo-openrc’了

cd snap-openstack/sunbeam-python
tox -epy3

#In order to debug read-only file /snap/openstack/274/lib/python3.10/site-packages/sunbeam/utils.py
sudo unsquashfs -d squashfs-root /var/lib/snapd/snaps/openstack_*.snap
sudo snap try ./squashfs-root/ --devmode
cd ./squashfs-root/lib/python3.10/site-packages
sudo patch -p1 < ./diff
cd ~ && sudo vim ./squashfs-root/lib/python3.10/site-packages/sunbeam/utils.py
import rpdb;rpdb.set_trace()
sudo ./squashfs-root/bin/python3 -m pip install rpdb
sudo systemctl restart snap.openstack.clusterd.service
nc 127.0.0.1 4444
sunbeam configure --accept-defaults --openrc demo-openrc #trigger it, but there is no sunbeam now

重新build snap也需要先删除snap也不能debug

git clone https://github.com/openstack-snaps/snap-openstack.git
cd snap-openstack/
patch -p1 < diff
sudo apt install build-essential -y
sudo snap install --classic snapcraft
#snapcraft clean
sudo snapcraft

用下列’mount --bind’的方法可以,但因为只是mount一个文件所以只能用日志:

cp /snap/openstack/274/lib/python3.10/site-packages/sunbeam/utils.py .
sudo mount --bind utils.py /snap/openstack/274/lib/python3.10/site-packages/sunbeam/utils.py 
mount | grep "utils.py"
#now we can modify /snap/openstack/274/lib/python3.10/site-packages/sunbeam/utils.py  (NOTE: it's not ./utils.py)
LOG.warn("quqi {}".format(nameservers[:max_count]))
#python needs to be restarted if they are running in the daemon
sudo systemctl restart snap.openstack.clusterd.service
sunbeam configure --accept-defaults --openrc demo-openrc #trigger it

想要’mount --bind’一个目录然后安装rpdb模块,但没有成功(以后再机会再确认一下)

mkdir -p ~/snap_write/ && cp -r /snap/openstack/274 ~/snap_write/
sudo mount --bind ~/snap_write/274 /snap/openstack/274
mount |grep /snap/openstack/274
vim /snap/openstack/274/lib/python3.10/site-packages/sunbeam/utils.py
LOG.warn("quqi {}".format(nameservers[:max_count]))
import rpdb;rpdb.set_trace()
/snap/openstack/274/bin/pip install rpdb
#python needs to be restarted if they are running in the daemon
sudo systemctl restart snap.openstack.clusterd.service
sunbeam configure --accept-defaults --openrc demo-openrc #trigger it
nc 127.0.0.1 4444

Some Info

juju ssh -m admin/controller 0
ubuntu@juju-5d90c3-sunbeam-0:~$ juju clouds |tail -n2
Only clouds with registered credentials are shown.
There are more clouds, use --all to see them.
microk8s   1        localhost  k8s     0            built-in  A Kubernetes Cluster
sunbeam    1        default    manual  0            local     
ubuntu@juju-5d90c3-sunbeam-0:~$ juju controllers |tail -n1
sunbeam-controller*  admin/controller  juju-5d90c3-sunbeam-0.cloud.sts  superuser                     2      -   -  3.2.0
ubuntu@juju-5d90c3-sunbeam-0:~$ juju models |tail -n3
Model              Cloud/Region                Type        Status     Machines  Cores  Units  Access  Last connection
admin/controller*  sunbeam/default             manual      available         1      8  4      admin   just now
openstack          sunbeam-microk8s/localhost  kubernetes  available         0      -  24     admin   1 minute ago

ubuntu@juju-5d90c3-sunbeam-0:~$ kubectl get pods --all-namespaces
NAMESPACE        NAME                                       READY   STATUS    RESTARTS        AGE
metallb-system   speaker-2rspk                              1/1     Running   0               108m
kube-system      coredns-6f5f9b5d74-ctc9d                   1/1     Running   0               109m
kube-system      calico-node-m74rh                          1/1     Running   0               107m
metallb-system   controller-9556c586f-kqslx                 1/1     Running   0               108m
kube-system      calico-kube-controllers-7457875fc6-xdst9   1/1     Running   0               106m
openstack        modeloperator-7f5fcd7474-w2f5p             1/1     Running   0               105m
openstack        cinder-ceph-mysql-router-0                 2/2     Running   0               105m
openstack        ovn-relay-0                                2/2     Running   0               105m
openstack        certificate-authority-0                    1/1     Running   0               104m
openstack        horizon-mysql-router-0                     2/2     Running   1 (101m ago)    105m
openstack        horizon-0                                  2/2     Running   0               105m
openstack        keystone-mysql-router-0                    2/2     Running   0               104m
openstack        cinder-ceph-0                              2/2     Running   0               105m
openstack        rabbitmq-0                                 2/2     Running   0               105m
openstack        placement-0                                2/2     Running   0               104m
openstack        neutron-0                                  2/2     Running   0               104m
openstack        keystone-0                                 2/2     Running   0               105m
openstack        glance-0                                   2/2     Running   1 (91m ago)     104m
openstack        traefik-0                                  2/2     Running   0               105m
openstack        cinder-mysql-router-0                      2/2     Running   2 (41m ago)     105m
openstack        neutron-mysql-router-0                     2/2     Running   2 (35m ago)     104m
openstack        nova-api-mysql-router-0                    2/2     Running   2 (10m ago)     104m
openstack        cinder-0                                   3/3     Running   1 (8m43s ago)   104m
kube-system      hostpath-provisioner-69cd9ff5b8-kdjpp      1/1     Running   5 (7m22s ago)   108m
openstack        nova-mysql-router-0                        2/2     Running   3 (7m19s ago)   105m
openstack        nova-0                                     4/4     Running   2 (7m19s ago)   103m
openstack        glance-mysql-router-0                      2/2     Running   1 (7m19s ago)   104m
openstack        ovn-central-0                              4/4     Running   2 (5m51s ago)   103m
openstack        nova-cell-mysql-router-0                   2/2     Running   1 (4m38s ago)   105m
openstack        mysql-0                                    2/2     Running   1 (3m21s ago)   104m
openstack        placement-mysql-router-0                   2/2     Running   3 (7m19s ago)   104m


ubuntu@juju-5d90c3-sunbeam-0:~$ juju switch admin/controller
sunbeam-controller:juju-5d90c3-sunbeam-0.cloud.sts/openstack -> sunbeam-controller:admin/controller
ubuntu@juju-5d90c3-sunbeam-0:~$ juju status
Model       Controller          Cloud/Region     Version  SLA          Timestamp  Notes
controller  sunbeam-controller  sunbeam/default  3.2.0    unsupported  03:50:49Z  upgrade available: 3.2.3
SAAS                   Status   Store  URL
certificate-authority  active   local  juju-5d90c3-sunbeam-0.cloud.sts/openstack.certificate-authority
keystone               waiting  local  juju-5d90c3-sunbeam-0.cloud.sts/openstack.keystone
ovn-relay              active   local  juju-5d90c3-sunbeam-0.cloud.sts/openstack.ovn-relay
rabbitmq               active   local  juju-5d90c3-sunbeam-0.cloud.sts/openstack.rabbitmq
App                   Version  Status   Scale  Charm                 Channel        Rev  Exposed  Message
controller                     active       1  juju-controller       3.2/stable      14  no       
microceph                      unknown      0  microceph             edge             9  no       
microk8s                       active       1  microk8s              legacy/stable  121  no       
openstack-hypervisor           active       1  openstack-hypervisor  2023.1/stable  105  no       
sunbeam-machine                active       1  sunbeam-machine       latest/edge      1  no       
Unit                     Workload  Agent  Machine  Public address  Ports      Message
controller/0*            active    idle   0        10.5.1.11                  
microk8s/0*              active    idle   0        10.5.1.11       16443/tcp  
openstack-hypervisor/0*  active    idle   0        10.5.1.11                  
sunbeam-machine/0*       active    idle   0        10.5.1.11                  
Machine  State    Address    Inst id  Base          AZ  Message
0        started  10.5.1.11  manual:  ubuntu@22.04      Manually provisioned machine
Offer      Application  Charm      Rev  Connected  Endpoint  Interface    Role
microceph  microceph    microceph  9    0/0        ceph      ceph-client  provider

ubuntu@juju-5d90c3-sunbeam-0:~$ juju switch openstack
sunbeam-controller:admin/controller -> sunbeam-controller:juju-5d90c3-sunbeam-0.cloud.sts/openstack
ubuntu@juju-5d90c3-sunbeam-0:~$ juju status                                                                                                                                            
Model      Controller          Cloud/Region                Version  SLA          Timestamp                                                                                             
openstack  sunbeam-controller  sunbeam-microk8s/localhost  3.2.0    unsupported  03:56:29Z
App                       Version                  Status       Scale  Charm                      Channel        Rev  Address         Exposed  Message                                 
certificate-authority                              active           1  tls-certificates-operator  latest/stable   22  10.152.183.253  no                                               
cinder                                             waiting          1  cinder-k8s                 2023.1/stable   47  10.152.183.47   no       installing agent                        
cinder-ceph                                        waiting          1  cinder-ceph-k8s            2023.1/stable   38  10.152.183.65   no       installing agent                        
cinder-ceph-mysql-router  8.0.34-0ubuntu0.22.04.1  active           1  mysql-router-k8s           8.0/candidate   64  10.152.183.165  no                                               
cinder-mysql-router       8.0.34-0ubuntu0.22.04.1  active           1  mysql-router-k8s           8.0/candidate   64  10.152.183.124  no                                               
glance                                             active           1  glance-k8s                 2023.1/stable   59  10.152.183.202  no                                               
glance-mysql-router       8.0.34-0ubuntu0.22.04.1  active           1  mysql-router-k8s           8.0/candidate   64  10.152.183.77   no                                               
horizon                                            active           1  horizon-k8s                2023.1/stable   56  10.152.183.234  no       http://10.20.21.10/openstack-horizon    
horizon-mysql-router      8.0.34-0ubuntu0.22.04.1  active           1  mysql-router-k8s           8.0/candidate   64  10.152.183.218  no                                               
keystone                                           waiting          1  keystone-k8s               2023.1/stable  125  10.152.183.123  no       installing agent                        
keystone-mysql-router     8.0.34-0ubuntu0.22.04.1  active           1  mysql-router-k8s           8.0/candidate   64  10.152.183.78   no                                               
mysql                     8.0.34-0ubuntu0.22.04.1  active           1  mysql-k8s                  8.0/candidate   99  10.152.183.183  no                                               
neutron                                            waiting          1  neutron-k8s                2023.1/stable   53  10.152.183.187  no       installing agent                        
neutron-mysql-router      8.0.34-0ubuntu0.22.04.1  active           1  mysql-router-k8s           8.0/candidate   64  10.152.183.45   no                                               
nova                                               waiting          1  nova-k8s                   2023.1/stable   48  10.152.183.59   no       installing agent                        
nova-api-mysql-router     8.0.34-0ubuntu0.22.04.1  active           1  mysql-router-k8s           8.0/candidate   64  10.152.183.46   no                                               
nova-cell-mysql-router    8.0.34-0ubuntu0.22.04.1  active           1  mysql-router-k8s           8.0/candidate   64  10.152.183.194  no                                               
nova-mysql-router         8.0.34-0ubuntu0.22.04.1  active           1  mysql-router-k8s           8.0/candidate   64  10.152.183.110  no                                               
ovn-central                                        active           1  ovn-central-k8s            23.03/stable    61  10.152.183.195  no                                               
ovn-relay                                          active           1  ovn-relay-k8s              23.03/stable    49  10.20.21.11     no                                               
placement                                          active           1  placement-k8s              2023.1/stable   43  10.152.183.90   no       
placement-mysql-router    8.0.34-0ubuntu0.22.04.1  active           1  mysql-router-k8s           8.0/candidate   64  10.152.183.210  no       
rabbitmq                  3.9.13                   active           1  rabbitmq-k8s               3.9/stable      30  10.20.21.12     no       
traefik                   2.10.4                   maintenance      1  traefik-k8s                1.0/candidate  148  10.20.21.10     no       updating ingress configuration for 'ingress:48'
Unit                         Workload     Agent  Address      Ports  Message
certificate-authority/0*     active       idle   10.1.105.20         
cinder-ceph-mysql-router/0*  active       idle   10.1.105.9          
cinder-ceph/0*               blocked      idle   10.1.105.12         (ceph) integration missing
cinder-mysql-router/0*       active       idle   10.1.105.7          
cinder/0*                    waiting      idle   10.1.105.30         (workload) Not all relations are ready
glance-mysql-router/0*       active       idle   10.1.105.19         
glance/0*                    active       idle   10.1.105.35         
horizon-mysql-router/0*      active       idle   10.1.105.11         
horizon/0*                   active       idle   10.1.105.13         
keystone-mysql-router/0*     active       idle   10.1.105.25         
keystone/0*                  waiting      idle   10.1.105.22         (workload) Not all relations are ready
mysql/0*                     active       idle   10.1.105.36         Primary
neutron-mysql-router/0*      active       idle   10.1.105.26         
neutron/0*                   waiting      idle   10.1.105.29         (workload) Not all relations are ready
nova-api-mysql-router/0*     active       idle   10.1.105.21         
nova-cell-mysql-router/0*    active       idle   10.1.105.18         
nova-mysql-router/0*         active       idle   10.1.105.8          
nova/0*                      waiting      idle   10.1.105.31         (workload) Not all relations are ready
ovn-central/0*               active       idle   10.1.105.37         
ovn-relay/0*                 active       idle   10.1.105.10         
placement-mysql-router/0*    active       idle   10.1.105.28         
placement/0*                 active       idle   10.1.105.27         
rabbitmq/0*                  active       idle   10.1.105.23         
traefik/0*                   maintenance  idle   10.1.105.24         updating ingress configuration for 'ingress:48'
Offer                  Application            Charm                      Rev  Connected  Endpoint              Interface             Role
certificate-authority  certificate-authority  tls-certificates-operator  22   1/1        certificates          tls-certificates      provider
keystone               keystone               keystone-k8s               125  1/1        identity-credentials  keystone-credentials  provider
ovn-relay              ovn-relay              ovn-relay-k8s              49   1/1        ovsdb-cms-relay       ovsdb-cms             provider
rabbitmq               rabbitmq               rabbitmq-k8s               30   1/1        amqp                  rabbitmq              provider

20231128更新 - try microk8s

正常地,这么装:

sudo update-alternatives --install "$(which editor)" editor "$(which vim)" 15
sudo update-alternatives --config editor
# only for snap download, it's useless
#sudo systemctl edit --full snapd.service
#[Service]
#Environment="HTTP_PROXY=http://192.168.99.186:9311"
#Environment="HTTPS_PROXY=http://192.168.99.186:9311"
#Environment="NO_PROXY=localhost,127.0.0.1,192.168.0.0/24,10.0.0.0/8,172.16.0.0/16,*.lan"
#sudo systemctl restart snapd.service
#sudo snap set system proxy.http=”http://localhost:8081"
#sudo snap set system proxy.https=”http://localhost:8081"

snap info microk8s
sudo snap install microk8s --classic
mkdir -p ~/.kube && sudo chown -R $USER ~/.kube
sudo usermod -a -G microk8s $USER && newgrp microk8s  #switch to the group microk8s
sudo journalctl -f -u snap.microk8s.daemon-kubelite
microk8s.kubectl get pods --all-namespaces
microk8s.ctr --namespace k8s.io image ls
microk8s.ctr --namespace k8s.io containers ls
alias kubectl='sudo /snap/bin/microk8s.kubectl'
source <(kubectl completion bash) && kubectl completion bash |sudo tee /etc/bash_completion.d/kubectl
kubectl describe pod -n kube-system coredns-864597b5fd-pj27h

但特色网络肯定会失败,下面的无论是修改contained-env还是containerd-template.toml都失败了

curl -x http://192.168.99.186:3129 https://registry.k8s.io
openssl s_client -connect registry.k8s.io:443
sudo HTTP_PROXY=http://192.168.99.186:3129 HTTPS_PROXY=http://192.168.99.186:3129 ctr --namespace=k8s.io images pull registry.k8s.io/pause:3.7
sudo ctr --namespace=k8s.io images pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.7
#echo -e 'HTTP_PROXY=http://192.168.99.186:3129\nHTTPS_PROXY=http://192.168.99.186:3129' |tee -a /var/snap/microk8s/current/args/containerd-env
sed -i "s#registry.k8s.io#registry.cn-hangzhou.aliyuncs.com/google_containers#g" /var/snap/microk8s/current/args/containerd-template.toml
sudo systemctl restart snap.microk8s.daemon-containerd.service
# sudo microk8s.stop && sudo microk8s.start   #并不需要重启microk8s, 重启上面的snap.microk8s.daemon-containerd即可

试图将所有image全下载,也失败了:

kubectl get pods --all-namespaces -o=jsonpath='{range .items[*]}{.metadata.namespace}:{.metadata.name}{"\t"}{range .spec.containers[*]}{.image}{"\n"}{end}{end}'
sudo HTTP_PROXY=http://192.168.99.186:3129 HTTPS_PROXY=http://192.168.99.186:3129 ctr --namespace=k8s.io images pull registry.k8s.io/pause:3.7
sudo HTTP_PROXY=http://192.168.99.186:3129 HTTPS_PROXY=http://192.168.99.186:3129 ctr --namespace=k8s.io images pull docker.io/calico/kube-controllers:v3.25.1
sudo HTTP_PROXY=http://192.168.99.186:3129 HTTPS_PROXY=http://192.168.99.186:3129 ctr --namespace=k8s.io images pull docker.io/calico/node:v3.25.1
sudo microk8s.ctr --namespace=k8s.io images pull registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:1.10.1

用L3层的工具还是失败了。
再换下面的方法来下载所有镜像:

git clone https://github.com/ubuntu/microk8s.git
cd microk8s
# know it's 1.28 according to 'snap info microk8s'
git checkout -b 1.28 v1.28

# grep -ir 'image:' * | awk '{print $3 $4}'
# reference - https://soulteary.com/2019/09/08/build-your-k8s-environment-with-microk8s.html
images=(
nginx:latest
rocks.canonical.com/cdk/diverdane/nginxdualstack:1.0.0
nginx:1.14.2
image:cdkbot/microbot-amd64
docker.io/calico/cni:v3.23.4
docker.io/calico/cni:v3.23.4
docker.io/calico/pod2daemon-flexvol:v3.23.4
docker.io/calico/node:v3.23.4
docker.io/calico/kube-controllers:v3.23.4
docker.io/calico/cni:v3.21.1
docker.io/calico/cni:v3.21.1
docker.io/calico/pod2daemon-flexvol:v3.21.1
docker.io/calico/node:v3.21.1
docker.io/calico/kube-controllers:v3.17.3
docker.io/calico/cni:v3.25.1
docker.io/calico/cni:v3.25.1
docker.io/calico/node:v3.25.1
docker.io/calico/kube-controllers:v3.25.1
)
for image in ${images[@]};do
  sudo HTTP_PROXY=http://192.168.99.186:3129 HTTPS_PROXY=http://192.168.99.186:3129 ctr --namespace k8s.io images pull $image
done
# 记得处理pause
#sudo HTTP_PROXY=http://192.168.99.186:3129 HTTPS_PROXY=http://192.168.99.186:3129 ctr --namespace k8s.io images pull registry.k8s.io/pause:3.7
sed -i "s#registry.k8s.io#registry.cn-hangzhou.aliyuncs.com/google_containers#g" /var/snap/microk8s/current/args/containerd-template.toml
sudo systemctl restart snap.microk8s.daemon-containerd.service

microk8s.kubectl get pods --all-namespaces -A
sudo usermod -a -G microk8s hua && sudo chown -r hua ~/.kube
newgrp microk8s
microk8s.inspect && microk8s status
microk8s.kubectl get pods --all-namespaces -A

# 查看什么错误的利器, 
microk8s.kubectl describe pod --all-namespaces > tmp/tmp #error reason
grep -r failed tmp/tmp  |tail -n1
# 然后重启microk8s (microk8s stop && microk8s start)

# 接着用‘kubectl get pods --all-namespaces -A’看到有一个calico-node启动失败的原因是有一个landscape的服务占用了9099端口
kubectl get pods --all-namespaces -A

可通过下面命令备份导入这些镜像:

# 注意:在import image时记得用microk8s.ctr代替ctr, 否则用“microk8s.ctr --namespace k8s.io image ls”看不到
sudo ctr --namespace k8s.io images ls -q |grep -v sha256 |xargs -I {} sh -c 'fname=$(echo "{}" | tr -s "/" "_"); sudo ctr --namespace k8s.io image export ${fname}.tar "{}"'
ls *.tar | xargs -I {} sh -c 'fname=$(echo "{}" | sed "s/_/\//g" | sed "s/.tar$//g");echo $fname; microk8s.ctr --namespace k8s.io image import {} ${fname}'
microk8s.ctr --namespace k8s.io image ls
sudo microk8s.stop && sudo microk8s.start
sed -i "s#registry.k8s.io#registry.cn-hangzhou.aliyuncs.com/google_containers#g" /var/snap/microk8s/current/args/containerd-template.toml
sudo systemctl restart snap.microk8s.daemon-containerd.service

# sunbeam特有的image
sudo HTTP_PROXY=http://192.168.99.186:3129 HTTPS_PROXY=http://192.168.99.186:3129 ctr --namespace k8s.io images pull quay.io/metallb/speaker:v0.13.3
kubectl delete -n metallb-system pod speaker-52bqz --grace-period=0 --force

继续部署cos:

# https://charmhub.io/topics/canonical-observability-stack/tutorials/install-microk8s
juju add-model cos sunbeam
juju switch cos
juju deploy cos-lite --trust
watch --color juju status --color --relations

完整步骤

# Backup images, remember to use 'mcirok8s.ctr' instead of 'ctr', and remove 'ctr' by 'sudo apt purge containerd docker.io -y'
microk8s.ctr --namespace k8s.io images ls -q |grep -v sha256 |xargs -I {} sh -c 'fname=$(echo "{}" | tr -s "/" "_"); microk8s.ctr --namespace k8s.io image export ${fname}.tar "{}"'
# ls *.tar | xargs -I {} sh -c 'fname=$(echo "{}" | sed "s/_/\//g" | sed "s/.tar$//g");echo $fname; microk8s.ctr --namespace k8s.io image import {} ${fname}'
# microk8s.ctr --namespace k8s.io image ls

# Reset the env
sudo snap remove --purge openstack
sudo snap remove --purge juju
sudo snap remove --purge juju-db
sudo snap remove --purge kubectl
sudo /usr/sbin/remove-juju-services
sudo rm -rf /var/lib/juju
rm -rf ~/.local/share/juju
rm -rf ~/snap/juju/
rm -rf ~/snap/openstack
rm -rf ~/snap/openstack-hypervisor
rm -rf ~/snap/microstack/
rm -rf ~/snap/microk8s/
sudo snap remove --purge vault
sudo snap remove --purge microk8s
sudo snap remove --purge openstack-hypervisor
sudo init 6    #最好重启,否则会有一些calico的网卡和namespace去不了

# Create a ssh key without password, and use 'NOPASSWD:ALL'
ssh-keygen
echo 'hua ALL=(ALL) NOPASSWD:ALL' |sudo tee -a /etc/sudoers

# Install sunbeam
sudo snap install openstack --channel 2023.1/stable
sunbeam prepare-node-script | bash -x
sudo usermod -a -G snap_daemon $USER && newgrp snap_daemon
sunbeam cluster bootstrap --accept-defaults
journalctl -f

# 下面这些东西现在全不用,改成用mirror的方式就行
# Monitor the status, when seeing the microk8s snap by using 'snap list |grep k8s', or the log 'Adding MicroK8S unit to machine ...', then load the image
ls *.tar | xargs -I {} sh -c 'fname=$(echo "{}" | sed "s/_/\//g" | sed "s/.tar$//g");echo $fname; microk8s.ctr --namespace k8s.io image import {} ${fname}'
# or use micro.registry
#sudo microk8s.enable registry
#ls *.tar | xargs -I {} sh -c 'fname=$(echo "{}" | sed "s/_/\//g" | sed "s/.tar$//g"); sudo docker load -i {}; sudo docker tag ${fname} localhost:32000/${fname}; sudo docker push localhost:32000/${fname}'
microk8s.ctr --namespace k8s.io image ls
sed -i "s#registry.k8s.io#registry.cn-hangzhou.aliyuncs.com/google_containers#g" /var/snap/microk8s/current/args/containerd-template.toml
sudo systemctl restart snap.microk8s.daemon-containerd.service
# sudo microk8s.stop && sudo microk8s.start   #并不需要重启microk8s, 重启上面的snap.microk8s.daemon-containerd即可
microk8s.kubectl get pods --all-namespaces
cat << EOF |tee -a /var/snap/microk8s/current/args/containerd-env
# should use http type
HTTP_PROXY=http://192.168.99.186:3129
HTTPS_PROXY=http://192.168.99.186:3129
NO_PROXY=10.0.0.0/8,127.0.0.0/16,192.168.0.0/16
EOF

# If there is a pause during the installation of pod in microk8s, we can restart microk8s by: sudo microk8s.stop && sudo microk8s.start

# Use sunbeam
alias kubectl='sudo /snap/bin/microk8s.kubectl'
source <(kubectl completion bash) && kubectl completion bash |sudo tee /etc/bash_completion.d/kubectl
source <(openstack complete) && openstack complete |sudo tee /etc/bash_completion.d/openstack
sunbeam configure --accept-defaults --openrc demo-openrc
sunbeam openrc > admin-openrc
source admin-openrc
sunbeam launch ubuntu --name test
sudo journalctl -u snap.openstack.clusterd.service -f
sudo snap logs openstack-hypervisor.ovn-controller
openstack hypervisor list
sunbeam cluster list

# Debug hacks
microk8s.kubectl describe pod --all-namespaces > tmp && grep -r failed tmp  |tail -n1
microk8s.kubectl logs -n kube-system pod xxx
$ microk8s.ctr --namespace k8s.io image ls -q |grep -v sha256
docker.io/calico/cni:v3.21.1
docker.io/calico/cni:v3.23.4
docker.io/calico/cni:v3.23.5
docker.io/calico/cni:v3.25.1
docker.io/calico/kube-controllers:v3.17.3
docker.io/calico/kube-controllers:v3.23.4
docker.io/calico/kube-controllers:v3.23.5
docker.io/calico/kube-controllers:v3.25.1
docker.io/calico/node:v3.21.1
docker.io/calico/node:v3.23.4
docker.io/calico/node:v3.23.5
docker.io/calico/node:v3.25.1
docker.io/calico/pod2daemon-flexvol:v3.21.1
docker.io/calico/pod2daemon-flexvol:v3.23.4
docker.io/cdkbot/hostpath-provisioner:1.4.2
docker.io/coredns/coredns:1.9.3
docker.io/jujusolutions/charm-base:ubuntu-20.04
docker.io/jujusolutions/charm-base:ubuntu-22.04
docker.io/jujusolutions/jujud-operator:3.2.0
quay.io/metallb/controller:v0.13.3
quay.io/metallb/speaker:v0.13.3
registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.7
registry.k8s.io/pause:3.7
rocks.canonical.com/cdk/diverdane/nginxdualstack:1.0.0

在这个过程中,如果运行juju clouds看到了下列问题, 那是因为在newgrp snap_daemon里运行了它 , juju 3.x开始用了strict (snap debug sandbox-features)这样用snap_daemon这个组肯定少权限。

update.go:85: cannot change mount namespace according to change mount (/run/user/1000/doc/by-app/snap.juju /run/user/1000/doc none bind,rw,x-snapd.ignore-missing 0 0): cannot inspect "/run/user/1000/doc": lstat /run/user/1000/doc: permission denied

遇到下面问题,将/etc/ntp.conf里搜索restrict禁用ipv6即可:

11月 29 13:43:28 minipc ntpd[1910]: bind(30) AF_INET6 fe80::ecee:eeff:feee:eeee%14#123 flags 0x11 failed: Cannot assign requested address
11月 29 13:43:28 minipc ntpd[1910]: unable to create socket on cali93e42ce2874 (14) for fe80::ecee:eeff:feee:eeee%14#123

另外,如果snap无法下载,是因为gw上的网络有问题。在microk8s里的pod始终部署不结束可能和这个有关,因为它要去registry.jujucharms.com里拉openstack的charm (这个目前网络可以正常访问)

grep -r 'PullImage from image service failed' /var/log/syslog | awk -F'image="' '{split($2, a, "@sha256:"); print a[1]}'

ov 29 13:50:15 minipc microk8s.daemon-kubelite[38665]: E1129 13:50:15.323748   38665 remote_image.go:171] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"registry.jujucharms.com/charm/6a0rnzywlucfo4rvn7y2aylcc19uaarnwsrge/ovn-sb-db-server-image@sha256:d132bf917fde0e48743ace9f0bceb0ae3ba17a7cc41c0a76c4160a1fb606940a\": failed to resolve reference \"registry.jujucharms.com/charm/6a0rnzywlucfo4rvn7y2aylcc19uaarnwsrge/ovn-sb-db-server-image@sha256:d132bf917fde0e48743ace9f0bceb0ae3ba17a7cc41c0a76c4160a1fb606940a\": pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed" image="registry.jujucharms.com/charm/6a0rnzywlucfo4rvn7y2aylcc19uaarnwsrge/ovn-sb-db-server-image@sha256:d132bf917fde0e48743ace9f0bceb0ae3ba17a7cc41c0a76c4160a1fb606940a"

sudo HTTP_PROXY=http://192.168.99.186:3129 HTTPS_PROXY=http://192.168.99.186:3129 microk8s.ctr --namespace=k8s.io images pull registry.jujucharms.com/charm/6a0rnzywlucfo4rvn7y2aylcc19uaarnwsrge/ovn-sb-db-server-image

the way - microk8s.registry

安装方法:

sudo microk8s.enable registry

刚开始不知道端口,用了一些方法找到端口是5000, 并且向外导出的端口是32000

journalctl -u snap.microk8s.daemon-containerd -u snap.microk8s.daemon-registry
cat /var/snap/microk8s/current/args/containerd.toml |grep '\.registry' -A1
cat /var/snap/microk8s/6100/args/certs.d/localhost\:32000/hosts.toml
hua@minipc:~$ sudo microk8s.kubectl logs -n container-registry registry-77c7575667-q9qr2 |tail -n1
time="2023-11-29T06:08:29.221059247Z" level=info msg="listening on [::]:5000" go.version=go1.16.15 instance.id=5e67e401-e456-49bc-acaf-d6406f012e7f service=registry version="v2.8.1+unknown"
$ microk8s.kubectl get svc -A |grep reg
container-registry   registry                             NodePort       10.152.183.20    <none>        5000:32000/TCP                   33m

curl http://192.168.99.179:32000/v2/_catalog

sudo microk8s.kubectl proxy
http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/

kubectl run -it --rm --image=alpine:latest test-container-registry -- sh
# apk add curl

导入镜像:

ls *.tar | xargs -I {} sh -c 'fname=$(echo "{}" | sed "s/_/\//g" | sed "s/.tar$//g"); microk8s.ctr image import {} localhost:32000/${fname}'

但不清楚怎么列出localhost:32000中的镜像,用"microk8s.ctr images ls localhost:32000"肯定不行因为ctr侧重与containerd打交道而不是镜像库,这里可能要用到docker的(sudo docker image ls localhost:32000/k8s.io)但也是空的,难道是需要将镜像还从ctr里导到docker吗或用docker重新标识吗?(注:可能就是用microk8s.ctr images ls 查看,下次试试)

ls *.tar | xargs -I {} sh -c 'fname=$(echo "{}" | sed "s/_/\//g" | sed "s/.tar$//g"); sudo docker load -i {}; sudo docker tag ${fname} localhost:32000/${fname}; sudo docker push localhost:32000/${fname}'
#sudo docker image ls localhost:32000
# 用上面的docker image ls localhost:32000还是看不到东西的,得按下列的命令看. docker并不关心存在哪,只看tag (tag里有localhost哦)
sudo docker image ls
sudo docker image ls localhost:32000/docker.io/calico/node

the way - use mirror

sudo docker pull registry.k8s.io/pause:3.7
sudo docker pull k8s.m.daocloud.io/pause:3.7

# https://github.com/DaoCloud/public-image-mirror
# https://microk8s.io/docs/registry-private

sudo mkdir -p /var/snap/microk8s/current/args/certs.d/registry.k8s.io
echo '
server = "registry.k8s.io"

[host."https://registry.aliyuncs.com/v2/google_containers"]
  capabilities = ["pull", "resolve"]
  override_path = true
' | sudo tee  /var/snap/microk8s/current/args/certs.d/registry.k8s.io/hosts.toml


sudo mkdir -p /var/snap/microk8s/current/args/certs.d/rocks.canonical.com
echo '
server = "rocks.canonical.com"

[host."https://rocks-canonical.m.daocloud.io"]
  capabilities = ["pull", "resolve"]
  override_path = true
' | sudo tee  /var/snap/microk8s/current/args/certs.d/rocks.canonical.com/hosts.toml


sudo mkdir -p /var/snap/microk8s/current/args/certs.d/registry.jujucharms.com
echo '
server = "registry.jujucharms.com"

[host."https://jujucharms.m.daocloud.io"]
  capabilities = ["pull", "resolve"]
  override_path = true
' | sudo tee  /var/snap/microk8s/current/args/certs.d/registry.jujucharms.com/hosts.toml


sudo mkdir -p /var/snap/microk8s/current/args/certs.d/quay.io
echo '
server = "quay.io"

[host."https://quay.m.daocloud.io"]
  capabilities = ["pull", "resolve"]
  override_path = true
' | sudo tee /var/snap/microk8s/current/args/certs.d/quay.io/hosts.toml


sudo mkdir -p /var/snap/microk8s/current/args/certs.d/gcr.io
echo '
server = "gcr.io"

[host."https://gcr.m.daocloud.io"]
  capabilities = ["pull", "resolve"]
  override_path = true
' | sudo tee /var/snap/microk8s/current/args/certs.d/gcr.io/hosts.toml


sudo mkdir -p /var/snap/microk8s/current/args/certs.d/docker.io
echo '
server = "docker.io"

[host."https://m.daocloud.io/docker.io"]
  capabilities = ["pull", "resolve"]
  override_path = true
' | sudo tee /var/snap/microk8s/current/args/certs.d/docker.io/hosts.toml

# 如果不生效,应是下面的权限问题
sud chown -R hua:hua /var/snap/microk8s/current/args/certs.d
sudo snap restart microk8s

其他问题

上面用mirror方式可以顺利安装完sunbeam,但启动一个测试虚机时报了下列错,

$ sudo dmesg | grep 'apparmor="DENIED"' |tail -n1
[ 8189.359936] audit: type=1400 audit(1701263375.834:2157): apparmor="DENIED" operation="open" profile="snap.openstack-hypervisor.nova-api-metadata" name="/etc/nova/api-paste.ini" pid=1440903 comm="python3" requested_mask="r" denied_mask="r" fsuid=0 ouid=1000

可这么fix它:

sudo vim /var/lib/snapd/apparmor/profiles/snap.openstack-hypervisor.nova-api-metadata
/etc/nova/api-paste.ini r,
/etc/nova/** r,

sudo apparmor_parser -r /var/lib/snapd/apparmor/profiles/snap.openstack-hypervisor.nova-api-metadata

但之后‘openstack hypervisor list’还是空,看到了下列错. 现在bug实在太多先不玩了:

hua@minipc:~$ journalctl -f -u snap.openstack-hypervisor.nova-compute.service
11月 29 21:31:06 minipc nova-compute[1655283]: 2023-11-29 21:31:06.227 1655283 INFO nova.virt.libvirt.driver [None req-646001d6-e3ff-414d-8be4-6bf0c1882b2b - - - - - -] Connection event '0' reason 'Failed to connect to libvirt: Failed to connect socket to '/var/snap/openstack-hypervisor/common/run/libvirt/virtqemud-sock': No such file or directory'

20240222 - docker 加速

sudo mkdir -p /etc/docker
sudo tee /etc/docker/daemon.json <<-EOF
{
    "registry-mirrors": [
        "https://dockerproxy.com",
        "https://mirror.baidubce.com",
        "https://docker.m.daocloud.io",
        "https://docker.nju.edu.cn",
        "https://docker.mirrors.sjtug.sjtu.edu.cn"
    ]
}
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker

20240618 - Use sunbeam to verify SRU (>jammy)

1, 通过我们自己将horizon设置为NodePort似乎不 work
kubectl get service -n openstack
kubectl patch service horizon --namespace openstack -p '{"spec": {"type": "NodePort"}}'
kubectl describe service horizon --namespace openstack |grep NodePort
kubectl get svc horizon --namespace openstack -o yaml
kubectl get pods --namespace openstack -l app.kubernetes.io/name=horizon
kubectl get nodes -o wide
kubectl logs horizon-0 --namespace openstack
sudo ufw allow 32241

2, 改用'sunbeam dashboard-url'这次work了,这样既可以通过FIP(10.20.21.11)访问,也可以通过NodePort=32241访问
#above NodePort didn't work so let's change to use dashboard-url instead
#juju run horizon/0 get-dashboard-url
sunbeam dashboard-url
#http://10.20.21.11:80/openstack-horizon
sshuttle -v -r ubuntu@seg 10.20.0.0/16
$ juju status horizon |grep active
horizon           active      1  horizon-k8s  2023.2/stable   62  10.152.183.130  no       http://10.20.21.11/openstack-horizon
horizon/0*  active    idle   10.1.141.236 
$ kubectl get services -o wide  -A|grep '10.152.183.130'
openstack        horizon                              NodePort       10.152.183.130   <none>        65535:32241/TCP                  6h14m   app.kubernetes.io/name=horizon

3, 但horizon上的apache2运行在哪里了?找过去看到的只是traefik,它是类似于haproxy来做HA的(由于charm是由haproxy来做HA的,现在改成microk8s上运行openstack后将由k8s的traefik来为openstack提供HA)
$ kubectl get nodes -o wide |grep Ready
harhall   Ready    <none>   4h43m   v1.28.10   10.230.62.239   <none>        Ubuntu Core 20   5.15.0-101-generic   containerd://1.6.28
#10.152.183.131 is ClusterIP for LoadBalancer, pod's cidr is 10.1.141.0/24(cni), the IP of br-ex is 10.20.21.1
$ kubectl get services --namespace openstack -o wide |grep 10.20.21.11
traefik-public                       LoadBalancer   10.152.183.131   10.20.21.11   80:30109/TCP,443:31719/TCP       3h44m   app.kubernetes.io/name=traefik-public
ubuntu@harhall:~$ kubectl describe service traefik-public -n openstack
...
IP:                       10.152.183.131
LoadBalancer Ingress:     10.20.21.11
Port:                     traefik-public  80/TCP
TargetPort:               80/TCP
NodePort:                 traefik-public  30109/TCP
Endpoints:                10.1.141.211:80
Port:                     traefik-public-tls  443/TCP
TargetPort:               443/TCP
NodePort:                 traefik-public-tls  31719/TCP
Endpoints:                10.1.141.211:443
$ kubectl describe service horizon -n openstack
...
Type:                     NodePort
IP:                       10.152.183.130
IPs:                      10.152.183.130
Port:                     placeholder  65535/TCP
TargetPort:               65535/TCP
NodePort:                 placeholder  32241/TCP
Endpoints:                10.1.141.236:65535
$ kubectl get pods --namespace openstack -o wide |grep 10.1.141.211
traefik-public-0                 2/2     Running   0          4h32m   10.1.141.211   harhall   <none>           <none>

4, 也可通过原来juju方式访问快速访问horizon unit,这下发现原来并没有通过apaches而是通过wsgi.py来运行horion的,wsgi是由host运行的,horizon/0 unit内只运行了horizon charm和pebble进程 (pebble是一个类似于systemd的东西 - https://github.com/canonical/pebble , 查看服务用'/charm/bin/pebble services', 停止服务用'/charm/bin/pebble stop <xxx>', 查看juju的log用/charm/bin/pebble logs)
#kubectl exec -it horizon-0 --namespace openstack -- /bin/bash
cat $HOME/snap/openstack/current/account.yaml
juju switch openstack && juju ssh horizon/0
#https://opendev.org/openstack/sunbeam-charms.git
root@horizon-0:/var/lib/juju# grep -r -i 'script' /var/lib/juju/agents/unit-horizon-0/charm/src/templates/wsgi-horizon.conf 
WSGIScriptAlias {{ ingress_public.ingress_path }} /usr/share/openstack-dashboard/openstack_dashboard/wsgi.py process-group=horizon
root@horizon-0:/var/lib/juju# find / -name 'openstack-dashboard'
<empty>
ubuntu@harhall:~$ sudo find / -name 'openstack_dashboard' |grep dist-packages
/var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/274/fs/usr/lib/python3/dist-packages/openstack_dashboard
/var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/236/fs/usr/lib/python3/dist-packages/openstack_dashboard
#https://review.opendev.org/c/openstack/horizon/+/910321/4/openstack_dashboard/dashboards/identity/projects/tabs.py
sudo vim /var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/236/fs/usr/lib/python3/dist-packages/openstack_dashboard/dashboards/identity/projects/tabs.py
ubuntu@harhall:~$ ps -ef |grep wsgi |grep horizon |head -n1
42420    1366796 1366774  4 04:22 ?        00:19:22 (wsgi:horizon)    -DFOREGROUND

5,现在知道了代码位于(/var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/236/fs/usr/lib/python3/dist-packages/openstack_dashboard/dashboards/identity/projects/tabs.py), 我以为它是从microk8s snap来的所以也以为它是read-only的, 但unsquash(注意:对read-only snap debug可用mount --bind)后又找不着这个文件呢 - ./squashfs-root/usr/lib/python3/dist-packages/openstack_dashboard 
sudo unsquashfs -d squashfs-root /var/lib/snapd/snaps/microk8s*.snap
#need to first exit bash from 'kubectl -n openstack exec -it horizon-0 -- bash', then run
ubuntu@harhall:~$ sudo snap try ./squashfs-root/ --devmode
2024-06-18T03:06:50Z INFO Waiting for "snap.microk8s.daemon-kubelite.service" to stop.
microk8s v1.28.10 mounted from /home/ubuntu/squashfs-root
ubuntu@harhall:~$ ls ./squashfs-root/usr/lib/python3/dist-packages/openstack_dashboard
ls: cannot access './squashfs-root/usr/lib/python3/dist-packages/openstack_dashboard': No such file or directory

6, 哦,原来它来自于Rocks image, 并且这个文件并不是read-only的, 可以直接改写, 那easy了。
ubuntu@harhall:~$ microk8s.kubectl describe -n openstack pod/horizon-0 |grep 'Image:' |grep horizon
    Image:         registry.jujucharms.com/charm/ctws3sfxf7i2j12tvw9obbkcq9q0576tozepc/horizon-image@sha256:a1d591317e14c9d4a4bae7971cd8074ca9d4e5f0399779bd8da313b130f482f8
用'mount |grep upperdir'命令看到 upperdir指定的目录是可写的(upperdir=/var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/548/fs)

7, 但采用‘sudo microk8s.stop dashboard && sudo microk8s.start dashboard’(这个命令应该是重启k8s的dashboard吧)重启之后整个o7k不work了,用'sudo microk8s.stop && sudo microk8s.start'也恢复不了。这
cd /var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/236/fs/usr/lib/python3/dist-packages
patch -p1 < diff
sudo microk8s.stop dashboard && sudo microk8s.start dashboard

8, 快速恢复openstack环境,将charmed o7k运行在microk8s上就有下列通过delete pod来恢复o7k的好处,另外,也可以用k8s的LB来代替charmed haproxy
ubuntu@harhall:~$ juju status |grep mysql |grep installing |grep -v router
glance-mysql              8.0.35-0ubuntu0.22.04.1  waiting      1  mysql-k8s                 8.0/stable     127  10.152.183.184  no       installing agent
keystone-mysql            8.0.35-0ubuntu0.22.04.1  waiting      1  mysql-k8s                 8.0/stable     127  10.152.183.218  no       installing agent
nova-mysql                8.0.35-0ubuntu0.22.04.1  waiting      1  mysql-k8s                 8.0/stable     127  10.152.183.105  no       installing agent

kubectl delete pod -n openstack glance-mysql-0
kubectl delete pod -n openstack keystone-mysql-0
kubectl delete pod -n openstack nova-mysql-0
#also delete router pods
kubectl delete pod -n openstack keystone-mysql-router-0
kubectl delete pod -n openstack nova-api-mysql-router-0
kubectl delete pod -n openstack nova-cell-mysql-router-0
kubectl delete pod -n openstack nova-mysql-router-0

9, 然后delete horizon-0 pod重启horizon后就可以成功验证patch了
kubectl delete pod -n openstack horizon-0
ubuntu@harhall:~$ sudo grep -r 'identity.get_domain_id_for_operation' /var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/236/fs/usr/lib/python3/dist-packages/openstack_dashboard/dashboards/identity/projects/tabs.py
        domain_id = identity.get_domain_id_for_operation(self.request)
        domain_id = identity.get_domain_id_for_operation(self.request)
            domain_id = identity.get_domain_id_for_operation(self.request)

10, 但是我想试一个rpdb, 但是不work
sudo vim /var/snap/microk8s/common/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/236/fs/usr/lib/python3/dist-packages/openstack_dashboard/dashboards/identity/projects/tabs.py
#import rpdb;rpdb.set_trace()
import rpdb;rpdb.Rpdb(addr='0.0.0.0', port=4444).set_trace()
sudo pip3 uninstall rpdb && /snap/openstack/335/bin/pip3 install rpdb
kubectl delete pod -n openstack horizon-0

既然用sunbeam在mantic上验证了2023.2 (4:23.3.0-0ubuntu1.2,其实也不能说验证了mantic, mantic用23.3.0版本作为base, 只能说验证了 23.3.0, 实际上sunbeam也是用的jammy + 23.3.0). 接着也用subbeam来验证2024.1 (noble用4:24.0.0-0ubuntu1.1作为base)时居然失败了,原因是另一个不相关的问题居然此时在domain那里只能列出一个domain来了(之前是可以列出多个的)。切换到devstack debug后找出来原来是这个原因(https://bugs.launchpad.net/keystone/+bug/2041611 , https://review.opendev.org/c/openstack/keystone/+/900028),如下,正常project-scoped token可以返回多个domain, 但domain-scoped应该只能返回一个domain, 但之前有bug出现了domain-scoped也能返回多个domain(而horizon里调用 keystone API是严格使用domain-scoped的),所以这样相当于间接掩盖了我之前的bug (https://bugs.launchpad.net/horizon/+bug/2054799), 因为今后将无法列出多个domain所以也就不存在通过admin domain登录来为别的domain如k8s domain来设置domain context的触发那个bug的先决条件了。

#project-scoped token can return multiple domains
$ env |grep OS_
OS_PROJECT_DOMAIN_ID=default
OS_CACERT=
OS_AUTH_URL=http://localhost:5000/v3/
OS_USER_DOMAIN_ID=default
OS_USERNAME=admin
OS_PROJECT_NAME=admin
OS_PASSWORD=password
#export OS_PROJECT_NAME=admin
#export OS_PROJECT_DOMAIN_ID=default
$ openstack domain list
+----------------------------------+---------+---------+--------------------+
| ID                               | Name    | Enabled | Description        |
+----------------------------------+---------+---------+--------------------+
| default                          | Default | True    | The default domain |
| e3c1e806d5204b638906c6e64d72ec9a | k8s     | True    |                    |
+----------------------------------+---------+---------+--------------------+

#domain-scoped token should return only one domain
#but although the token of the domain member user making the API request is strictly domain-scoped, 
#all domains are returned without this patch https://bugs.launchpad.net/keystone/+bug/2041611
unset OS_PROJECT_NAME
unset OS_PROJECT_DOMAIN_ID
export OS_DOMAIN_NAME=default
$ openstack domain list
+---------+---------+---------+--------------------+
| ID      | Name    | Enabled | Description        |
+---------+---------+---------+--------------------+
| default | Default | True    | The default domain |
+---------+---------+---------+--------------------+

另外,charm可以暂时可能下列方法在jammy上创建2024.1环境 (2024.1上还没有合适的ovn版本所以用ml2-ovs), noble上用juju3.x所用的charmcraft.yaml格式不一样,charm目前无法同时管理两套格式,所以目前noble还直接运行不了charm所以也verify不了sru debian包(用sunbeam也只是验证patch而不是deb)
我试图先安装jammy-caracal, 然后仅将dashboard unit做series upgrade to mantic,但升级后的虚机的console-log显示eth0没有IP导致无法继续(所以也无法验证之前说的mysql-router charm是否会有问题)

./generate-bundle.sh -s jammy -r caracal -n osttest --ml2-ovs --run --openstack-dashboard
juju ssh openstack-dashboard/0 -- sudo -s
systemctl stop jujud-machine-12.service && systemctl disable jujud-machine-12.service
#we should NOT disable /etc/apt/sources.list.d/cloud-archive.list etc, otherwise, we will hit package dependencies issue during do-release-upgrade
sed -i 's/Prompt=lts/Prompt=normal/g' /etc/update-manager/release-upgrades
apt upgrade -y && apt dist-upgrade -y
reboot
juju ssh openstack-dashboard/0 -- sudo -s
#series upgrade from jammy to mantic
do-release-upgrade

因为安装的是jammy-caracal, 所以直接从jammy升级到noble (避开mantic)就不会遇到上面的IP问题了。
Mantic == Bobcat which is < Caracal. If you deploy Jammy Caracal using these instructions you should not upgrade to Mantic because you will then be running an unsupported combination of packages. Forget Mantic entirely and go to straight Noble.

sed -i 's/Prompt=normal/Prompt=lts/g' /etc/update-manager/release-upgrades
do-release-upgrade -d -f DistUpgradeViewNonInteractive
systemctl start jujud-machine-12.service && systemctl enable jujud-machine-12.service

20240923 - 如果国内快速安装sunbeam/microk8s再次理论分析

通过docker registry是可以创建docker image cache的(见: https://docs.docker.com/docker-hub/mirror/), 也见( How to use private registry in docker and containerd and k8s (by quqi99) 0- https://blog.csdn.net/quqi99/article/details/104842160 ).
假设目前已经为为https://k8s.gcr.io准备了cache server (http://10.245.167.25:8082), 为https://docker.io准备了cache servr(http://10.245.166.36:8082), micork8s添加下列配置之后重启(sudo systemctl restart snap.microk8s.daemon-containerd.service )即可:

sudo mkdir -p /var/snap/microk8s/current/args/certs.d/k8s.gcr.io
cat << EOF | sudo tee /var/snap/microk8s/current/args/certs.d/k8s.gcr.io/hosts.toml
server = "http://10.245.167.25:8082"

[host."http://10.245.167.25:8082"]
  capabilities = ["pull", "resolve"]
EOF

sudo mkdir -p /var/snap/microk8s/current/args/certs.d/docker.io
cat << EOF | sudo tee /var/snap/microk8s/current/args/certs.d/docker.io/hosts.toml
server = "http://10.245.166.36:8082"

[host."http://10.245.166.36:8082"]
  capabilities = ["pull", "resolve"]
EOF

安装sunbeam的话,可以先照上面方式安装microk8s之后再装sunbeam.

如果使用docker-registry这个charm来部署docker registry的话,可以这样部署:

series: focal
applications:
  docker-io-caching-registry:
    charm: ./docker-registry
    num_units: 1
    options:
      cache-password: CHANGEME
      cache-remoteurl: https://registry-1.docker.io
      cache-username: CHANGEME
      http_proxy: http://xx:3128
      https_proxy: http://xx:3128
      registry-http-proxy: http://xx:3128
      registry-https-proxy: http://xx:3128
      registry-port: 8082
    constraints: arch=amd64 cpu-cores=4 allocate-public-ip=true
  k8s-gcr-io-caching-registry:
    charm: ./docker-registry
    num_units: 1
    options:
      cache-remoteurl: https://k8s.gcr.io
      http_proxy: http://xx:3128
      https_proxy: http://xx:3128
      registry-http-proxy: http://xx:3128
      registry-https-proxy: http://xx:3128
      registry-port: 8082
    constraints: arch=amd64 cpu-cores=4 allocate-public-ip=true

也不用自建docker registry, 照上面’the way - use mirror’中的用国内公共的docker registry即可 (或者加前缀m.daocloud.io即可,如: k8s.gcr.io/coredns/coredns => m.daocloud.io/k8s.gcr.io/coredns/coredns)。

#https://github.com/DaoCloud/public-image-mirror
#sudo docker pull registry.k8s.io/pause:3.7
#sudo docker pull k8s.m.daocloud.io/pause:3.7
registry.k8s.io -     https://registry.aliyuncs.com/v2/google_containers
rocks.canonical.com - https://rocks-canonical.m.daocloud.io
registry.jujucharms.com - https://jujucharms.m.daocloud.io
quay.io - https://quay.m.daocloud.io
gcr.io - https://gcr.m.daocloud.io
docker.io - https://m.daocloud.io/docker.io
sud chown -R hua:hua /var/snap/microk8s/current/args/certs.d
sudo snap restart microk8s

microk8s charm也是有一个custom_registries配置参数的 - https://github.com/canonical/charm-microk8s/blob/legacy/config.yaml#L27 , 例如:

juju config containerd custom_registries='[{"url": "https://zhhuabj-bastion.cloud.sts:5000", "username": "test", "password": "password"}]'

docker也是支持custom_registries配置参数的,在/etc/docker/daemon.json中

"registry-mirrors": [
    "https://docker.m.daocloud.io"
  ]

但是为什么下列的方法不work呢?待查:

#https://microk8s.io/
sudo snap install microk8s --classic

sudo mkdir -p /var/snap/microk8s/current/args/certs.d/registry.k8s.io
echo '
server = "registry.k8s.io"

[host."https://m.daocloud.io/registry.k8s.io"]
  capabilities = ["pull", "resolve"]
  override_path = true
' | sudo tee  /var/snap/microk8s/current/args/certs.d/registry.k8s.io/hosts.toml


sudo mkdir -p /var/snap/microk8s/current/args/certs.d/rocks.canonical.com
echo '
server = "rocks.canonical.com"

[host."https://m.daocloud.io/rocks.canonical.com"]
  capabilities = ["pull", "resolve"]
  override_path = true
' | sudo tee  /var/snap/microk8s/current/args/certs.d/rocks.canonical.com/hosts.toml


sudo mkdir -p /var/snap/microk8s/current/args/certs.d/registry.jujucharms.com
echo '
server = "registry.jujucharms.com"

[host."https://m.daocloud.io/registry.jujucharms.com"]
  capabilities = ["pull", "resolve"]
  override_path = true
' | sudo tee  /var/snap/microk8s/current/args/certs.d/registry.jujucharms.com/hosts.toml


sudo mkdir -p /var/snap/microk8s/current/args/certs.d/quay.io
echo '
server = "quay.io"

[host."https://m.daocloud.io/quay.io"]
  capabilities = ["pull", "resolve"]
  override_path = true
' | sudo tee /var/snap/microk8s/current/args/certs.d/quay.io/hosts.toml


sudo mkdir -p /var/snap/microk8s/current/args/certs.d/gcr.io
echo '
server = "gcr.io"

[host."https://m.daocloud.io/gcr.io"]
  capabilities = ["pull", "resolve"]
  override_path = true
' | sudo tee /var/snap/microk8s/current/args/certs.d/gcr.io/hosts.toml


sudo mkdir -p /var/snap/microk8s/current/args/certs.d/docker.io
echo '
server = "docker.io"

[host."https://m.daocloud.io/docker.io"]
  capabilities = ["pull", "resolve"]
  override_path = true
' | sudo tee /var/snap/microk8s/current/args/certs.d/docker.io/hosts.toml

sudo chown -R $USER /var/snap/microk8s/current/args/certs.d
journalctl -u snap.microk8s.daemon-containerd.service -f
sudo systemctl restart snap.microk8s.daemon-containerd.service
sudo tail -f /var/log/syslog |grep 'image'
#sudo docker pull docker.io/calico/cni:v3.25.1
#sudo docker pull m.daocloud.io/docker.io/calico/cni:v3.25.1

#sed -i "s#k8s.gcr.io#registry.cn-hangzhou.aliyuncs.com/google_containers#g" /var/snap/microk8s/current/args/containerd-template.toml
#sudo systemctl restart snap.microk8s.daemon-containerd.service
#cat /var/snap/microk8s/3629/args/containerd.toml

microk8s.ctr --namespace k8s.io image ls
microk8s kubectl get all --all-namespaces
microk8s status --wait-ready
microk8s enable registry

它报这种错, 明显是从docker.m.daocloud.io这里报了StatusNotFound,然后再去docker.io里下载就遇到了Forbidden:

9月 24 14:28:07 minipc microk8s.daemon-containerd[3155217]: time="2024-09-24T14:28:07.871135376+08:00" level=info msg="PullImage \"docker.io/calico/cni:v3.25.1\""
9月 24 14:28:08 minipc microk8s.daemon-containerd[3155217]: time="2024-09-24T14:28:08.010462692+08:00" level=info msg="trying next host - response was http.StatusNotFound" host=docker.m.daocloud.io
9月 24 14:28:21 minipc microk8s.daemon-containerd[3155217]: time="2024-09-24T14:28:21.587410000+08:00" level=error msg="PullImage \"docker.io/calico/cni:v3.25.1\" failed" error="failed to pull and unpack image \"docker.io/calico/cni:v3.25.1\": failed to resolve reference \"docker.io/calico/cni:v3.25.1\": unexpected status from HEAD request to https://www.docker.com/v2/calico/cni/manifests/v3.25.1: 403 Forbidden"

所以我直接编译 /var/snap/microk8s/current/args/containerd-template.toml 添加下列配置之后(sudo cat /var/snap/microk8s/current/args/containerd.toml)再运行’sudo systemctl restart snap.microk8s.daemon-containerd.service’就正常了。

  [plugins."io.containerd.grpc.v1.cri".registry.mirrors]
    [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
        endpoint = ["https://docker.m.daocloud.io","http://hub-mirror.c.163.com"]
    [plugins."io.containerd.grpc.v1.cri".registry.mirrors."gcr.io"]
        endpoint = ["gcr.m.daocloud.io"]
    [plugins."io.containerd.grpc.v1.cri".registry.mirrors."k8s.gcr.io"]
        endpoint = ["k8s-gcr.m.daocloud.io"]
    [plugins."io.containerd.grpc.v1.cri".registry.mirrors."quay.io"]
        endpoint = ["quay.m.daocloud.io"]
    [plugins."io.containerd.grpc.v1.cri".registry.mirrors."registry.k8s.io"]
        endpoint = ["k8s.m.daocloud.io"]
    [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.elastic.co"]
        endpoint = ["elastic.m.daocloud.io"]
    [plugins."io.containerd.grpc.v1.cri".registry.mirrors."registry.jujucharms.com"]
        endpoint = ["jujucharms.m.daocloud.io"]
    [plugins."io.containerd.grpc.v1.cri".registry.mirrors."rocks.canonical.com"]
        endpoint = ["rocks-canonical.m.daocloud.io"]
    [plugins."io.containerd.grpc.v1.cri".registry.mirrors."harbor.sundayhk.com"]
        endpoint = ["https://harbor.sundayhk.com"]
  #  'plugins."io.containerd.grpc.v1.cri".registry' contains config related to the registry
  #[plugins."io.containerd.grpc.v1.cri".registry]
  #  config_path = "${SNAP_DATA}/args/certs.d"

Reference

[1] Sunbeam underlying projects - https://discourse.ubuntu.com/t/sunbeam-underlying-projects/37526
[2] https://github.com/canonical/snap-openstack.git

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

quqi99

你的鼓励就是我创造的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值