本文是公司内部集群扩容笔记,只做参考
软硬件环境
- 阿里云ecs,1个deploy,3个master,3个infra,其余为计算节点
- centos7.6
- docker-1.13.1
- openshfit 3.11
1. 扩容节点配置
1.1 ssh到新的节点
hostnamectl set-hostname computeXX.youdomain.local
1.2 禁用swap
sudo swapoff -a
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
1.3 启用selinux
setenforce 0
sed -i 's/^SELINUX=disabled$/SELINUX=permissive/' /etc/selinux/config
1.4 禁用防火墙
systemctl stop firewalld
systemctl disable firewalld
1.5 安装、更新包
yum install -y docker-1.13.1 wget git net-tools bind-utils yum-utils iptables-services bridge-utils bash-completion kexec-tools sos psacct PyYAML python-ipaddress yum-utils telnet curl lrzsz jq perf strace vim iotop httpd-tools python-passlib patch java-1.8.0-openjdk-headless
yum update -y
systemctl enable NetworkManager
systemctl start NetworkManager
1.6 重启
注: 阿里云ecs有bug,修改selinux后直接重启无法进入系统
fixfiles onboot && reboot
2. deploy节点操作
XX为节点序号
2.1. 配置host,/etc/hosts
xxxx.xxxx.xxxx.xxxx computeXX.yourdomain.local computeXX
2.2. 修改/etc/ansible/hosts,新增
[new_nodes]
computeXX.yourdomain.local openshift_node_group_name='node-config-compute'
2.3. 配置免密
ssh-copy-id -i ~/.ssh/id_rsa.pub computeXX.yourdomain.local
2.4. 拷贝host到新的节点 (外部读者请忽略)
ansible all -m copy -a "src=/etc/hosts dest=/etc/hosts owner=root group=root mode=0755"
这里有三台master和3台infra节点,对外暴露使用了阿里的负载均衡器(有bug),目前如果负载到内部自身那个节点流量会进不来,需要master和infra的host需要单独再修改,内部通信不走负载均衡使用自身节点的apiserver
ssh master1
192.168.1.77 openshift-internal.yourdomain.local
ssh master2
192.168.1.81 openshift-internal.yourdomain.local
ssh master3
192.168.1.83 openshift-internal.yourdomain.local
分别ssh infra1 ssh infra2 ssh infra3,注释掉registry内部ip,走公网流量
#192.168.1.97 registry.yourdomain.com
2.5. 重启域名服务
ansible all -m shell -a 'systemctl daemon-reload && systemctl restart dnsmasq'
2.6.安装lxcfs (opt)
ansible new_nodes -m shell -a 'mkdir -p /root/lxcfs-rpm'
ansible new_nodes -m copy -a "src=/root/lxcfs-rpm/lxcfs-2.0.5-3.el7.centos.x86_64.rpm dest=/root/lxcfs-rpm/lxcfs-2.0.5-3.el7.centos.x86_64.rpm owner=root group=root mode=0755"
ansible new_nodes -m shell -a 'yum install -y /root/lxcfs-rpm/lxcfs-2.0.5-3.el7.centos.x86_64.rpm && systemctl enable lxcfs && systemctl start lxcfs'
2.7. 执行安装
## 扩容计算节点
ansible-playbook -vv -i /etc/ansible/hosts /root/openshift-ansible/playbooks/openshift-node/scaleup.yml
## 如果扩容master节点
ansible-playbook -vv -i /etc/ansible/hosts /root/openshift-ansible/playbooks/openshift-master/scaleup.yml
2.8. 待成功后,注释/etc/ansible/hosts
[new_nodes]
#computeXX.youdomain.local openshift_node_group_name='node-config-compute'
2.9. 给新的节点打污点 key=value
oc label node computeXX.youdomain.local key=value
kubectl taint nodes computeXX.youdomain.local key=value:NoExecute