控制节点替换流程
背景
即使是高可用部署的 3 节点 cloudpods over k8s 集群,在生产环境可能会出现任意 1 个节点挂掉的情况。 一些普通的故障,比如更换内存,CPU 之类可以解决的问题,可以临时关机,恢复后再重启节点解决。
但如果发生硬盘之类的故障,比如数据无法恢复的情况,需要把该节点删除再重新加入新的节点,接下来描述该步骤以及注意事项。
测试环境
- k8s_vip: 10.127.100.102
- 使用 staticpods keepalived 运行在 3 个 master 节点上
- 由 kubelet 直接启动,路径:/etc/kubernetes/manifests/keepalived.yaml
- primary_master_node 第1个初始化的控制节点: ip: 10.127.100.234
- master_node_1 第2个加入控制节点: ip: 10.127.100.229
- master_node_2 第3个加入的控制节点: ip: 10.127.100.226
- 数据库: 数据库部署在集群之外,不在 3 个节点之上
- CSI: 使用 local-path
- local-path 的 CSI 会强绑定 pod 到指定的 node,这里需要特别注意
- 如果挂掉的节点有对应 local-path 的 pvc 绑定在上面,使用该 pvc 的 pod 就没办法漂移到其它 Ready 的 node ,这种 pod 叫有状态的 pod,可以通过命令
kubectl get pvc -A | grep local-path就可以看到所有的 local-path pvc
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
lzx-ocboot-ha-test Ready master 100m v1.15.12 10.127.100.234 <none> CentOS Linux 7 (Core) 3.10.0-1160.6.1.el7.yn20201125.x86_64 docker://20.10.5
lzx-ocboot-ha-test-2 Ready master 61m v1.15.12 10.127.100.229 <none> CentOS Linux 7 (Core) 3.10.0-1160.6.1.el7.yn20201125.x86_64 docker://20.10.5
lzx-ocboot-ha-test-3 Ready master 60m v1.15.12 10.127.100.226 <none> CentOS Linux 7 (Core) 3.10.0-1160.6.1.el7.yn20201125.x86_64 docker://20.10.5
- minio:
- 高可用部署下使用 minio 作为 glance 的后端存储
- statefulset
- 使用 local-path CSI 作为后端存储
$ kubectl get pods -n onecloud-minio -o wide
kubectl get pods -n onecloud-minio -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
minio-0 1/1 Running 0 46m 10.40.99.205 lzx-ocboot-ha-test <none> <none>
minio-1 1/1 Running 0 46m 10.40.158.215 lzx-ocboot-ha-test-3 <none> <none>
minio-2 1/1 Running 0 46m 10.40.159.22 lzx-ocboot-ha-test-2 <none> <none>
minio-3 1/1 Running 0 46m 10.40.99.206 lzx-ocboot-ha-test <none> <none>
$ kubectl get pvc -n onecloud-minio
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
export-minio-0 Bound pvc-297ed5e5-66c8-4855-8031-c65a0ccfa4d0 1Ti RWO local-path 46m
export-minio-1 Bound pvc-4e8fe486-5b23-44a0-876c-df36d134957f 1Ti RWO local-path 46m
export-minio-2 Bound pvc-389b3c61-6000-4757-9949-db53e4e53776 1Ti RWO local-path 46m
export-minio-3 Bound pvc-3dd54509-7745-47dd-84ea-fbacfe1e2f5b 1Ti RWO local-path 46m
测试
目标
下线 primary_master_node 10.127.100.234 节点,加入新的节点替换该节点
步骤
1. 需要确认有哪些有状态的 pod 和 pvc 运行在该节点
# 根据 IP 找到节点在 k8s 集群中的名称
$ kubectl get nodes -o wide | grep 10.127.100.234
lzx-ocboot-ha-test Ready master 4h15m v1.15.12 10.127.100.234 <none> CentOS Linux 7 (Core) 3.10.0-1160.6.1.el7.yn20201125.x86_64 docker://20.10.5
# 查看所有为 local-path 的 pvc
$ kubectl get pvc -A | grep local-path
# onecloud-minio namespace 的 export-minio-x 负责存储 glance 的镜像,是关键组件
onecloud-minio export-minio-0 Bound pvc-297ed5e5-66c8-4855-8031-c65a0ccfa4d0 1Ti RWO local-path 3h11m
onecloud-minio export-minio-1 Bound pvc-4e8fe486-5b23-44a0-876c-df36d134957f 1Ti RWO local-path 3h11m
onecloud-minio export-minio-2 Bound pvc-389b3c61-6000-4757-9949-db53e4e53776 1Ti RWO local-path 3h11m
onecloud-minio export-minio-3 Bound pvc-3dd54509-7745-47dd-84ea-fbacfe1e2f5b 1Ti RWO local-path 3h11m
# onecloud-monitoring namespace 的 export-monitor-minio-x 负责存储服务的日志,不是关键组件
onecloud-monitoring export-monitor-minio-0 Bound pvc-b885605f-b5ca-40ff-b968-4d95b03e8bb8 1Ti RWO local-path 3h8m
onecloud-monitoring export-monitor-minio-1 Bound pvc-520a8262-5dad-48aa-9a0e-0e25f850faad 1Ti RWO local-path 3h8m
onecloud-monitoring export-monitor-minio-2 Bound pvc-6de1ff0f-3465-4a51-8124-880f1b3c6d7a 1Ti RWO local-path 3h8m
onecloud-monitoring export-monitor-minio-3 Bound pvc-364652ca-496e-4b29-82ea-ec7e768aa8f5 1Ti RWO local-path 3h8m
# onecloud namespace 下面的这些 pvc 都是系统服务依赖的
# default-baremetal-agent 是存储物理机管理的存储,不开启 baremetal-agent 可以不用管,默认也是 Pending 待绑定状态
onecloud default-baremetal-agent Pending local-path 3h35m
# default-esxi-agent 负责存 esxi-agent 服务相关的本地数据
onecloud default-esxi-agent Bound pvc-b32adfcc-96e7-45e4-b8bd-b5318c954dca 30G RWO local-path 3h35m
# default-glance 负责存 glance 服务的镜像,在高可用部署环境下,deployment default-glance 不会挂载这个 pvc ,会使用 onecloud-minio 里面的 minio s3 存储镜像,可以不用管
onecloud default-glance Bound pvc-e6ee398e-2d84-46cf-9401-e94f438d87cd 100G RWO local-path 3h36m
# default-influxdb 负责存平台的监控数据,监控数据可以容忍丢失,如果所在节点挂了,可以删掉重建
onecloud default-influxdb Bound pvc-871b9441-c56f-4bb4-8b56-868e1df1a438 20G RWO local-path 3h35m
# 查看该节点上有哪些 pod
$ kubectl get pods -A -o wide | grep onecloud | grep 'lzx-ocboot-ha-test '
onecloud-minio minio-0 1/1 Running 0 3h10m 10.40.99.205 lzx-ocboot-ha-test <none> <none>
onecloud-minio minio-3 1/1 Running 0 3h10m 10.40.99.206 lzx-ocboot-ha-test <none> <none>
onecloud-monitoring monitor-kube-state-metrics-6c97499758-w69tz 1/1 Running 0 3h6m 10.40.99.214 lzx-ocboot-ha-test <none> <none>
onecloud-monitoring monitor-loki-0 1/1 Running 0 3h6m 10.40.99.213 lzx-ocboot-ha-test <none> <none>
onecloud-monitoring monitor-minio-0 1/1 Running 0 3h7m 10.40.99.211 lzx-ocboot-ha-test <none> <none>
onecloud-monitoring monitor-minio-3 1/1 Running 0 3h7m 10.40.99.212 lzx-ocboot-ha-test <none> <none>
onecloud-monitoring monitor-monitor-stack-operator-54d8c46577-qknws 1/1 Running 0 3h6m 10.40.99.216 lzx-ocboot-ha-test <none> <none>
onecloud-monitoring monitor-promtail-4mx2s 1/1 Running 0 3h6m 10.40.99.215 lzx-ocboot-ha-test <none> <none>
onecloud default-etcd-7brtldv78z 1/1 Running 0 3h10m 10.40.99.207 lzx-ocboot-ha-test <none> <none>
onecloud default-glance-6fd697b7b9-nbk9t 1/1 Running 0 3h7m 10.40.99.208 lzx-ocboot-ha-test <none> <none>
onecloud default-host-5rmg8 3/3 Running 7 3h34m 10.127.100.234 lzx-ocboot-ha-test <none> <none>
onecloud default-host-deployer-sf494 1/1 Running 7 3h34m 10.40.99.202 lzx-ocboot-ha-test <none> <none>
onecloud default-host-image-s6pwq 1/1 Running 2 3h34m 10.127.100.234 lzx-ocboot-ha-test <none> <none>
onecloud default-region-dns-2hcpv 1/1 Running 1 3h34m 10.127.100.234 lzx-ocboot-ha-test <none> <none>
onecloud default-telegraf-5jn4x 2/2 Running 0 3h34m 10.127.100.234 lzx-ocboot-ha-test <none> <none>
onecloud default-influxdb-6bqgq 1/1 Running 0 3h34m 10.127.99.218 lzx-ocboot-ha-test <none> <none>
通过上面命令的结果,可以筛选出 onecloud-minio/minio-0 onecloud-minio/minio-3,onecloud/default-influxdb,onecloud-monitoring/monitor-minio-0,onecloud-minio/monitor-minio-3 这些有状态的 pod 在 primary_master_node 上。
2. 接下来将 primary_master_node 关机踢出集群
# 登录其它两个 master_node 节点,比如:10.127.100.229
$ ssh root@10.127.100.229
# 设置 KUBECONFIG 配置
[root@lzx-ocboot-ha-test-2 ~]$ export KUBECONFIG=/etc/kubernetes/admin.conf
# 查看节点状态,发现 primary_master_node 已经变成 NotReady
[root@lzx-ocboot-ha-test-2 ~]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
lzx-ocboot-ha-test NotReady master 4h37m v1.15.12
lzx-ocboot-ha-test-2 Ready master 3h58m v1.15.12
lzx-ocboot-ha-test-3 Ready master 3h57m v1.15.12
# 删除 primary_master_node 节点:lzx-ocboot-ha-test
[root@lzx-ocboot-ha-test-2 ~]$ kubectl drain --delete-local-data --ignore-daemonsets lzx-ocboot-ha-test
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-fdzql, kube-system/kube-proxy-nfxvd, kube-system/traefik-ingress-controller-jms9v, onecloud-monitoring/monitor-promtail-4mx2s, onecloud/default-host-5rmg8, onecloud/default-host-deployer-sf494, onecloud/default-host-image-s6pwq, onecloud/default-region-dns-2hcpv, onecloud/default-telegraf-5jn4x
evicting pod "minio-0"
evicting pod "monitor-minio-3"
evicting pod "default-etcd-7brtldv78z"
evicting pod "monitor-kube-state-metrics-6c97499758-w69tz"
evicting pod "default-influxdb-85945647d5-6bqgq"
evicting pod "default-glance-6fd697b7b9-nbk9t"
evicting pod "minio-3"
evicting pod "monitor-monitor-stack-operator-54d8c46577-qknws"
evicting pod "monitor-loki-0"
evicting pod "monitor-minio-0"
# 该命令会卡停住,因为 primary_master_node 已经关机了,无法删除 pod ,这个时候 'Ctrl-c' 取消命令
^C
# 使用 kubectl delete node 直接删除 primary_master_node 节点
$ kubectl delete node lzx-ocboot-ha-test
# 然后再查看处于 Pending 状态的 pod 全部都是之前 primary_master_node 上面有状态的 pod
# 因为这些 pod 使用了 local-path 的 pvc ,这些 pvc 是和节点强绑定的,还存在集群中
[root@lzx-ocboot-ha-test-2 ~]$ kubectl get pods -A | grep Pending
onecloud-minio minio-0 0/1 Pending 0 61s
onecloud-minio minio-3 0/1 Pending 0 61s
onecloud-monitoring monitor-minio-0 0/1 Pending 0 61s
onecloud-monitoring monitor-minio-3 0/1 Pending 0 61s
onecloud default-influxdb-85945647d5-x5sv5 0/1 Pending 0 10m
3. 删除旧的 primary_master_node 的 etcd endpoint
# 查看 kube-system 系统下的 etcd pod
[root@lzx-ocboot-ha-test-2 ~]$ kubectl get pods -n kube-system | grep etcd
etcd-lzx-ocboot-ha-test-2 1/1 Running 1 4h52m
etcd-lzx-ocboot-ha-test-3 1/1 Running 1 4h51m
# 进入 etcd-lzx-ocboot-ha-test-2 etcd pod
[root@lzx-ocboot-ha-test-2 ~]# kubectl exec -ti -n kube-system etcd-lzx-ocboot-ha-test-2 sh
# 使用 etcdctl 查看 member list
$ etcdctl --endpoints https://127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
# 会发现旧的 primary_master_node member lzx-ocboot-ha-test 还在 etcd 集群
14da7b338b44eee0, started, lzx-ocboot-ha-test, https://10.127.100.234:2380, https://10.127.100.234:2379, false
454ae6f931376261, started, lzx-ocboot-ha-test-2, https://10.127.100.229:2380, https://10.127.100.229:2379, false
5afd19948b9009f6, started, lzx-ocboot-ha-test-3, https://10.127.100.226:2380, https://10.127.100.226:2379, false
# 删除 lzx-ocboot-ha-test member
$ etcdctl --endpoints https://127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 14da7b338b44eee0