一、剔除master1节点
1)删除master1节点
3台master下掉一个,剩下2个master运行基本也没问题。坚持个一两天问题不大。
kubectl drain paas-m-k8s-master-1 --delete-local-data --force --ignore-daemonsets
kubectl delete node paas-m-k8s-master-1
2)清理etcd数据
a.进入etcd容器
kubectl -n kube-system exec -it etcd-paas-m-k8s-master-2 -- /bin/sh
b.查看member list
etcdctl --endpoints=127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key member list
c.剔除已删除的master1
etcdctl --endpoints=127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key member remove 7eab7c23b19f6778
二、master1重新加入
1)重置下master1
kubeadm reset
2)配置一个对域名apiserver.cluster.local的解析
修改 /etc/hosts
正常的master的ip apiserver.cluster.local
3)在master2上生成join命令
kubeadm init phase upload-certs --upload-certs
kubeadm token create --print-join-command
4)master1加入集群
kubeadm join apiserver.cluster.local:6443 \
--token yubedv.0rg185no5jgqwn07 \
--discovery-token-ca-cert-hash sha256:be87c7200420224f1f8d439a5f058de7be88282eec1fc833b346b38c62ddf482 \
--control-plane --certificate-key 23d8e27402b4f982d9ec894c37b1a3271c9f27bef2e653ca471426cc57025324
三、问题修复
1)域名解析不到apiserver.cluster.local
解决:
直接在/ets/hosts里配上
正常的master的ip apiserver.cluster.local
2)kubelet的端口占用
解决:
kubeadm join时会启动kubelet
使用kubeadm reset 重置配置
3)etcd目录不为空
解决:
删除即可。
rm -rf /var/lib/etcd
4)etcd健康检查失败
查看:
etcdctl --endpoints=127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key member list
解决:
删除即可
etcdctl --endpoints=127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key member remove 7eab7c23b19f6778