扩展搭建一个高可用的 k8s 集群

背景

节点太少了，要清空一个节点都做不到，于是直接扩展集群规模。（顺便尝试一下高可用部署）

目前是已有一控制平面，两工作节点的集群，今天要把原有的单机部署的控制平面扩展为高可用部署的。

架构

参考文档

首先，得搞清楚，HA 的 k8s 集群是一个什么样的架构。

ETCD

ETCD 本来就是分布式的数据库，基于 Raft 协议的主从架构。新节点上线自动和其他机器同步数据，从节点挂了就直接踢掉，主节点挂了就通过 Raft 协议选举新的主节点。

所以，基本不需要像传统数据库（mysql，pgsql）一样，考虑数据同步的问题，只要存活节点数量大于集群数量的一半，都可以持续对外提供服务。

apiserver

apiserver 在设计之初就考虑到了高可用的情况，所以整个 apiserver 都是无状态的。也就是说，多个 apiserver 可以同时使用一个 etcd 数据库，同时对外提供服务。

这就非常利于我们高可用部署了，直接起多个就行了。但是呢，我们访问的时候需要访问哪一个呢？所以还要在多个 apiserver 前面加一个 lb。当然，这个 lb 就不能由集群提供了，需要自己起一个。

scheduler & controller-manager

这两个和 apiserver 类似，直接每一个控制平面都部署一个就行了。

负载均衡

所以，我们要引入一个负载均衡器进来，这里我使用的是 haproxy。

原因是：老牌，坑应该比较少，ai 掌握的应该比较好（雾）

步骤

准备 Haproxy

为了避免出现“我自己依赖自己”的不合理现象，所以我专门又开了一台机器，专门运行 haproxy。

这次换个口味，直接用 RockyLinux 9.5。（潮到风湿了）

直接安装 haproxy

1

dnf install haproxy

直接修改配置文件

1

vi /etc/haproxy/haproxy.cfg

直接使用 ai 给的配置文件

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27


global
    log stdout format raw local0
    maxconn 4096
    daemon

defaults
    log global
    option httplog
    timeout connect 5s
    timeout client  50s
    timeout server  50s
    retries 3

frontend kubernetes-api
    bind 10.21.22.101:6443
    mode tcp
    option tcplog
    default_backend k8s-control-plane

backend k8s-control-plane
    mode tcp
    balance roundrobin
    option tcp-check
    default-server inter 10s downinter 5s rise 3 fall 2 maxconn 256
    server cp1 10.21.22.70:6443 check
    server cp2 10.21.22.72:6443 check
    server cp3 10.21.22.73:6443 check

直接启动

1

systemctl enable --now haproxy

直接把 kube config 里面的地址改了

1
2
3
4
5


- cluster:
    certificate-authority-data: ***
#    server: https://10.21.22.73:6443  old
    server: https://10.21.22.101:6443
  name: k8s-73

直接访问

1

k get po

直接报错

1
2
3
4
5
6


E0402 21:09:48.171050    5525 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://10.21.22.101:6443/api?timeout=32s\": dial tcp 10.21.22.101:6443: connect: connection refused"
E0402 21:09:48.173499    5525 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://10.21.22.101:6443/api?timeout=32s\": dial tcp 10.21.22.101:6443: connect: connection refused"
E0402 21:09:48.175366    5525 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://10.21.22.101:6443/api?timeout=32s\": dial tcp 10.21.22.101:6443: connect: connection refused"
E0402 21:09:48.177254    5525 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://10.21.22.101:6443/api?timeout=32s\": dial tcp 10.21.22.101:6443: connect: connection refused"
E0402 21:09:48.178989    5525 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://10.21.22.101:6443/api?timeout=32s\": dial tcp 10.21.22.101:6443: connect: connection refused"
The connection to the server 10.21.22.101:6443 was refused - did you specify the right host or port?

这里有好几个坑

selinux 会阻止 haproxy 监听端口，直接给关了

1
2


sudo setenforce 0
sudo sed -i 's/^SELINUX=enforcing/SELINUX=permissive/' /etc/selinux/config

防火墙会阻止入站请求，直接给关了
1

sudo systemctl disable --now firewalld

然后再直接连接

1
2
3
4
5
6
7


k get po
E0402 21:11:23.752300    5677 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://10.21.22.101:6443/api?timeout=32s\": tls: failed to verify certificate: x509: certificate is valid for 10.96.0.1, 10.21.22.73, not 10.21.22.101"
E0402 21:11:23.758801    5677 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://10.21.22.101:6443/api?timeout=32s\": tls: failed to verify certificate: x509: certificate is valid for 10.96.0.1, 10.21.22.73, not 10.21.22.101"
E0402 21:11:23.764242    5677 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://10.21.22.101:6443/api?timeout=32s\": tls: failed to verify certificate: x509: certificate is valid for 10.96.0.1, 10.21.22.73, not 10.21.22.101"
E0402 21:11:23.772137    5677 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://10.21.22.101:6443/api?timeout=32s\": tls: failed to verify certificate: x509: certificate is valid for 10.96.0.1, 10.21.22.73, not 10.21.22.101"
E0402 21:11:23.777593    5677 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://10.21.22.101:6443/api?timeout=32s\": tls: failed to verify certificate: x509: certificate is valid for 10.96.0.1, 10.21.22.73, not 10.21.22.101"
Unable to connect to the server: tls: failed to verify certificate: x509: certificate is valid for 10.96.0.1, 10.21.22.73, not 10.21.22.101

可以看到，虽然报了证书错误，但是也代表我们成功连接到集群了

碎碎念

其实选用 rhel 系，是因为查资料的时候看到了红帽官方关于负载均衡的资料，使用的就是 haproxy。心想，红帽官方保障的功能，应该没有什么坑吧。所以就也来体验一下商业产品的使用。

红帽系有个非常好用的面板

1

dnf install cockpit

（debian/Ubuntu 系什么时候有类似的管理工具

重签证书

既然客户端不信任我们的 haproxy，那就重签一个证书，把 haproxy 的 ip 加进去就行了

参考官方文档，这个选项可以让我们把其他的地址添加为证书的可选主体，我们只要把地址添加到这里，就可以让客户端信任我们的 haproxy 了。

我们都知道，kubeadm 的配置文件是存在集群中，kube-system 命名空间内的 kubeadm-config configmap 里面，所以直接在这里添加相关配置即可

1
2
3
4
5
6


apiServer: {}
apiVersion: kubeadm.k8s.io/v1beta4
 ......
apiServer:
  certSANs:
    - "10.21.22.101"

修改完成之后，我们再在控制平面上面使用 kubeadm 工具重新生成证书，并且重启 apiserver

1
2
3
4


kubeadm certs renew apiserver
crictl ps | grep apiserver # 看 contianer 的 id
crictl stop ***
crictl start ***

然后呢，很遗憾，这个方法是行不通的，生成出来的证书还是只有之前的哪些可选主体，怀疑 kubeadm 是直接使用已有证书的可选主体，而不是配置文件内的

使用这个命令，我们也能看到

1
2
3
4
5
6
7


kubeadm certs renew apiserver -h

Renewals run unconditionally, regardless of certificate expiration date; extra attributes such as SANs will be based on the existing file/certificates, there is no need to resupply them.

Renewal by default tries to use the certificate authority in the local PKI managed by kubeadm; as alternative it is possible to use K8s certificate API for certificate renewal, or as a last option, to generate a CSR request.

After renewal, in order to make changes effective, is required to restart control-plane components and eventually re-distribute the renewed certificate in case the file is used elsewhere.

明确的说明了会使用原来的可选主体

所以说，我们改用 kubeadm init 命令（需要先删除已有的证书）

1
2
3


rm apiserver.{crt,key}
kubeadm init phase certs apiserver \
    --apiserver-cert-extra-sans "10.21.22.101" -v8

这样子，就可以添加新的可选主体了。apiserver 应该会自动应用这个改动，过几秒就可以成功请求了

上传证书

我们知道，集群有一些很核心的证书，主要是集群的 ca，etcd 的 ca 等，这些信息证书是所有控制平面都需要有的，所以我们这里提交把证书上传到集群上，方便后面新的控制平面下载。

在原来的控制平面上执行下述命令

1
2
3
4
5
6


root@ubuntu-73:~# kubeadm init phase upload-certs --upload-certs
W0402 22:54:48.494770 1198890 version.go:109] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get "https://cdn.dl.k8s.io/release/stable-1.txt": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
W0402 22:54:48.494866 1198890 version.go:110] falling back to the local client version: v1.32.3
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
e6add5c89b36d0bc9a149b0b0d6f146ddc88898c909975027ac05ddaff32cecd

就可以得到一个 key，后面要用到的

生成加入命令

然后运行这个命令，就可以生成一个给其他新节点使用的加入集群命令

1
2


root@ubuntu-73:~# kubeadm token create --print-join-command
kubeadm join 10.21.22.73:6443 --token xlz6i0.cgi2ksvojfr223ta --discovery-token-ca-cert-hash sha256:e216a29ab03ff9852c374ebd1181f0968a50a976d5317b8dcbe4b6bec377a26c

修改控制平面地址

我们还需要让其他的工作节点知道，“现在有多个控制平面了，请访问负载均衡器的节点来连接控制平面”。所以我们还需要修改一个 kubeadm 的配置

还是之前的 kubeadm 配置中

1
2
3


apiVersion: kubeadm.k8s.io/v1beta4

controlPlaneEndpoint: 10.21.22.101:6443 # 添加这一行

添加 controlPlaneEndpoint，并设置为负载均衡器的地址

添加新的控制平面

这里在上面命令的基础上，添加一个 –certificate-key，用于把证书下载下来

1
2
3
4
5
6


kubeadm join 10.21.22.101:6443 \
  --token xlz6i0.cgi2ksvojfr223ta \
  --discovery-token-ca-cert-hash sha256:e216a29ab03ff9852c374ebd1181f0968a50a976d5317b8dcbe4b6bec377a26c \
  --control-plane \
  -v 8 \
  --certificate-key e6add5c89b36d0bc9a149b0b0d6f146ddc88898c909975027ac05ddaff32cecd

就和创建的第一次控制平面一样，输出了一段提示信息

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


This node has joined the cluster and a new control plane instance was created:

* Certificate signing request was sent to apiserver and approval was received.
* The Kubelet was informed of the new secure connection details.
* Control plane label and taint were applied to the new node.
* The Kubernetes control plane instances scaled up.
* A new etcd member was added to the local/stacked etcd cluster.

To start administering your cluster from this node, you need to run the following as a regular user:

	mkdir -p $HOME/.kube
	sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
	sudo chown $(id -u):$(id -g) $HOME/.kube/config

Run 'kubectl get nodes' to see this node join the cluster.

再回来看一眼，我们新的控制平面就添加好了

1
2
3
4
5
6
7


k get no
NAME        STATUS   ROLES           AGE     VERSION
ubuntu-70   Ready    control-plane   14m     v1.31.6
ubuntu-73   Ready    control-plane   27d     v1.32.3
ubuntu-74   Ready    <none>          27d     v1.31.7
ubuntu-75   Ready    <none>          27d     v1.32.3
ubuntu-81   Ready    <none>          5h58m   v1.31.6

新增工作节点

如果需要新增工作节点，就直接把上面 kubeadm token create 命令的输出在其他计算机上面运行就行了

验收

最后，我们就可以获得一个多节点的高可用集群啦

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


➜  ~ k get no
NAME        STATUS     ROLES           AGE     VERSION
ubuntu-70   Ready      control-plane   46m     v1.31.6
ubuntu-72   Ready      control-plane   9m43s   v1.31.6
ubuntu-73   Ready      control-plane   27d     v1.32.3
ubuntu-74   Ready      <none>          27d     v1.31.7
ubuntu-75   Ready      <none>          27d     v1.32.3
ubuntu-81   Ready      <none>          6h30m   v1.31.6
ubuntu-82   Ready      <none>          6m30s   v1.31.6
ubuntu-83   Ready      <none>          6m40s   v1.31.6

总结

很舒服的一次折腾，虽然改证书的主体走了一点弯路，不过剩下的都挺顺利的。不得不说，之前整的自动启动虚拟机的 terraform 配置，用起来是真的爽🥰。这一次写的是从已有的集群上面扩展高可用部署，下次再写一个直接从一开始就高可用部署的，虽然那样子可能会更简单一点。