Skip to main content

High Availability Installation

Use ocboot deployment tool for high availability installation of Cloudpods service, which better meets the deployment needs of production environments.

Environment Preparation

  • Operating System: Supported distributions vary depending on CPU architecture. The current situation of supported distributions is as follows:
  • The operating system needs to be a clean version, as the deployment tool will build the specified version of Kubernetes cluster from scratch. Ensure that the system does not have container management tools such as Kubernetes and Docker installed, otherwise conflicts may occur and cause installation abnormalities.
  • Minimum system requirements: CPU 4 cores, 8GiB memory, 100GiB storage.
  • The storage paths used by virtual machines and services are both under /opt directory. Thus, it is recommended to set up a separate mount point for the /opt directory in an ideal environment.
    • For example, create a separate partition for /dev/sdb1 and format it as ext4, then mount it to the /opt directory through /etc/fstab.

Assuming that there are 3 CentOS7 machines and 1 Mariadb/MySQL machine ready, the deployment is planned as follows:

roleipinterfacenote
k8s primary10.127.90.101eth0the first control node
k8s master 110.127.90.102eth0the second control node
k8s master 210.127.90.103eth0the third control node
k8s VIP10.127.190.10-the VIP used by keepalived, which will be bound to the first of the three control nodes
DB10.127.190.11-independently deployed database node, pswd="0neC1oudDB#", port=3306

The deployment of the DB is currently not managed by the ocboot deployment tool and needs to be manually deployed in advance. It is recommended to use MariaDB database and not to use MySQL 5.6 and earlier versions to prevent the "Index column size too large. The maximum column size is 767 bytes." bug. The corresponding MariaDB version for each distribution is as follows:

  • CentOS 7.6-7.9 Minimal(X86_64 and ARM64) installs MariaDB 5.5.68 by default
  • Debian 10-11 (X86_64 and ARM64) installs MariaDB 10.3.1 by default
  • Kylin V10 sp2 (X86_64 and ARM64) installs MariaDB 10.3.4 by default

In addition, the deployment of a high availability database can also refer to the document: Deploy Mariadb HA environment.

NTP consistency of high availability cluster

Before the installation, please ensure that the time of each node to be deployed is consistent, otherwise the certificate issuance step will fail.

If it is installed online, you can refer to the following command to ensure that every server in the cluster is synchronized with Internet time:

# You can choose a more convenient and accessible time server.
# If the ntpdate command is not available, use the corresponding package manager on the os to install it.
# For example, on CentOS: yum install -y ntp && systemctl enable ntpd --now
$ ntpdate -u edu.ntp.org.cn && hwclock -w && ntpdate -u -q edu.ntp.org.cn

Getting Started with Installation

Download ocboot

# Use git clone the ocboot deployment tool locally
$ git clone -b release/3.10 https://github.com/yunionio/ocboot && cd ./ocboot

Write Deployment Configuration

# Setting shell environment variables
DB_IP="10.127.190.11"
DB_PORT=3306
DB_PSWD="0neC1oudDB#"
DB_USER=root

K8S_VIP=10.127.190.10
PRIMARY_INTERFACE="eth0"
PRIMARY_IP=10.127.90.101

MASTER_1_INTERFACE="eth0"
MASTER_1_IP=10.127.90.102
MASTER_2_INTERFACE="eth0"
MASTER_2_IP=10.127.90.103

# Generate the yaml deployment configuration file
cat > config-k8s-ha.yml <<EOF
# primary_master_node is the node that runs k8s and Cloudpods services
primary_master_node:
# ssh login IP
hostname: $PRIMARY_IP
# Don't use local login
use_local: false
# ssh login user
user: root
# cloudpods version
onecloud_version: "v3.10.12"
# mariadb connection address
db_host: "$DB_IP"
# mariadb user
db_user: "$DB_USER"
# mariadb password
db_password: "$DB_PSWD"
# mariadb port
db_port: "$DB_PORT"
# The address that the node service listens to, you can specify the address of the corresponding NIC when you have multiple NICs
node_ip: "$PRIMARY_IP"
# Default NIC Selection Rules for the Kubernetes calico Plugin
ip_autodetection_method: "can-reach=$PRIMARY_IP"
# IP of the K8s control node, corresponding to the VIP on which keepalived listens
controlplane_host: $K8S_VIP
# K8s control node apiserver listening port
controlplane_port: "6443"
# This node acts as a Cloudpods private cloud compute node, if you don't want the control node to act as a compute node, you can set it to false
as_host: true
# VM can be used as Cloudpods built-in private cloud compute nodes (default is false). When turning this on, make sure as_host: true
as_host_on_vm: true
# Product version, select one from [Edge, CMP, FullStack]. FullStack will install Converged Cloud, CMP will install Multi-Cloud Management Edition, Edge will install Private Cloud
product_version: 'Edge'
# If the machine to be deployed is not in mainland China, you can use dockerhub's mirror repository: docker.io/yunion
image_repository: registry.cn-beijing.aliyuncs.com/yunion
# Enabling High Availability Mode
high_availability: true
# Using minio as the VM image backend store
enable_minio: true
ha_using_local_registry: false
# NIC corresponding to default bridge br0 on compute node
host_networks: "$PRIMARY_INTERFACE/br0/$PRIMARY_IP"

master_nodes:
# The K8s vip of the control node to join
controlplane_host: $K8S_VIP
# The K8s apiserver port of the control node to join
controlplane_port: "6443"
# As a K8s and Cloudpods control node
as_controller: true
# This node acts as a Cloudpods private cloud compute node, if you don't want the control node to act as a compute node, you can set it to false
as_host: true
# VM can be used as Cloudpods built-in private cloud compute nodes (default is false). When turning this on, make sure as_host: true
as_host_on_vm: true
# Synchronizing ntp time from the primary node
ntpd_server: "$PRIMARY_IP"
# Enabling High Availability Mode
high_availability: true
hosts:
- user: root
hostname: "$MASTER_1_IP"
# NIC corresponding to default bridge br0 on compute node
host_networks: "$MASTER_1_INTERFACE/br0/$MASTER_1_IP"
- user: root
hostname: "$MASTER_2_IP"
# NIC corresponding to default bridge br0 on compute node
host_networks: "$MASTER_2_INTERFACE/br0/$MASTER_2_IP"
EOF

Start Deployment

$ ./ocboot.py install ./config-k8s-ha.yml

After the deployment completes, you can use a web browser to access https://10.127.190.10 (VIP), enter the username admin and password admin@123, then access the frontend.

Additionally, after deployment is complete, you can add nodes to an existing cluster. Refer to the document: Add Compute Nodes. Note that when adding nodes, the ip of the control node cannot use vip. Only use the actual ip of the first control node, because vip may have drifted to other nodes. Usually, only the first node is configured with ssh to login to other nodes without password. Using other control nodes will cause ssh login to fail.

Frequently Asked Questions

1. How to manually re-add the control node?

All 3 control nodes will run critical services such as kube-apiserver and etcd. If one of the nodes encounters an etcd data inconsistency, the node can be reset and re-added to the cluster according to the following steps:

# create join token on another normal control node
$ export KUBECONFIG=/etc/kubernetes/admin.conf
$ ocadm token create --description "ocadm-playbook-node-joining-token" --ttl 90m
2fmpbx.7zikd8sp5uhaxrjr

# get control node authentication
$ /opt/yunion/bin/ocadm init phase upload-certs | grep -v upload-certs
6150f8da2dcdf3a8a730f407ddce9f1cb9f24b15ffa4e4b3680e16ed40201cf0

########## Note that the following commands need to be executed on the node that needs to be added/reset ###########
# if the node has been added to the cloud platform as a compute node before
# it is necessary to back up the current /etc/yunion/host.conf file
[your-reset-node] $ cp /etc/yunion/host.conf /etc/yunion/host.conf.manual.bk

# log in to the node that needs to be reset and reset the current kubernetes environment
[your-reset-node] $ kubeadm reset -f

# Assuming the current network card is bond0 (if not bonded, the physical network card is generally named eth0 or similar), the IP is 172.16.84.40, and needs to join the cluster 172.16.84.101:6443
[your-reset-node] $ ocadm join \
--control-plane 172.16.84.101:6443 \ # target cluster to be joined
--token 2fmpbx.7zikd8sp5uhaxrjr --certificate-key 6150f8da2dcdf3a8a730f407ddce9f1cb9f24b15ffa4e4b3680e16ed40201cf0 --discovery-token-unsafe-skip-ca-verification \ # join authentication information
--apiserver-advertise-address 172.16.84.40 --node-ip 172.16.84.40 \ # IP address of the node
--as-onecloud-controller \ # as a cloudpods control node
--enable-host-agent \ # as a cloudpods compute node
--host-networks 'bond0/br0/172.16.84.40' \ # bridge network of the compute node, which means creating the br0 bridge, adding bond0 to it and configuring the IP of 172.16.84.40 to br0
--high-availability-vip 172.16.84.101 --keepalived-version-tag v2.0.25 # keepalived's VIP, ensuring the high availability of kube-apiserver

# After joining is complete, restore the /etc/yunion/host.conf.manual.bk configuration
[your-reset-node] $ cp /etc/yunion/host.conf.manual.bk /etc/yunion/host.conf

# restart the host service
$ kubectl get pods -n onecloud -o wide | grep host | grep $your-reset-node
$ kubectl delete pods -n onecloud default-host-xxxx

The above manual steps reference the logic of ocboot join master-node and can be found at https://github.com/yunionio/ocboot/blob/master/onecloud/roles/master-node/tasks/main.yml.