Adding Compute Nodes
To run AI applications in AI Cloud, you need to first add the corresponding compute nodes (hosts) and ensure they have the necessary container and (if applicable) GPU environment. This section describes how to deploy the base components on compute nodes and add hosts to the cluster.
Compute nodes are mainly responsible for container, network, storage, and GPU management.
Environment
- The operating system needs to be a clean version, because the deployment tool will build a k3s cluster of the specified version from scratch, so ensure that the system does not have kubernetes, containerd and other container management tools installed, otherwise conflicts will occur causing installation exceptions.
- Minimum configuration requirements: CPU 8 cores, memory 8GiB, storage 200GiB.
- The storage paths for virtual machines and services are all under the /opt directory, so ideally it is recommended to set up a separate mount point for the /opt directory.
- For example, partition /dev/sdb1 separately as ext4 and mount it to the /opt directory via /etc/fstab.
Depending on the CPU architecture, the supported distributions are also different. The currently supported distributions are as follows:
Note: 3.11, 3.10, etc. represent Release/3.11, Release/3.10. The same applies to others.
| Operationg system and Arch | 3.11 | 3.10 |
|---|---|---|
| OpenEuler 22.03 LTS SP3 x86_64+aarch64 | ✅ | ✅ |
| OpenEuler 22.03 LTS SP4 x86_64+aarch64 | ✅ | |
| OpenEuler 24.03 LTS SP2 x86_64+aarch64 | ✅ | |
| CentOS 7 2009 x86_64+aarch64 | ✅ | ✅ |
| CentOS 8 Stream x86_64+aarch64 | ✅ | |
| CentOS 9 Stream x86_64+aarch64 | ✅ | |
| CentOS 10 Stream x86_64+aarch64 | ✅ | |
| Debian 11 x86_64+aarch64 | ✅ | ✅ |
| Debian 12 x86_64+aarch64 | ✅ | |
| Debian 13 x86_64+aarch64 | ✅ | |
| Ubuntu 20.04 LTS x86_64+aarch64 | ✅ | |
| Ubuntu 22.04 LTS x86_64+aarch64 | ✅ | |
| Ubuntu 24.04 LTS x86_64+aarch64 | ✅ | |
| Ubuntu 25.04 x86_64+aarch64 | ✅ | |
| Rocky Linux 8.x x86_64+aarch64 | ✅ | |
| Rocky Linux 9.x x86_64+aarch64 | ✅ | |
| Rocky Linux 10.x x86_64+aarch64 | ✅ | |
| AlmaLinux 8.x x86_64+aarch64 | ✅ | |
| AlmaLinux 9.x x86_64+aarch64 | ✅ | |
| AlmaLinux 10.x x86_64+aarch64 | ✅ | |
| OpencloudOS 8.x x86_64+aarch64 | ✅ | |
| OpencloudOS 9.x x86_64+aarch64 | ✅ | |
| AnolisOS 8.x x86_64+aarch64 | ✅ |
Use ocboot to Add Corresponding Nodes
The following operations are performed on the control node. Use the ocboot.sh add-node command on the control node to add the corresponding compute node to the cluster.
Assuming you want to add compute node 10.168.222.140 to control node 10.168.26.216, you first need SSH root passwordless login to both the corresponding compute node and the control node itself.
If it is a high-availability deployment environment, do not use the VIP for the control node IP when adding nodes. Only use the actual IP of the first control node, because the VIP may drift to other nodes, and usually only the first node has SSH passwordless login permissions configured for other nodes. Using other control nodes may cause SSH login to fail.
# Set the control node itself to passwordless login
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@10.168.26.216
# Try passwordless login to the control node to verify
$ ssh root@10.168.26.216 "hostname"
# Copy the generated ~/.ssh/id_rsa.pub public key to the machine to be deployed
$ ssh-copy-id -i ~/.ssh/id_rsa.pub root@10.168.222.140
# Try passwordless login to the machine to be deployed. You should be able to get the hostname without entering a password
$ ssh root@10.168.222.140 "hostname"
Add Nodes
The following commands are all run on the previously deployed control node. The control node should have the ocboot deployment tool installed in advance.
If you plan to run GPU-dependent AI applications (such as Ollama) on the new compute node, please complete Setting up NVIDIA and CUDA Environment on the target compute node before running ocboot.sh add-node to add the node.
# Use ocboot to add nodes
$ ./ocboot.sh add-node --enable-ai-env 10.168.26.216 10.168.222.140
# Other options, use '--help' for help
$ ./ocboot.sh add-node --help
Enable Compute Nodes (Hosts)
After the compute nodes are added, you need to enable the newly reported compute nodes (hosts). Only enabled hosts can run virtual machines and related workloads.
# Use climc to view the registered host list
$ climc host-list
# Enable the specified host
$ climc host-enable <host_name>
Common Troubleshooting
For common troubleshooting on compute nodes, please refer to: Host Service Troubleshooting.