Kubernetes Operations
Introduction to Kubernetes operations management as the platform foundation
Cloudpods services deployed using ocboot run on Kubernetes. This chapter introduces Kubernetes cluster operations-related knowledge and techniques.
Each service binary runs in Kubernetes Pods. For how to diagnose service component issues within Pods, please refer to Troubleshooting Pod Abnormalities.
Container networking between Pods uses the Calico network plugin. For how to troubleshoot Pod network issues, please refer to Troubleshooting Pod Network Issues.
Kubernetes has a container eviction mechanism. When host CPU, memory, and storage usage reach certain thresholds, Kubernetes will actively stop containers. This mechanism is beneficial for stateless services, as it can proactively avoid service degradation due to insufficient host resources. However, for stateful services, such as the host service on each host, this mechanism amplifies the harm of resource shortages. What was originally just possible service capability degradation due to resource shortages becomes complete service unavailability due to the Eviction mechanism. Therefore, we should try to avoid Kubernetes triggering Eviction. Adjusting Kubernetes Node Eviction Threshold introduces methods to adjust the Eviction mechanism of Kubernetes container clusters.
Kubernetes does not have systemd's ability to stop and start services. Pause Cluster Services introduces methods to pause and restore services through operator.
Common Pod Operations Commands
Introduction to how to restart component services, view component logs, etc.
Troubleshoot Pod Abnormalities
Troubleshoot errors based on pod status.
Troubleshoot Pod Network Issues
Troubleshoot DNS resolution failure reasons within Pods.
Migrate from k8s to k3s
Migrating from k8s to k3s is suitable for users who:
Modify OC to Update Cluster Status
Introduction to updating cluster status by modifying OC (cloudpods-operator's CRD resource).
Adjust Kubernetes Node Eviction Threshold
Kubernetes has a node eviction mechanism. For example, when the node's root partition usage exceeds 85%, it will change the node to NotReady status and then evict Pods on it. The following introduces how to adjust the node's related configuration thresholds, which can be appropriately adjusted according to your environment.
Pause Cluster Services
During cluster maintenance operations, such as maintaining cluster databases, etc., it is necessary to pause cluster services. This article introduces methods to pause platform services without affecting normal virtual machine operation.
k3s 集群部署 rook ceph
在此之前这篇博客 介绍了如何在 kubernetes 集群上部署 rook ceph 集群,该博客适用的 kubernetes 集群版本为 v1.15.9,在新版本的 Cloudpods 中集群采用 k3s 部署,并且 k3s 的版本为 v1.28.5,此前博客中的文档已不适用,本文介绍如何在 cloudpods k3s 集群中部署 rook ceph。