Kubernetes Operations
Introduction to Kubernetes operations management as the platform foundation
Cloudpods services deployed using ocboot run on Kubernetes. This chapter introduces Kubernetes cluster operations-related knowledge and techniques.
Each service binary runs in Kubernetes Pods. For how to diagnose service component issues within Pods, please refer to Troubleshooting Pod Abnormalities.
Container networking between Pods uses the Calico network plugin. For how to troubleshoot Pod network issues, please refer to Troubleshooting Pod Network Issues.
Kubernetes has a container eviction mechanism. When host CPU, memory, and storage usage reach certain thresholds, Kubernetes will actively stop containers. This mechanism is beneficial for stateless services, as it can proactively avoid service degradation due to insufficient host resources. However, for stateful services, such as the host service on each host, this mechanism amplifies the harm of resource shortages. What was originally just possible service capability degradation due to resource shortages becomes complete service unavailability due to the Eviction mechanism. Therefore, we should try to avoid Kubernetes triggering Eviction. Adjusting Kubernetes Node Eviction Threshold introduces methods to adjust the Eviction mechanism of Kubernetes container clusters.
Kubernetes does not have systemd's ability to stop and start services. Pause Cluster Services introduces methods to pause and restore services through operator.
📄️ Common Pod Operations Commands
Introduction to how to restart component services, view component logs, etc.
📄️ Troubleshoot Pod Abnormalities
Troubleshoot errors based on pod status.
📄️ Troubleshoot Pod Network Issues
Troubleshoot DNS resolution failure reasons within Pods.
📄️ Modify OC to Update Cluster Status
Introduction to updating cluster status by modifying OC (cloudpods-operator's CRD resource).
📄️ Adjust Kubernetes Node Eviction Threshold
Kubernetes has a node eviction mechanism. For example, when the node's root partition usage exceeds 85%, it will change the node to NotReady status and then evict Pods on it. The following introduces how to adjust the node's related configuration thresholds, which can be appropriately adjusted according to your environment.
📄️ Pause Cluster Services
During cluster maintenance operations, such as maintaining cluster databases, etc., it is necessary to pause cluster services. This article introduces methods to pause platform services without affecting normal virtual machine operation.