Kubernetes Operations

Introduction to Kubernetes operations management as the platform foundation

Cloudpods services deployed using ocboot run on Kubernetes. This chapter introduces Kubernetes cluster operations-related knowledge and techniques.

Each service binary runs in Kubernetes Pods. For how to diagnose service component issues within Pods, please refer to Troubleshooting Pod Abnormalities.

Container networking between Pods uses the Calico network plugin. For how to troubleshoot Pod network issues, please refer to Troubleshooting Pod Network Issues.

Kubernetes has a container eviction mechanism. When host CPU, memory, and storage usage reach certain thresholds, Kubernetes will actively stop containers. This mechanism is beneficial for stateless services, as it can proactively avoid service degradation due to insufficient host resources. However, for stateful services, such as the host service on each host, this mechanism amplifies the harm of resource shortages. What was originally just possible service capability degradation due to resource shortages becomes complete service unavailability due to the Eviction mechanism. Therefore, we should try to avoid Kubernetes triggering Eviction. Adjusting Kubernetes Node Eviction Threshold introduces methods to adjust the Eviction mechanism of Kubernetes container clusters.

Kubernetes does not have systemd's ability to stop and start services. Pause Cluster Services introduces methods to pause and restore services through operator.

📄️ Common Pod Operations Commands

Introduction to how to restart component services, view component logs, etc.

📄️ Troubleshoot Pod Abnormalities

Troubleshoot errors based on pod status.

📄️ Troubleshoot Pod Network Issues

Troubleshoot DNS resolution failure reasons within Pods.

📄️ Modify OC to Update Cluster Status

Introduction to updating cluster status by modifying OC (cloudpods-operator's CRD resource).

📄️ Adjust Kubernetes Node Eviction Threshold

Kubernetes has a node eviction mechanism. For example, when the node's root partition usage exceeds 85%, it will change the node to NotReady status and then evict Pods on it. The following introduces how to adjust the node's related configuration thresholds, which can be appropriately adjusted according to your environment.

📄️ Pause Cluster Services

During cluster maintenance operations, such as maintaining cluster databases, etc., it is necessary to pause cluster services. This article introduces methods to pause platform services without affecting normal virtual machine operation.