Operations Guide
Covers AI Cloud platform operations and management, including upgrading, GPU operations, AI instance operations, service component operations, and more.
Upgrading
1 item
Kubernetes Operations
7 items
Database Operations
3 items
High Availability Environment
3 items
Frontend Component Operations
4 items
Log Operations
2 items
Platform Common Issues
1 item
Fault Recovery
This document describes recovery methods for common abnormal faults in the platform.
Uninstallation
Depending on the installation method, the uninstallation methods vary as follows.
配置 NVIDIA MPS 环境
MPS(Multi-Process Service) 是NVIDIA为CUDA设计的多进程并发执行机制,允许多个CPU进程共享同一GPU的CUDA Context,从而突破默认单进程独占GPU的限制,实现多个进程的CUDA Kernel真正并行执行。