AI Inference

Documentation for LLM inference services, including deployment and usage instructions for Ollama.

Ollama Quick Start

📄️ Ollama

Ollama is a lightweight local LLM inference service that requires a GPU, suitable for quick validation and small-scale inference scenarios.

Inference templates are used to preset a set of reusable runtime configurations for AI inference instances. When creating an instance, selecting a template applies the type, image, CPU, memory, GPU, data disk, mounted models, and port mapping settings all at once.

📄️ Inference Model Library

The inference model library is used to consolidate model data needed by inference instances into reusable, mountable, and cacheable model entries within the platform. This document currently focuses on the Ollama scenario as the main thread. It is responsible for centrally managing model sources, versions, model data images, and mount relationships, but does not directly provide inference interfaces.

📄️ Ollama

📄️ Inference Templates

📄️ Inference Model Library