Inference (AI Model Routing)
Inference is the AI model routing service for the Shift Platform. It provides a unified API for chat completions, embeddings, image generation, and audio transcription across multiple AI providers (Anthropic, OpenAI, Google, and Opencode-compatible endpoints).

What It Does
- Provider Management — Register and manage AI provider configurations with API key references.
- Model Syncing — Sync available models from each provider with capability metadata.
- Chat Completions — Send synchronous or streaming chat requests to any configured provider.
- Embeddings — Generate text embeddings through provider-specific models.
- Image Generation — Generate images via models like DALL-E 3 and GPT Image 1.
- Audio Transcription — Transcribe audio files via models like Whisper.
- Usage Tracking — Track all inference requests with token counts, latency, and error rates.
Key Concepts
| Concept | Description |
|---|---|
| Provider | A configured AI service (Anthropic, OpenAI, Google, or Opencode) with connection details. |
| Model | A specific AI model with capability flags (streaming, embeddings, images, transcription). |
| Request | A logged inference operation with token usage, duration, and status. |
Configuration
| Setting | Value |
|---|---|
| Storage Directory | .inference/ |
| API Dev Port | 4008 |
| Gateway Prefix | /api/v1/inference/* |
Supported Providers
| Provider | Type | Models | Env Variable |
|---|---|---|---|
| Anthropic | anthropic | Claude family | ANTHROPIC_API_KEY |
| OpenAI | openai | GPT, DALL-E, Whisper, Embeddings | OPENAI_API_KEY |
google | Gemini family | GEMINI_API_KEY | |
| Opencode | opencode | OpenCode / Zen-compatible catalogs | OPENCODE_API_KEY |
Getting Started
Register a provider:
shift-cli inference providers add --name openai --type openai
Sync available models:
shift-cli inference models sync --provider <provider-id>
Send a chat completion:
curl -X POST http://localhost:3000/api/v1/inference/chat \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-20250514",
"messages": [{"role": "user", "content": "Hello"}]
}'
Check usage:
shift-cli inference usage --since 2026-03-01