Skip to main content

Inference (AI Model Routing)

Inference is the AI model routing service for the Shift Platform. It provides a unified API for chat completions, embeddings, image generation, and audio transcription across multiple AI providers (Anthropic, OpenAI, Google, and Opencode-compatible endpoints).

Inference admin dashboard showing providers, models, requests, and token usage

What It Does

  • Provider Management — Register and manage AI provider configurations with API key references.
  • Model Syncing — Sync available models from each provider with capability metadata.
  • Chat Completions — Send synchronous or streaming chat requests to any configured provider.
  • Embeddings — Generate text embeddings through provider-specific models.
  • Image Generation — Generate images via models like DALL-E 3 and GPT Image 1.
  • Audio Transcription — Transcribe audio files via models like Whisper.
  • Usage Tracking — Track all inference requests with token counts, latency, and error rates.

Key Concepts

ConceptDescription
ProviderA configured AI service (Anthropic, OpenAI, Google, or Opencode) with connection details.
ModelA specific AI model with capability flags (streaming, embeddings, images, transcription).
RequestA logged inference operation with token usage, duration, and status.

Configuration

SettingValue
Storage Directory.inference/
API Dev Port4008
Gateway Prefix/api/v1/inference/*

Supported Providers

ProviderTypeModelsEnv Variable
AnthropicanthropicClaude familyANTHROPIC_API_KEY
OpenAIopenaiGPT, DALL-E, Whisper, EmbeddingsOPENAI_API_KEY
GooglegoogleGemini familyGEMINI_API_KEY
OpencodeopencodeOpenCode / Zen-compatible catalogsOPENCODE_API_KEY

Getting Started

Register a provider:

shift-cli inference providers add --name openai --type openai

Sync available models:

shift-cli inference models sync --provider <provider-id>

Send a chat completion:

curl -X POST http://localhost:3000/api/v1/inference/chat \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-20250514",
"messages": [{"role": "user", "content": "Hello"}]
}'

Check usage:

shift-cli inference usage --since 2026-03-01