Inference (AI Model Routing)

Inference is the AI model routing service for the Shift Platform. It provides a unified API for chat completions, embeddings, image generation, and audio transcription across multiple AI providers (Anthropic, OpenAI, Google, and Opencode-compatible endpoints).

Inference admin dashboard showing providers, models, requests, and token usage

What It Does

Provider Management — Register and manage AI provider configurations with API key references.
Model Syncing — Sync available models from each provider with capability metadata.
Chat Completions — Send synchronous or streaming chat requests to any configured provider.
Embeddings — Generate text embeddings through provider-specific models.
Image Generation — Generate images via models like DALL-E 3 and GPT Image 1.
Audio Transcription — Transcribe audio files via models like Whisper.
Usage Tracking — Track all inference requests with token counts, latency, and error rates.

Key Concepts

Concept	Description
Provider	A configured AI service (Anthropic, OpenAI, Google, or Opencode) with connection details.
Model	A specific AI model with capability flags (streaming, embeddings, images, transcription).
Request	A logged inference operation with token usage, duration, and status.

Configuration

Setting	Value
Storage Directory	`.inference/`
API Dev Port	4008
Gateway Prefix	`/api/v1/inference/*`

Supported Providers

Provider	Type	Models	Env Variable
Anthropic	`anthropic`	Claude family	`ANTHROPIC_API_KEY`
OpenAI	`openai`	GPT, DALL-E, Whisper, Embeddings	`OPENAI_API_KEY`
Google	`google`	Gemini family	`GEMINI_API_KEY`
Opencode	`opencode`	OpenCode / Zen-compatible catalogs	`OPENCODE_API_KEY`

Getting Started

shift-cli inference providers add --name openai --type openai

Sync available models:

shift-cli inference models sync --provider <provider-id>

Send a chat completion:

curl -X POST http://localhost:3000/api/v1/inference/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Check usage:

shift-cli inference usage --since 2026-03-01

What It Does​

Key Concepts​

Configuration​

Supported Providers​

Getting Started​

What It Does

Key Concepts

Configuration

Supported Providers

Getting Started