Inference API Reference

All Inference endpoints are served under the /api/v1/inference prefix via the gateway.

Endpoints

Providers

Method	Path	Description
GET	`/api/v1/inference/providers`	List registered providers
POST	`/api/v1/inference/providers`	Register a provider
GET	`/api/v1/inference/providers/:idOrName`	Get provider by ID or name
PUT	`/api/v1/inference/providers/:idOrName`	Update provider
DELETE	`/api/v1/inference/providers/:idOrName`	Delete provider

Models

Method	Path	Description
GET	`/api/v1/inference/models`	List synced models (filter: `?provider=`)
POST	`/api/v1/inference/models/sync`	Sync models from a provider

Chat

Method	Path	Description
POST	`/api/v1/inference/chat`	Synchronous chat completion
POST	`/api/v1/inference/chat/stream`	Streaming chat completion (SSE)

Embeddings

Method	Path	Description
POST	`/api/v1/inference/embed`	Generate text embeddings

Images

Method	Path	Description
POST	`/api/v1/inference/images`	Generate images

Transcription

Method	Path	Description
POST	`/api/v1/inference/transcribe`	Transcribe audio (JSON or multipart)

Usage

Method	Path	Description
GET	`/api/v1/inference/usage`	Query usage stats (filter: `?provider=`, `?since=`, `?until=`, `?limit=`)
GET	`/api/v1/inference/requests/:id`	Get a specific request log

Examples

Register a Provider

curl -X POST http://localhost:3000/api/v1/inference/providers \
  -H "Content-Type: application/json" \
  -d '{
    "name": "openai",
    "type": "openai",
    "defaultModel": "gpt-4o",
    "envVar": "OPENAI_API_KEY"
  }'

Response:

{
  "success": true,
  "data": {
    "id": "a1b2c3d4",
    "name": "openai",
    "type": "openai",
    "defaultModel": "gpt-4o",
    "envVar": "OPENAI_API_KEY",
    "created": "2026-03-05T12:00:00.000Z",
    "updated": "2026-03-05T12:00:00.000Z"
  }
}

Chat Completion

curl -X POST http://localhost:3000/api/v1/inference/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "maxTokens": 100
  }'

Response:

{
  "success": true,
  "data": {
    "content": "The capital of France is Paris.",
    "model": "claude-sonnet-4-20250514",
    "usage": {
      "inputTokens": 15,
      "outputTokens": 12
    }
  }
}

Streaming Chat (SSE)

curl -X POST http://localhost:3000/api/v1/inference/chat/stream \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Returns Server-Sent Events:

data: {"type":"content_block_delta","delta":{"text":"Hello"}}

data: {"type":"message_stop","usage":{"inputTokens":10,"outputTokens":5}}

Generate an Image

curl -X POST http://localhost:3000/api/v1/inference/images \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-image-1",
    "prompt": "A sunset over mountains",
    "size": "1024x1024"
  }'

Generate Embeddings

curl -X POST http://localhost:3000/api/v1/inference/embed \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "The quick brown fox"
  }'

Transcribe Audio

# JSON with base64 audio
curl -X POST http://localhost:3000/api/v1/inference/transcribe \
  -H "Content-Type: application/json" \
  -d '{
    "model": "whisper-1",
    "audio": "<base64-encoded-audio>",
    "language": "en"
  }'

# Multipart upload
curl -X POST http://localhost:3000/api/v1/inference/transcribe \
  -F "model=whisper-1" \
  -F "file=@recording.mp3"

Query Usage

curl "http://localhost:3000/api/v1/inference/usage?since=2026-03-01&provider=openai"

Response:

{
  "success": true,
  "data": {
    "summary": {
      "totalRequests": 42,
      "totalErrors": 1,
      "totalInputTokens": 15000,
      "totalOutputTokens": 8500,
      "totalTokens": 23500,
      "avgDurationMs": 450
    },
    "requests": [ ... ]
  }
}

Auto-Resolution

The Inference API automatically resolves the provider from the model name when providerId is not specified:

Model Prefix	Provider
`claude*`	Anthropic
`gpt-`, `text-embedding-3`	OpenAI
`gemini*`, `text-embedding-004`	Google
`opencode/*`	Opencode
`gpt-image-1`, `dall-e-3`	OpenAI
`whisper-1`	OpenAI

Response Envelope

All responses follow the standard envelope format:

Success:

{
  "success": true,
  "data": { ... }
}

Error:

{
  "success": false,
  "error": {
    "code": "BAD_REQUEST",
    "message": "model is required"
  }
}

Endpoints​

Providers​

Models​

Chat​

Embeddings​

Images​

Transcription​

Usage​

Examples​

Register a Provider​

Chat Completion​

Streaming Chat (SSE)​

Generate an Image​

Generate Embeddings​

Transcribe Audio​

Query Usage​

Auto-Resolution​

Response Envelope​

Endpoints

Providers

Models

Chat

Embeddings

Images

Transcription

Usage

Examples

Register a Provider

Chat Completion

Streaming Chat (SSE)

Generate an Image

Generate Embeddings

Transcribe Audio

Query Usage

Auto-Resolution

Response Envelope