Skip to main content

Inference API Reference

All Inference endpoints are served under the /api/v1/inference prefix via the gateway.

Endpoints

Providers

MethodPathDescription
GET/api/v1/inference/providersList registered providers
POST/api/v1/inference/providersRegister a provider
GET/api/v1/inference/providers/:idOrNameGet provider by ID or name
PUT/api/v1/inference/providers/:idOrNameUpdate provider
DELETE/api/v1/inference/providers/:idOrNameDelete provider

Models

MethodPathDescription
GET/api/v1/inference/modelsList synced models (filter: ?provider=)
POST/api/v1/inference/models/syncSync models from a provider

Chat

MethodPathDescription
POST/api/v1/inference/chatSynchronous chat completion
POST/api/v1/inference/chat/streamStreaming chat completion (SSE)

Embeddings

MethodPathDescription
POST/api/v1/inference/embedGenerate text embeddings

Images

MethodPathDescription
POST/api/v1/inference/imagesGenerate images

Transcription

MethodPathDescription
POST/api/v1/inference/transcribeTranscribe audio (JSON or multipart)

Usage

MethodPathDescription
GET/api/v1/inference/usageQuery usage stats (filter: ?provider=, ?since=, ?until=, ?limit=)
GET/api/v1/inference/requests/:idGet a specific request log

Examples

Register a Provider

curl -X POST http://localhost:3000/api/v1/inference/providers \
-H "Content-Type: application/json" \
-d '{
"name": "openai",
"type": "openai",
"defaultModel": "gpt-4o",
"envVar": "OPENAI_API_KEY"
}'

Response:

{
"success": true,
"data": {
"id": "a1b2c3d4",
"name": "openai",
"type": "openai",
"defaultModel": "gpt-4o",
"envVar": "OPENAI_API_KEY",
"created": "2026-03-05T12:00:00.000Z",
"updated": "2026-03-05T12:00:00.000Z"
}
}

Chat Completion

curl -X POST http://localhost:3000/api/v1/inference/chat \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-20250514",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"maxTokens": 100
}'

Response:

{
"success": true,
"data": {
"content": "The capital of France is Paris.",
"model": "claude-sonnet-4-20250514",
"usage": {
"inputTokens": 15,
"outputTokens": 12
}
}
}

Streaming Chat (SSE)

curl -X POST http://localhost:3000/api/v1/inference/chat/stream \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello"}]
}'

Returns Server-Sent Events:

data: {"type":"content_block_delta","delta":{"text":"Hello"}}

data: {"type":"message_stop","usage":{"inputTokens":10,"outputTokens":5}}

Generate an Image

curl -X POST http://localhost:3000/api/v1/inference/images \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-image-1",
"prompt": "A sunset over mountains",
"size": "1024x1024"
}'

Generate Embeddings

curl -X POST http://localhost:3000/api/v1/inference/embed \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-small",
"input": "The quick brown fox"
}'

Transcribe Audio

# JSON with base64 audio
curl -X POST http://localhost:3000/api/v1/inference/transcribe \
-H "Content-Type: application/json" \
-d '{
"model": "whisper-1",
"audio": "<base64-encoded-audio>",
"language": "en"
}'

# Multipart upload
curl -X POST http://localhost:3000/api/v1/inference/transcribe \
-F "model=whisper-1" \
-F "file=@recording.mp3"

Query Usage

curl "http://localhost:3000/api/v1/inference/usage?since=2026-03-01&provider=openai"

Response:

{
"success": true,
"data": {
"summary": {
"totalRequests": 42,
"totalErrors": 1,
"totalInputTokens": 15000,
"totalOutputTokens": 8500,
"totalTokens": 23500,
"avgDurationMs": 450
},
"requests": [ ... ]
}
}

Auto-Resolution

The Inference API automatically resolves the provider from the model name when providerId is not specified:

Model PrefixProvider
claude*Anthropic
gpt-*, text-embedding-3*OpenAI
gemini*, text-embedding-004Google
opencode/*Opencode
gpt-image-1, dall-e-3OpenAI
whisper-1OpenAI

Response Envelope

All responses follow the standard envelope format:

Success:

{
"success": true,
"data": { ... }
}

Error:

{
"success": false,
"error": {
"code": "BAD_REQUEST",
"message": "model is required"
}
}