Inference API Reference
All Inference endpoints are served under the /api/v1/inference prefix via the gateway.
Endpoints
Providers
| Method | Path | Description |
|---|---|---|
| GET | /api/v1/inference/providers | List registered providers |
| POST | /api/v1/inference/providers | Register a provider |
| GET | /api/v1/inference/providers/:idOrName | Get provider by ID or name |
| PUT | /api/v1/inference/providers/:idOrName | Update provider |
| DELETE | /api/v1/inference/providers/:idOrName | Delete provider |
Models
| Method | Path | Description |
|---|---|---|
| GET | /api/v1/inference/models | List synced models (filter: ?provider=) |
| POST | /api/v1/inference/models/sync | Sync models from a provider |
Chat
| Method | Path | Description |
|---|---|---|
| POST | /api/v1/inference/chat | Synchronous chat completion |
| POST | /api/v1/inference/chat/stream | Streaming chat completion (SSE) |
Embeddings
| Method | Path | Description |
|---|---|---|
| POST | /api/v1/inference/embed | Generate text embeddings |
Images
| Method | Path | Description |
|---|---|---|
| POST | /api/v1/inference/images | Generate images |
Transcription
| Method | Path | Description |
|---|---|---|
| POST | /api/v1/inference/transcribe | Transcribe audio (JSON or multipart) |
Usage
| Method | Path | Description |
|---|---|---|
| GET | /api/v1/inference/usage | Query usage stats (filter: ?provider=, ?since=, ?until=, ?limit=) |
| GET | /api/v1/inference/requests/:id | Get a specific request log |
Examples
Register a Provider
curl -X POST http://localhost:3000/api/v1/inference/providers \
-H "Content-Type: application/json" \
-d '{
"name": "openai",
"type": "openai",
"defaultModel": "gpt-4o",
"envVar": "OPENAI_API_KEY"
}'
Response:
{
"success": true,
"data": {
"id": "a1b2c3d4",
"name": "openai",
"type": "openai",
"defaultModel": "gpt-4o",
"envVar": "OPENAI_API_KEY",
"created": "2026-03-05T12:00:00.000Z",
"updated": "2026-03-05T12:00:00.000Z"
}
}
Chat Completion
curl -X POST http://localhost:3000/api/v1/inference/chat \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-20250514",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"maxTokens": 100
}'
Response:
{
"success": true,
"data": {
"content": "The capital of France is Paris.",
"model": "claude-sonnet-4-20250514",
"usage": {
"inputTokens": 15,
"outputTokens": 12
}
}
}
Streaming Chat (SSE)
curl -X POST http://localhost:3000/api/v1/inference/chat/stream \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello"}]
}'
Returns Server-Sent Events:
data: {"type":"content_block_delta","delta":{"text":"Hello"}}
data: {"type":"message_stop","usage":{"inputTokens":10,"outputTokens":5}}
Generate an Image
curl -X POST http://localhost:3000/api/v1/inference/images \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-image-1",
"prompt": "A sunset over mountains",
"size": "1024x1024"
}'
Generate Embeddings
curl -X POST http://localhost:3000/api/v1/inference/embed \
-H "Content-Type: application/json" \
-d '{
"model": "text-embedding-3-small",
"input": "The quick brown fox"
}'
Transcribe Audio
# JSON with base64 audio
curl -X POST http://localhost:3000/api/v1/inference/transcribe \
-H "Content-Type: application/json" \
-d '{
"model": "whisper-1",
"audio": "<base64-encoded-audio>",
"language": "en"
}'
# Multipart upload
curl -X POST http://localhost:3000/api/v1/inference/transcribe \
-F "model=whisper-1" \
-F "file=@recording.mp3"
Query Usage
curl "http://localhost:3000/api/v1/inference/usage?since=2026-03-01&provider=openai"
Response:
{
"success": true,
"data": {
"summary": {
"totalRequests": 42,
"totalErrors": 1,
"totalInputTokens": 15000,
"totalOutputTokens": 8500,
"totalTokens": 23500,
"avgDurationMs": 450
},
"requests": [ ... ]
}
}
Auto-Resolution
The Inference API automatically resolves the provider from the model name when providerId is not specified:
| Model Prefix | Provider |
|---|---|
claude* | Anthropic |
gpt-*, text-embedding-3* | OpenAI |
gemini*, text-embedding-004 | |
opencode/* | Opencode |
gpt-image-1, dall-e-3 | OpenAI |
whisper-1 | OpenAI |
Response Envelope
All responses follow the standard envelope format:
Success:
{
"success": true,
"data": { ... }
}
Error:
{
"success": false,
"error": {
"code": "BAD_REQUEST",
"message": "model is required"
}
}