Transcription Jobs
The Inference service supports asynchronous transcription via Kubernetes jobs. Instead of blocking on a synchronous request, you submit a job that progresses through a defined lifecycle and produces downloadable artifacts.
Job Lifecycle
queued → downloading → transcoding → transcribing → completed
→ failed
| Phase | Description |
|---|---|
queued | Job created, waiting for a worker |
downloading | Worker is fetching the source audio |
transcoding | Normalizing audio format for the transcription model |
transcribing | Running the transcription model |
completed | Transcript ready with artifacts |
failed | Job failed (retryable based on maxAttempts) |
API Endpoints
Create a Transcription Job
POST /api/v1/inference/transcriptions/jobs
Request:
{
"model": "whisper-1",
"providerId": "openai-prod",
"source": {
"kind": "url",
"url": "https://example.com/recording.mp3"
},
"language": "en",
"responseFormat": "vtt",
"maxAttempts": 3
}
Source types:
| Kind | Fields | Description |
|---|---|---|
url | url | Direct URL to an audio file |
stageDrive | sessionId, fileId | Audio file stored in a Stage session's drive |
Optional parameters:
| Field | Type | Default | Description |
|---|---|---|---|
language | string | auto-detect | Source language hint |
prompt | string | — | Context prompt for the model |
responseFormat | string | "vtt" | Output format |
temperature | number | — | Sampling temperature |
maxAttempts | number | 1 | Maximum retry attempts on failure |
Response:
{
"success": true,
"data": {
"id": "tj_abc123",
"status": "queued",
"phase": "queued",
"model": "whisper-1",
"requestedAt": "2026-03-08T10:00:00Z",
"artifacts": [],
"result": null
}
}
List Transcription Jobs
GET /api/v1/inference/transcriptions/jobs
Returns all jobs for the authenticated user, ordered by creation time.
Get Job Status
GET /api/v1/inference/transcriptions/jobs/:id
Returns the full job record including artifacts and transcript result when completed.
Completed response:
{
"success": true,
"data": {
"id": "tj_abc123",
"status": "completed",
"phase": "completed",
"model": "whisper-1",
"requestedAt": "2026-03-08T10:00:00Z",
"completedAt": "2026-03-08T10:02:30Z",
"artifacts": [
{
"kind": "normalizedAudio",
"filename": "recording-normalized.wav",
"contentType": "audio/wav",
"sizeBytes": 2048000,
"url": "https://..."
},
{
"kind": "transcriptText",
"filename": "recording.vtt",
"contentType": "text/vtt",
"sizeBytes": 4200,
"url": "https://..."
},
{
"kind": "transcriptJson",
"filename": "recording.json",
"contentType": "application/json",
"sizeBytes": 12400,
"url": "https://..."
}
],
"result": {
"text": "Welcome to the quarterly review...",
"language": "en",
"duration": 245.6,
"segments": [
{
"start": 0.0,
"end": 3.2,
"text": "Welcome to the quarterly review"
}
]
}
}
}
Retry a Failed Job
POST /api/v1/inference/transcriptions/jobs/:id/retry
Re-queues a failed job if it hasn't exceeded maxAttempts.
Artifacts
Each completed job produces up to three artifact types:
| Kind | Description | Content Type |
|---|---|---|
normalizedAudio | Audio converted to a standard format for the model | audio/wav |
transcriptText | Human-readable transcript (VTT format) | text/vtt |
transcriptJson | Detailed transcript with timestamps and segments | application/json |
Artifacts are stored in the platform's file storage and referenced by URL in the job record. Artifact URLs are stable for the lifetime of the job record.
Kubernetes Execution
Transcription jobs run as Kubernetes Job resources. The platform dispatches each job to a dedicated pod with configurable resource limits.
Environment variables:
| Variable | Description |
|---|---|
SHIFT_TRANSCRIPTION_RUNNER_MODE | k8s-job or deployment |
SHIFT_TRANSCRIPTION_JOB_IMAGE | Docker image for worker pods |
SHIFT_TRANSCRIPTION_JOB_REQUEST_CPU | CPU request |
SHIFT_TRANSCRIPTION_JOB_REQUEST_MEMORY | Memory request |
SHIFT_TRANSCRIPTION_JOB_LIMIT_CPU | CPU limit |
SHIFT_TRANSCRIPTION_JOB_LIMIT_MEMORY | Memory limit |
SHIFT_TRANSCRIPTION_JOB_TTL_SECONDS | Job TTL after completion (default: 900) |
SHIFT_TRANSCRIPTION_JOB_BACKOFF_LIMIT | K8s Job backoff limit (default: 0) |
Data Model
interface TranscriptionJob {
sid: string;
model: string;
providerId: string;
source: UrlSource | StageDriveSource;
status: "queued" | "processing" | "completed" | "failed";
phase: "queued" | "downloading" | "transcoding" | "transcribing" | "completed" | "failed";
requestedBy: string;
requestedAt: string;
startedAt?: string;
completedAt?: string;
attempts: number;
maxAttempts: number;
error?: string;
artifacts: Artifact[];
result?: {
text: string;
language: string;
duration: number;
segments: Array<{ start: number; end: number; text: string }>;
};
}
Jobs are stored in the Convex inference table namespace and indexed for efficient status-based queries.