Skip to main content

Transcription Jobs

The Inference service supports asynchronous transcription via Kubernetes jobs. Instead of blocking on a synchronous request, you submit a job that progresses through a defined lifecycle and produces downloadable artifacts.

Job Lifecycle

queued → downloading → transcoding → transcribing → completed
→ failed
PhaseDescription
queuedJob created, waiting for a worker
downloadingWorker is fetching the source audio
transcodingNormalizing audio format for the transcription model
transcribingRunning the transcription model
completedTranscript ready with artifacts
failedJob failed (retryable based on maxAttempts)

API Endpoints

Create a Transcription Job

POST /api/v1/inference/transcriptions/jobs

Request:

{
"model": "whisper-1",
"providerId": "openai-prod",
"source": {
"kind": "url",
"url": "https://example.com/recording.mp3"
},
"language": "en",
"responseFormat": "vtt",
"maxAttempts": 3
}

Source types:

KindFieldsDescription
urlurlDirect URL to an audio file
stageDrivesessionId, fileIdAudio file stored in a Stage session's drive

Optional parameters:

FieldTypeDefaultDescription
languagestringauto-detectSource language hint
promptstringContext prompt for the model
responseFormatstring"vtt"Output format
temperaturenumberSampling temperature
maxAttemptsnumber1Maximum retry attempts on failure

Response:

{
"success": true,
"data": {
"id": "tj_abc123",
"status": "queued",
"phase": "queued",
"model": "whisper-1",
"requestedAt": "2026-03-08T10:00:00Z",
"artifacts": [],
"result": null
}
}

List Transcription Jobs

GET /api/v1/inference/transcriptions/jobs

Returns all jobs for the authenticated user, ordered by creation time.

Get Job Status

GET /api/v1/inference/transcriptions/jobs/:id

Returns the full job record including artifacts and transcript result when completed.

Completed response:

{
"success": true,
"data": {
"id": "tj_abc123",
"status": "completed",
"phase": "completed",
"model": "whisper-1",
"requestedAt": "2026-03-08T10:00:00Z",
"completedAt": "2026-03-08T10:02:30Z",
"artifacts": [
{
"kind": "normalizedAudio",
"filename": "recording-normalized.wav",
"contentType": "audio/wav",
"sizeBytes": 2048000,
"url": "https://..."
},
{
"kind": "transcriptText",
"filename": "recording.vtt",
"contentType": "text/vtt",
"sizeBytes": 4200,
"url": "https://..."
},
{
"kind": "transcriptJson",
"filename": "recording.json",
"contentType": "application/json",
"sizeBytes": 12400,
"url": "https://..."
}
],
"result": {
"text": "Welcome to the quarterly review...",
"language": "en",
"duration": 245.6,
"segments": [
{
"start": 0.0,
"end": 3.2,
"text": "Welcome to the quarterly review"
}
]
}
}
}

Retry a Failed Job

POST /api/v1/inference/transcriptions/jobs/:id/retry

Re-queues a failed job if it hasn't exceeded maxAttempts.

Artifacts

Each completed job produces up to three artifact types:

KindDescriptionContent Type
normalizedAudioAudio converted to a standard format for the modelaudio/wav
transcriptTextHuman-readable transcript (VTT format)text/vtt
transcriptJsonDetailed transcript with timestamps and segmentsapplication/json

Artifacts are stored in the platform's file storage and referenced by URL in the job record. Artifact URLs are stable for the lifetime of the job record.

Kubernetes Execution

Transcription jobs run as Kubernetes Job resources. The platform dispatches each job to a dedicated pod with configurable resource limits.

Environment variables:

VariableDescription
SHIFT_TRANSCRIPTION_RUNNER_MODEk8s-job or deployment
SHIFT_TRANSCRIPTION_JOB_IMAGEDocker image for worker pods
SHIFT_TRANSCRIPTION_JOB_REQUEST_CPUCPU request
SHIFT_TRANSCRIPTION_JOB_REQUEST_MEMORYMemory request
SHIFT_TRANSCRIPTION_JOB_LIMIT_CPUCPU limit
SHIFT_TRANSCRIPTION_JOB_LIMIT_MEMORYMemory limit
SHIFT_TRANSCRIPTION_JOB_TTL_SECONDSJob TTL after completion (default: 900)
SHIFT_TRANSCRIPTION_JOB_BACKOFF_LIMITK8s Job backoff limit (default: 0)

Data Model

interface TranscriptionJob {
sid: string;
model: string;
providerId: string;
source: UrlSource | StageDriveSource;
status: "queued" | "processing" | "completed" | "failed";
phase: "queued" | "downloading" | "transcoding" | "transcribing" | "completed" | "failed";
requestedBy: string;
requestedAt: string;
startedAt?: string;
completedAt?: string;
attempts: number;
maxAttempts: number;
error?: string;
artifacts: Artifact[];
result?: {
text: string;
language: string;
duration: number;
segments: Array<{ start: number; end: number; text: string }>;
};
}

Jobs are stored in the Convex inference table namespace and indexed for efficient status-based queries.