Transcription Jobs

The Inference service supports asynchronous transcription via Kubernetes jobs. Instead of blocking on a synchronous request, you submit a job that progresses through a defined lifecycle and produces downloadable artifacts.

Job Lifecycle

queued → downloading → transcoding → transcribing → completed
                                                   → failed

Phase	Description
`queued`	Job created, waiting for a worker
`downloading`	Worker is fetching the source audio
`transcoding`	Normalizing audio format for the transcription model
`transcribing`	Running the transcription model
`completed`	Transcript ready with artifacts
`failed`	Job failed (retryable based on `maxAttempts`)

API Endpoints

Create a Transcription Job

POST /api/v1/inference/transcriptions/jobs

Request:

{
  "model": "whisper-1",
  "providerId": "openai-prod",
  "source": {
    "kind": "url",
    "url": "https://example.com/recording.mp3"
  },
  "language": "en",
  "responseFormat": "vtt",
  "maxAttempts": 3
}

Source types:

Kind	Fields	Description
`url`	`url`	Direct URL to an audio file
`stageDrive`	`sessionId`, `fileId`	Audio file stored in a Stage session's drive

Optional parameters:

Field	Type	Default	Description
`language`	string	auto-detect	Source language hint
`prompt`	string	—	Context prompt for the model
`responseFormat`	string	`"vtt"`	Output format
`temperature`	number	—	Sampling temperature
`maxAttempts`	number	1	Maximum retry attempts on failure

Response:

{
  "success": true,
  "data": {
    "id": "tj_abc123",
    "status": "queued",
    "phase": "queued",
    "model": "whisper-1",
    "requestedAt": "2026-03-08T10:00:00Z",
    "artifacts": [],
    "result": null
  }
}

List Transcription Jobs

GET /api/v1/inference/transcriptions/jobs

Returns all jobs for the authenticated user, ordered by creation time.

Get Job Status

GET /api/v1/inference/transcriptions/jobs/:id

Returns the full job record including artifacts and transcript result when completed.

Completed response:

{
  "success": true,
  "data": {
    "id": "tj_abc123",
    "status": "completed",
    "phase": "completed",
    "model": "whisper-1",
    "requestedAt": "2026-03-08T10:00:00Z",
    "completedAt": "2026-03-08T10:02:30Z",
    "artifacts": [
      {
        "kind": "normalizedAudio",
        "filename": "recording-normalized.wav",
        "contentType": "audio/wav",
        "sizeBytes": 2048000,
        "url": "https://..."
      },
      {
        "kind": "transcriptText",
        "filename": "recording.vtt",
        "contentType": "text/vtt",
        "sizeBytes": 4200,
        "url": "https://..."
      },
      {
        "kind": "transcriptJson",
        "filename": "recording.json",
        "contentType": "application/json",
        "sizeBytes": 12400,
        "url": "https://..."
      }
    ],
    "result": {
      "text": "Welcome to the quarterly review...",
      "language": "en",
      "duration": 245.6,
      "segments": [
        {
          "start": 0.0,
          "end": 3.2,
          "text": "Welcome to the quarterly review"
        }
      ]
    }
  }
}

Retry a Failed Job

POST /api/v1/inference/transcriptions/jobs/:id/retry

Re-queues a failed job if it hasn't exceeded maxAttempts.

Artifacts

Each completed job produces up to three artifact types:

Kind	Description	Content Type
`normalizedAudio`	Audio converted to a standard format for the model	`audio/wav`
`transcriptText`	Human-readable transcript (VTT format)	`text/vtt`
`transcriptJson`	Detailed transcript with timestamps and segments	`application/json`

Artifacts are stored in the platform's file storage and referenced by URL in the job record. Artifact URLs are stable for the lifetime of the job record.

Kubernetes Execution

Transcription jobs run as Kubernetes Job resources. The platform dispatches each job to a dedicated pod with configurable resource limits.

Environment variables:

Variable	Description
`SHIFT_TRANSCRIPTION_RUNNER_MODE`	`k8s-job` or `deployment`
`SHIFT_TRANSCRIPTION_JOB_IMAGE`	Docker image for worker pods
`SHIFT_TRANSCRIPTION_JOB_REQUEST_CPU`	CPU request
`SHIFT_TRANSCRIPTION_JOB_REQUEST_MEMORY`	Memory request
`SHIFT_TRANSCRIPTION_JOB_LIMIT_CPU`	CPU limit
`SHIFT_TRANSCRIPTION_JOB_LIMIT_MEMORY`	Memory limit
`SHIFT_TRANSCRIPTION_JOB_TTL_SECONDS`	Job TTL after completion (default: 900)
`SHIFT_TRANSCRIPTION_JOB_BACKOFF_LIMIT`	K8s Job backoff limit (default: 0)

Data Model

interface TranscriptionJob {
  sid: string;
  model: string;
  providerId: string;
  source: UrlSource | StageDriveSource;
  status: "queued" | "processing" | "completed" | "failed";
  phase: "queued" | "downloading" | "transcoding" | "transcribing" | "completed" | "failed";
  requestedBy: string;
  requestedAt: string;
  startedAt?: string;
  completedAt?: string;
  attempts: number;
  maxAttempts: number;
  error?: string;
  artifacts: Artifact[];
  result?: {
    text: string;
    language: string;
    duration: number;
    segments: Array<{ start: number; end: number; text: string }>;
  };
}

Jobs are stored in the Convex inference table namespace and indexed for efficient status-based queries.

Job Lifecycle​

API Endpoints​

Create a Transcription Job​

List Transcription Jobs​

Get Job Status​

Retry a Failed Job​

Artifacts​

Kubernetes Execution​

Data Model​