Model Evaluation QA Lead

Posted Feb 9, 2026

DeepgramUSA | RemoteRemotefull-timelead

Apply Now

Tech Stack

AWSExpressGitGoPythonRustTypeScript

Responsibilities

Design, build, and maintain automated model evaluation pipelines for candidate models, implementing objective and subjective quality metrics across STT, TTS, and STS products.
Embed model quality checkpoints into CI/CD and release pipelines, defining pass/fail criteria and owning the go/no-go signal for production promotions.
Stand up and operate evaluation tooling for end-to-end voice agent testing, covering accuracy, latency, turn-taking, conversational quality, and custom metrics.
Partner with the Active Learning team to validate data ingestion infrastructure, annotation pipelines, and retraining automation.
Automate execution and reporting of industry-standard benchmarks and maintain reproducible benchmark environments across multiple model versions.

Benefits

401k
Flexible Hours
Gym Membership
Health Insurance
Learning Budget
Parental Leave
Remote Stipend
Unlimited PTO

Culture

AI-FirstFast-PacedExperimentationAdaptabilityContinuous LearningCustomer-ObsessedCross-Functional TeamsCollaborationInclusive Hiring

Requirements

Regions: Us

About Deepgram

Industry: ai

Size: small

Deepgram is the leading Voice AI platform providing real-time APIs for speech-to-text, text-to-speech, and building production-grade voice agents, trusted by over 200,000 developers and 1,300+ organizations. The company's voice-native foundation models offer unmatched accuracy, low latency, and cost efficiency, having processed over 50,000 years of audio.

View company profile →

Cursor · San Francisco

Member of Technical Staff (Data Scientist, Evals)

perplexity · San Francisco

SDET / QA Lead

saris-ai · San Francisco

Research Staff, Data Science

Deepgram · USA | Remote

Remote

Defense / Edge Tech Lead

Deepgram · USA | Remote

Remote