Member of Technical Staff — Model Optimization and Inference (Experienced)

Posted Jun 11, 2026

Nuance CommunicationsSeattle, Washingtonfull-timestaff

$250,000 USD

Apply Now

Tech Stack

Next.jsPythonReactTypeScript

Responsibilities

Own end-to-end inference optimization across LLMs, audio models, and diffusion-based components.
Implement and tune KV cache strategies for long-context conversations, including eviction, compression, and memory-efficient attention.
Evaluate, deploy, and extend inference serving frameworks like vLLM, SGLang, and TensorRT-LLM for specific workloads.
Profile and benchmark end-to-end latency and throughput; identify and systematically eliminate bottlenecks.
Build internal tooling for faster and more rigorous optimization, including profiling viewers and end-to-end inference test harnesses.

Benefits

401k
Equity

Culture

Deep Work FocusCustomer-ObsessedStartup EnergyCollaborative SpaceInclusive Hiring

Requirements

Regions: Us

About Nuance Communications

Industry: ai

Size: startup

Nuance Labs is a Series A company building photorealistic, real-time AI avatars with emotional intelligence that can listen, speak, react, interrupt, and respond like a real person. They focus on developing foundation models for full-duplex audiovisual AI to achieve natural conversation with sub-500ms response times.

View company profile →

Compensation

Base salary: $250,000 USD

Equity: Meaningful equity structured for long-term ownership

Nuance Communications · Seattle, Washington

$300k

Member of Technical Staff — RL Research (Experienced)

Nuance Communications · Seattle, Washington

$300k

Member of Technical Staff — RL Research (New PhD Grad)

Nuance Communications · Seattle, Washington

$250k