Member of Technical Staff — Model Optimization and Inference (Experienced)
Posted
$250,000 USD
Tech Stack
Responsibilities
- Own end-to-end inference optimization across LLMs, audio models, and diffusion-based components.
- Implement and tune KV cache strategies for long-context conversations, including eviction, compression, and memory-efficient attention.
- Evaluate, deploy, and extend inference serving frameworks like vLLM, SGLang, and TensorRT-LLM for specific workloads.
- Profile and benchmark end-to-end latency and throughput; identify and systematically eliminate bottlenecks.
- Build internal tooling for faster and more rigorous optimization, including profiling viewers and end-to-end inference test harnesses.
Benefits
- 401k
- Equity
Culture
Deep Work FocusCustomer-ObsessedStartup EnergyCollaborative SpaceInclusive Hiring
Requirements
Regions: Us
Get jobs like this in your inbox
Weekly Next.js, Python, React hiring trends and salary data — free.
Join 6 engineers getting weekly insights
Get market intelligence in your inbox
Free weekly insights on tech hiring trends, salaries, and in-demand stacks.
Already a subscriber? Sign in
About Nuance Communications
Industry: ai
Size: startup
Nuance Labs is a Series A company building photorealistic, real-time AI avatars with emotional intelligence that can listen, speak, react, interrupt, and respond like a real person. They focus on developing foundation models for full-duplex audiovisual AI to achieve natural conversation with sub-500ms response times.
View company profile →Compensation
Base salary: $250,000 USD
Equity: Meaningful equity structured for long-term ownership