TL, Research Inference

Posted Mar 19, 2026

OpenAISan Franciscofull-time

Tech Stack

Design and build high-performance inference runtimes for large-scale AI models, focusing on efficiency, reliability, and scalability.
Own and optimize core execution paths, including model execution, memory management, batching, and scheduling.
Develop and improve distributed inference across multiple GPUs, including parallelism strategies, communication patterns, and runtime coordination.
Implement and optimize inference-critical operators and kernels informed by real-world workloads.
Partner closely with research teams to ensure new model architectures are supported accurately and efficiently in inference systems.