Tech Stack
Responsibilities
- Own the reliability of Baseten's multi-cloud Kubernetes infrastructure, including incident response, post-mortems, and remediation tracking.
- Build and maintain observability infrastructure — metrics, logging, dashboards, and alerting — as code.
- Author, validate, and improve runbooks for recurring failure patterns, ensuring they're structured for low-context, safe execution.
- Identify high-frequency failure patterns and convert them into automated mitigations or self-healing automations.
- Diagnose and resolve runtime issues related to latency, memory behavior, GPU utilization, concurrency, and model lifecycle management.
Benefits
- 401k
- Equity
- Health Insurance
- Parental Leave
Culture
Mission-DrivenFast-PacedCollaborative SpaceInclusive Hiring
Requirements
Regions: Us
Get jobs like this in your inbox
Weekly Express, Git, Kubernetes hiring trends and salary data — free.
Join 6 engineers getting weekly insights
Get market intelligence in your inbox
Free weekly insights on tech hiring trends, salaries, and in-demand stacks.
Already a subscriber? Sign in
About baseten
Industry: ai
Size: startup
Baseten powers mission-critical inference for dynamic AI companies by uniting applied AI research, flexible infrastructure, and seamless developer tooling to bring cutting-edge models into production.
View company profile →Compensation
Equity: meaningful equity