Reliability Lead, Common Services
Posted
$206,000 USD
Tech Stack
Responsibilities
- Establish and lead the SRE / production engineering practice for the Common Services organization, defining standards for reliability, incident management, and on-call.
- Develop an Operational Excellence strategy focused on improving system performance and reducing operational toil.
- Partner with engineering and product teams to define SLOs, SLIs, and error budgets for critical Common Services.
- Own and improve the incident management lifecycle for Common Services, including on-call rotations, escalation paths, and post-incident reviews.
- Drive the observability strategy for Common Services, ensuring actionable visibility into system health, performance, and capacity.
Benefits
- 401k
- Equity
- Gym Membership
- Health Insurance
- Parental Leave
Culture
Hybrid WorkContinuous ImprovementBlameless PostmortemsHumane On-CallData-Driven Decision Making
Requirements
Regions: Us
Get jobs like this in your inbox
Weekly AWS, Git, Kubernetes hiring trends and salary data — free.
Join 6 engineers getting weekly insights
Get market intelligence in your inbox
Free weekly insights on tech hiring trends, salaries, and in-demand stacks.
Already a subscriber? Sign in
About CoreWeave
Industry: cloud
Size: large
CoreWeave is The Essential Cloud for AI™, providing a platform of technology, tools, and teams to enable innovators to build and scale AI with confidence.
View company profile →Compensation
Base salary: $206,000 USD
Equity: equity awards, Employee Stock Purchase Program (ESPP)
Bonus: discretionary bonus