Tech Stack
Responsibilities
- Design and build best-in-class AI evaluation system: curated datasets, offline replay, scorers / judges, regression alerts, and dashboards.
- Design feedback loops from real usage: collecting, cleaning, and interpreting user signals to inform model and harness changes.
- Develop analysis tooling and workflows for debugging agent behavior: deep dives on failure modes, clustering themes, and surfacing actionable insights.
- Improve reliability and guardrails by making quality measurable and operational: defining “good/bad/degraded” sessions, alerting, and triage primitives.
Culture
Flat HierarchyCross-Functional TeamsTruth-SeekingPassionateCreative
Get jobs like this in your inbox
Weekly Ruby, TypeScript hiring trends and salary data — free.
Join 6 engineers getting weekly insights
Get market intelligence in your inbox
Free weekly insights on tech hiring trends, salaries, and in-demand stacks.
Already a subscriber? Sign in
About Cursor
Industry: saas
Size: startup
Cursor is building an AI-powered tool for professional programmers with the mission to automate coding through inventive research, design, and engineering. The company is small and talent dense, focusing on building and shipping code quickly.
View company profile →