Tech Stack
Responsibilities
- Design and conduct experiments to evaluate AI model behavior across reasoning, style, robustness, and user preference dimensions.
- Develop new metrics, methodologies, and evaluation protocols that go beyond traditional benchmarks.
- Analyze large-scale human voting and interaction data to uncover insights into model performance and user preferences.
- Collaborate with engineers to implement and scale research findings into production systems.
- Author internal reports and external publications that contribute to the broader ML research community.
Benefits
- Equity
- Gym Membership
- Health Insurance
Culture
Cross-Functional TeamsMission-DrivenTransparent Leadership
Requirements
Required: PhD or equivalent research experience in Machine Learning, Natural Language Processing, Statistics, or a related field
Regions: Us
Get jobs like this in your inbox
Weekly Express, Go, Python hiring trends and salary data — free.
Join 6 engineers getting weekly insights
Get market intelligence in your inbox
Free weekly insights on tech hiring trends, salaries, and in-demand stacks.
Already a subscriber? Sign in
About arena
Industry: ai
Size: small
Arena Intelligence is an open platform created by UC Berkeley's SkyLab researchers, focused on evaluating AI model performance in the real world to advance transparent, rigorous, and human-centered evaluations. It provides leaderboards and evaluation tools trusted by enterprises and AI labs to understand real-world reliability, alignment, and impact.
View company profile →Compensation
Equity: Equity aligned to the markets