Tech Stack
Responsibilities
- Design and conduct experiments to evaluate AI model behavior across reasoning, style, robustness, and user preference dimensions.
- Develop new metrics, methodologies, and evaluation protocols that go beyond traditional benchmarks.
- Analyze large-scale human voting and interaction data to uncover insights into model performance and user preferences.
- Communicate results with the broader research community via academic papers, educational content, and conference talks.
- Collaborate with engineers to implement and scale research findings into production systems.
Benefits
- Equity
- Gym Membership
- Health Insurance
- Learning Budget
Culture
Mission-DrivenTransparencyWork-Life BalanceCross-Functional TeamsMentorship Program
Requirements
Required: PhD or equivalent research experience in Machine Learning, Natural Language Processing, Statistics, or a related field
Regions: Us
Get jobs like this in your inbox
Weekly Express, Go, Python hiring trends and salary data — free.
Join 6 engineers getting weekly insights
Get market intelligence in your inbox
Free weekly insights on tech hiring trends, salaries, and in-demand stacks.
Already a subscriber? Sign in
About arena
Industry: ai
Size: small
Arena Intelligence is an open platform created by UC Berkeley's SkyLab researchers, focused on evaluating AI model performance in the real world to advance transparent, rigorous, and human-centered evaluations. It provides leaderboards and evaluation tools trusted by enterprises and AI labs to understand real-world reliability, alignment, and impact.
View company profile →Compensation
Equity: competitive equity