Senior Research Scientist, Reward Models
Posted
$350,000 USD
Tech Stack
Responsibilities
- Lead research on novel reward model architectures and training approaches for RLHF.
- Develop and evaluate LLM-based grading and evaluation methods, including rubric-driven approaches.
- Research techniques to detect, characterize, and mitigate reward hacking and specification gaming.
- Design experiments to understand reward model generalization, robustness, and failure modes.
- Collaborate with the Finetuning team to translate research insights into improvements for production training pipelines.
Benefits
- Equity
- Learning Budget
- Parental Leave
Culture
Mission-DrivenImpact-OrientedCross-Functional TeamsCollaborative SpaceFlexible Hours
Requirements
Required: Bachelor’s degree or an equivalent combination of education, training, and/or experience
Regions: Us
Get jobs like this in your inbox
Weekly AWS, Git, Python hiring trends and salary data — free.
Join 6 engineers getting weekly insights
Get market intelligence in your inbox
Free weekly insights on tech hiring trends, salaries, and in-demand stacks.
Already a subscriber? Sign in
About Anthropic
Industry: ai
Size: small
Anthropic's mission is to create reliable, interpretable, and steerable AI systems to ensure AI is safe and beneficial for users and society. The team is a quickly growing group of researchers, engineers, policy experts, and business leaders committed to building beneficial AI systems.
View company profile →Compensation
Base salary: $350,000 USD
Equity: optional equity donation matching