Senior Research Scientist, Reward Models

Posted Apr 3, 2026

AnthropicRemote-Friendly (Travel Required) | San Francisco, CARemotefull-timestaff

$350,000 USD

Apply Now

Tech Stack

AWSGitPythonRust

Responsibilities

Lead research on novel reward model architectures and training approaches for RLHF.
Develop and evaluate LLM-based grading and evaluation methods, including rubric-driven approaches.
Research techniques to detect, characterize, and mitigate reward hacking and specification gaming.
Design experiments to understand reward model generalization, robustness, and failure modes.
Collaborate with the Finetuning team to translate research insights into improvements for production training pipelines.

Benefits

Equity
Learning Budget
Parental Leave

Culture

Mission-DrivenImpact-OrientedCross-Functional TeamsCollaborative SpaceFlexible Hours

Requirements

Required: Bachelor’s degree or an equivalent combination of education, training, and/or experience

Regions: Us

About Anthropic

Industry: ai

Size: small

Anthropic's mission is to create reliable, interpretable, and steerable AI systems to ensure AI is safe and beneficial for users and society. The team is a quickly growing group of researchers, engineers, policy experts, and business leaders committed to building beneficial AI systems.

View company profile →

Compensation

Base salary: $350,000 USD

Equity: optional equity donation matching

Similar Jobs

Research Lead, Training Insights

Anthropic · Remote-Friendly (Travel Required) | San Francisco, CA; San Francisco, CA | New York City, NY

Remote