Engineering Lead, Benchmarking

AI Safety

May 12

Epoch AI

📍 Remote (Global) 🔄 Rolling Applications
💰 $200-300K USD 🕔 Full Time

Epoch AI is looking for a senior Research Engineer to lead the engineering efforts of our Benchmarking team. There, you’ll help us provide independent evaluations of leading AI models to enable researchers, developers, and policymakers to better understand AI development.

About the Role

Epoch AI is a leading research institute investigating trends in artificial intelligence, aiming to provide rigorous, accessible insights into AI development. A core part of our work is understanding the capabilities of state-of-the-art AI models through benchmarking.

We are seeking a talented Senior Engineer to lead the engineering efforts of our Benchmarking team and play a crucial role in expanding and operating our AI Benchmarking Hub. This platform provides independent evaluations of leading AI models on challenging benchmarks, helping researchers, developers, and policymakers understand what AI systems can do and where they are headed.

As the Engineering Lead for Benchmarking, you will drive the technical execution and strategy for our AI Benchmarking Hub. This is a hands-on leadership role where you will own the engineering roadmap and actively contribute code to our evaluation infrastructure (using frameworks like Inspect). Your deep engineering expertise and operational focus will help make our benchmarking timely, rigorous, and transparent.

This role is fully remote and we expect to be legally able to hire in many countries. If you are unsure whether we can hire in the country you are based in, please email careers@epoch.ai. This role is open to full-time candidates.

Key Responsibilities

Own and execute the engineering roadmap: Define, manage, and actively contribute to implementing the long-term engineering roadmap for our Benchmarking infrastructure, ensuring it supports research priorities and enables rapid evaluation cycles.
Team leadership & mentorship: Provide technical guidance and day-to-day management for our current benchmarking research engineer and potential future hires.
Lead and execute timely evaluations: Personally oversee and contribute to the execution of evaluations across our benchmark suite. Ensure a rapid response pipeline is in place and operational to benchmark major new models within tight timeframes (e.g., targeting initial results within days of release).
Implement benchmarks: Implement new and existing AI benchmarks within our evaluation framework (primarily using the Inspect library) to expand the suite of capabilities we track.
Collaborate: Work closely with Epoch AI researchers and analysts to ensure evaluation data and outputs are accurate, insightful, and effectively integrated into our research products and publications.

What We’re Looking For

Outstanding engineering skills: Possess a strong software engineering background with several years of professional experience building and maintaining complex systems. You are expected to regularly contribute high-quality, robust, and maintainable code and be comfortable diving deep into existing codebases and infrastructure. We expect most (but not necessarily all) strong candidates to have 10 years or more of engineering experience.
Leadership & people-management experience: Successfully led small engineering teams, set priorities, and managed reports.
Mission-driven: You’re motivated by Epoch AI’s mission to provide rigorous, independent insight into key trends in AI. You want to deliver public, trustworthy evaluations of AI capabilities on challenging benchmarks, empowering researchers, policymakers, and the wider public to make well-informed decisions about AI.
AI domain expertise is a strong plus but not required: Hands-on experience running LLM evaluations, familiarity with evaluation frameworks like Inspect, as well as a solid grasp of current AI trends are a strong plus. However, outstanding engineering skills and an ability to learn quickly matter more than direct background in these areas.

If you don’t tick all these boxes but think you would be a great fit, please consider applying anyway!

What We Offer

Compensation

Annual salary between $150,000 and $250,000 USD.
Salaries are not restricted to USD, and contracts and payments are usually in local currencies. Conversions are based on one-year average exchange rates.

Other Benefits

Compensation: Annual salary between $200k and $300k depending on experience and location.
Impactful work: Directly contribute to a leading resource for understanding AI progress, informing critical global discussions.
Cutting-edge field: Work at the forefront of AI, evaluating the most advanced models as they emerge.
Collaborative environment: Join a mission-driven team of researchers and engineers passionate about understanding AI trends.
Other benefits:
- Fully remote environment, including flexible work hours and schedules for most roles.
- Competitive global benefits program, including:
  - Comprehensive health insurance program, including supplemental benefits specific to a local country, as available and mandated by local law.
  - Life insurance and pension plan, if applicable in your country.
- Generous paid time off (PTO), including:
  - We don’t set a specific limit on paid time off per year, and protect a minimum of 30 days off per year (including public local holidays and vacation time), pro-rated by contract length.
  - Unlimited (within reason) personal and sick leave
  - Parental leave - up to 6 months of a combination of paid and unpaid parental leave during the first 2 years after child's birth or adoption, for permanent staff.
- A flexible and generous expense policy for you to spend on equipment and a large range of productivity tools or learning/development opportunities you might find valuable, subject to regulations and manager approval.
- Paid work trips, including 3 staff retreats per year and relevant conferences.
- Access to our very well-equipped offices in Berkeley, California, including paid meals, snacks, gym, and more. All staff, independently of where they are based, have access to the office for at least 20 days each year.
- Other benefits as allowed at the discretion of Epoch AI’s leadership and local availability

Additional Information

Please do not include a cover letter, photograph, or headshot of yourself, or any personal information that is not relevant to the role. Applications are rolling.
Please email careers@epoch.ai if you have any questions about this role or accessibility requests.
While we welcome applicants from all time zones, we prefer candidates who can maintain overlap with both UTC (GMT) and UTC–8 (Pacific Time) time zones.
Please submit all of your application materials in English and note that we require professional-level English proficiency.
For this position we prefer candidates who can travel: we hold three retreats per year to which attendance is strongly encouraged.
Epoch is committed to building an inclusive, equitable, and supportive community for you to thrive and do your best work. We’re committed to finding the best people for our team, so please don’t hesitate to apply for a role regardless of your age, gender identity/expression, political identity, personal preferences, physical abilities, veteran status, neurodiversity or any other background.

Apply

Remote (Global)

Epoch AI

Epoch AI is a research institute that investigates trends in machine learning and the economic consequences of AI. Our work informs policy-making at key government institutes and governance at leading industry AI labs.

Engineering Lead, Benchmarking

Epoch AI

Head of Communications

Head of Web Development