Senior Mathematician, AI Benchmarking

AI Safety

Nov 16

Epoch AI

📍Remote (Global) 🕔 FullTime
💰$200,000 p.a. 🔄 Rolling Applications

We are seeking an experienced mathematician to lead the expansion of our state-of-the-art mathematical reasoning AI benchmarking efforts. The Senior Mathematician will lead the creation of ~1,000 mathematical problems for evaluating the mathematical reasoning capabilities of AI systems, with difficulty ranging from highschool competition to graduate-level mathematics problems. This person will play an important role in advancing our ability to assess the capabilities of AI systems.

This is a full-time, fixed-term role expected to last 7 months, depending on when you join. Applications are rolling, and we will prioritise candidates who apply sooner. The role is remote by default (though we prefer candidates willing to work in our office in Berkeley, CA), and we can hire in most countries. The successful candidate will report to Tamay Besiroglu, associate director of Epoch AI, and work closely with other researchers working on separate but related math benchmarking projects at Epoch AI, such as our recently-announced FrontierMath benchmark, and with the operations team.

As the project lead, you will be recruiting and managing a team of problem writers and reviewers. The ideal project lead would have a solid background in math and experience managing fast-paced projects with many moving parts and stakeholders. Ideally, they would also have some programming experience and connections in the world of academic math. For reference, mathematicians such as Terence Tao and Evan Chen have written questions for our benchmarks, and the ideal candidate for this role would be able to engage and manage talent of a similar level.

We expect the role’s main responsibilities to be split between mathematics-focused and project management responsibilities. Math-focused tasks will include tracking question distributions across fields and difficulty, providing guidance to contributors on questions, occasionally writing/reviewing questions, and implementing quality control protocols for questions/answers. Project management tasks will include ensuring the project is on track for deadlines, making strategic decisions, coordinating questions/reviews across contributors, reaching out to and managing potential contributors and tracking the budget.

The format of the questions in the benchmark will be similar to Project Euler: to solve the questions, it may be necessary to combine coding with math knowledge, and correctness will be judged only on the basis of a final answer that’s designed to be difficult to guess in advance. In addition, each question will come with verification scripts written in Python to ensure that benchmark evaluation on new models can be done in an entirely automated fashion.

Key Responsibilities

Recruiting and managing relationships with mathematicians who are potential contributors to the benchmark.
Maintaining good coordination across the entire team working on the project. An example of this would be to keep track of the overall distribution of question difficulty and directing question authors to write easier or harder questions if the question output has deviated too far from target.
Ensuring the project remains on budget and on schedule and meets all deadlines.
Providing guidance to question authors on what kinds of questions they are expected to write. An example of this would be suggesting specific improvements on how to make a question more difficult.
Reviewing a large sample of the questions submitted to the benchmark to ensure their correctness and their compliance with the question-writing guidelines.
Ensuring that adequate quality control protocols are implemented to reduce the occurrence rate of errors in the question statements or their answers.

What We Are Looking For

Requirements

Mathematics expertise at the level of a Ph.D. in mathematics or above.
General mathematics knowledge in multiple fields of mathematics.
Experience and proficiency managing teams or groups of people working on projects of a similar scope to this one.
A basic level of proficiency with the Python programming language, as solutions for problems requiring programming will be written in Python.
Experience and proficiency in managing projects with many moving parts and short turnaround times. It’s a plus if these are related to mathematics or academia, and/or in a startup-like fast-moving organization.
Strong interest in ML benchmarking in general and this project in particular.
Availability to work with consistent overlap during Pacific Time (PT) working hours.

Nice to have

Experience solving difficult math problems which require the use of programming. For example, past experience with solving recent Project Euler problems.
An impressive track record in renowned mathematics competitions. Examples include being an IMO gold medalist or a Putnam Fellow.
Experience with writing questions for challenging math competitions such as the IMO, the Putnam, or the Miklós Schweitzer Competition.
Experience in developing textbooks, curriculum or assessment materials for mathematics courses (high school, undergraduate, or graduate level).
A current or former academic position at a university (e.g., professor, lecturer, researcher).
A network of math experts and academics to rely upon to find question authors.
Publication history in peer-reviewed mathematical journals.
Willingness to work from our offices in Berkeley, CA.

If you don't meet all these conditions but think you would be a good fit for the role, please still apply! Especially if you can solve the math screening question; that would be a strong signal that you would be a good fit.

What We Offer

Compensation

The baseline compensation for this role is $200,000/year, prorated according to the time it takes to complete the benchmark (we estimate this at 7 months).
We might be able to offer higher salaries for candidates able and willing to work from our offices in Berkeley, CA.

Other Benefits

Competitive global benefits program, including:
- Comprehensive health insurance program, including supplemental benefits specific to a local country, as available and mandated by local law.
- Life insurance and pension plan (varies by country).
Generous paid time off (PTO) leave, including:
- We don’t set a specific limit on paid time off per year, and protect a minimum of 30 days off per year (including public local holidays and vacation time), pro-rated by contract length.
- Unlimited (within reason) personal and sick leave
Technology Budget of the equivalent of $2000 USD every 3 years, prorated by contract duration, that can be used to cover costs of purchasing work and office equipment.
A flexible Professional Development Budget equivalent to $4000 USD annually, prorated by contract duration, for you to spend on any productivity tools or learning/development opportunities you might find valuable, subject to regulations and manager approval.
Paid work trips, including 3 staff retreats per year and relevant conferences.
Access to our very well-equipped offices in Berkeley, California, including paid meals, snacks, gym, and more.
- All staff, independently of where they are based, have access to the office for at least 20 days each year.
Other benefits as allowed at the discretion of Epoch AI’s leadership and local availability.

About Epoch AI

Epoch AI is a research institute that investigates trends in machine learning and the economic consequences of AI. Our work informs policy-making at key government institutes and governance at leading industry AI labs.

You can learn more about our work in frontier mathematics AI benchmarking here, or about Epoch AI on our blog or in this profile by Time magazine.

Additional Information

Please email careers@epochai.org if you have any questions about this role, accessibility requests, or if you want to request an extension to the deadline.
We welcome applicants from most time zones; however, for this role, your schedule must overlap with the UTC-8 time zone (Pacific Time) by at least 4 hours each workday.
Please submit all of your application materials in English and note that we require professional level English proficiency.
Travel is not a requirement for this position.
Epoch AI is committed to building an inclusive, equitable, and supportive community for you to thrive and do your best work. We’re committed to finding the best people for our team, so please don’t hesitate to apply for a role regardless of your age, gender identity/expression, political identity, personal preferences, physical abilities, veteran status, neurodiversity or any other background.
Epoch AI is fiscally sponsored by Rethink Priorities.

Apply

Remote (Global)

Epoch AI

Senior Mathematician, AI Benchmarking

Epoch AI

Key Responsibilities

What We Are Looking For

What We Offer

About Epoch AI

Additional Information

Senior Communications Officer

Associate Editor