Writers
Heron Program for AI Security
📍 Remote (Global) 🕔 Part Time
⌛ 24/11/2024 💼 Freelance 💰$1000
The Heron Program for AI Security is seeking Writers to author feature pieces of the newsletter on a freelance basis.
The AI security newsletter will cover the growing intersection of information security and AI with a focus on the technical challenges that relate to societal and catastrophic risks.
This will interest you if:
You enjoy writing about new technological problems and their implications for society
You are interested in the intersection of frontier AI models and security
You want a chance to explore new topics in-depth and write for an audience
You have 10-30 hours free a month
Details & Compensation
Output - 3 pieces of 800-1200 words within an initial 2 month trial period
Compensation - $1,000
Employment - contractor (freelance)
Location - remote, global
Hours - choose your own, other than availability for occasional meetings over Zoom
What support will you get?
Access to a network of AI Security experts as potential sources of content or interviews
Help advertising and creating partnerships with organisations that could promote the newsletter
Brainstorming and advice on content
Editing and feedback on written pieces
A platform to write the newsletter on
Access to a preexisting email list of interested people
How to apply?
If the opportunity interests you, please submit our application form along with a sample of an article for the newsletter. It should be <500 words and cover a topic related to AI security. For questions, feel free to reach out to Ezra at ezra@heronsec.ai.
About the Newsletter
General Information
We aim to publish a newsletter monthly, including 1-3 written pieces in each edition.
Authors will receive 1 to 2 rounds of feedback on articles.
Consistent contributors may be considered for longterm roles in the future.
Authors will be credited by name in the newsletter.
Audience
The newsletter will aim to keep those interested up to date with the latest research, opportunities, and developments in the field. The primary audience will be:
Cyber or information security professionals interested in the field
The AI Safety community and those working at AI Safety organisations
Policy organisations working on AI standard setting
Content Style
We expect the most useful types of content will be short articles summarising or explaining relevant topics, interviews with experts, and discussions of recent news/research. Each newsletter will also include relevant job opportunities such as those on the 80,000 job board (filtered to by infosec skill).
The content should be interesting and accessible to a technical and professional audience. Each article should be around 600-1000 words.
Potential Topics
These are a few topics we would be interested in covering in the newsletter. It’s not an exhaustive list and we’d be excited about hearing more ideas. The descriptions and links are also just a place to get started and we encourage you to do your own research into the topics mentioned.
A few key resources to get started:
AI companies are not on track to secure model weights — Jeffrey Ladish
Information security considerations for AI and the long term future — EA Forum
Securing AI hardware
Hardware used to train AI systems could be exploited through physical attacks to steal model weights. Previous work on hardware security has focused on low-power devices like key fobs, and credit cards. Securing the most recent GPUs used in frontier AI model training is a much greater challenge.
Hardware mechanisms could be used to enable and enforce many AI governance approaches such as requiring a licence to run. Attacks on hardware could be used to circumvent these. For example, with access to an AI chip an attacker could reverse engineer it to understand how it functions and disable the security mechanisms, or design an alternative chip without security mechanisms.
Topics that could be covered include summarising the risks and challenges of hardware security and potential solutions such as tamper-proof enclosures.
Resources:
AI Verification | Center for Security and Emerging Technology
Secure, Governable Chips | Center for a New American Security (en-US) (cnas.org)
Further readings listed here: Transformative AI and Compute - Reading List [shared] - Google Docs
Defending AI models from nation-states
The RAND report defines 5 levels of operational capacity with the most capable including a small number of operations prioritised by the world’s most capable nation-states. Defending from these actors poses a significant challenge. The report concludes that achieving this level of defence isn’t currently possible and requires R&D and assistance from the national security community.
We are interested in exploring the history of nation-states performing cyber attacks or stealing technology, hacking capabilities of nation-states, the measures needed to defend from nation-states such as those highlighted in the RAND report, examples of nation-states stealing AI secrets (such as this).
Resources:
RAND report - Chapter 4 on Operational Capacity Categories, and Chapter 6 on Security Levels
Significant Cyber Incidents | Strategic Technologies Program | CSIS
Example high-stakes information security breaches [public] - Google Docs
Situational Awareness; Leopold Aschenbrenner - Lock Down the Labs: Security for AGI
Confidential computing
Confidential computing is a technique for protecting data while in use by ensuring a trusted execution environment (TEE) to secure model weights. It’s widely considered to be an important technique in security although there hasn’t been much work to apply it to GPUs. We’d be excited to see explorations of how confidential computing could be applied to GPUs to protect model weights and evaluate its importance.
Resources:
Malicious use of open-source models
In some cases, AI companies have released model weights publicly, most recently Llama 3.1 created by Meta. While there are many benefits to open-source software, publishing model weights could allow malicious actors to use models to cause harm. Today’s models aren’t capable enough to cause serious harms, as frontier AI models become increasingly powerful, the potential harm they could cause increases such as through creating biological weapons or enabling advanced cyberattacks. Palisade Research showed how open-source models could have their safety fine-tuning effectively undone for just $200.
Topics we would be excited about covering in the newsletter include highlighting research into model misuse, regulation to reduce risks from open-source models.
Resources:
[2311.00117] BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B (arxiv.org)
unRLHF - Efficiently undoing LLM safeguards — AI Alignment Forum
Data centre security
The RAND report recommends physical security measures for data centres including significant guarding, supervised access, rigorous device inspections, and disabling most communication at the hardware level.
Topics that could be valuable include the challenges and costs with securing data centres, current security practices in data centres, or examples of data centre security failures.
Other topics
Exploring the importance of protecting algorithmic secrets.
AI / Cyber capabilities
"Cyber evals," short for cyber evaluations, are assessments designed to ensure the security and reliability of frontier AI systems. As advanced AI technologies continue to evolve, cyber evals play a crucial role in identifying vulnerabilities, assessing potential risks, and implementing robust security measures. This process typically involves rigorous testing against various threat scenarios, including adversarial attacks and data integrity challenges. By conducting cyber evals, organizations and society can better understand the security landscape surrounding frontier AI, enabling them to deploy these systems confidently while mitigating risks related to privacy, misuse, and unintended consequences.
Topics that could be valuable include analysis of the current state of cyber evals or of the performance of a specific model, and analysis of other capabilities of frontier models that have relevance to security implications.
Resources: