🧭 Sahar Abdelnabi

COMPASS Research Group

COoperative Machine intelligence for People-Aligned Safe Systems

I am joining the ELLIS Institute Tübingen as a Faculty/Principal Investigator where I will be co-affiliated with the Max-Planck Institute for Intelligent Systems in Tübingen as an independent research group leader. I will found the COMPASS research group, focused on developing safe, aligned, and steerable AI agents with emphasis on security, human aspects, and cooperative multi-agent systems. 📢 I am hiring!

About Me

Hi! I am an AI security researcher at Microsoft. Previously, I completed my PhD at CISPA Helmholtz Center for Information Security, advised by Prof. Dr. Mario Fritz, and I obtained my MSc degree at Saarland University.

I am interested in the broad intersection of AI with security, safety, and sociopolitical aspects. This includes the following areas:
1) Understanding, probing, and evaluating the failure modes of AI models, their biases, emergent risks, and their misuse scenarios.
2) How to design mitigations, system defenses, white-box control methods, and reasoning enhancements to counter such risks.
3) Leveraging AI agents for good: scientific discovery and advancing our society.

Our work, in 2023, was the first to identify and discover the indirect prompt injection vulnerability in LLM-integrated applications, and in 2020, to propose and call for watermarking generative AI for language and vision.

Our work on LLM sampling heuristics received a Best Paper Award at ACL2025!

AI Security

A(G)I Safety & Alignment

AI & Society

AI Ethics

Prompt Injection

Multi-agent Safety

Cooperative AI

Human-AI Interaction

Join the COMPASS Research Group!

You can find my research statement here and job talk here. I am open to broad topics on A(G)I safety and security, interpretability, reasoning, evals, contextual integrity, agentic risks and opportunities, multi-agent dynamics, agents with long-term memory, self-improving agents, (deceptive) alignment, situational awareness, manipulation and deception. If you are interested in these research directions, I would love to hear from you!

Open Positions:
• I am hiring across all levels: interns, research visits, PhD students, and postdocs
• To apply, please check and fill out this form
• I will also likely admit PhD students via the following channels:
      • Max Planck & ETH Center for Learning Systems (CLS)
      • International Max Planck Research School for Intelligent Systems (IMPRS-IS)
      • ELLIS PhD Program

Please reach out for questions!

Highlights

July 2025

Our paper "A Theory of Response Sampling in LLMs: Part Descriptive and Part Prescriptive" received A "Best Paper Award" at ACL2025!

December 2024

Successfully defended my PhD with grade: Summa cum laude!

Feb 2024

I have joined Microsoft as an AI Security Researcher!

December 2023

Our paper "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection" received the best paper award at AISec'23 workshop!

Selected Publications

For the full list, please refer to my Google Scholar page.

LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge

Sahar Abdelnabi, Aideen Fay, Ahmed Salem, Egor Zverev, Kai-Chieh Liao, Chi-Huang Liu, Chun-Chih Kuo, Jannis Weigend, Danyael Manlangit, Alex Apostolov, Haris Umair, João Donato, Masayuki Kawakita, Athar Mahboob, Tran Huu Bach, Tsun-Han Chiang, Myeongjin Cho, Hajin Choi, Byeonghyeon Kim, Hyeonjin Lee, Benjamin Pannell, Conor McCauley, Mark Russinovich, Andrew Paverd, Giovanni Cherubin

Arxiv 2025

Paper Dataset Challenge Code Data Analysis Code

Linear Control of Test Awareness Reveals Differential Compliance in Reasoning Models

Sahar Abdelnabi, Ahmed Salem

Arxiv 2025

Paper Post

Contextual Integrity in LLMs via Reasoning and Reinforcement Learning

Guangchen Lan, Huseyin A Inan, Sahar Abdelnabi, Janardhan Kulkarni, Lukas Wutschitz, Reza Shokri, Christopher G Brinton, Robert Sim

Arxiv 2025

Paper

Firewalls to Secure Dynamic LLM Agentic Networks

Sahar Abdelnabi*, Amr Gomaa*, Eugene Bagdasarian, Per Ola Kristensson, Reza Shokri

Arxiv 2025

Paper

Safety is Essential for Responsible Open-Ended Systems

Ivaxi Sheth, Jan Wehner*, Sahar Abdelnabi*, Ruta Binkyte*, Mario Fritz

Arxiv 2025

Paper

Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models

Jan Wehner, Sahar Abdelnabi, Daniel Tan, David Krueger, Mario Fritz

Arxiv 2025

Paper

A Theory of Response Sampling in LLMs: Part Descriptive and Part Prescriptive

Sarath Sivaprasad*, Pramod Kaushik*, Sahar Abdelnabi, and Mario Fritz

ACL 2025 🏆 A Best Paper Award!!

Paper

Get My Drift? Catching LLM Task Drift with Activation Deltas

Sahar Abdelnabi*, Aideen Fay*, Giovanni Cherubin, Ahmed Salem, Mario Fritz, and Andrew Paverd

SaTML 2025

Paper

Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?

Egor Zverev, Sahar Abdelnabi, Soroush Tabesh, Mario Fritz, and Christoph H Lampert

ICLR 2025

Paper

Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation

Sahar Abdelnabi, Amr Gomaa, Sarath Sivaprasad, Lea Schönherr, and Mario Fritz

NeurIPS Datasets and Benchmarks 2024

Paper

Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition

Edoardo Debenedetti*, Javier Rando*, Daniel Paleka*, Silaghi Fineas Florin, Dragos Albastroiu, Niv Cohen, Yuval Lemberg, Reshmi Ghosh, Rui Wen, Ahmed Salem, Giovanni Cherubin, Santiago Zanella-Beguelin, Robin Schmid, Victor Klemm, Takahiro Miki, Chenhao Li, Stefan Kraft, Mario Fritz, Florian Tramèr, Sahar Abdelnabi, Lea Schönherr

NeurIPS Datasets and Benchmarks 2024 🏆 Spotlight

Paper

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

Kai Greshake*, Sahar Abdelnabi*, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz

AISec Workshop at CCS 2023 🏆 Best Paper Award

Paper

Fact-Saboteurs: A Taxonomy of Evidence Manipulation Attacks against Fact-Verification Systems

Sahar Abdelnabi and Mario Fritz

USENIX Security 2023

Paper

Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources

Sahar Abdelnabi, Rakibul Hasan, and Mario Fritz

CVPR 2022

Paper

Adversarial Watermarking Transformer: Towards Tracing Text Provenance with Data Hiding

Sahar Abdelnabi and Mario Fritz

S&P 2021

Paper

Artificial Fingerprinting for Generative Models: Rooting Deepfake Attribution in Training Data

Ning Yu*, Vladislav Skripniuk*, Sahar Abdelnabi, Mario Fritz

ICCV 2021 🏆 Oral

Paper

Visualphishnet: Zero-day phishing website detection by visual similarity

Sahar Abdelnabi, Katharina Krombholz, and Mario Fritz

CCS 2020

Paper

Media Coverage & Outreach

Major Media Interviews & Features

Vice, Wired, Zeit, MIT Technology Review

2023

Our work on "indirect prompt injection" has been featured as interviews with myself/authors in Vice, Wired, Zeit, MIT Technology Review, and CISPA communication channels.

Podcasts & Documentaries

Various Media Outlets

2022 - 2024

Y-Kollektiv Documentary (2023): "ChatGPT: What happens when the AI takes over?"
CyberWire Podcast (2023): "A dark side to LLMs"
CISPA tl;dr Podcast (2022): "Deepfakes and Fingerprinting"

Industry & Research Blogs

Microsoft Security Response Center, Montreal AI Ethics Institute

2023 - 2024

Microsoft Security Response Center (MSRC) blogs: "Announcing the Adaptive Prompt Injection Challenge (LLMail-Inject)" and "Announcing the winners of the Adaptive Prompt Injection Challenge (LLMail-Inject)"
Montreal AI Ethics Institute: Featured work on "LLM-deliberation"

Policy & Industry Impact

Government & Industry Organizations

2023 - Present

Our work on "indirect prompt injection" has been featured by policymakers and practitioners including the German Federal Office for Information Security, NIST, OWASP, MITRE, Microsoft's AI bug bar, and many others, introducing new terminologies for the entire research and tech fields.

Academic Service

Program Committee Member

Multiple AI and Security Conferences, Workshops, and Journals

2023 - 2025

Program Committee: S&P (2026), SaTML (2024, 2026), AISec Workshop (2023 - 2025), USENIX Security (2025), CCS (2025), AAAI (2025)
Reviewer: ICLR (2024, 2025), ICML (2024), NeurIPS (2023, 2025), ICCV (2023), CVPR (2022, 2023), ECCV (2022), TPAMI (2021, 2022, 2024), TMLR (2025), ICML 2023 Neural Conversational AI Workshop, ICLR 2021 Workshop on "Synthetic Data Generation Quality, Privacy, Bias"

Competition Organization

IEEE SaTML Challenges

2024 - 2025

Lead Co-organizer: IEEE SaTML'25 challenge "LLMail-Inject: Adaptive Prompt Injection Attacks"
Co-organizer: IEEE SaTML'24 challenge "LLM CtF"

Grant Reviewing & Consulting

Philanthropic and Governmental Organizations

2024 - Present

Grant Reviewing: Cooperative AI
Consulting: UK AI Safety Institute

Invited Talks & Panels

Firewalls to Secure Dynamic LLM Agentic Networks

Brave, Google DeepMind, Qualcomm

2025

Presenting recent research on securing LLM agent networks across multiple industry venues.

Panel: Women in AI Security Workshop

The Alan Turing Institute

2025

Participated as a panelist discussing trends, challenges, diversity, and inclusion in AI security research.

Panel: Implementation and Evaluation of a Research Paper

AI Saturdays Lagos

2024

Participated as a panelist discussing prescriptives and research experiences.

Towards Aligned, Interpretable, and Steerable Safe AI Agents

TU Graz, UMass Amherst Security and Privacy Seminar, CISPA, ELLIS Institute

2025

Series of talks on developing safer and more controllable AI agent systems.

On the Security of Real-World LLM-Integrated Applications

European Symposium on Security and Artificial Intelligence

2024

Invited talk on security vulnerabilities in production LLM systems.

On New Security and Safety Challenges Posed by LLMs

HIDA PhD Meet-up (Keynote), MLSec seminars

2024

Keynote presentation on emerging security challenges in large language models and evaluation methodologies.

Compromising LLMs: The Advent of AI Malware

Black Hat USA 2023

2023

Major industry conference presentation on LLM security vulnerabilities and attack vectors.

On Evaluating Language Models and Their Security Implications

Vector Institute, ETH Zürich

2023

Research talks on methodologies for evaluating LLM safety and security properties.

LLM-Deliberation: Evaluating LLMs with Interactive Multi-Agent Negotiation

SIGSEC talk

2023

Presentation on novel evaluation frameworks using multi-agent systems for LLM assessment.

Panel: Security of Generative AI and Generative AI in Security

DIMVA Conference

2023

Invited panelist discussing the dual aspects of generative AI security challenges and applications.

Multi-modal Fact-checking: Out-of-Context Images

UCL Information Security seminars, Max Planck Institute

2022

Research presentations on detecting and countering misinformation through multi-modal approaches.

Experience

AI Security Researcher

Microsoft Security Response Center (MSRC), Microsoft Research Cambridge, UK

2024 - Present

Conducting AI security and safety research. Assessing AI security vulnerabilities reported through Microsoft's AI Bug Bounty program.

PhD in Computer Science

CISPA Helmholtz Center for Information Security, Germany

2019 - 2024

Advisor: Prof. Dr. Mario Fritz. External reviewers: Prof. Dr. Battista Biggio and Prof. Dr. Florian Tramèr. Grade: Summa cum laude.

Research Assistant

Max Planck Institute for Informatics, Germany

2017 - 2019

Advised by Prof. Dr. Andreas Bulling. Research on brain-computer inteface, ML, and human-computer interaction.

Quality Assurance Engineer

Mentor Graphics (Currently, Siemens EDA), Egypt

2013 - 2017

Software quality assurance and testing for electronic design automation tools.