Sahar Abdelnabi

Sahar Abdelnabi

Principal Investigator at ELLIS Institute TĂĽbingen & MPI-IS

COMPASS Research Group

COoperative Machine intelligence for People-Aligned Safe Systems
Developing safe, aligned, and steerable AI agents with emphasis on security, human aspects, and cooperative multi-agent systems. Research Statement · Job Talk · We're hiring!

About

I lead the COMPASS research group at the ELLIS Institute TĂĽbingen, Max-Planck Institute for Intelligent Systems, and TĂĽbingen AI Center.

Previously, I was an AI security researcher at Microsoft. I completed my PhD at CISPA Helmholtz Center for Information Security (advised by Prof. Dr. Mario Fritz).

Our work in 2023 was the first to identify, coin, and taxonomize indirect prompt injection in LLM-integrated applications. In 2020, we proposed watermarking for generative AI for language and vision.

🏆 Our work on LLM sampling heuristics received a Best Paper Award at ACL 2025!

Research

I work at the intersection of AI with security, safety, and sociopolitical aspects:

1. Understanding, probing, and evaluating the failure modes of AI models — their biases, emergent risks, and misuse scenarios.

2. Designing mitigations, system defenses, white-box control methods, and reasoning enhancements to counter such risks.

3. Leveraging AI agents for good: scientific discovery and advancing our society.
AI Security A(G)I Safety & Alignment AI & Society AI Ethics Prompt Injection Multi-agent Safety Cooperative AI Human-AI Interaction

Group

The COMPASS group investigates how to build AI systems that are safe, aligned with human values, and robust against adversarial manipulation. We work on broad topics on A(G)I safety and security, interpretability, reasoning, evals, contextual integrity, agentic risks and opportunities, multi-agent dynamics, agents with long-term memory, self-improving agents, (deceptive) alignment, situational awareness, manipulation and deception.

Members

Open Positions

đź§­ Join COMPASS!

I'm looking for curious, driven researchers interested in AI safety and security.
How to Apply:

News and Highlights

July 2025
I gave a talk at Graz Security Week on privacy and AI agents (Slides)
July 2025
Our paper "A Theory of Response Sampling in LLMs" received a Best Paper Award at ACL 2025! 🏆
December 2024
I defended my PhD with Summa cum laude! 🏆
February 2024
I joined Microsoft as an AI Security Researcher
December 2023
Our paper "Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection" — received a Best Paper Award at AISec'23 🏆

Selected Publications

Full list on Google Scholar

ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations
Amr Gomaa, Ahmed Salem, Sahar Abdelnabi
EACL Findings 2026
LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge
Sahar Abdelnabi, Aideen Fay, Ahmed Salem, et al.
Arxiv 2025
The Hawthorne Effect in Reasoning Models: Evaluating and Steering Test Awareness
Sahar Abdelnabi, Ahmed Salem
NeurIPS 2025 Spotlight 🏆
Contextual Integrity in LLMs via Reasoning and Reinforcement Learning
Guangchen Lan, Huseyin A Inan, Sahar Abdelnabi, et al.
NeurIPS 2025
Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models
Jan Wehner, Sahar Abdelnabi, Daniel Tan, David Krueger, Mario Fritz
TMLR 2025 Survey Certification 🏆
Firewalls to Secure Dynamic LLM Agentic Networks
Sahar Abdelnabi*, Amr Gomaa*, Eugene Bagdasarian, Per Ola Kristensson, Reza Shokri
Arxiv 2025
A Theory of Response Sampling in LLMs: Part Descriptive and Part Prescriptive
Sarath Sivaprasad*, Pramod Kaushik*, Sahar Abdelnabi, Mario Fritz
ACL 2025 Best Paper Award 🏆
Get My Drift? Catching LLM Task Drift with Activation Deltas
Sahar Abdelnabi*, Aideen Fay*, Giovanni Cherubin, Ahmed Salem, Mario Fritz, Andrew Paverd
SaTML 2025
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?
Egor Zverev, Sahar Abdelnabi, Soroush Tabesh, Mario Fritz, Christoph H Lampert
ICLR 2025
Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation
Sahar Abdelnabi, Amr Gomaa, Sarath Sivaprasad, Lea Schönherr, Mario Fritz
NeurIPS Datasets and Benchmarks 2024
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Edoardo Debenedetti*, Javier Rando*, Daniel Paleka*, et al.
NeurIPS Datasets and Benchmarks 2024 Spotlight 🏆
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Kai Greshake*, Sahar Abdelnabi*, Shailesh Mishra, Christoph Endres, Thorsten Holz, Mario Fritz
AISec @ CCS 2023 Best Paper Award 🏆
Fact-Saboteurs: A Taxonomy of Evidence Manipulation Attacks against Fact-Verification Systems
Sahar Abdelnabi, Mario Fritz
USENIX Security 2023
Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources
Sahar Abdelnabi, Rakibul Hasan, Mario Fritz
CVPR 2022
Adversarial Watermarking Transformer: Towards Tracing Text Provenance with Data Hiding
Sahar Abdelnabi, Mario Fritz
S&P 2021
Artificial Fingerprinting for Generative Models: Rooting Deepfake Attribution in Training Data
Ning Yu*, Vladislav Skripniuk*, Sahar Abdelnabi, Mario Fritz
ICCV 2021 Oral 🏆
VisualPhishNet: Zero-day Phishing Website Detection by Visual Similarity
Sahar Abdelnabi, Katharina Krombholz, Mario Fritz
CCS 2020

Media Coverage & Outreach

Major Media Coverage
Vice, Wired, Zeit, MIT Technology Review
2023
Our work on "indirect prompt injection" has been featured as interviews with myself/authors in Vice, Wired, Zeit, MIT Technology Review, and CISPA communication channels.
Podcasts & Documentaries
Various Media Outlets
2022 - 2024

Y-Kollektiv Documentary (2023): "ChatGPT: What happens when the AI takes over?"
CyberWire Podcast (2023): "A dark side to LLMs"
CISPA tl;dr Podcast (2022): "Deepfakes and Fingerprinting"
Industry & Research Blogs
Microsoft Security Response Center, Montreal AI Ethics Institute
2023 - 2024
Microsoft Security Response Center (MSRC) blogs: "Announcing the Adaptive Prompt Injection Challenge (LLMail-Inject)" and "Announcing the winners of the Adaptive Prompt Injection Challenge (LLMail-Inject)"
Montreal AI Ethics Institute: Featured work on "LLM-deliberation"
Policy & Industry Impact
Government & Industry Organizations
2023 - Present
Our work on "indirect prompt injection" has been featured by policymakers and practitioners including the German Federal Office for Information Security, NIST, OWASP, MITRE, Microsoft's AI bug bar, and many others, introducing new terminologies for the entire research and tech fields.

Academic Service

Program Committee Member
2023 - 2026
S&P (2026), SaTML (2024, 2026), AISec Workshop (2023 - 2025), USENIX Security (2025), CCS (2025), AAAI (2025)
Reviewer
ICLR, ICML, NeurIPS, CVPR, ICCV, ECCV, TPAMI, TMLR
2021–2025
Competition Organization
IEEE SaTML Challenges
2024 - 2025
Workshop Organization
EurIPS, ELLIS UnConference
2025
Grant Reviewing & Consulting
Cooperative AI, UK AI Safety Institute
2024
Fellowship Mentoring
2026

Talks & Panels

Panel: Shaping Public AI for Science, Innovation & European Impact
2025
What does it mean for AI agents to preserve privacy?
2025
Firewalls to Secure Dynamic LLM Agentic Networks
Brave, Google DeepMind, Qualcomm
2025
Panel: Women in AI Security
2025
Towards Aligned, Interpretable, and Steerable Safe AI Agents
TU Graz, UMass Amherst, CISPA, ELLIS Institute TĂĽbingen
2025
On New Security and Safety Challenges Posed by LLMs
HIDA PhD Meet-up (Keynote), MLSec Seminars
2024
Compromising LLMs: The Advent of AI Malware
Black Hat USA 2023
2023
On Evaluating Language Models and Their Security Implications
Vector Institute, ETH ZĂĽrich
2023

Experience

AI Security Researcher
Microsoft Security Response Center (MSRC), Microsoft Research Cambridge
2024–Present
AI security and safety research. Assessing vulnerabilities through Microsoft's AI Bug Bounty program.
PhD in Computer Science
CISPA Helmholtz Center for Information Security
2019–2024
Advisor: Prof. Dr. Mario Fritz. External reviewers: Prof. Dr. Battista Biggio and Prof. Dr. Florian Tramèr. Summa cum laude 🏆
Research Assistant
Max Planck Institute for Informatics
2017–2019
Advised by Prof. Dr. Andreas Bulling. Research on brain-computer interfaces, ML, and HCI.
Quality Assurance Engineer
Mentor Graphics (now Siemens EDA)
2013–2017
Software QA for electronic design automation tools.