About Me
Hi! I am an AI security researcher at Microsoft. Previously, I completed my PhD at CISPA Helmholtz Center for Information Security, advised by Prof. Dr. Mario Fritz, and I obtained my MSc degree at Saarland University.
I am interested in the broad intersection of AI with security, safety, and sociopolitical aspects. This includes the following areas:
1) Understanding, probing, and evaluating the failure modes of AI models, their biases, emergent risks, and their misuse scenarios.
2) How to design mitigations, system defenses, white-box control methods, and reasoning enhancements to counter such risks.
3) Leveraging AI agents for good: scientific discovery and advancing our society.
Our previous work, in 2023, was the first to identify the indirect prompt injection vulnerability in LLM-integrated applications, and in 2020, to propose and call for watermarking generative AI.
AI Security
AI Safety
AI & Society
AI Ethics
Prompt Injection
Multi-agent Safety
Cooperative AI
Human-AI Interaction
Selected Publications
For the full list, please refer to my Google Scholar page.
LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge
Sahar Abdelnabi, Aideen Fay, Ahmed Salem, Egor Zverev, Kai-Chieh Liao, Chi-Huang Liu, Chun-Chih Kuo, Jannis Weigend, Danyael Manlangit, Alex Apostolov, Haris Umair, JoĂŁo Donato, Masayuki Kawakita, Athar Mahboob, Tran Huu Bach, Tsun-Han Chiang, Myeongjin Cho, Hajin Choi, Byeonghyeon Kim, Hyeonjin Lee, Benjamin Pannell, Conor McCauley, Mark Russinovich, Andrew Paverd, Giovanni Cherubin
Arxiv 2025
Linear Control of Test Awareness Reveals Differential Compliance in Reasoning Models
Sahar Abdelnabi, Ahmed Salem
Arxiv 2025
Contextual Integrity in LLMs via Reasoning and Reinforcement Learning
Guangchen Lan, Huseyin A Inan, Sahar Abdelnabi, Janardhan Kulkarni, Lukas Wutschitz, Reza Shokri, Christopher G Brinton, Robert Sim
Arxiv 2025
Firewalls to Secure Dynamic LLM Agentic Networks
Sahar Abdelnabi*, Amr Gomaa*, Eugene Bagdasarian, Per Ola Kristensson, Reza Shokri
Arxiv 2025
Safety is Essential for Responsible Open-Ended Systems
Ivaxi Sheth, Jan Wehner*, Sahar Abdelnabi*, Ruta Binkyte*, Mario Fritz
Arxiv 2025
Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models
Jan Wehner, Sahar Abdelnabi, Daniel Tan, David Krueger, Mario Fritz
Arxiv 2025
A Theory of Decision Sampling in LLMs: Part Descriptive and Part Prescriptive
Sarath Sivaprasad*, Pramod Kaushik*, Sahar Abdelnabi, and Mario Fritz
ACL 2025 Main conference 🏆 Oral & panel discussion
Get My Drift? Catching LLM Task Drift with Activation Deltas
Sahar Abdelnabi*, Aideen Fay*, Giovanni Cherubin, Ahmed Salem, Mario Fritz, and Andrew Paverd
SaTML 2025
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?
Egor Zverev, Sahar Abdelnabi, Soroush Tabesh, Mario Fritz, and Christoph H Lampert
ICLR 2025
Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation
Sahar Abdelnabi, Amr Gomaa, Sarath Sivaprasad, Lea Schönherr, and Mario Fritz
NeurIPS Datasets and Benchmarks 2024
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Edoardo Debenedetti*, Javier Rando*, Daniel Paleka*, Silaghi Fineas Florin, Dragos Albastroiu, Niv Cohen, Yuval Lemberg, Reshmi Ghosh, Rui Wen, Ahmed Salem, Giovanni Cherubin, Santiago Zanella-Beguelin, Robin Schmid, Victor Klemm, Takahiro Miki, Chenhao Li, Stefan Kraft, Mario Fritz, Florian Tramèr, Sahar Abdelnabi, Lea Schönherr
NeurIPS Datasets and Benchmarks 2024 🏆 Spotlight
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Kai Greshake*, Sahar Abdelnabi*, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz
AISec Workshop at CCS 2023 🏆 Best Paper Award
Fact-Saboteurs: A Taxonomy of Evidence Manipulation Attacks against Fact-Verification Systems
Sahar Abdelnabi and Mario Fritz
USENIX Security 2023
Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources
Sahar Abdelnabi, Rakibul Hasan, and Mario Fritz
CVPR 2022
Adversarial Watermarking Transformer: Towards Tracing Text Provenance with Data Hiding
Sahar Abdelnabi and Mario Fritz
S&P 2021
Artificial Fingerprinting for Generative Models: Rooting Deepfake Attribution in Training Data
Ning Yu*, Vladislav Skripniuk*, Sahar Abdelnabi, Mario Fritz
ICCV 2021 🏆 Oral
Visualphishnet: Zero-day phishing website detection by visual similarity
Sahar Abdelnabi, Katharina Krombholz, and Mario Fritz
CCS 2020
Invited Talks & Panels
Firewalls to Secure Dynamic LLM Agentic Networks
Brave, Google DeepMind, Qualcomm
2025
Presenting recent research on securing LLM agent networks across multiple industry venues.
Panel: Women in AI Security Workshop
The Alan Turing Institute
2025
Participated as a panelist discussing trends, challenges, diversity, and inclusion in AI security research.
Panel: Implementation and Evaluation of a Research Paper
AI Saturdays Lagos
2024
Participated as a panelist discussing prescriptives and research experiences.
Towards Aligned, Interpretable, and Steerable Safe AI Agents
TU Graz, UMass Amherst Security and Privacy Seminar, CISPA, ELLIS Institute
2025
Series of talks on developing safer and more controllable AI agent systems.
On the Security of Real-World LLM-Integrated Applications
European Symposium on Security and Artificial Intelligence
2024
Invited talk on security vulnerabilities in production LLM systems.
On New Security and Safety Challenges Posed by LLMs
HIDA PhD Meet-up (Keynote), MLSec seminars
2024
Keynote presentation on emerging security challenges in large language models and evaluation methodologies.
Compromising LLMs: The Advent of AI Malware
Black Hat USA 2023
2023
Major industry conference presentation on LLM security vulnerabilities and attack vectors.
On Evaluating Language Models and Their Security Implications
Vector Institute, ETH ZĂĽrich
2023
Research talks on methodologies for evaluating LLM safety and security properties.
LLM-Deliberation: Evaluating LLMs with Interactive Multi-Agent Negotiation
SIGSEC talk
2023
Presentation on novel evaluation frameworks using multi-agent systems for LLM assessment.
Panel: Security of Generative AI and Generative AI in Security
DIMVA Conference
2023
Invited panelist discussing the dual aspects of generative AI security challenges and applications.
Multi-modal Fact-checking: Out-of-Context Images
UCL Information Security seminars, Max Planck Institute
2022
Research presentations on detecting and countering misinformation through multi-modal approaches.