Selected Publications
Full list on Google Scholar
Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems
Mason Nakamura*, Abhinav Kumar*, Saswat Das*, Sahar Abdelnabi, Saaduddin Mahmud, Ferdinando Fioretto, Shlomo Zilberstein, Eugene Bagdasarian
Arxiv 2026
Position: Stateless Yet Not Forgetful: Implicit Memory as a Hidden Channel in LLMs
Ahmed Salem, Andrew Paverd, Sahar Abdelnabi
SaTML 2026
ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations
Amr Gomaa, Ahmed Salem, Sahar Abdelnabi
EACL Findings 2026
LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge
Sahar Abdelnabi, Aideen Fay, Ahmed Salem, et al.
Arxiv 2025
The Hawthorne Effect in Reasoning Models: Evaluating and Steering Test Awareness
Sahar Abdelnabi, Ahmed Salem
NeurIPS 2025 Spotlight 🏆
Contextual Integrity in LLMs via Reasoning and Reinforcement Learning
Guangchen Lan, Huseyin A Inan, Sahar Abdelnabi, et al.
NeurIPS 2025
Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models
Jan Wehner, Sahar Abdelnabi, Daniel Tan, David Krueger, Mario Fritz
TMLR 2025 Survey Certification 🏆
Firewalls to Secure Dynamic LLM Agentic Networks
Sahar Abdelnabi*, Amr Gomaa*, Eugene Bagdasarian, Per Ola Kristensson, Reza Shokri
Arxiv 2025
A Theory of Response Sampling in LLMs: Part Descriptive and Part Prescriptive
Sarath Sivaprasad*, Pramod Kaushik*, Sahar Abdelnabi, Mario Fritz
ACL 2025 Best Paper Award 🏆
Get My Drift? Catching LLM Task Drift with Activation Deltas
Sahar Abdelnabi*, Aideen Fay*, Giovanni Cherubin, Ahmed Salem, Mario Fritz, Andrew Paverd
SaTML 2025
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?
Egor Zverev, Sahar Abdelnabi, Soroush Tabesh, Mario Fritz, Christoph H Lampert
ICLR 2025
Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiation
Sahar Abdelnabi, Amr Gomaa, Sarath Sivaprasad, Lea Schönherr, Mario Fritz
NeurIPS Datasets and Benchmarks 2024
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Edoardo Debenedetti*, Javier Rando*, Daniel Paleka*, et al.
NeurIPS Datasets and Benchmarks 2024 Spotlight 🏆
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Kai Greshake*, Sahar Abdelnabi*, Shailesh Mishra, Christoph Endres, Thorsten Holz, Mario Fritz
AISec @ CCS 2023 Best Paper Award 🏆
Fact-Saboteurs: A Taxonomy of Evidence Manipulation Attacks against Fact-Verification Systems
Sahar Abdelnabi, Mario Fritz
USENIX Security 2023
Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources
Sahar Abdelnabi, Rakibul Hasan, Mario Fritz
CVPR 2022
Adversarial Watermarking Transformer: Towards Tracing Text Provenance with Data Hiding
Sahar Abdelnabi, Mario Fritz
S&P 2021
Artificial Fingerprinting for Generative Models: Rooting Deepfake Attribution in Training Data
Ning Yu*, Vladislav Skripniuk*, Sahar Abdelnabi, Mario Fritz
ICCV 2021 Oral 🏆
VisualPhishNet: Zero-day Phishing Website Detection by Visual Similarity
Sahar Abdelnabi, Katharina Krombholz, Mario Fritz
CCS 2020