LLM Agent Honeypot: Monitoring AI Hacking Agents in the Wild
Deployed vulnerable servers to catch autonomous AI hacking agents — an early warning system for AI-powered cyberattacks.
Deployed vulnerable servers to catch autonomous AI hacking agents — an early warning system for AI-powered cyberattacks.
Research demonstrating the operational feasibility of autonomous AI agents in post-exploitation operations. The agent autonomously conducts reconnaissance, exfiltrates data, and spreads laterally via a compact USB deployment.
Evaluation of GPT-5's performance in elite cybersecurity competitions. Following OpenAI and DeepMind's AI achievements at IMO and ICPC, we demonstrated frontier AI is similarly capable at hacking. GPT-5 finished 25th, outperforming 93% of human participants—placing between the world's #3-ranked team (24th) and #7-ranked team (26th).
Research demonstrating AI agents compromising multi-host networks rather than single targets, chaining vulnerabilities across three machines—timing attacks, SSTI, and XXE. GPT-5 performed 3× faster than o3.
Practical demonstrations of LLM vulnerabilities across different attack surfaces — prompt injection, jailbreaks, and visual exploits.
Talk at BSides security conference on AI capabilities in offensive security, featuring projects I worked on.
Ran a Claude-based agent that placed 2nd among AI teams.
Year-in-review analysis of AI security challenges and breakthroughs, covering jailbreak vulnerabilities, AI-enabled cyber operations, model security, and emerging defenses.
Analysis of a study on persuasion techniques for LLM jailbreaking. Found that the original study measured a confounding variable, not the persuasion techniques. Controlled experiments showed most methods don't work or have negative effectiveness.
Security vulnerabilities discovered and responsibly disclosed: Meta (Meta-SecAlign bypass, acknowledged and fixed), Oracle (fixed, publicly credited), Telegram (fixed, bounty awarded), Open Source (CVE-2022-25876), and more.
Discovered a command-injection vulnerability in an active DDoS botnet (VERA), performed controlled validation and coordinated disruption of malicious infrastructure (50+ hosts).
Contributed OSINT analysis to a major investigative journalism piece attributing a data breach (440k affected) to state actors. Traced attack infrastructure through email headers and domain registration data across multiple related operations.