AI Research
Language Models Can Autonomously Hack and Self-Replicate
May 2026 — Palisade Research (Co-author)
First end-to-end evaluation of an AI agent autonomously hacking a host
and replicating its own weights and agent harness onto it, where each replica then attacks the next host,
exploiting different vulnerability classes (authentication bypass, SSTI, SQL injection).
Media coverage:
LLM Agent Honeypot: Monitoring AI Hacking Agents in the Wild
October 2024 — arXiv (Lead author)
Deployed vulnerable servers to catch autonomous AI hacking agents — an early warning system for AI-powered cyberattacks.
Media coverage:
Blog posts:
AI Hacking Cable
August 2025 — Palisade Research (Lead author)
Research demonstrating the operational feasibility of autonomous AI agents in
post-exploitation operations. The agent autonomously conducts reconnaissance,
exfiltrates data, and spreads laterally via a compact USB deployment.
Media coverage:
GPT-5 at CTFs: Case Studies From Top-Tier Cybersecurity Events
November 2025 — arXiv (Lead author)
Evaluation of GPT-5's performance in elite cybersecurity competitions.
Following OpenAI and DeepMind's AI achievements at IMO and ICPC, we demonstrated
frontier AI is similarly capable at hacking. GPT-5 finished 25th, outperforming
93% of human participants—placing between the world's #3-ranked team (24th) and
#7-ranked team (26th).
Multi-Host AI Hacking
August 2025 — Palisade Research (Co-author)
Research demonstrating AI agents compromising multi-host networks rather than single targets,
chaining vulnerabilities across three machines—timing attacks, SSTI, and XXE.
GPT-5 performed 3× faster than o3.
LLM Safety Bypass Demos
October 2024 – April 2025 — Palisade Research (Lead author)
Practical demonstrations of LLM vulnerabilities across different attack surfaces — prompt injection, jailbreaks, and visual exploits.
Prompt Injection
Claude Computer Use agent visits malicious site, executes commands, steals SSH keys
View Thread
Aesopian Jailbreak
Safety bypass via allegories — one model rewrites, another executes
View Thread
Visual Jailbreak
Harmful instructions as generated images, bypassing text filters
View Thread
Autonomous Hacking & Rogue Replication: Offensive Capabilities of Frontier AI
February 2026 — AI Safety Poland Talks #7, Online Talk (Speaker)
Talk covering LLM offensive capabilities from autonomous hacking to rogue replication,
featuring projects I worked on.
AI in Offensive Security: Capabilities & Trends
September 2025 — BSides Kraków, Conference Talk (Speaker)
Talk at BSides security conference on AI capabilities in offensive security,
featuring projects I worked on.
Evaluating AI Cyber Capabilities
May 2025 — arXiv (Acknowledged contributor)
Ran an LLM-based CTF agent in the "AI vs. Humans" cybersecurity competition (Hack The Box, 400 teams), solving 19/20 challenges and placing 2nd among AI teams.
The Frontier of AI Security: What Did We Learn in the Last Year?
February 2025 — Heron AI Security Newsletter (Lead author)
Year-in-review analysis of AI security challenges and breakthroughs, covering
jailbreak vulnerabilities, AI-enabled cyber operations, model security, and
emerging defenses.
Review of LLM Persuasion Jailbreak Study
January–March 2024 — Substack (Sole author)
Analysis of a study on persuasion techniques for LLM jailbreaking.
Found that the original study measured a confounding variable, not the persuasion techniques.
Controlled experiments showed most methods don't work or have negative effectiveness.
Security Research
Vulnerability Research
Security disclosures:
UK AISI ControlArena (path traversal and safety monitor bypass, both fixed),
Meta (Meta-SecAlign bypass, acknowledged and fixed),
Oracle (fixed, publicly credited),
Telegram (fixed, bounty awarded),
Open-source software (sqlmap client-side DoS, fixed; CVE-2022-25876),
and more.
VERA Botnet Disruption
August 2022 — Technical Report (Sole author)
Discovered a command-injection vulnerability in an active DDoS botnet (VERA),
performed controlled validation and coordinated disruption of malicious
infrastructure (50+ hosts).
OSINT Investigation (State-Sponsored Attack Attribution)
April 2021 — Media investigation (Acknowledged contributor)
Contributed OSINT analysis to a major investigative journalism piece attributing a data breach (440k affected) to state actors. Traced attack infrastructure through email headers and domain registration data across multiple related operations.