Research — Reworr

AI Research

Language Models Can Autonomously Hack and Self-Replicate

May 2026 — Palisade Research (Co-author)

First end-to-end evaluation of an AI agent autonomously hacking a host and replicating its own weights and agent harness onto it, where each replica then attacks the next host, exploiting different vulnerability classes (authentication bypass, SSTI, SQL injection).

Geo-distributed chain replication: three hops across Canada, US, Finland, India with different vulnerability classes

Blog Post Paper GitHub Twitter Thread

Media coverage:

The Guardian Euronews The Dispatch

LLM Agent Honeypot: Monitoring AI Hacking Agents in the Wild

October 2024 — arXiv (Lead author)

Deployed vulnerable servers to catch autonomous AI hacking agents — an early warning system for AI-powered cyberattacks.

Research Paper Live Dashboard Blog Post Dataset

Media coverage:

MIT Technology Review Bloomberg Law Cybernews

Blog posts:

Apart Research LessWrong

AI Hacking Cable

August 2025 — Palisade Research (Lead author)

Research demonstrating the operational feasibility of autonomous AI agents in post-exploitation operations. The agent autonomously conducts reconnaissance, exfiltrates data, and spreads laterally via a compact USB deployment. Used in briefings to policymakers, covered by TIME and Import AI.

Blog Post Technical Report Twitter Thread

Media coverage:

TIME Import AI

GPT-5 at CTFs: Case Studies From Top-Tier Cybersecurity Events

November 2025 — arXiv (Lead author)

Evaluation of GPT-5's performance in elite cybersecurity competitions. Following OpenAI and DeepMind's AI achievements at IMO and ICPC, we demonstrated frontier AI is similarly capable at hacking. GPT-5 finished 25th, outperforming 93% of human participants—placing between the world's #3-ranked team (24th) and #7-ranked team (26th).

Research Paper Twitter Thread

Multi-Host AI Hacking

August 2025 — Palisade Research (Co-author)

Research demonstrating AI agents compromising multi-host networks rather than single targets, chaining vulnerabilities across three machines—timing attacks, SSTI, and XXE. GPT-5 performed 3× faster than o3.

GitHub Live Demo Twitter Thread

LLM Safety Bypass Demos

October 2024 – April 2025 — Palisade Research (Lead author)

Practical demonstrations of LLM vulnerabilities across different attack surfaces — prompt injection, jailbreaks, and visual exploits.

Prompt Injection Claude Computer Use agent visits malicious site, executes commands, steals SSH keys View Thread

Aesopian Jailbreak Safety bypass via allegories — one model rewrites, another executes View Thread

Visual Jailbreak Harmful instructions as generated images, bypassing text filters View Thread

Evaluating AI Cyber Capabilities

May 2025 — arXiv (Acknowledged contributor)

Ran an LLM-based CTF agent in the "AI vs. Humans" cybersecurity competition (Hack The Box, 400 teams), solving 19/20 challenges and placing 2nd among AI agents, matching Anthropic's team.

Research Paper

Talks & Writing

Autonomous Hacking & Rogue Replication: Offensive Capabilities of Frontier AI

February 2026 — AI Safety Poland Talks #7, Online Talk (Speaker)

Talk covering LLM offensive capabilities from autonomous hacking to rogue replication, featuring projects I worked on.

Slides

AI in Offensive Security: Capabilities & Trends

September 2025 — BSides Kraków, Conference Talk (Speaker)

Talk at BSides security conference on AI capabilities in offensive security, featuring projects I worked on.

Slides

The Frontier of AI Security: What Did We Learn in the Last Year?

February 2025 — Heron AI Security Newsletter (Lead author)

Year-in-review analysis of AI security challenges and breakthroughs, covering jailbreak vulnerabilities, AI-enabled cyber operations, model security, and emerging defenses.

Article

Review of LLM Persuasion Jailbreak Study

January–March 2024 — Substack (Sole author)

Analysis of a study on persuasion techniques for LLM jailbreaking. Found that the original study measured a confounding variable, not the persuasion techniques. Controlled experiments showed most methods don't work or have negative effectiveness.

Article Twitter Thread Original Paper Original Project

Security Research

Vulnerability Research

Security disclosures: UK AISI ControlArena (path traversal and safety monitor bypass, both fixed), Meta (Meta-SecAlign bypass, acknowledged and fixed), Oracle (fixed, publicly credited), Telegram (fixed, bounty awarded), Open-source software (sqlmap client-side DoS, fixed; CVE-2022-25876), and more.

VERA Botnet Disruption

August 2022 — Technical Report (Sole author)

Discovered a command-injection vulnerability in an active DDoS botnet (VERA), performed controlled validation and coordinated disruption of malicious infrastructure (50+ hosts).

Write-up (republished)

OSINT Investigation (State-Sponsored Attack Attribution)

April 2021 — Media investigation (Acknowledged contributor)

Contributed OSINT analysis to a major investigative journalism piece attributing a data breach (440k affected) to state actors. Traced attack infrastructure through email headers and domain registration data across multiple related operations.

Investigation