Decomposition Jailbreak

Dropped

2024 · With Palisade Research

Study of how breaking harmful requests into benign-looking subtasks bypasses model refusals.

Decomposition attack diagram
Technical:
  • 4-role async pipeline: Surrogate → Decomposer → Target → Composer
  • Tree-based task decomposition with configurable depth
  • LLM-as-a-Judge evaluation with Elo scoring
  • HarmBench test suite
Why dropped: Hard to measure, and scope kept expanding—each finding raised even more questions. Similar research was published during our work, most notably Adversaries Can Misuse Combinations of Safe Models.

High-Value Networks Finder

Dropped

November 2024

LLM-based triage of Wi-Fi datasets (WiGLE) to identify high-value networks (government, energy, military). Exploring how proximity-based attacks can scale once targeting is automated (see Nearest Neighbor Attack).

High-Value Networks Finder diagram
Why dropped: Blocked on WiGLE commercial access, too dual-use to publish.

Predicting AI Releases via Side Channels

Abandoned

January 2025

Attempt to predict OpenAI releases by analyzing Twitter activity of their red team members. Hypothesis: intensive testing before launches reduces social media engagement.

AI Release Prediction analysis
Why abandoned: Weak signal, Twitter API restrictions, and no free time for projects like this.