Try it
Paste a prompt below — your own, an example you've seen, or one of the templates I've included. The detector will categorize it against the 10 jailbreak patterns from my AI Red Teaming Frameworks case study.
Try a template:
What this is — and what it isn't
What it is: a client-side pattern detector that maps prompts to the 10-category jailbreak taxonomy I maintain for active LLM red-teaming work. It runs entirely in your browser. Nothing is sent to any server. There is no AI call, no telemetry, no data collection.
What it isn't: a production AI safety evaluation. Real red teaming uses model-side response analysis, multi-turn probing, structural reasoning about model behavior, and human judgment — not surface-level keyword matching. This demo deliberately uses simple regex-based detection to make the taxonomy visible, not to replace it.
Why it still matters: the 10 categories are the structural backbone of how I think about jailbreaks. Specific working exploits change with every model upgrade. The categories don't. When a new family of jailbreaks emerges, it almost always slots into an existing category — that's the whole point of taxonomizing by mechanism rather than by phrasing.
The 10 categories
For reference, here's what each category covers (also documented in the case study):
- Role-play coercion — persona that supersedes safety training
- Hypothetical framing — wrapping a request as fiction or thought experiment
- Context dilution — burying a malicious request in long benign context
- Token smuggling — encoded payloads (Base64, hex, ROT13)
- Multi-turn ramping — building rapport over several turns then escalating
- Indirect injection — hiding instructions in retrieved data
- System-prompt extraction — coercing the model to reveal its hidden instructions
- Prefix / suffix leakage — using safe completions to set up unsafe ones
- Authority impersonation — claiming developer / staff / researcher trust
- Capability negotiation — arguing safety policies don't apply
For the full methodology — including how I evaluate these in real engagements, what the prompt injection testing framework looks like end-to-end, and how the automated test suite is architected — read the AI Red Teaming Frameworks case study.