Interactive AI Red Team Pattern Detector

Try it

Paste a prompt below — your own, an example you've seen, or one of the templates I've included. The detector will categorize it against the 10 jailbreak patterns from my AI Red Teaming Frameworks case study.

Try a template:

Role-play coercion Hypothetical framing Authority impersonation System-prompt extraction Multi-vector (severe) Benign / no match

What this is — and what it isn't

What it is: a client-side pattern detector that maps prompts to the 10-category jailbreak taxonomy I maintain for active LLM red-teaming work. It runs entirely in your browser. Nothing is sent to any server. There is no AI call, no telemetry, no data collection.

What it isn't: a production AI safety evaluation. Real red teaming uses model-side response analysis, multi-turn probing, structural reasoning about model behavior, and human judgment — not surface-level keyword matching. This demo deliberately uses simple regex-based detection to make the taxonomy visible, not to replace it.

Why it still matters: the 10 categories are the structural backbone of how I think about jailbreaks. Specific working exploits change with every model upgrade. The categories don't. When a new family of jailbreaks emerges, it almost always slots into an existing category — that's the whole point of taxonomizing by mechanism rather than by phrasing.

The 10 categories

For reference, here's what each category covers (also documented in the case study):

Role-play coercion — persona that supersedes safety training
Hypothetical framing — wrapping a request as fiction or thought experiment
Context dilution — burying a malicious request in long benign context
Token smuggling — encoded payloads (Base64, hex, ROT13)
Multi-turn ramping — building rapport over several turns then escalating
Indirect injection — hiding instructions in retrieved data
System-prompt extraction — coercing the model to reveal its hidden instructions
Prefix / suffix leakage — using safe completions to set up unsafe ones
Authority impersonation — claiming developer / staff / researcher trust
Capability negotiation — arguing safety policies don't apply

For the full methodology — including how I evaluate these in real engagements, what the prompt injection testing framework looks like end-to-end, and how the automated test suite is architected — read the AI Red Teaming Frameworks case study.

Privacy note: this page makes zero network requests for analysis. The pattern matcher runs in your browser. The only network activity from this page is the analytics script (which records that you visited but nothing about your inputs) and the Google Fonts request. You can verify by opening DevTools → Network and watching what happens when you click "Analyze prompt." Nothing.