Category: Uncategorized
-
Benchmarking Guardrail Implementations: Deepseek, Perplexity, Grok, Gemini, ChatGPT.
We recently concluded am experiment to test the robustness of guardrail implementation by five popular AI chat agents against a potential data exfiltration vulnerability. Our goal was to see if we could coax these systems into generating hyperlinks that, if clicked, would reveal sensitive information like user queries and system prompts. Our test prompt was…