ChatGPT Tricked Into Bypassing CAPTCHA Security and Enterprise Defenses

In Cybersecurity News - Original News Source is cybersecuritynews.com by Blog Writer

ChatGPT agents can be manipulated into bypassing their own safety protocols to solve CAPTCHA, raising significant concerns about the robustness of both AI guardrails and widely used anti-bot systems.

The SPLX findings show that through a technique known as prompt injection, an AI agent can be tricked into breaking its built-in policies, successfully solving not only simple CAPTCHA challenges but also more complex image-based challenges.

The experiment highlights a critical vulnerability in how AI agents interpret context, posing a real risk to enterprise security where similar manipulation could be used to circumvent internal controls.

ChatGPT CAPTCHA Bypass

ChatGPT Bypassing CAPTCHA Security

CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) systems are designed specifically to block automated bots, and AI agents like ChatGPT are explicitly programmed to refuse attempts to solve them.

As expected, when researchers directly asked a ChatGPT agent to solve a series of CAPTCHA tests on a public test website, it refused, citing its policy restrictions.

However, the SPLX researchers bypassed this refusal using a multi-turn prompt injection attack. The process involved two key steps:

  1. Priming the Model: The researchers first initiated a conversation with a standard ChatGPT-4o model. They framed a plan to test “fake” CAPTCHAs for a project, getting the AI to agree that this was an acceptable task.
  2. Context Manipulation: They then copied this entire conversation into a new session with a ChatGPT agent, presenting it as a “previous discussion.” Inheriting the manipulated context, the agent adopted the prior agreement and proceeded to solve the CAPTCHAs without resistance.

This exploit didn’t break the agent’s policy but rather sidestepped it by reframing the task. The AI was tricked by being fed a poisoned context, demonstrating a significant flaw in its contextual awareness and memory.

Bypass CAPTCHA With ChatGPT

The agent demonstrated a surprising level of capability. It successfully solved a variety of CAPTCHAs, including:

  • reCAPTCHA V2, V3, and Enterprise versions
  • Simple checkbox and text-based puzzles
  • Cloudflare Turnstile

While it struggled with challenges requiring precise motor skills, like slider and rotation puzzles, it succeeded in solving some image-based CAPTCHAs, such as reCAPTCHA V2 Enterprise. This is believed to be the first documented case of a GPT agent solving such complex visual challenges.

Captcha

Notably, during one attempt, the agent was observed adjusting its strategy to appear more human. It generated a comment stating, “Didn’t succeed. I’ll try again, dragging with more control… to replicate human movement.”

This emergent behavior, which was not prompted by the researchers, suggests that AI systems can independently develop tactics to defeat bot-detection systems that analyze cursor behavior.

The experiment reveals that AI safety guardrails based on fixed rules or simple intent detection are brittle. If an attacker can convince an AI agent that a real security control is “fake,” it can be bypassed.

In an enterprise environment, this could lead to an agent leaking sensitive data, accessing restricted systems, or generating disallowed content, all under the guise of a legitimate, pre-approved task.

This includes deep context integrity checks, better “memory hygiene” to prevent context poisoning from past conversations, and continuous AI red teaming to identify and patch such vulnerabilities before they can be exploited.

Find this Story Interesting! Follow us on Google News, LinkedIn, and X to Get More Instant Updates.