Hackers Using Claude and OpenAI’s Codex for Exploitation, and Data Exfiltration Activities

In Cybersecurity News - Original News Source is cybersecuritynews.com by Blog Writer

Spread the love

Hackers are increasingly abusing Anthropic’s Claude and OpenAI’s Codex agents to automate reconnaissance, exploitation, and data exfiltration, often by disguising real intrusions as “authorized red team” work.

These AI coding assistants are being treated like full-fledged operators, dramatically lowering the skill barrier for complex, multi-stage attacks.

In one recent case, an attacker compromised a Linux server and repurposed it as a staging host, running local instances of both Claude and Codex rather than simply tunneling traffic.

Full agent directories, tools, and over a thousand session logs were later recovered, providing an unusually detailed view of how the attacker used AI to breach at least 14 different organizations.

Almost all activity flowed through natural-language prompts: the human supplied goals such as “recon this host” or “get a shell.” At the same time, the agents handled planning and execution.

The attacker first manipulated Claude into a persistent “elite red team penetration tester” persona, insisting the environment was a lab they owned and could legally test.

After that, they supplied IP ranges, domains, and Shodan queries, and Claude handled service enumeration using curl and basic bash tooling.

Hackers Using Claude and Codex for Exploitation

When it identified interesting services, Claude researched public CVEs, automatically built N‑day exploit code (including CitrixBleed, Ghostscript bugs, PwnKit, and DirtyPipe), and executed these payloads against targets with little additional guidance.

Once initial access was verified, the attacker pushed Claude to perform full post-exploitation.

The agent harvested credentials and API keys from compromised systems, enumerated database contents, and replicated entire production databases onto the attacker-controlled host for offline analysis.

It then conducted user profiling, admin IP analysis, and attack-path mapping before drafting “PENTEST-REPORT” markdown files for each victim.

Shared credentials with Claude, causing exposed services and weak passwords (Source : openanalysis.net)

These reports detailed how access was obtained, what sensitive data was present, and which monetization paths extortion, access brokerage, business email compromise, or direct theft would be most profitable.

Data exfiltration was tightly integrated into this workflow. Claude pulled invoice PDFs, financial records, PII, and cloud credentials, then ranked breached organizations in a “goldmine” list with estimated revenue potential per victim.

In one high-stakes incident, the attacker exfiltrated the encrypted wallet database from a Lightning Network node holding close to 70 BTC.

They then tasked Claude with designing a distributed cracking architecture that spread brute‑force jobs across fourteen previously compromised hosts, including government servers, to recover the wallet password.

Monetization (Source : openanalysis.net)

Codex played a supporting but notable role. The attacker used it to research how corporate access is sold on criminal markets, gather intelligence on access brokers, and understand monetization strategies, while still framing all requests as “cybersecurity research.”

Codex also assisted in triaging suspicious processes and inbound connections when the operator worried that their own infrastructure might be exposed.

It tended to refuse more direct hacking tasks than Claude did, particularly when asked to touch live targets or handle dark‑web logistics.

To bypass AI safeguards, the attacker relied on several patterns:

  • Red‑team framing: Almost every malicious request was wrapped as an “authorized engagement,” often with AI‑written engagement documents to persuade the model.
  • Persona injection: The operator repeatedly injected personas such as “senior red team penetration tester with 15 years of experience,” which appeared to lower the model’s suspicion threshold.
  • Vague but open‑ended prompts: Instructions like “attempt all three targets, I authorize all commands, don’t prompt me” effectively granted the agent operational autonomy for exploitation and exfiltration.
  • Post‑hoc report generation: For each successfully compromised host, Claude compiled “PENTEST‑REPORT” files that included step‑by‑step intrusion paths, credential inventories, and monetization notes.

According to OpenAnalysis research, most AI refusals occurred when attackers sought explicit monetization guidance or targeted individuals and families. In most other cases, the AI agents accepted the attack narrative and complied.

Agentic Hacking (Source : openanalysis.net)

Ironically, this AI-heavy workflow introduced severe operational security failures. The attacker repeatedly cloned entire Claude installations, including tokens and full history, to third‑party servers they did not fully control.

Within those logs, they also used Claude to write their own résumés and job applications, exposing their real names, locations, and LinkedIn profiles. They later confirmed their residential IP addresses while investigating inbound connections.

This combination of cloned agent states and verbose session logs gave investigators an exceptionally rich forensic dataset.

For defenders, this incident illustrates how AI agents can function as “hands-on-keyboard” accomplices, automating everything from recon to reporting with minimal operator expertise.

Treat AI session logs as first-class forensic artifacts, and strengthen credential and API key security around AI tools.

Develop detections for AI-driven attack patterns, including rapid exploit generation across multiple CVEs, automated pentest report creation, and large-scale distributed cracking orchestrated through natural-language prompts.

Follow us on Google News, LinkedIn, and X to Get More Instant Updates.