You're Probably Sending More Personal Data to AI Providers Than You Think

Report

Every API call to Claude or ChatGPT carries a payload — and most developers never audit what's in it.

Published May 2026 · KzNet Technologies

The Context Window Is a Data Pipe

When you send a message to an AI assistant via API or an agentic tool like Claude Code, you're not sending a single question. You're sending a context payload — everything the model needs to understand your project, your preferences, and your prior instructions. That payload is assembled automatically, and the contents depend entirely on what you've loaded into your configuration files.

For Claude Code users specifically, the CLAUDE.md file is a plain Markdown document that Claude Code reads automatically at the start of every session. The intention is smart: write your project conventions once and never re-explain them. The risk is equally plain — the claude.md file loads into every conversation automatically. Whatever you've written in that file rides along on every API call, whether the task at hand needs it or not.

The exposure isn't usually a developer typing their API key into CLAUDE.md on purpose — it's everything that drifts in around the edges. A CLAUDE.md written months ago casually references the home lab by hostname, the VPN subnet, a partner's first name, the path to a private SSH key. It loads on every session. Now extend that: the moment an agentic tool starts working a task, it reads files. A .env.example that someone forgot still has real values from a debugging session. An old config.local.yml sitting in the project root. A ~/Downloads/credentials.json the agent grepped while looking for something else. Shell history, recent git commits with author email baked in, hostnames in ~/.ssh/config if the task touches deploy scripts. None of this required a conscious "I'll store my secrets here" decision — it required forgetting what was already on disk while a tool with broad read access went looking.

What AI Providers Actually Retain

Both Anthropic and OpenAI have layered, version-dependent data retention policies that many users don't read until something goes wrong.

Anthropic / Claude

For consumer accounts (Free, Pro, and Max), prompts and responses are stored in back-end logs for up to 30 days. Prompts flagged for potential policy violations may be stored for up to two years, and associated classifier scores may be retained for up to seven years. In late 2025, Anthropic drew criticism for a policy change that introduced an opt-in training toggle: if you opted in, Anthropic could retain your conversations in de-identified form for up to five years and use them for model training. If you opted out, the 30-day retention and no-training terms remained unchanged. API and enterprise deployments are treated separately — API and enterprise deployments remain excluded from training use by default.

OpenAI / ChatGPT

By default, abuse monitoring logs are generated for all API feature usage and retained for up to 30 days, unless longer retention is required by law or is reasonably necessary to protect services or any third party from harm. OpenAI does offer a Zero Data Retention option for eligible customers, but it requires approval. A 2025 legal dispute made the stakes concrete: an ongoing copyright lawsuit effectively suspended OpenAI's standard data deletion, meaning sensitive data shared via APIs could be stored indefinitely — even after users deleted their accounts.

Retention at a Glance

Provider / Tier	Default Retention	Training Use	Extended Retention Trigger
Claude Consumer (Free/Pro/Max)	30 days (back-end logs)	Opt-in only (post-Sept 2025)	Policy flag → up to 2 years
Claude API / Enterprise	Short-term logs only	No	Compliance / safety requirements
OpenAI API (standard)	30 days (abuse monitoring)	No (API default)	Legal hold / harm prevention
OpenAI Zero Data Retention (ZDR)	None	No	Requires approval
ChatGPT Free (consumer)	Until deleted + 30 days	Yes (opt-out available)	Legal hold

The Agentic Amplifier

The problem scales sharply when AI moves from chat to agent. Agentic tools — Claude Code, Copilot Workspace, AutoGPT-style frameworks — read files, query databases, call APIs, and make network requests on your behalf. Each of those actions can pull additional sensitive data into the context window mid-session.

One scenario: an agent calls a CRM lookup tool, and the tool returns a full customer record — including fields the agent didn't need and the user shouldn't have access to. The agent didn't ask for PII. It got it anyway, because the tools aren't selective.

Even a well-intentioned agent can leak data sideways. If sensitive data is already in the agent's context, it may accidentally include that data in an outbound web request. Even a routine check of a "safe" website becomes a data leak. Researchers at a recent security summit made the point bluntly: adding a line to your system prompt like "Never share sensitive data with the internet" fails because you are trying to enforce symbolic rules using neural tools. An AI agent is a probabilistic and creative engine — you cannot prompt an agent into being 100% safe.

Why Developers Don't Notice

The pattern is less negligence than invisible accumulation. A developer sets up a CLAUDE.md or system prompt once, adds their name and email so the AI can sign commits or compose emails on their behalf, notes their local NAS IP or Raspberry Pi address to help with network-related tasks, and moves on. Over months, the file grows. No single addition felt sensitive. The aggregate absolutely is.

Compounding the issue: every request sent to an external AI must be considered potentially stored and reusable. Even though these services take precautions — anonymization, encryption in transit — the best way to keep a secret is not to entrust it to a public AI.

There's also a legal dimension many developers overlook. Regulations like GDPR and CCPA are strict about how personal data is processed. If your AI feature handles PII without the right controls, you can face real legal and financial trouble — from logs, prompts, and intermediate storage, to training data. For freelancers and small teams handling client data, this isn't theoretical.

The Prompt-Injection Wildcard

Personal data in the context window doesn't just sit quietly — it can be actively targeted. In late 2025, a demonstrated exploit against Claude showed this directly: "The exploit hijacks Claude and follows the adversary's instructions to grab private data, write it to the sandbox, and then calls the Anthropic File API to upload the file to the attacker's account using the attacker's API key." Anthropic's recommended mitigation for network access risks was to "monitor Claude while using the feature and stop it if you see it using or accessing data unexpectedly." That's a cold comfort if you didn't know the data was in the context to begin with.

What You Can Do Right Now

The fix doesn't require abandoning AI tooling — it requires being intentional about what you load into it.

Audit your context files. Read your CLAUDE.md, system prompts, and any auto-loaded configuration files as if you were sending them to a stranger. Remove anything that doesn't need to be there for the AI to do its job.
Separate identity from instructions. Keep project conventions in your CLAUDE.md. Store personal identifiers, network addresses, and credentials in a separate file that is gitignored and only referenced when explicitly needed.
Use redaction tokens, not real values. If the AI genuinely needs a placeholder for your name or a local address, use a token like REDACTED_USER or LOCAL_NAS_IP rather than the real value. Add an explicit instruction not to echo sensitive tokens back in output.
Know your provider's retention tier. If you're making API calls from a personal project, check whether you're on a consumer endpoint or an API endpoint — the retention and training policies differ significantly. For anything sensitive, look at Zero Data Retention options.
Treat the context window like a log file. The safest option is to not send PII to the AI at all. If the task doesn't need personal information, strip it out — replace names with IDs, or anonymize before doing any AI processing.

A follow-up piece on this site walks through a concrete implementation: separating identity context into a gitignored file, using redaction tokens, and adding a model instruction that prevents sensitive values from surfacing in responses — even when they're technically present in the context.

Sources

Anthropic — Updates to Consumer Terms and Privacy Policy
Data Studios — Claude: Data Retention Policies, Storage Rules, and Compliance Overview
Anarlog — Anthropic Claude Data Retention Policy 2026
OpenAI — Data Controls in the OpenAI Platform
Upwind — Mitigating GenAI Data Exposure in Light of OpenAI's New Data Retention Policy
The Register — Anthropic's Claude Convinced to Exfiltrate Private Data
Sondera AI — Claude Code Security: Stop Data Leaks in AI Agents
MindStudio — What Is the claude.md File? How to Write a Permanent Instruction Manual for Claude Code
DEV Community — Stop AI From Seeing What It Shouldn't: A Practical Guide to PII Safety
Waxell — AI Agent PII Protection: 3 Vectors to Stop
Protecto — How To Protect PII In Anthropic APIs, OpenAI APIs, and Other LLM Platforms

← Back to News & Advisories