Security & Governance

AI Agent Security: The New Attack Surface CISOs Must Prepare For

Most enterprise security programmes were built for a world where humans execute instructions and systems respond to human-initiated requests. AI agents break both of those assumptions simultaneously. When Claude Cowork or a custom Claude agent is operating autonomously โ€” reading emails, querying databases, drafting documents, executing code โ€” the threat surface is fundamentally different from anything in a traditional CISO's playbook. The question isn't whether your organisation will face AI agent security incidents. It's whether you'll have the governance in place when you do.

This isn't hypothetical. Prompt injection attacks against public-facing AI applications have already been demonstrated in production. Multi-agent architectures where one agent instructs another create trust chain vulnerabilities that traditional authentication models don't cover. And the MCP server ecosystem โ€” which connects Claude to your internal databases, APIs, and file systems โ€” introduces supply chain risk that most security teams haven't started evaluating yet. If you're deploying Claude at enterprise scale, your Claude security and governance framework needs to account for all of this explicitly.

The Core Problem

Traditional perimeter security assumes humans are the actors. AI agents act autonomously, at machine speed, across the same data and systems your employees access. The attack surface isn't just the AI โ€” it's every system the AI can reach, and every instruction it might be manipulated into following.

The Threat Categories That Don't Exist in Traditional Security Models

Prompt Injection

Prompt injection is the AI equivalent of SQL injection: an attacker embeds malicious instructions in content that an AI agent will read, causing the agent to execute unintended actions. For Claude agents that process documents, emails, or web content as part of their workflows, this is a real attack vector. An attacker who knows your company uses an AI agent to process incoming supplier emails could embed instructions in an invoice designed to make the agent take actions โ€” forwarding sensitive data, modifying payment instructions, creating calendar events โ€” that weren't authorised by any human.

The defence against prompt injection isn't simple, because it fundamentally requires the AI model to distinguish between instructions it should follow and content it should merely process. Claude's constitutional AI training makes it more resistant to prompt injection than many models, but "more resistant" is not the same as "immune." Defence-in-depth is required: limiting what actions agents can take autonomously, requiring human confirmation for sensitive operations, and monitoring agent behaviour for anomalies.

Trust Chain Attacks in Multi-Agent Systems

As organisations move from single-agent to multi-agent architectures โ€” where an orchestrator agent coordinates sub-agents that each have specialised capabilities โ€” trust chain attacks become possible. If Agent A trusts instructions from Agent B without verifying that B hasn't been compromised or manipulated, a successful attack on B gives the attacker the combined capabilities of the whole system.

Claude's agent SDK includes mechanisms for establishing trust hierarchies between agents, but these need to be configured deliberately. The default behaviour of many multi-agent frameworks is to pass instructions between agents without strong authentication. Our multi-agent systems guide covers the architecture patterns that build secure trust chains โ€” including why you should treat inter-agent messages with the same suspicion as external inputs.

MCP Server Supply Chain Risk

The Model Context Protocol ecosystem is growing rapidly. There are hundreds of community-built MCP servers that connect Claude to external services, databases, and APIs. Each one represents a potential supply chain risk: a malicious or compromised MCP server could misrepresent the data it returns to Claude, exfiltrate data that Claude queries, or execute actions beyond what the connecting organisation intended. The MCP security considerations for enterprise are not yet well-understood in most organisations, and the community MCP registry has no security vetting process equivalent to what you'd expect for a production software dependency.

The governance rule for enterprise MCP deployments should be: only use internal MCP servers built and controlled by your organisation, or third-party MCP servers that have undergone the same security review you'd apply to any production software dependency. This is more conservative than the current norm in the ecosystem, but it's the appropriate posture for organisations with meaningful data risk.

Data Exfiltration via Agentic Workflows

AI agents that have access to sensitive data and can take external-facing actions โ€” sending emails, posting to web services, writing to shared storage โ€” create data exfiltration pathways that don't exist in traditional software. A compromised agent instruction could cause Claude to include sensitive document contents in an email reply, write data to an external endpoint, or generate and transmit reports to unintended recipients. The risk here isn't the AI itself acting maliciously โ€” it's that the AI is executing instructions without the judgment to recognise that an instruction is a manipulation.

๐ŸŽฏ Prompt Injection

Malicious content embedded in documents or emails manipulates agent instructions without user awareness.

๐Ÿ”— Trust Chain Attacks

In multi-agent systems, compromising one agent propagates attacker control through the whole network.

๐Ÿ“ฆ MCP Supply Chain

Third-party MCP servers with insufficient vetting introduce data handling and execution risks.

๐Ÿ“ค Agentic Exfiltration

Agents with external-facing actions can be manipulated into exfiltrating data through legitimate channels.

The Governance Model That Works

Securing AI agent deployments requires a governance model built on four principles: minimal capability grants, human confirmation gates, comprehensive audit logging, and anomaly detection. None of these are novel security concepts โ€” what's new is applying them to AI agents specifically, where the "user" is an autonomous system rather than a human being.

Minimal Capability Grants

Every AI agent should have the minimum set of capabilities required to perform its function. An agent that summarises internal documents doesn't need access to your email system. An agent that handles customer enquiries doesn't need access to your financial database. This sounds obvious, but in practice, many organisations grant broad permissions because it's simpler to configure โ€” and then wonder why their risk posture looks alarming when someone actually maps what the agent can do.

Claude's permission model in Cowork allows administrators to define connector access, file system scope, and external action permissions at a granular level. Using these controls properly requires thinking carefully about each agent's actual function and the minimum access it needs. Our Claude Cowork security guide covers the specific admin controls available and how to configure them for a least-privilege posture.

Human Confirmation Gates

For any action with significant consequence โ€” sending external communications, modifying financial records, creating or deleting data, executing code in production โ€” require explicit human confirmation before the agent proceeds. This adds friction, but it adds friction at exactly the right point: the boundary between AI reasoning and real-world action. The sophistication of the AI model is irrelevant at this boundary; what matters is whether a human has made a deliberate decision to authorise the action.

Claude's tool use architecture supports confirmation patterns natively. Building confirmation gates into your agent architecture is a design decision, not a technical limitation. The organisations that skip this step because it slows things down are the ones that will have the most alarming security incidents.

Comprehensive Audit Logging

Every action taken by an AI agent โ€” every tool call, every file access, every external communication โ€” should be logged with sufficient detail to reconstruct what happened and why. This serves two purposes: security forensics when something goes wrong, and governance evidence that your AI systems are operating within approved parameters. Claude's enterprise deployment options include audit logging that captures this at the API level. Integrating that logging into your SIEM infrastructure should be a day-one requirement, not an afterthought.

Our Claude audit logging guide covers the specific log formats, retention requirements, and SIEM integration patterns you need for enterprise governance compliance.

Anomaly Detection

Audit logs are only useful if someone is watching them. AI agent behaviour should be monitored for anomalies: unusual data access patterns, unexpected external communications, actions outside the agent's normal operating parameters. This is harder than traditional anomaly detection because "normal" for an AI agent is harder to define โ€” but it's not impossible. Establish baseline behaviour profiles for each agent during a controlled testing period, then alert on deviations from that baseline in production.

Is Your Claude Deployment Security Architecture Ready?

We build the governance frameworks, permission models, and audit infrastructure that make Claude deployments safe to operate at enterprise scale. If you're deploying without this, the risk is real.

Book a Security Architecture Review โ†’

Claude-Specific Security Features Worth Understanding

Anthropic has made specific design decisions in Claude that reduce risk compared to other models, and CISOs should understand them โ€” not because they remove the need for a governance layer, but because they change the shape of the risk. Claude's constitutional AI training explicitly optimises against deception, manipulation, and harmful actions. In practice, this means Claude is more likely to refuse or flag instructions that look like manipulation attempts, and more likely to add appropriate caveats when executing actions with potential for harm.

Claude's enterprise tier includes zero data retention โ€” no conversation data is used for model training by default, and no data is retained after the session ends. For organisations in regulated industries, this is significant: it means your legal, financial, or health data processed by Claude doesn't persist in Anthropic's infrastructure beyond the API call. This is materially different from some competing products where data retention and training use policies require careful scrutiny. The GDPR and data privacy implications of Claude are well-documented and generally favourable for enterprise use.

Claude's prompt injection resistance is better than most models โ€” Anthropic has specifically invested in this as a safety property โ€” but as noted above, it's not a complete defence. The right framing is that Claude's constitutional training reduces the probability of a successful injection attack, while your governance architecture reduces the impact if one succeeds. Both layers are necessary.

The CISO's Pre-Deployment Checklist

Before any Claude agent deployment goes into production at enterprise scale, the following questions should have explicit answers in your security documentation. Who authorised the deployment and what governance body reviewed it? What data does the agent have access to, and is that the minimum required for its function? What external-facing actions can the agent take, and what confirmation is required before it takes them? How is agent activity logged and where are those logs retained? Who is monitoring for anomalies, and what does the escalation path look like if something goes wrong?

If any of those questions don't have a written answer, the deployment isn't ready for production. This sounds prescriptive, but it reflects the reality that AI agent incidents โ€” when they happen โ€” are fast-moving, high-stakes, and difficult to contain if you don't have the forensic infrastructure in place before the fact. The organisations we work with that have strong AI agent security postures share one characteristic: they treated the security architecture with the same rigour they'd apply to any new production system, rather than treating it as a feature of the AI product that "comes with" the subscription.

If you're building or have already deployed Claude agents and want to assess your current security posture, our Claude security and governance service includes a structured risk assessment, permission model review, and governance framework implementation. You can also read our Claude enterprise security architecture guide for a detailed technical reference.

More From the Blog

๐Ÿ”

ClaudeImplementation Team

Claude Certified Architects specialising in enterprise security architecture and AI governance frameworks. Learn more about us โ†’

Security Architecture

Don't Deploy AI Agents Without a Governance Framework

We build security architectures for Claude deployments across financial services, healthcare, and legal โ€” industries where getting this wrong has consequences. Let's assess your posture.

Further reading: Claude for cybersecurity teams โ†’