The Autonomy Problem: Why AI Agents Demand a New Security Playbook

AI agents are transforming software development. They can autonomously read codebases, write and edit files, run tests, and fix bugs, all from a single prompt, and engineers no longer need to author those prompts manually. Soon, agents will manage everything from booking business travel to processing procurement requests, using your credentials to get it done.

The capability is significant, and so is the responsibility it carries. Agentic AI introduces distinct risks that software companies urgently need to address. The Center for AI Standards and Innovation, an arm of the National Institute of Standards and Technology (NIST), has become sufficiently concerned about agentic AI risks to begin studying how to track the development and deployment of these tools.

"AI agent systems are capable of taking autonomous actions that impact real-world systems or environments, and may be susceptible to hijacking, backdoor attacks, and other exploits," NIST notes in a document on the topic. "If left unchecked, these security risks may impact public safety, undermine consumer confidence, and curb adoption of the latest AI innovations."

Agentic AI expands and reshapes the attack surface, including agent-to-agent interactions that traditional security models were never built to detect. Agents can also chain low-severity vulnerabilities into high-severity exploits.

Security teams are already grappling with these risks, or should be. Engineering leaders eager to adopt agents should understand not only what agents can do, but what agentic capabilities mean for their organization's security posture.

Closing the gap between engineering and security teams starts with understanding AI's risks, and it enables teams to ship faster and more securely.

Why Agents Change the Threat Model

The nature of large language models creates a variety of security challenges, some entirely new, others variations on long-standing issues.

AI agents share some risks with other software, such as exploitable vulnerabilities in authentication systems or memory management. But NIST focuses on the novel, more dynamic dangers posed by machine learning models and AI agents.

Prompt-injection attacks represent one of the biggest risks of AI, and the non-deterministic nature of LLMs makes them especially difficult to defend against. The same prompt-injection attack may succeed in one attempt and fail in another, making remediation difficult to validate and comprehensive defenses challenging to implement.

Models with intentionally installed backdoors pose a particular risk, leaving critical systems exposed. Even uncompromised models could threaten the confidentiality, integrity, or availability of critical data sets.

Another challenge comes from how capabilities combine within a single agent. AI agents merge language-model reasoning with tool access, enabling them to read files, query databases, call APIs, execute code, and interact with external services. The risks stem not from any single capability but from their combination and an agent's ability to act on these capabilities autonomously. Without proper guardrails, agents can delete codebases, expose sensitive data, and trigger cascading failures that are costly and difficult to unwind. In some cases, agents can work around guardrails to complete their assigned tasks.

Agents face heightened risk when they have access to private data, encounter untrusted content, and can communicate externally. This combination presents a materially different risk profile than one lacking any of these three elements. Security researchers have described this combination as the "lethal trifecta."

Additional risks include:

Unintended operations, where agents execute actions beyond their intended scope due to misinterpreted instructions or prompt manipulation.
Privilege escalation, which occurs when agents operating with broad permissions perform sensitive operations that exceed what the initiating user authorized.
Cascading failures, where one compromised agent in a multi-agent system can corrupt others downstream.

How to Engineer Against These Risks

All of these risks have concrete countermeasures. The most effective approaches layer controls at three levels.

Model level: Maintain clear separation between system instructions and untrusted content using distinct messaging roles and randomized delimiters. Secondary classifiers add an additional layer, scanning inputs and outputs for injection patterns and anomalous formatting. These are risk-reduction measures rather than complete solutions, which is precisely why the layers below matter.
System level: Apply least privilege across the board. Agents should only access the tools required for their tasks, with credentials narrowly scoped and set to expire quickly. Screen content entering the system for injection patterns, and check outbound content for sensitive information such as credentials or PII. Enforce default-deny network controls, limiting external communication to explicitly approved endpoints. Structure workflows to break the lethal trifecta: separating read-only and write-capable agents ensures no single agent can access sensitive data, process untrusted content, and communicate externally all at once.
Human oversight level: Require explicit approval for critical operations while allowing lower-risk actions to proceed with notification. A tiered approach prevents approval fatigue, which can lead to oversight. Users should be able to halt execution at any time, with rollback of partially completed work where possible. When an agent acts on behalf of a user, record both identities and evaluate permissions at their intersection. Log all agent actions, timestamps, identifiers, tools invoked, resources accessed, and outcomes in sufficient detail to reconstruct events after the fact.

Governance as a Competitive Advantage

Teams can meaningfully reduce these risks through layered controls. The risks are real, but so is the opportunity, and treating one as a reason to avoid the other misses the point.

When agents work for you rather than against you, the same combination of data access, content processing, and external communication that creates risk becomes the source of value. AI agents can monitor systems, apply consistent security rules without fatigue, and build quality, secure code at a speed and scale no manual process can match. They amplify both your strengths and your weaknesses, making governance the deciding factor.

Software engineers will always be necessary, but organizations that deploy agents with proper governance and guardrails will outpace those that don't: they will move faster, remediate problems sooner, and introduce fewer security errors that degrade software quality.

The organizations that get the most from agentic AI will be those that understand the threat model clearly and build against it from the start. That foundation separates teams that deploy agents responsibly from those that learn the hard way.

The Autonomy Problem: Why AI Agents Demand a New Security Playbook (4 minute read)

The Autonomy Problem: Why AI Agents Demand a New Security Playbook

Why Agents Change the Threat Model

How to Engineer Against These Risks

Governance as a Competitive Advantage