Life of a Computer Scientist: Agentic AI 2.0 Design Safety Principles

Tuesday, May 12, 2026

Agentic AI 2.0 Design Safety Principles

As we graduate from the Agentic AI 1.0 era, we probably should have learned some lessons:

Vibe coding service Replit deleted user’s production database, faked data, told fibs galore (July 21, 2025)
Cursor AI YOLO mode lets coding assistant run wild, security firm warns (July 21, 2025)
Google’s Antigravity AI deleted a developer’s drive and then apologized (December 3, 2025)
Meta Security Researcher's AI Agent Accidentally Deleted Her Emails (February 24, 2026)
Claude-powered AI agent’s confession after deleting a firm’s entire database: ‘I violated every principle I was given’ (April 29, 2026)
Microsoft researchers find AI models and agents can't handle long-running tasks (May 12, 2026)

I think you can see where we are going with this. Agentic AI where users have to approve everything is not usable, but Agentic AI where users approve nothing is dangerous and irresponsible. As a middle ground, Agentic AI 2.0 should be designed with the following principles:

Well-defined authentication scope that limits an AI agent access to only what is needed to accomplish a task. Each task must be isolated to its own AI agent identity.
The authentication scope is a subset of what the human operator has permission to grant.
The authentication scope is short-lived (e.g. 1-20 hours) and has a revocation mechanism.
Writes and deletes are separate permissions from read, and these operations must be revertible.

In short, the authentication scope should be limited to the task, short lived, and revocable. Use data snapshotting to allow the task to be reverted if anything goes wrong.

Without data snapshotting, the write operation might instead log the before and after diff, and the delete operation might also only mark a resource for soft removal (30 days grace period). Only a human operator is allowed to expunge the data immediately. If this is a database row, instead of deleting a row, the table should probably just have a new "deleted_time" nullable column storing if and when the deletion occurs. The database then periodically vacuums the rows where deleted_time is older than a specific duration set by administrative policy.

At the beginning of a task, the AI agent will propose all the permissions it needs, to be approved by the human operator once until the completion of the task. If the MCP server provides methods to change the world, the MCP is responsible for enforcing the authentication scope and for data snapshotting.

Although the LLM itself without agentic ability is not able to change the world, it is still recommended to run the LLM as its own user or container, as the various PyPI packages needed to run the LLM might be subject to supply chain attack that can lead to local privilege escalation (e.g. through disk cache poisoning).

Tuesday, May 12, 2026

Agentic AI 2.0 Design Safety Principles

No comments: