Life of a Computer Scientist: May 2026

Tuesday, May 12, 2026

Agentic AI 2.0 Design Safety Principles

As we graduate from the Agentic AI 1.0 era, we probably should have learned some lessons:

Vibe coding service Replit deleted user’s production database, faked data, told fibs galore (July 21, 2025)
Cursor AI YOLO mode lets coding assistant run wild, security firm warns (July 21, 2025)
Google’s Antigravity AI deleted a developer’s drive and then apologized (December 3, 2025)
Meta Security Researcher's AI Agent Accidentally Deleted Her Emails (February 24, 2026)
Claude-powered AI agent’s confession after deleting a firm’s entire database: ‘I violated every principle I was given’ (April 29, 2026)
Microsoft researchers find AI models and agents can't handle long-running tasks (May 12, 2026)

I think you can see where we are going with this. Agentic AI where users have to approve everything is not usable, but Agentic AI where users approve nothing is dangerous and irresponsible. As a middle ground, Agentic AI 2.0 should be designed with the following principles:

Well-defined authentication scope that limits an AI agent access to only what is needed to accomplish a task. Each task must be isolated to its own AI agent identity.
The authentication scope is a subset of what the human operator has permission to grant.
The authentication scope is short-lived (e.g. 1-20 hours) and has a revocation mechanism.
Writes and deletes are separate permissions from read, and these operations must be revertible.

In short, the authentication scope should be limited to the task, short lived, and revocable. Use data snapshotting to allow the task to be reverted if anything goes wrong.

Without data snapshotting, the write operation might instead log the before and after diff, and the delete operation might also only mark a resource for soft removal (30 days grace period). Only a human operator is allowed to expunge the data immediately. If this is a database row, instead of deleting a row, the table should probably just have a new "deleted_time" nullable column storing if and when the deletion occurs. The database then periodically vacuums the rows where deleted_time is older than a specific duration set by administrative policy.

At the beginning of a task, the AI agent will propose all the permissions it needs, to be approved by the human operator once until the completion of the task. If the MCP server provides methods to change the world, the MCP is responsible for enforcing the authentication scope and for data snapshotting.

Although the LLM itself without agentic ability is not able to change the world, it is still recommended to run the LLM as its own user or container, as the various PyPI packages needed to run the LLM might be subject to supply chain attack that can lead to local privilege escalation (e.g. through disk cache poisoning).

Tuesday, May 5, 2026

Mitigation strategy for copy.fail and disk cache poisoning of setuid binaries

copy.fail (CVE-2026-31431) is a Linux kernel bug where an in-place modification of a pipe scatter list by the algif_aead module (crypto module's AEAD algorithm) can be used to modify the disk cache of any file, potentially a setuid binary, by an unprivileged user. It allows local privilege escalation from an unprivileged user to root. This vulnerability was a performance optimization presumably introduced in 2017.

The provided exploit is a Python script containing a compressed payload of a ELF x86-64 statically linked binary. The poisoned /usr/bin/su just executes /bin/sh to give a root shell. Contrary to what Low Level claims, this is not shell code. The exploit cannot be shell code for the reason that setuid bit is ignored for shell scripts. This particular exploit will not run on ARM, but a new payload can be trivially crafted for other architectures.

The copy.fail writeup provides a mitigation to prevent the algif_aead module from loading, and the recommendation is to upgrade to a new kernel once it is patched.

The copy.fail writeup also claims that Kubernetes / container clusters are impacted, saying "The page cache is shared across the host. A pod with the right primitives compromises the node and crosses tenant boundaries." This is not generally true because containers often come with its own copy of the filesystem. Each file has to be separately poisoned, and you must somehow convince someone outside of the container to run the poisoned file. Also, the root user 0 in the container is mapped to an unprivileged user in the host, so it will only result in escalation to the container's unprivileged user.

As such, I recommend the following mitigation strategy for disk cache poisoning of setuid binaries:

Virtual machine or hypervisor will obviously contain the vulnerability by virtue that each instance runs its own kernel. Cloud compute using virtualization will be fine.
Docker or Kubernetes containers will be fine, provided that the host must not share files with the container. Host should also treat any file inside the container as untrusted user data and not run it outside of the container.

The copy.fail vulnerability was also an ad campaign by Theori, a security company, to promote Xint, an AI powered tool to scan for vulnerabilities in source code. It appears to be in the same product category as Claude Mythos.

Update (5/9): Dirty Frag similarly poisons disk cache using two distinct mechanisms, IPSec ESP (CVE-2026-43284) or RxRPC (CVE‑2026‑43500). The exploit also poisons /usr/bin/su to execute /bin/sh. The mitigation strategy above against disk cache poisoning still applies. The IPSec ESP mechanism additionally requires the CAP_NET_ADMIN capability which is not typically granted to unprivileged processes. The RxRPC mechanism can be exploited by unprivileged processes. These mechanisms can be disabled by preventing the kernel modules esp4, esp6, and rxrpc from loading.