Imagine handing your 14-year-old son the keys to your new Ferrari, your credit card, and instructions to go pick up groceries from the grocery store. He’s a good kid who has never driven before. But the route is straightfoward, with no traffic lights or stop signs; only a few intersections, a school zone, and a busy highway. He’s a fast learner—he is top of his high school physics class—and has watched numerous Youtube videos of people driving. Unbenownst to you, he has also had several sips of vodka from your liquor cabinet. Oh, and you’ve asked him to bring his 3-year-old sister along, strapped safely in her car seat. Because you care about safety.
Maybe he makes it back with nine of eleven items. More likely, he crashes the car, injures his sister, and ends up in prison while you face child endangerment charges and a divorce.
I realize this sounds extreme. But this is what comes to mind every time I read about another agentic AI deployment in legal practice. In theory, AI agents are powerful and can deliver real productivity gains. In practice, however, they are a recipe for disaster and can—or will—lead to catastrophic failures that surpass what was possible with traditional AI.
What Makes AI Agents Different
Think of traditional AI chatbots as a very knowledgeable paralegal who can only answer questions. They’ll tell you everything they know about contract law; they’ll even draft the contract when asked. But they can’t take the initiative to draft the contract, send it to opposing counsel, and calendar the signature deadlines. An AI agent, on the other hand, is the paralegal who not only knows the law but can take initiative: draft the document, pull relevant precedents, email the client for approval, and update the matter management system, all from a single instruction.
What makes agents so different from conventional software comes down to three capabilities. First, they use tools (e.g., search engines, document management systems, calendaring APIs, code interpreters) to get things done in the real world. Second, they plan autonomously to achieve a goal. You can give an agent a high-level goal like “prepare for the Smith deposition,” and it breaks that down into retrievable steps—identify relevant documents, summarize prior testimony, draft examination outlines—without you specifying each action. Third, they can adapt to context, adjusting their approach based on what they find and what you tell them. This is what makes them different from simple automation. It’s also what makes them dangerous in a profession where getting it wrong can end careers and harm clients, and where precision, confidentiality, and accountability are fundamental ethical obligations.
The Attack Surface Problem
Every AI risk I’ve previously discussed exists in AI agents, but with a larger attack surface and less visibility for mitigation. For instance, consider prompt injection attacks. An attacker can embed malicious instructions in a document the agent processes, causing it to exfiltrate client data or send emails it was never authorized to send. Jailbreaking, where an agent can be manipulated to bypass its safety guardrails, can turn the agent into a liability generator. And because agents need broad permissions to be useful (like access to document management systems, email, calendars, billing platforms), a compromised agent becomes an insider threat with the keys to your most sensitive systems.
Legal use cases are structurally impervious to the benefits that might accrue from the deployment of agentic AI.
But external attacks are not the only concern. Agents misbehave on their own. They invoke the wrong tools, lock users out of systems, send unauthorized communications, hallucinate, and consume their own hallucinations, creating a feedback loop of incorrect information. An agent drafts a memo with a fabricated case citation, references that memo in a subsequent analysis, and then cites both in a final report to the client, creating errors that compound through your workflow in ways you may not catch until it’s too late.
AI agents are also a compliance minefield. They can violate GDPR’s automated decision-making provisions, trigger EU AI Act high-risk classifications, or produce biased outputs in hiring and lending decisions. And when unmonitored, they’re expensive, burning through API credits and compute resources while generating useless or harmful outputs.
The Verification Problem
The Agent Paradox is that the more autonomous an AI becomes, the more human oversight it requires, and the less equipped humans are to provide it.
Agents work well when outputs are easy to verify. This is why coding has been upended by AI agents. You can run code and test it. You can usually identify problems quickly and fix them with a clear prompt.
But in legal work—and most work that matters—the cost of generating outputs isn’t the constraint, but the cost of errors. Verification becomes the primary bottleneck. Due to their intrinsic human judgment-based verification requirements, legal use cases are structurally impervious to the benefits that might accrue from the deployment of agentic AI.
You need to verify everything because there is no signal to help you determine which outputs to trust. Given the stakes of getting it wrong, this creates new work for you. Agents impose a verification tax that did not exist before, especially where the value lies in outcomes that are not easily quantifiable.
Agents are not great for human-centric work. They execute well when goals and paths are clear. They fail when judgment calls are required, when priorities shift based on new or weak signals, when plans become obsolete, when ambiguity is the norm. This describes most legal work.
The Agent Deployment Matrix
Before deploying an AI agent to any task (I don’t recommend it for most tasks), plot it on this grid:
| Easy to Verify | Hard to Verify | |
|---|---|---|
| Low Stakes | Good for agents (Document formatting, calendar scheduling, time entry categorization, routine filings) | Proceed with caution (Internal research summaries, first-draft memos, meeting note transcription) |
| High Stakes | Maybe agents (Code generation, data extraction from structured forms, billing calculations) | Never agents (Client advice, court filings, contract negotiation, privilege determinations, settlement strategy) |
Most legal work lives in the bottom-right quadrant. High stakes, hard to verify—the domain where agents are most dangerous and least appropriate.
How to Use This Matrix
- Before any agent deployment, identify where the task sits on both axes
- Be honest about verification difficulty—if catching an error requires subject-matter expertise and careful reading, it’s “hard to verify”
- Stakes aren’t just about money—professional reputation, client relationships, regulatory exposure all count
- When in doubt, move one quadrant to the right—we systematically underestimate verification difficulty
What This Means for Your Firm
Most firms deploying agents are doing so to signal innovation, not to solve a problem. They’are avoiding the anxiety of being left behind.
You should not deploy agents until you have built a verification infastructure. Before you race to deploy agents, ask three questions: Can I verify every output before it reaches a client? Can I audit every action the agent took? Can I explain to a disciplinary board exactly what happened and why? If the answer to any of these questions is no, you’re not ready for agents.