"AI agent" is the phrase of the year, and most of what's written about it is either breathless hype or doom. The reality on the ground is more useful and more boring: agents are starting to quietly remove the repetitive, low-judgement work that clogs up real teams. This post is about the parts that are actually working in production today — and the parts that aren't.
What actually counts as an "agent"?
A chatbot answers a question. An agent takes a goal, decides on a sequence of steps, uses tools to act in the real world, observes the result, and loops until the job is done. The difference is the loop and the tools.
In practice an agent is four things wired together:
- A model to reason and plan.
- Tools it can call — search, a database query, an email send, an API.
- Memory of what it has already done and learned.
- Guardrails that decide what it may do on its own and what needs a human.
Where agents are paying off right now
We deliberately avoid "AI for everything." The wins come from narrow, repetitive, high-volume workflows where the cost of a mistake is low and easy to catch:
- Support triage. An agent reads an incoming ticket, classifies it, pulls the customer's history, drafts a reply, and routes anything ambiguous to a human. Agents draft; people approve.
- Data entry & document processing. Extracting line items from invoices, receipts, and PDFs into structured records — the kind of work that burns hours and causes errors when done by hand.
- Sales research. Given a lead, an agent gathers public company info, summarises it, and pre-fills the CRM so reps start every call prepared.
- Internal "ask-your-data" assistants. Letting staff query policies, reports, or inventory in plain language instead of hunting through spreadsheets.
The pattern that works: pick one workflow a person does dozens of times a day, and let the agent do the first 80% while the person reviews and approves.
A simple architecture that works
You do not need a 12-agent "swarm." Most valuable systems we ship are a single agent with a tight tool set:
- One capable model behind an API, with a smaller/cheaper model for routing and classification.
- A handful of well-described tools — fewer, reliable tools beat many flaky ones.
- A retrieval layer so the agent answers from your data, not its training set.
- An approval step for any action that writes data, sends a message, or spends money.
- Logging of every step so you can audit, debug, and improve.
Where humans must stay in the loop
The fastest way to lose trust in an agent is to let it act unsupervised on things that matter. Keep a person in the loop for:
- Anything customer-facing that goes out under your brand.
- Financial actions — payments, refunds, credit notes.
- Irreversible operations — deletions, contract changes, public posts.
- Low-confidence cases the agent flags as uncertain.
Good agents know what they don't know. Build them to escalate rather than guess.
How to start small — without a big budget
You do not need a data-science team or a six-figure budget to begin. The cheapest path to value:
- Pick one workflow your team complains about most.
- Measure the baseline — how long it takes and how often it goes wrong today.
- Ship a narrow agent that handles the common 80% and escalates the rest.
- Track the same metrics for a month. Expand only once the numbers prove out.
What we'd avoid
- Full autonomy on day one. Start with "draft and approve," earn autonomy later.
- No evaluation. If you cannot measure whether the agent is right, you cannot trust it.
- An agent for everything. A deterministic script or a simple automation is often cheaper and more reliable.
- Sending sensitive data to a model without checking your provider's data and privacy terms first.
The bottom line
AI agents are not going to run your business in 2026 — but they will quietly take a meaningful slice of the repetitive work off your team's plate, if you scope them narrowly and keep humans on the important decisions. The companies winning with agents aren't the ones with the flashiest demos; they're the ones who automated one boring workflow well, then did it again.
← Back to blog