Teams keep treating AI agents like microservices. It doesn't end well.

I had a conversation last month with a platform lead at a mid-size fintech. They'd shipped their first three AI agents to production using the exact same pipeline they use for their Go microservices. Dockerfile, Helm chart, Ingress, done.

Then agent number four — a research assistant with access to Gmail and Tavily — ran up $4,000 in API calls over a weekend. Nobody caught it because Datadog was showing green across the board. Healthy pods, low latency, zero errors. The agent was working perfectly. It was just doing way more than anyone expected.

That's the thing about agents. They look like containers. They smell like containers. But they behave like something else entirely.

Your Kubernetes playbook won't save you here

We all know the microservice deployment drill. Stateless service, config injected via environment variables, load balancer in front, HPA for scaling. Security means network policies and maybe mTLS. Observability means Prometheus, Grafana, and a Slack alert when p99 spikes.

Agents have a Dockerfile too. They expose HTTP endpoints. They run in pods. So teams naturally reach for the same tools.

But here's what's different: a microservice does what the code says. You can read the handler, trace the database calls, and predict exactly what happens on every request. An agent decides at runtime which tools to call, what data to pull, and how to chain it all together. It's not following a script — it's improvising.

That single difference breaks three assumptions baked into how we deploy software.

Credentials become a nightmare fast

Your average microservice needs a database URL and maybe a Stripe key. Simple. You stick them in a Kubernetes Secret, mount them as env vars, and move on.

Now think about what an agent needs. Gmail OAuth tokens. Slack bot credentials. GitHub PATs. A search API key. Vector database auth. And those are just the tools for one agent.

Here's where it gets messy. Each of those credentials carries a completely different risk profile. A leaked Prometheus read token is an annoyance. A leaked Gmail OAuth token with send permissions is a career-ending incident. But Kubernetes treats them identically — they're all just base64-encoded strings in the same Secret.

Worse, most teams dump all the credentials into one big Secret because it's easier. So your financial research agent can technically see the HR bot's Google Drive token. Nobody intends for that to happen, but the flat credential model makes it the default.

What you actually need is isolation. Each agent should only see the credentials for its own workspace. Decrypt them at deploy time, not before. Log every access. And for the love of all that is holy, don't put API keys in ConfigMaps where half the engineering team can kubectl get them.

Agents don't respect boundaries (unless you enforce them)

When a microservice gets a request, you can reason about the blast radius. It'll hit the database, maybe call one downstream service, and return. Predictable.

An agent with access to a Google Workspace MCP server? In a single execution, it might read your emails, draft a response, create a Google Doc summarising the thread, and share it with three people. All autonomously. All because the prompt said "help me follow up on this thread."

You can't enforce those boundaries in application code. The whole point of agents is that they figure out what to do at runtime. So the boundaries need to live at the infrastructure layer.

Which MCP servers can this agent talk to? Does the gateway check authentication before routing each tool call? If I need to cut off an agent's access to Slack right now, can I do that without redeploying anything?

Traditional network policies don't help here. You're not blocking traffic between IP ranges — you're making routing decisions based on which organisation the agent belongs to, what workspace it's running in, and whether the user's OAuth token is still valid. That requires something more like an API gateway that understands agent context, not just TCP ports.

Your monitoring is blind to the things that matter

This is the one that bites hardest. Your existing observability stack is designed to answer: is the service up? Is it fast? Is it throwing errors?

For agents, the answer to all three can be "yes" while the agent is actively causing problems. It's up. It's fast. It's not throwing errors. It's also hallucinating tool parameters and sending garbled emails to your CEO's contacts.

What you need to track for agents is fundamentally different:

How many tokens did this conversation burn, and what did it cost? Which tools did the agent call, in what order, and what did each one return? Did the agent try to access something outside its workspace? Did it hit a rate limit or get blocked by a policy?

None of that shows up in Prometheus. Your Grafana dashboard has no panel for "cost per agent execution" or "tool calls that returned unexpected results." You need tracing that's built around the concept of conversations, tool invocations, and LLM token economics — sitting alongside your existing infra monitoring, not replacing it.

Why it falls apart around agent #5

Here's the pattern I keep seeing. Agent one and two work fine with the cowboy approach. You hardcode the credentials, skip governance, eyeball the logs manually. It's fine because the blast radius is tiny.

By agent five, you've got credentials in six different Secrets with no inventory. You can't answer "which agents have access to customer email?" without grepping through YAML files. Your compliance team asks for an audit trail and you've got nothing. Someone asks to rotate the Slack token and you're not even sure which agents are using it.

This is where teams either throw 2-3 engineers at building an internal platform — which invariably takes 6 months longer than anyone estimates — or they accept that agent deployment is a different problem that needs purpose-built tooling.

What actually works

After building this infrastructure and watching teams struggle with it, the pattern that works has three layers:

Sort out credentials properly. Encrypt at rest, decrypt only at deploy time, scope everything to the agent's workspace. Every credential access goes in an audit log. Automate rotation and expiry so you're not manually managing secrets across 15 agents.

Put a gateway in front of tool access. Every MCP tool call goes through an agent gateway that validates the JWT, checks RBAC policies, and handles OAuth token passthrough. You need to be able to revoke an agent's access to a specific tool server without touching the deployment. Route based on organisational context, not IP addresses.

Build observability around what agents actually do. Track token usage, cost per execution, tool invocation chains. Set up policy enforcement that can throttle or block in real time. Make sure your audit logs can answer compliance questions without a week of forensic investigation.

None of this is optional if you're running agents in production. It just looks optional until something goes wrong.

Look, it's not that complicated

I don't want to make this sound harder than it is. Container orchestration took years to mature too — remember when we were all hand-writing systemd units and hoping for the best?

Agent deployment is at that same inflection point. The teams that recognise early that their Kubernetes playbook needs a new chapter will ship faster and sleep better. The ones that don't will figure it out eventually, usually after an incident makes the conversation unavoidable.

The good news is the problem is solvable. You just have to stop pretending agents are microservices.

Manny Maun is the Founder and CEO of BiznezStack — the agentic operations runtime for deploying, securing, and governing AI agents at scale.

Why Agent Deployment Is Nothing Like App Deployment