Back to Blog
AI AgentsFrameworksProductionPlatform Engineering

Why an AI Agent Framework Is Not Enough for Production

Manny MaunMar 29, 20267 min read
Why an AI Agent Framework Is Not Enough for Production

Teams spend weeks choosing between LangChain and CrewAI. Then they spend months stuck trying to get their agent into production. The framework isn't the problem.


I see this pattern constantly. A team evaluates LangChain (131k GitHub stars), CrewAI (45.9k stars, fastest-growing), maybe AutoGen or Dify. They pick one. They build a working agent in a few days — sometimes a few hours. The demo goes well. Leadership is excited.

Then someone asks: "Great, how do we deploy this?"

And that's where it stalls. Because the framework gave them the agent logic, but none of the infrastructure they need to run it in production. No deployment pipeline. No credential management. No observability. No access control. No cost tracking.

Six months later, the agent is still running on someone's laptop.

What frameworks actually do

Let's be clear about what you get when you pick a framework:

LangChain/LangGraph gives you composable chains, stateful graph-based workflows with branching and checkpointing, and a massive ecosystem of integrations. LangSmith adds observability and evaluation.

CrewAI gives you role-based multi-agent coordination — think of it as modelling a team of people where each agent has a specific role and set of tools. You can get a working multi-agent system in about 20 lines of code.

AutoGen (now merging into Microsoft's Agent Framework) gives you conversation-based multi-agent patterns with GroupChat coordination.

These are all genuinely useful. The agent architecture, task orchestration, memory systems, tool integration, prompt management — frameworks handle this well.

But here's what none of them handle: what happens after you type git push.

The gap nobody writes comparison articles about

Go search "LangChain vs CrewAI" and you'll find dozens of articles. They compare programming models (graphs vs roles vs conversations), developer experience, GitHub stars, and benchmark performance. They're useful for choosing a framework.

They completely miss the production story.

A March 2026 survey of 650 enterprise technology leaders found that 78% have AI agent pilots running, but only 14% have reached production scale. Gartner predicts over 40% of agentic AI projects will be cancelled by end of 2027 — not because the models are bad, but because of "escalating costs, unclear business value, or inadequate risk controls."

The five gaps accounting for 89% of scaling failures? Integration complexity, inconsistent output quality at volume, absence of monitoring tooling, unclear organisational ownership, and insufficient domain training data. Notice that "chose the wrong framework" isn't on the list.

Here's what falls through the cracks between your framework and production:

Deployment infrastructure

Your CrewAI agent runs locally with python main.py. How do you containerise it? Where does it run — Kubernetes, Cloud Run, Lambda? How do you handle health checks for an agent that might be idle for hours between tasks? How do you scale to zero without losing state?

Frameworks are pip-installable packages. They're not deployment systems. You still need Dockerfiles, Kubernetes manifests, CI/CD pipelines, and runtime adapters for different cloud environments.

Credential management

Your agent needs API keys for OpenAI, Gmail OAuth tokens, Slack credentials, a search API key, and vector database auth. The framework gives you a place to pass these in — usually an environment variable or a config file.

But who manages those credentials in production? Who encrypts them at rest? Who scopes them so the finance agent can't see the HR bot's Google Drive token? Who rotates them? Who logs which agent accessed which credential and when?

74% of organisations say their agents receive more access than necessary. Only 17.8% use mTLS for agent-to-agent authentication. The framework doesn't solve this because it's not a framework problem — it's an infrastructure problem.

MCP server lifecycle

MCP has become the standard integration layer — the TypeScript SDK has 34,700+ dependent projects, and every major cloud vendor now supports it. Your framework lets you connect to an MCP server with a few lines of config.

But who deploys the MCP servers? Who manages authentication between your agent and the gateway? Who enforces which agents can access which MCP tools? Who audits tool invocations?

When Knostic scanned 2,000 internet-exposed MCP servers, every single one lacked authentication. The framework doesn't care — it just connects. The security posture is entirely up to you.

Governance and compliance

The EU AI Act's August 2026 deadline requires documented risk management, automatic logging, human oversight, and incident reporting for high-risk AI systems. Fines go up to 35 million euros.

LangChain doesn't generate compliance documentation. CrewAI doesn't produce audit trails. No framework provides built-in support for HIPAA, SOC 2, or GDPR.

Only 24.4% of organisations have full visibility into which AI agents communicate with each other. When the auditor asks "can you prove this agent's access to customer data was revoked on March 15th?" the answer needs to come from your infrastructure, not your framework.

Cost management at scale

Your agent works great in development where you're making a few API calls per test run. In production, it's handling hundreds of requests a day across multiple LLM providers and tool servers.

Without per-workspace token budgets, rate limiting, and cost tracking, you're one prompt loop away from a $4,000 surprise. The framework tracks tokens within a single execution — it doesn't give you aggregate cost visibility across all agents, all workspaces, over time.

What you actually need alongside your framework

The framework is the brain. You still need the body.

A deployment layer that can take your agent container and run it on Kubernetes, Cloud Run, or any container runtime — with proper health checks, auto-scaling, and namespace isolation.

A credential layer that encrypts secrets at rest, decrypts just-in-time at deploy time, scopes access per workspace, and logs every credential access to an audit trail.

A gateway layer that sits between your agents and MCP servers (or any external tools), validates authentication, enforces RBAC, and handles OAuth token passthrough.

An observability layer that tracks token usage, cost per execution, tool invocation chains, and policy violations — not just whether the pod is healthy.

A governance layer that produces the audit trails, access logs, and compliance documentation your security team and regulators need.

Some teams build this internally. The Digital Applied survey found it typically consumes 2-3 engineers for 6+ months — and organisations without a dedicated AI operations function were 6x more likely to experience incidents requiring rollbacks.

The analogy that clicks

Think about web frameworks circa 2015. Rails, Django, and Express gave you the application logic. But you still needed Heroku or AWS for deployment, Stripe for payments, Auth0 for authentication, Datadog for monitoring, and Terraform for infrastructure.

Nobody said "Rails is all you need for production." They said "Rails plus a platform."

Agent frameworks are in the same position. LangChain is your Rails. But you still need the platform layer to actually run it.

How to think about the choice

Don't agonise over LangChain vs CrewAI vs AutoGen. Pick the one that fits your team's programming model preference and move on. The framework choice matters less than you think — the successful 12% who scaled to production invested in "evaluation infrastructure, monitoring tooling, and operational staffing," not better prompt engineering.

Then invest the real effort in what sits around the framework: deployment, credentials, gateway, observability, governance. That's what separates the 14% who reach production from the 86% who don't.


Manny Maun is the Founder and CEO of BiznezStack — the agentic operations runtime for deploying, securing, and governing AI agents at scale.

Enjoyed this? Get more every week.

Agent Ops Weekly — practical insights on deploying, securing, and governing AI agents at scale. No spam, unsubscribe anytime.