Ashpreet Bedi · @ashpreetbedi
The 7 Sins of Agentic Software
"Demos are easy. Production is hard" is the most recycled line in AI.
After three years building agent infrastructure, here's the truth:
Production is not the problem.
Distributed systems have always been hard.
The problem is pretending demos represent production.
Demos hide the infrastructure.
They hide state.
They hide cost.
They hide isolation.
They hide the failure modes.
Here are 7 failure modes that show up when demo meets production.
1. Treating a system like a script
The first sin is underestimating the scope of what you're building.
Most agents start as python scripts. You run it locally, it calls a model, runs tools, and returns a response. It works. You demo it. Everyone's impressed.
Then someone asks: "Can we ship this?"
So you wrap it in FastAPI and write endpoints for:
- Chat
- Session management
- Cancellation and resume
- File uploads
- Auth
One endpoint becomes five. Five become fifteen.
Then traffic spikes. Rate limits. 429s everywhere.
Now you are building queues. Backpressure. Retries. Caching. Cost controls.
Your 200-line demo just became 2,000 lines of infrastructure.
Most agent builders think they are writing isolated programs. In reality, they are building stateful distributed systems.
2. Forcing agents into traditional request-response
The second sin is assuming traditional web semantics apply to agents.
Traditional software gets a request, does some work, returns a response. Agents break that contract.
Agents think, stream tool calls, spawn sub-agents, retrieve memory and change direction mid-execution.
Streaming handles latency and is mostly solved. Start with SSE. Move to WebSockets when you need bidirectional control.
But some tasks require background execution and polling.
"Analyze this dataset and email me a report"
This is a long-running background task. Now you need:
- Background execution
- Job queues
- Progress tracking
- Completion guarantees
Add human approval and minutes become days.
Agents are not request handlers. They are long-running computations that may span sessions, humans, and systems.
3. Ignoring persistence
The third sin is ignoring durability.
Demo agents run fresh every time. Production agents do not. They live across sessions. They accumulate context. They mutate state. They remember.
That means you must persist:
- Inputs and outputs
- Intermediate artifacts
- State transitions
- Execution traces
If your agent crashes on step 12 of 15, you must know exactly where it was.
You need checkpoints, replays, and resume semantics.
Restarting is not acceptable. Restarting might duplicate a side effect. Restarting might lose critical context.
But persistence is not only about recovery. Durable state lets you:
- Compress context instead of replaying the full history
- Debug failures
- Extract successful runs into reusable few-shot patterns
- Analyze latency, token usage, and tool behavior to optimize cost
Without persistence, every run starts from zero. With persistence, every run can become cheaper, safer, and smarter.
An agent without durability is a demo. An agent with durability is a system.
You are no longer serving responses. You are maintaining long-lived computation.
4. Ignoring multi-tenancy
The fourth sin is ignoring multi-tenancy.
Demo agents serve one user. Production agents serve thousands. User A's context cannot leak into User B's experience.
Passing a user_id is easy. Isolating every resource the agent touches is not:
- Sessions
- Memory
- Knowledge
- Vector search
- Tool outputs
- Cached artifacts
Your database was not designed for multi-tenant agent workloads. Your vector store was not either. Your model provider definitely was not.
So you build isolation yourself: namespaces, resource scoping, RBAC, policy enforcement.
One missing filter. One incorrect join. Now you are writing an incident report.
Isolation is optional in a demo. It is critical in production.
5. Confusing reasoning with execution
The fifth sin is treating reasoning like execution.
Not every tool call should auto-execute.
"Look up a record" is fine.
"Delete a record" is not the same decision.
"Issue a $50 refund" might be acceptable.
"Issue a $5,000 refund" is not.
If your agent can act, it can cause damage.
Your runtime must express policy:
- Which actions are free?
- Which require user confirmation?
- Which require admin approval?
When an action is blocked, the agent cannot crash. It must pause, persist state, wait for approval, and resume exactly where it left off.
Not restart. Resume. Because restarting might issue the refund twice.
Governance is not a feature. It is part of the execution model.
6. Assuming horizontal scaling is trivial
The sixth sin is assuming horizontal scaling is trivial.
Agents are stateful. Cloud infrastructure assumes statelessness. Those assumptions conflict.
The obvious solution: externalize all state, make the app layer stateless, let any instance resume any session. In theory, scaling is solved.
In practice:
- One cached artifact in memory.
- One missing write.
- One assumption about local state.
It works perfectly on one instance. You deploy a second.
Now sessions drift. State disappears. Runs resume incorrectly. Behavior becomes unpredictable and only your users can tell.
7. Confusing observability with trust
The seventh sin is confusing visibility with safety.
Agents are non-deterministic. The same input can produce different outputs depending on context, memory, and model behavior.
The industry response has been observability: trace everything, log everything, run evaluations. Yes, you need all of that.
But observability is retrospective. It explains what happened. Trust constrains what is allowed to happen.
That means:
- Input validation before reasoning
- Guardrails on every step
- Output checks before responses
- Confidence thresholds that halt execution
- Post-response evaluations that catch drift early
The agent should stop itself on a bad call. Not log it for someone to discover later.
Observability tells you the past. Trust guarantees correct execution.
These are runtime problems
None of these sins are solved by better prompts.
Calling a model is easy.
Executing a tool is easy.
Returning a response is easy.
Serving agents.
Managing sessions.
Scoping users.
Enforcing governance.
Externalizing state.
Embedding trust into execution.
Those are runtime problems. Infrastructure problems. Language problems.
All seven sins come from one mistake: treating agents like a feature instead of a new type of software.
The winners in agentic software will not be the best prompt engineers.
They will be the systems engineers.
Building Agents with Pydantic AI
Pydantic AI is a Python framework that brings a FastAPI-like developer experience to building production-grade AI agents. It uses Pydantic for type-safe, validated, structured outputs and is model-agnostic — works with OpenAI, Google Gemini, Anthropic, Groq, and others.
Why Pydantic AI
- Type safety: agent inputs, outputs, and dependencies are all typed and validated with Pydantic models.
- Dependency injection: runtime context (database connections, API keys, user info) is passed cleanly via
RunContext— no globals, no hacks. - Structured output: define a Pydantic
BaseModelas the output type and the agent is guaranteed to return validated data, not raw text. - Tool registration: register Python functions as tools the LLM can call. Pydantic validates tool arguments automatically and passes errors back to the LLM so it can retry.
- Model-agnostic: swap the model string and your agent works with a different provider — no code changes.
Core concepts
1. Agent — the central object
An Agent wraps a model, a system prompt, dependencies, tools, and an output type into one reusable unit.
from pydantic_ai import Agent
agent = Agent(
'openai:gpt-4o', # model string — swap freely
system_prompt='You are a helpful assistant.',
)
result = agent.run_sync('What is 2 + 2?')
print(result.output)
#> 2 + 2 equals 4.
2. Structured output — validated responses
Define a Pydantic model and the agent will always return a validated instance, not free-form text.
from pydantic import BaseModel, Field
from pydantic_ai import Agent
class CityInfo(BaseModel):
name: str
country: str
population: int = Field(description='Estimated population')
agent = Agent('openai:gpt-4o', output_type=CityInfo)
result = agent.run_sync('Tell me about Tokyo')
print(result.output)
#> name='Tokyo' country='Japan' population=13960000
3. Dependencies — runtime context via injection
Dependencies let you pass runtime objects (DB connections, API clients, user IDs) into system prompts
and tools without globals or closures. The RunContext carries them.
from dataclasses import dataclass
from pydantic_ai import Agent, RunContext
@dataclass
class MyDeps:
user_name: str
api_key: str
agent = Agent('openai:gpt-4o', deps_type=MyDeps)
@agent.system_prompt
def personalise(ctx: RunContext[MyDeps]) -> str:
return f"The user's name is {ctx.deps.user_name}."
result = agent.run_sync('Say hello', deps=MyDeps('Emad', 'sk-...'))
print(result.output)
#> Hello Emad!
4. Tools — functions the LLM can call
Register Python functions as tools with @agent.tool (has access to RunContext)
or @agent.tool_plain (no context needed). Pydantic validates every argument
the LLM passes, and sends validation errors back so the LLM can self-correct.
import random
from pydantic_ai import Agent, RunContext
agent = Agent(
'openai:gpt-4o',
deps_type=str,
system_prompt=(
"You're a dice game. Roll the die and see if it matches "
"the user's guess. Use the player's name in the response."
),
)
@agent.tool_plain # no context needed
def roll_dice() -> str:
"""Roll a six-sided die and return the result."""
return str(random.randint(1, 6))
@agent.tool # has access to RunContext
def get_player_name(ctx: RunContext[str]) -> str:
"""Get the player's name."""
return ctx.deps
result = agent.run_sync('My guess is 4', deps='Anne')
print(result.output)
#> Tough luck, Anne, you rolled a 2. Better luck next time.
5. Full example — bank support agent
This ties everything together: dependencies (customer ID + DB connection), structured output (advice + risk level + card block flag), dynamic system prompt, and a tool the LLM calls to look up the balance.
from dataclasses import dataclass
from pydantic import BaseModel, Field
from pydantic_ai import Agent, RunContext
from bank_database import DatabaseConn
@dataclass
class SupportDependencies:
customer_id: int
db: DatabaseConn
class SupportOutput(BaseModel):
support_advice: str = Field(description='Advice returned to the customer')
block_card: bool = Field(description="Whether to block the customer's card")
risk: int = Field(description='Risk level of query', ge=0, le=10)
support_agent = Agent(
'openai:gpt-4o',
deps_type=SupportDependencies,
output_type=SupportOutput,
system_prompt=(
'You are a support agent in our bank, give the '
'customer support and judge the risk level of their query.'
),
)
@support_agent.system_prompt
async def add_customer_name(ctx: RunContext[SupportDependencies]) -> str:
name = await ctx.deps.db.customer_name(id=ctx.deps.customer_id)
return f"The customer's name is {name!r}"
@support_agent.tool
async def customer_balance(
ctx: RunContext[SupportDependencies], include_pending: bool
) -> float:
"""Returns the customer's current account balance."""
return await ctx.deps.db.customer_balance(
id=ctx.deps.customer_id,
include_pending=include_pending,
)
async def main():
deps = SupportDependencies(customer_id=123, db=DatabaseConn())
result = await support_agent.run('What is my balance?', deps=deps)
print(result.output)
#> support_advice='Hello John, your balance is $123.45.' block_card=False risk=1
result = await support_agent.run('I just lost my card!', deps=deps)
print(result.output)
#> support_advice="I'm sorry John. Blocking your card now." block_card=True risk=8
Key takeaway: Pydantic AI treats agents as typed, testable, dependency-injected units — not loose prompt strings. The framework handles validation, tool dispatch, retries on bad tool arguments, and structured output enforcement so you can focus on the business logic.