What technologies were used in Enterprise Agentic Knowledge System?

LangGraph, Pydantic, LangSmith. Explicit state machines prevent runaway agent behavior

What were the results of Enterprise Agentic Knowledge System?

Production Reliability Without Sacrificing Capability Key metrics: 99% task completion, 0 data corruption incidents, < 5min knowledge sync time, 100% decision traceability.

How long did the Enterprise Agentic Knowledge System project take?

The project took Ongoing with a team of Variable. Structured, Observable, Recoverable Agents

Who built Enterprise Agentic Knowledge System?

This project was built by Nazmul Hoque Khan (Shuvro), a senior software engineer with 10+ years experience. Role: Founder & Principal Engineer at Sparrow Intelligence. Contact for similar projects: cal.com/nazmul

← All Case Studies

📖 3 min read 685 words

Enterprise AI 2024 Ongoing

Enterprise Agentic Knowledge System

Q: What was the challenge in the Enterprise Agentic Knowledge System project?

Clients came to us after failed attempts at agentic AI. Their agents hallucinated tool calls, entered infinite loops, and occasionally corrupted data. Demo-quality wasn't production-quality.

Q: How long did the Enterprise Agentic Knowledge System project take?

The project took Ongoing with a team of Variable. Structured, Observable, Recoverable Agents

Q: Who built Enterprise Agentic Knowledge System?

This project was built by Nazmul Hoque Khan (Shuvro), a senior software engineer with 10+ years experience. Role: Founder & Principal Engineer at Sparrow Intelligence. Contact for similar projects: cal.com/nazmul

Sparrow Intelligence — Founder & Principal Engineer

AI agents that work reliably in production, not just in demos

99% Task Completion

0 Data Corruption

< 5min Knowledge Sync

🎯 The Challenge

AI Agents That Break in Production

Clients came to us after failed attempts at agentic AI. Their agents hallucinated tool calls, entered infinite loops, and occasionally corrupted data. Demo-quality wasn't production-quality.

Key Pain Points

Agents would hallucinate actions that didn't correspond to available tools
Multi-step workflows would fail mid-execution with no recovery
No visibility into why an agent made a particular decision
Knowledge bases became stale within hours as source documents changed

💡 The Solution

Structured, Observable, Recoverable Agents

We built an agent orchestration framework emphasizing reliability over capability. Every action is validated, every decision is logged, and humans stay in the loop for high-stakes operations.

Technical Approach

Structured Output Validation

Every agent action is defined by a Pydantic schema. The LLM must produce valid JSON matching the schema, or the action is rejected and retried.

State Machine Architecture

LangGraph workflows define explicit states and transitions. No 'free-form' agent loops; every path is designed and testable.

Human-in-the-Loop Gates

High-stakes actions (data modification, external API calls) require human approval. Agents can request, but humans decide.

Real-Time Knowledge Sync

Event-driven ingestion from Confluence, Notion, GitHub. Changes propagate in under 5 minutes, not hours.

🛠️ Technology Stack

🚀 Core Technologies

LangGraph

Agent workflow orchestration

Explicit state machines prevent runaway agent behavior

Pydantic

Output schema validation

Type-safe action definitions that LLMs must conform to

LangSmith

Agent observability and debugging

Full trace of agent decisions, tool calls, and reasoning

🔧 Supporting Technologies

Python / FastAPI PostgreSQL / PGVector Redis RabbitMQ

☁️ Infrastructure

Model Context Protocol (MCP) Docker / Kubernetes

🏗️ Architecture

Agents operate within a controlled state machine, not free-form loops:

1
2
3
4
5
6
7
User Request → Intent Classification → Plan Generation → Human Review (optional)
                                                ↓
                          Execution Loop ← State Machine ← Checkpoints
                                ↓
                       Tool Call → Validation → Execute → Record
                                ↓
                          Response Generation → User

System Components

Intent Classifier

Determines if request requires agent workflow or simple retrieval

Plan Generator

Creates step-by-step execution plan with explicit tool sequence

Execution Engine

LangGraph state machine that executes plan with validation at each step

Knowledge Ingestion

Event-driven pipeline keeping vector stores synchronized

⚙️ Implementation Details

Structured Output Pattern

Every tool call is defined by a Pydantic model:

1
2
3
4
5
class DocumentAnalysis(BaseModel):
    summary: str = Field(description="2-3 sentence summary")
    entities: List[Entity] = Field(description="Extracted entities")
    confidence: float = Field(ge=0, le=1, description="Confidence score")
    reasoning: str = Field(description="Why this analysis")

The LLM is prompted to produce JSON matching this schema. If validation fails:

Parse error is fed back to LLM
LLM retries with correction guidance
After 3 failures, escalate to human

This eliminates most hallucinated actions.

LangGraph State Machine

Instead of while True: action = agent.decide(), we define explicit states:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
workflow = StateGraph(AgentState)

workflow.add_node("classify", classify_intent)
workflow.add_node("plan", generate_plan)
workflow.add_node("review", human_review)  # conditional
workflow.add_node("execute", execute_step)
workflow.add_node("respond", generate_response)

workflow.add_edge("classify", "plan")
workflow.add_conditional_edges(
    "plan",
    needs_review,
    {True: "review", False: "execute"}
)
workflow.add_edge("execute", check_completion)
workflow.add_edge("respond", END)

Benefits:

Every possible path is visible and testable
No infinite loops possible
Checkpoints enable resume after failures

📊 Results & Impact

Production Reliability Without Sacrificing Capability

99% Task Completion Of attempted agent workflows

0 Data Corruption Incidents With validation + HITL

< 5min Knowledge Sync Time Source change to searchable

100% Decision Traceability Full audit trail via LangSmith

Additional Outcomes

Clients report confident deployment of AI agents in customer-facing systems
Legal and compliance teams approve due to audit trail completeness
Knowledge workers trust agent outputs because they can verify reasoning

📚 Key Takeaways

Constrain First, Expand Later

Start with tight guardrails and explicit workflows. Loosen constraints only when you've proven reliability. It's easier to unlock capability than to add safety after the fact.

Observability Is Non-Negotiable

If you can't trace exactly why an agent took an action, you can't debug production issues. LangSmith paid for itself in the first week.

Humans Should Approve, Not Monitor

HITL isn't about humans watching everything. It's about strategic checkpoints for high-stakes decisions. The agent does the work; humans make the calls.

📝 Additional Details

The Problem with “Demo AI”

Every week, a potential client shows me an AI agent demo. It’s impressive; the agent navigates complex workflows, calls tools, synthesizes information. Then I ask: “How does it behave when the user asks something unexpected?” Usually, the answer is a nervous laugh.

Demo AI and production AI are different species.

What Goes Wrong with Naive Agents

Hallucinated Actions

Free-form agent loops (while True: think → act → observe) give LLMs too much freedom. They invent tool calls that don’t exist, parameters that don’t make sense, and actions that corrupt data.

Infinite Loops

Without explicit termination conditions, agents get stuck. “I need more information” → search → “I need more information” → search → forever.

No Audit Trail

When an agent makes a mistake in production, you need to understand why. “The LLM decided” isn’t an acceptable answer for compliance teams.

Stale Knowledge

Agents are only as good as their knowledge base. If documents change but embeddings don’t update, agents confidently return outdated information.

Our Approach: Constrained by Design

We built agents with reliability as the primary design goal, not capability.

Structured Outputs, Not Free Text

Every agent action has a schema:

1
2
3
4
5
6
7
8
9
class SearchQuery(BaseModel):
    query: str
    filters: Optional[Dict[str, str]]
    max_results: int = Field(default=10, le=50)
    
class DocumentUpdate(BaseModel):
    document_id: str
    changes: Dict[str, Any]
    reason: str  # Required justification

The LLM must produce valid JSON matching the schema. If it can’t, the action doesn’t happen. This eliminates:

Made-up tool names
Invalid parameters
Actions without explanations

State Machines, Not Loops

Free-form loops are dangerous. Instead, we define explicit state machines:

States: classify → plan → (optional: review) → execute → respond Transitions: Explicit conditions for each edge Checkpoints: Save state after each step for recovery

This means:

Every possible execution path is visible
No infinite loops (state transitions are bounded)
Failures can resume from last checkpoint

Strategic Human-in-the-Loop

Not every action needs human approval; that would defeat the purpose. But high-stakes operations do:

Human Required:

Modifying production data
Sending external communications
Actions with regulatory implications

Human Optional:

Analyzing documents
Summarizing information
Internal knowledge retrieval

The agent does the work; humans make the final call when it matters.

Real-Time Knowledge Sync

Knowledge bases must stay current. Our ingestion pipeline:

Webhooks from Confluence, Notion, GitHub
Change detection: only process modified documents
Incremental indexing: update embeddings for changed chunks
Propagation time: < 5 minutes from source change to searchable

This means agents always have current information.

Results in Production

Quantitative

99% task completion rate: agents finish what they start
Zero data corruption incidents: validation catches errors
< 5 minute knowledge freshness: no stale information
100% decision traceability: every action has an audit trail

Qualitative

Compliance Teams Love It: Full audit trails mean legal and compliance can review any agent decision. This was a blocker for many enterprise deployments.

Users Trust the Output: Because they can see the reasoning (via LangSmith traces), knowledge workers trust agent summaries and analyses.

Engineers Sleep Better: Structured outputs and state machines mean fewer 3 AM pages about agents gone rogue.

Key Architecture Principles

1. Define Actions, Not Capabilities

Don’t tell the agent “you can do anything.” Define specific actions with specific schemas. The agent’s capability is exactly the union of its defined tools.