Sridhar S

Posted on May 26 • Edited on May 27

My AI Agent Was Escalating Every Contract. One Decision Layer Fixed It 📑🤖📑🤖

#hermesagentchallenge #python #agents #ai

Hermes Agent Challenge Submission: Build With Hermes Agent

This is a submission for the Hermes Agent Challenge: Build With Hermes Agent

My Hermes Agent Couldn’t Decide Which Contracts Needed Legal Review. One Planning Layer Fixed It. 📑🤖

What I Built

While experimenting with enterprise AI agents, I noticed a common problem:

Contract reviews are painfully manual.

Vendor agreements, NDAs, MSAs, and SOWs often require legal teams to manually inspect:

missing clauses
unclear liabilities
compliance gaps
termination conditions
SLA definitions

I wanted to see:

Can an AI agent intelligently decide what to review and when to escalate?

So I built an Enterprise Contract Intelligence Agent powered by Hermes Agent.

Instead of simply extracting text from contracts, the agent plans tasks, invokes tools, reasons through risks, and decides whether a contract actually requires legal review.

The interesting part?

My first version failed badly.

Hermes Agent was escalating almost every contract.

NDAs.

Vendor agreements.

Even low-risk contracts.

Technically the system worked.

Practically?

Completely unusable.

The issue turned out to be simple:

The agent lacked a confidence-based decision layer.

If a single clause looked risky, Hermes escalated immediately.

That created too many false positives.

So I redesigned the workflow.

Now Hermes Agent:

Reads the uploaded contract
Detects contract type
Extracts clauses
Identifies risk signals
Calculates confidence score
Determines escalation need
Generates executive summary

The result:

Hermes now behaves much more like a real enterprise analyst instead of a rule-based script.

Example output:

Contract Type:
Vendor Agreement

Risk Score:
7.2/10

Issues Found:
❌ Missing termination clause
❌ SLA definition unclear
⚠ Liability section weak

Confidence:
89%

Recommendation:
Escalate to Legal Review

For low-risk contracts:

Contract Type:
NDA

Risk Score:
2.1/10

Issues Found:
✅ Confidentiality present
✅ Termination clause present

Confidence:
94%

Recommendation:
Approved

Demo

Workflow

Contract PDF
        ↓
Hermes Master Agent
        ↓
Task Planning
        ↓
Clause Extraction
        ↓
Risk Detection
        ↓
Confidence Scoring
        ↓
Compliance Check
        ↓
Final Recommendation

Example Agent Plan

1. Read uploaded contract
2. Identify contract type
3. Extract important clauses
4. Detect missing sections
5. Evaluate business risk
6. Calculate confidence
7. Decide escalation

(Adding screenshots/video walkthrough soon 🚀)

Code

Repository:

https://github.com/radhirsh/Hermes_Agent.git

Example decision logic:

class ContractDecisionAgent:

    def should_escalate(
        self,
        risk_score,
        confidence
    ):

        if (
            risk_score > 0.7
            and confidence > 0.8
        ):

            return (
                "legal_review"
            )

        return (
            "approved"
        )

My Tech Stack

Hermes Agent
Python
Azure Document Intelligence
PDFPlumber
PyPDF
FastAPI / Streamlit
LangChain
OpenAI / Azure OpenAI

How I Used Hermes Agent

Hermes Agent sits at the center of the system.

Instead of hardcoding a workflow, I used Hermes for:

1. Planning

Hermes breaks the task into smaller reasoning steps.

Example:

Read contract
↓
Determine type
↓
Extract clauses
↓
Evaluate risk
↓
Decide escalation

2. Tool Use

Hermes invokes multiple tools dynamically:

parse_pdf()

extract_clauses()

risk_detector()

compliance_checker()

summary_generator()

Different contract types require different reasoning paths, and Hermes dynamically chooses what to do next.

3. Multi-Step Reasoning

The agent doesn't just summarize documents.

It reasons through:

missing legal clauses
business risk
confidence levels
escalation decisions

This felt like a much more realistic enterprise use case for AI agents.

One big lesson from building this:

Agentic systems become useful only when they can decide what to do next, not just generate text.

That’s where Hermes Agent really stood out for me.

Thanks for reading 🚀

hermesagentchallenge #devchallenge #agents #python

Top comments (13)

xulingfeng • May 26

Nice to see another Hermes user in the wild! We ran into a similar decision-fork problem with multi-agent memory writes. What worked for us was adding a lightweight planning step before the agent picks a tool — basically a 'stop and think' phase that doesn't burn a full turn. Your legal review use case is a great fit for that pattern.

Sridhar S • May 26

That’s a really interesting pattern — especially the lightweight “stop and think” phase before tool execution.

Right now my workflow is more sequential (classification → clause extraction → risk → escalation), but I can already see how adding a lightweight planning layer could reduce unnecessary escalations and improve decision confidence.

Something like:

Contract → Planning → Decide required tools → Execute → Final recommendation

Feels especially useful for ambiguous contracts where not every step may be needed.

Appreciate the insight — definitely going to experiment with this 🚀

xulingfeng • May 27

Awesome to hear someone else is running Hermes in production! The sqlitemem provider has been solid for us after dealing with some early membridge instability. What's your use case looking like — are you using it more for personal automation or team workflows?

Sridhar S • May 27 • Edited

Thanks! Great to hear Hermes has been stable for you in production too. We’re currently using it more for enterprise-style team workflows, specifically around a Contract Intelligence Agent for legal/compliance review. Hermes is acting as the orchestration layer for planning, tool-calling, risk evaluation, and escalation decisions rather than simple automation. One interesting challenge was reducing false positives — adding a confidence-based decision layer made the agent behave much closer to a real enterprise analyst 🚀

xulingfeng • May 27

Contract intelligence is a killer use case for Hermes — the structured decision-making is where it really shines. Our setup is lighter

xulingfeng • May 27

That's a super clean setup — Contract Intelligence with Hermes as the orchestration layer sounds like exactly the kind of workflow where it shines. The confidence-based decision layer for false positive reduction is a smart touch. Have you found any specific threshold tuning pattern that works best across different contract types, or does it vary a lot by domain?

Sridhar S • May 27

Appreciate that — and yes, the threshold tuning definitely varies by contract type and business risk tolerance. During experimentation, we found that using a single escalation threshold across all contracts created too many false positives, especially for low-risk documents like NDAs.

What worked better was a contract-type-aware confidence strategy. For example, NDAs can tolerate a slightly higher approval threshold if key clauses (confidentiality, termination, governing law) are present, whereas vendor agreements/MSAs require stricter risk sensitivity around SLAs, liability, indemnification, and compliance sections.

Still iterating, but the biggest learning so far has been: confidence scoring works best when combined with contract context, not as a universal number 🚀

xulingfeng • May 27

The single threshold problem is exactly what we ran into too — different contract types need different thresholds because the cost of

xulingfeng • May 28

Great discussion! Your threshold-tuning approach is really practical — I might borrow that idea. Followed you 💪

xulingfeng • May 28

Really sorry to hear that — I am in a similar spot actually. Our company has no active projects right now so things feel pretty uncertain on my end too. It is a weird time in tech. But honestly, seeing how deep you are into Hermes and contract intelligence, I think you have the right skills to land on your feet. The fact that you are already building with AI instead of waiting to see what happens puts you ahead of most people. Hang in there man 💪

Sridhar S • May 29

Sorry to hear things feel uncertain on your side too — definitely a weird phase in tech right now. I’m trying to look at it as a push to learn faster and build more deeply in areas I genuinely enjoy, especially Agentic AI and enterprise workflows. Been spending time building things like contract intelligence agents and AP automation systems to stay hands-on.

Hoping things turn around for both of us soon. Wishing you solid projects and stability ahead too 🚀

Sridhar S • May 28

Appreciate it 💪 Ironically, while we’re building with AI, I actually lost my job yesterday as AI starts changing roles. Tough moment, but trying to learn, adapt, and keep moving forward

xulingfeng • May 29

Thanks for the kind words — and your AP automation work sounds right up the same alley. 3-way reconciliation (PO-GRN-Invoice) is exactly the kind of high-stakes workflow where "looks right but drifted" is the scariest failure mode.

We've been hitting the same problem in contract validation: the model outputs a summary that reads fine, but over time the confidence distribution shifts. The financial domain is unforgiving for that kind of silent regression.

Would love to hear how you're approaching the confidence threshold question on the reconcile side. Are you using a fixed cutoff or something adaptive?

View full discussion (13 comments)