DEV Community

Sridhar S
Sridhar S

Posted on • Edited on

My AI Agent Was Escalating Every Contract. One Decision Layer Fixed It πŸ“‘πŸ€–πŸ“‘πŸ€–

Hermes Agent Challenge Submission: Build With Hermes Agent

This is a submission for the Hermes Agent Challenge: Build With Hermes Agent

My Hermes Agent Couldn’t Decide Which Contracts Needed Legal Review. One Planning Layer Fixed It. πŸ“‘πŸ€–

What I Built

While experimenting with enterprise AI agents, I noticed a common problem:

Contract reviews are painfully manual.

Vendor agreements, NDAs, MSAs, and SOWs often require legal teams to manually inspect:

  • missing clauses
  • unclear liabilities
  • compliance gaps
  • termination conditions
  • SLA definitions

I wanted to see:

Can an AI agent intelligently decide what to review and when to escalate?

So I built an Enterprise Contract Intelligence Agent powered by Hermes Agent.

Instead of simply extracting text from contracts, the agent plans tasks, invokes tools, reasons through risks, and decides whether a contract actually requires legal review.

The interesting part?

My first version failed badly.

Hermes Agent was escalating almost every contract.

NDAs.

Vendor agreements.

Even low-risk contracts.

Technically the system worked.

Practically?

Completely unusable.

The issue turned out to be simple:

The agent lacked a confidence-based decision layer.

If a single clause looked risky, Hermes escalated immediately.

That created too many false positives.

So I redesigned the workflow.

Now Hermes Agent:

  1. Reads the uploaded contract
  2. Detects contract type
  3. Extracts clauses
  4. Identifies risk signals
  5. Calculates confidence score
  6. Determines escalation need
  7. Generates executive summary

The result:

Hermes now behaves much more like a real enterprise analyst instead of a rule-based script.

Example output:

Contract Type:
Vendor Agreement

Risk Score:
7.2/10

Issues Found:
❌ Missing termination clause
❌ SLA definition unclear
⚠ Liability section weak

Confidence:
89%

Recommendation:
Escalate to Legal Review
Enter fullscreen mode Exit fullscreen mode

For low-risk contracts:

Contract Type:
NDA

Risk Score:
2.1/10

Issues Found:
βœ… Confidentiality present
βœ… Termination clause present

Confidence:
94%

Recommendation:
Approved
Enter fullscreen mode Exit fullscreen mode

Demo

Workflow

Contract PDF
        ↓
Hermes Master Agent
        ↓
Task Planning
        ↓
Clause Extraction
        ↓
Risk Detection
        ↓
Confidence Scoring
        ↓
Compliance Check
        ↓
Final Recommendation
Enter fullscreen mode Exit fullscreen mode

Example Agent Plan

1. Read uploaded contract
2. Identify contract type
3. Extract important clauses
4. Detect missing sections
5. Evaluate business risk
6. Calculate confidence
7. Decide escalation
Enter fullscreen mode Exit fullscreen mode

(Adding screenshots/video walkthrough soon πŸš€)


Code

Repository:

https://github.com/radhirsh/Hermes_Agent.git
Enter fullscreen mode Exit fullscreen mode

Example decision logic:

class ContractDecisionAgent:

    def should_escalate(
        self,
        risk_score,
        confidence
    ):

        if (
            risk_score > 0.7
            and confidence > 0.8
        ):

            return (
                "legal_review"
            )

        return (
            "approved"
        )
Enter fullscreen mode Exit fullscreen mode

My Tech Stack

  • Hermes Agent
  • Python
  • Azure Document Intelligence
  • PDFPlumber
  • PyPDF
  • FastAPI / Streamlit
  • LangChain
  • OpenAI / Azure OpenAI

How I Used Hermes Agent

Hermes Agent sits at the center of the system.

Instead of hardcoding a workflow, I used Hermes for:

1. Planning

Hermes breaks the task into smaller reasoning steps.

Example:

Read contract
↓
Determine type
↓
Extract clauses
↓
Evaluate risk
↓
Decide escalation
Enter fullscreen mode Exit fullscreen mode

2. Tool Use

Hermes invokes multiple tools dynamically:

parse_pdf()

extract_clauses()

risk_detector()

compliance_checker()

summary_generator()
Enter fullscreen mode Exit fullscreen mode

Different contract types require different reasoning paths, and Hermes dynamically chooses what to do next.

3. Multi-Step Reasoning

The agent doesn't just summarize documents.

It reasons through:

  • missing legal clauses
  • business risk
  • confidence levels
  • escalation decisions

This felt like a much more realistic enterprise use case for AI agents.

One big lesson from building this:

Agentic systems become useful only when they can decide what to do next, not just generate text.

That’s where Hermes Agent really stood out for me.

Thanks for reading πŸš€

hermesagentchallenge #devchallenge #agents #python

Top comments (13)

Collapse
 
xulingfeng profile image
xulingfeng

Nice to see another Hermes user in the wild! We ran into a similar decision-fork problem with multi-agent memory writes. What worked for us was adding a lightweight planning step before the agent picks a tool β€” basically a 'stop and think' phase that doesn't burn a full turn. Your legal review use case is a great fit for that pattern.

Collapse
 
sridhar_s_dfc5fa7b6b295f9 profile image
Sridhar S

That’s a really interesting pattern β€” especially the lightweight β€œstop and think” phase before tool execution.

Right now my workflow is more sequential (classification β†’ clause extraction β†’ risk β†’ escalation), but I can already see how adding a lightweight planning layer could reduce unnecessary escalations and improve decision confidence.

Something like:

Contract β†’ Planning β†’ Decide required tools β†’ Execute β†’ Final recommendation

Feels especially useful for ambiguous contracts where not every step may be needed.

Appreciate the insight β€” definitely going to experiment with this πŸš€

Collapse
 
xulingfeng profile image
xulingfeng

Awesome to hear someone else is running Hermes in production! The sqlitemem provider has been solid for us after dealing with some early membridge instability. What's your use case looking like β€” are you using it more for personal automation or team workflows?

Thread Thread
 
sridhar_s_dfc5fa7b6b295f9 profile image
Sridhar S • Edited

Thanks! Great to hear Hermes has been stable for you in production too. We’re currently using it more for enterprise-style team workflows, specifically around a Contract Intelligence Agent for legal/compliance review. Hermes is acting as the orchestration layer for planning, tool-calling, risk evaluation, and escalation decisions rather than simple automation. One interesting challenge was reducing false positives β€” adding a confidence-based decision layer made the agent behave much closer to a real enterprise analyst πŸš€

Collapse
 
xulingfeng profile image
xulingfeng

Contract intelligence is a killer use case for Hermes β€” the structured decision-making is where it really shines. Our setup is lighter

Collapse
 
xulingfeng profile image
xulingfeng

That's a super clean setup β€” Contract Intelligence with Hermes as the orchestration layer sounds like exactly the kind of workflow where it shines. The confidence-based decision layer for false positive reduction is a smart touch. Have you found any specific threshold tuning pattern that works best across different contract types, or does it vary a lot by domain?

Collapse
 
sridhar_s_dfc5fa7b6b295f9 profile image
Sridhar S

Appreciate that β€” and yes, the threshold tuning definitely varies by contract type and business risk tolerance. During experimentation, we found that using a single escalation threshold across all contracts created too many false positives, especially for low-risk documents like NDAs.

What worked better was a contract-type-aware confidence strategy. For example, NDAs can tolerate a slightly higher approval threshold if key clauses (confidentiality, termination, governing law) are present, whereas vendor agreements/MSAs require stricter risk sensitivity around SLAs, liability, indemnification, and compliance sections.

Still iterating, but the biggest learning so far has been: confidence scoring works best when combined with contract context, not as a universal number πŸš€

Collapse
 
xulingfeng profile image
xulingfeng

The single threshold problem is exactly what we ran into too β€” different contract types need different thresholds because the cost of

Collapse
 
xulingfeng profile image
xulingfeng

Great discussion! Your threshold-tuning approach is really practical β€” I might borrow that idea. Followed you πŸ’ͺ

Collapse
 
xulingfeng profile image
xulingfeng

Really sorry to hear that β€” I am in a similar spot actually. Our company has no active projects right now so things feel pretty uncertain on my end too. It is a weird time in tech. But honestly, seeing how deep you are into Hermes and contract intelligence, I think you have the right skills to land on your feet. The fact that you are already building with AI instead of waiting to see what happens puts you ahead of most people. Hang in there man πŸ’ͺ

Collapse
 
sridhar_s_dfc5fa7b6b295f9 profile image
Sridhar S

Sorry to hear things feel uncertain on your side too β€” definitely a weird phase in tech right now. I’m trying to look at it as a push to learn faster and build more deeply in areas I genuinely enjoy, especially Agentic AI and enterprise workflows. Been spending time building things like contract intelligence agents and AP automation systems to stay hands-on.

Hoping things turn around for both of us soon. Wishing you solid projects and stability ahead too πŸš€

Collapse
 
sridhar_s_dfc5fa7b6b295f9 profile image
Sridhar S

Appreciate it πŸ’ͺ Ironically, while we’re building with AI, I actually lost my job yesterday as AI starts changing roles. Tough moment, but trying to learn, adapt, and keep moving forward

Collapse
 
xulingfeng profile image
xulingfeng

Thanks for the kind words β€” and your AP automation work sounds right up the same alley. 3-way reconciliation (PO-GRN-Invoice) is exactly the kind of high-stakes workflow where "looks right but drifted" is the scariest failure mode.

We've been hitting the same problem in contract validation: the model outputs a summary that reads fine, but over time the confidence distribution shifts. The financial domain is unforgiving for that kind of silent regression.

Would love to hear how you're approaching the confidence threshold question on the reconcile side. Are you using a fixed cutoff or something adaptive?