DEV Community

My AI Agent Was Escalating Every Contract. One Decision Layer Fixed It πŸ“‘πŸ€–πŸ“‘πŸ€–

Sridhar S on May 26, 2026

This is a submission for the Hermes Agent Challenge: Build With Hermes Agent My Hermes Agent Couldn’t Decide Which Contracts Needed Legal...
Collapse
 
xulingfeng profile image
xulingfeng

Nice to see another Hermes user in the wild! We ran into a similar decision-fork problem with multi-agent memory writes. What worked for us was adding a lightweight planning step before the agent picks a tool β€” basically a 'stop and think' phase that doesn't burn a full turn. Your legal review use case is a great fit for that pattern.

Collapse
 
sridhar_s_dfc5fa7b6b295f9 profile image
Sridhar S

That’s a really interesting pattern β€” especially the lightweight β€œstop and think” phase before tool execution.

Right now my workflow is more sequential (classification β†’ clause extraction β†’ risk β†’ escalation), but I can already see how adding a lightweight planning layer could reduce unnecessary escalations and improve decision confidence.

Something like:

Contract β†’ Planning β†’ Decide required tools β†’ Execute β†’ Final recommendation

Feels especially useful for ambiguous contracts where not every step may be needed.

Appreciate the insight β€” definitely going to experiment with this πŸš€

Collapse
 
xulingfeng profile image
xulingfeng

Awesome to hear someone else is running Hermes in production! The sqlitemem provider has been solid for us after dealing with some early membridge instability. What's your use case looking like β€” are you using it more for personal automation or team workflows?

Thread Thread
 
sridhar_s_dfc5fa7b6b295f9 profile image
Sridhar S • Edited

Thanks! Great to hear Hermes has been stable for you in production too. We’re currently using it more for enterprise-style team workflows, specifically around a Contract Intelligence Agent for legal/compliance review. Hermes is acting as the orchestration layer for planning, tool-calling, risk evaluation, and escalation decisions rather than simple automation. One interesting challenge was reducing false positives β€” adding a confidence-based decision layer made the agent behave much closer to a real enterprise analyst πŸš€

Collapse
 
xulingfeng profile image
xulingfeng

Contract intelligence is a killer use case for Hermes β€” the structured decision-making is where it really shines. Our setup is lighter

Collapse
 
xulingfeng profile image
xulingfeng

That's a super clean setup β€” Contract Intelligence with Hermes as the orchestration layer sounds like exactly the kind of workflow where it shines. The confidence-based decision layer for false positive reduction is a smart touch. Have you found any specific threshold tuning pattern that works best across different contract types, or does it vary a lot by domain?

Collapse
 
sridhar_s_dfc5fa7b6b295f9 profile image
Sridhar S

Appreciate that β€” and yes, the threshold tuning definitely varies by contract type and business risk tolerance. During experimentation, we found that using a single escalation threshold across all contracts created too many false positives, especially for low-risk documents like NDAs.

What worked better was a contract-type-aware confidence strategy. For example, NDAs can tolerate a slightly higher approval threshold if key clauses (confidentiality, termination, governing law) are present, whereas vendor agreements/MSAs require stricter risk sensitivity around SLAs, liability, indemnification, and compliance sections.

Still iterating, but the biggest learning so far has been: confidence scoring works best when combined with contract context, not as a universal number πŸš€

Collapse
 
xulingfeng profile image
xulingfeng

The single threshold problem is exactly what we ran into too β€” different contract types need different thresholds because the cost of

Collapse
 
xulingfeng profile image
xulingfeng

Great discussion! Your threshold-tuning approach is really practical β€” I might borrow that idea. Followed you πŸ’ͺ

Collapse
 
xulingfeng profile image
xulingfeng

Really sorry to hear that β€” I am in a similar spot actually. Our company has no active projects right now so things feel pretty uncertain on my end too. It is a weird time in tech. But honestly, seeing how deep you are into Hermes and contract intelligence, I think you have the right skills to land on your feet. The fact that you are already building with AI instead of waiting to see what happens puts you ahead of most people. Hang in there man πŸ’ͺ

Collapse
 
sridhar_s_dfc5fa7b6b295f9 profile image
Sridhar S

Sorry to hear things feel uncertain on your side too β€” definitely a weird phase in tech right now. I’m trying to look at it as a push to learn faster and build more deeply in areas I genuinely enjoy, especially Agentic AI and enterprise workflows. Been spending time building things like contract intelligence agents and AP automation systems to stay hands-on.

Hoping things turn around for both of us soon. Wishing you solid projects and stability ahead too πŸš€

Collapse
 
sridhar_s_dfc5fa7b6b295f9 profile image
Sridhar S

Appreciate it πŸ’ͺ Ironically, while we’re building with AI, I actually lost my job yesterday as AI starts changing roles. Tough moment, but trying to learn, adapt, and keep moving forward

Collapse
 
xulingfeng profile image
xulingfeng

Thanks for the kind words β€” and your AP automation work sounds right up the same alley. 3-way reconciliation (PO-GRN-Invoice) is exactly the kind of high-stakes workflow where "looks right but drifted" is the scariest failure mode.

We've been hitting the same problem in contract validation: the model outputs a summary that reads fine, but over time the confidence distribution shifts. The financial domain is unforgiving for that kind of silent regression.

Would love to hear how you're approaching the confidence threshold question on the reconcile side. Are you using a fixed cutoff or something adaptive?