Cipher: The Jarvis with a Hermes Core

#hermesagentchallenge #devchallenge #agents

Hermes Agent Challenge Submission: Build With Hermes Agent

This is a submission for the Hermes Agent Challenge: Build With Hermes Agent

What I Built

I used to love the Avengers Iron Man movie growing up, and thought how fun, and enjoyable it would be to have something like that in real life. Back then, that Idea, even with predictable AI was still kind of a joke. But now, with Hermes acting as the brain and core of the operation, I built a Jarvis like agent called Cipher. Kokoro acts as Ciphers voice, but takes a little while to respond. Patience with the baby LLM that's giving life to a voice. He has access to my daily life. I can access him on my local desktop or via a secure tunnel online. The main core setup for this is making sure whisper (the voice skill) is in place, and Kokoro server is installed.

I've never heard my agent officially talk to me and that was the big learning I took from this project was how to set up Whisper and Kokoro for Hermes.

You want to know the amazing thing about it? Hermes will build it out for you with the right prompts and repository. I found this project and made it my own. You can find the jarvis project's github by clicking here. It's got endpoints built in for voice servers so you can attach Kokoro pretty easily. Let Hermes do the grunt work of setting up Kokoro. Just make sure to not spin up empty ghost procs or you might end up in a graveyard 😆 (screenshot reference below).

Let's show you a live demo of Cipher, and let him say hi to everyone here at Dev. Check out the video demo!

Demo

My Code and Tech Stack

This was the backbone structure for the code used to create Jarvis. Hermes help set it up for itself.
Jarvis Project's Github

This was built on top of a python project. I had Hermes integrate the repository and built it on top of a react vite front-end.

How I Used Hermes Agent

Hermes is the brain to my project. It remembers and the future of the project is only going to grow. It will have a heartbeat that will confirm I'm there by asking and giving me daily updates and ways it has found to improve itself, and eventually live as a desktop agent on my native desktop.

Top comments (2)

Harjot Singh • May 31

The "Jarvis" framing is the dream everyone reaches for, and the interesting engineering reality is that the magic isn't the personality - it's the orchestration core underneath ("Hermes core" is the right instinct). A Jarvis that feels seamless is really a router + memory + tool-dispatch system with a conversational skin; the assistant vibe is the last 5%, the harness is the 95%.

The trap I'd watch for: a personal-assistant agent accretes capabilities until context bloat and tool-selection ambiguity make it slower and dumber than a focused one. The fix is the same as any multi-agent system - scope context per task, route to the right model/tool, don't make one mega-agent hold everything. That's the architecture under Moonshift (a multi-agent pipeline: prompt to a shipped SaaS on your own GitHub + Vercel) and it's why routing keeps even a complex build ~$3 flat. First run's free, no card. Cool project - is Cipher one big agent with many tools, or a crew of specialized ones behind the Jarvis interface? That choice tends to decide how it scales.

John A Madrigal • May 31

I agree and already plan on a multi-agent setup. Hermes is new to me but I've been using a 6 agent setup with OpenClaw for the past 3 months. It's the route I'm going and I only gave myself a single weekend to build that with Hermes for this Challenge. Also trying to see how differently it works with the self improvement profile.

Honestly, the whole Idea is to make this into a native desktop app using Python and native phone app in Kotlin. And honestly it doesn't stop at the amount of agents, you need to be using multiple LLM models and different models based on the task that's being asked. So he has multiple tools called "skills" using skill.md files and he will also have a knowledge base where you can force the documents, images, and websites directly to him for certain skill calls.

There is still a lot of work to be done on the agent. It would honestly be best if I can find a way to make it into it's own OS with Hermes or Openclaw becoming the center operator for the OS. That's how I picture it when it's fully completed because that will be the most secure and give the most access to the agent, if the agent itself is a small time OS.