It used to take a lot of effort to get your first PR merged in open source. Now you can ship something real in a weekend thanks to coding agents li...
For further actions, you may consider blocking this person and/or reporting abuse
Finding high-quality, maintained agent tools right now is a minefield of hype, so this list is a breath of fresh air. Love the structure and how you broke down the alternatives too. Phenomenal write-up, thanks for sharing your daily exploration with us!
great to hear that! tbh most of us won't even need half of these but knowing what exists saves a ton of time. spent months exploring these so hopefully this helps someone out there :)
you should definitely try agent-skills & taste-skill. I use them almost every time I'm building side projects.
I use agent skills.
will use taste-skill it's seems interesting.
yeah. using those skills will help you avoid ai slop websites, like gpt-taste (for gsap animations), high-end-visual design.. I have tried all of them.
let me also share something I have been using. using this prompt in chatgpt will give you a lot of cool assets. then you can ask it to export those assets without bg and voila.. your website will look far better :)
here are some samples.
websites like open-design.ai has been built using same method.
whoa!!! this is crazy.
I will be sure to check these out thankyou for these knowledge.
This is a useful ecosystem overview because the tooling landscape is getting fragmented very quickly. I especially liked the inclusion of MCP-aware tooling and generative UI runtimes because many discussions still focus only on orchestration frameworks themselves. One thing I keep noticing is that observability and debugging tooling still feels underrepresented compared to orchestration, evals, and memory layers. Iāve been exploring that local inspection gap in TypeScript with agent-inspect, particularly around execution trees and tool-call traces. Curious which tooling category you think becomes most important over the next year.
thanks!! most blogs about this only cover the backend which is weird to me, there is a lot more that makes the overall system better.
I think observability is mostly a maturity thing. most devs aren't at production scale yet so they skip it. last I used langsmith was for basic smoke testing and validating agent responses on one of my projects.
I personally believe harness + skills will be very useful by the end of this year. and the models are getting good, really good at using those skills (like I have only seen it couple of times missing those strict rules)
Nice article Anmol
yay! means a lot, took forever to put together :)
Not a big fan of listicles, but some really solid projects in here. Wish there were MCP servers like Context7 - I use it all the time. Just curious have you personally used LiveKit Agents / Pipecat?
thanks. I used LiveKit agents a few months ago and Pipecat very recently.. still need to learn more since I wasn't able to try subagents last time
Solid list. One category that's going to matter more in the next six months: tools for inspecting and replaying agent runs locally. Once you have 3+ agents chained, the bottleneck moves from "which framework" to "what actually happened in run #47 last Tuesday." Anything you've found useful for that?
Great piece!
Exactly what I've been looking for. Thanks Anmol!!
tried my best to include all the awesome repos I found in the past few months. my personal favorite among the list is agent-skills by Addy Osmani & sutando. thanks for reading!!
Anmol, this has been sitting on my reading list for a while and I finally got around to reading it š
Really enjoyed it. You can tell a lot of time went into putting this together. I ended up opening quite a few tabs while reading. Thanks for keeping everything in one place and sharing it with the community š
the harness point for Deep Agents is the one teams learn the hard way ā we spent months swapping models on a document QA system before realizing chunking, reranking, and prompt structure were doing most of the work. swapping the model at the end shifted accuracy maybe 8%. redesigning the retrieval harness shifted it 30%. exact same pattern as the Terminal Bench 52ā66% jump you cited.
the DeepEval section deserves a callout: task completion and argument correctness metrics catch failures that hallucination metrics completely miss in agentic workflows.
curious which memory store you'd pick for temporal reasoning agents ā graphiti vs mem0 is the one i see teams get wrong most often. which did you end up recommending?
sir stop using ai to write lol (really don't mean to be rude)
by the way, I definitely recommend reading about agent harness on Addy's blog + langchain. they have covered it very well.
Good roundup. One thing I'd add from running these in anger: the framework choice matters way less than your tracing. I shipped nearly the same agent flow on two stacks, and what actually decided maintainability was whether I could see every tool call and token spend per step. Without that you debug blind. Would love a section on observability in a future version, it's the part people regret skipping.