This is a submission for the Google I/O Writing Challenge
Everyone walked away from Google I/O 2026 talking about Gemini 3.5 Flash benchmarks. ...
For further actions, you may consider blocking this person and/or reporting abuse
The part where the agent read your git status and project structure before proposing the hybrid approach (LLM skill + static Python checker + git hook) is exactly how I'd expect a competent engineer to scope the work — not just generate code, but understand the constraint surface first. I've been running static analysis at scale for years, and the hardest part is always the same: getting developers to actually run the checks before pushing. A pre-commit hook that an AI wired up end-to-end, including the enforcement layer, is legitimately useful if it doesn't produce a flood of false positives. The real test is whether that
check-a11y.pyscript is maintainable six months from now when WCAG 2.2 rules change or your component patterns evolve.Exactly. The impressive part wasn’t the codegen, it was the agent understanding the repo and constraint surface before deciding on architecture. Hybrid enforcement (LLM skill + deterministic checker + git hook) is the only approach that scales realistically. The real benchmark isn’t “does it work today”, it’s whether that checker survives evolving WCAG rules and component drift 6 months later without becoming noise developers bypass.
I've been loving it, currently porting the SDK to windows (a rather annoying task, thanks Google for the support) to create a swarmer and mobile interface, so it's easier to interface with and manage multiple projects at once, without sitting at my desk. That custom skills system fits perfectly with my framework and what I built over the past year, I had built the system from scratch to utilize Vertex AI, but with the new Gemini Builds, I'm out of the cloud market, so pivoting to Sovereign systems, where those skills make a huge difference for multi-agent management. Orchestrator no longer needs to share the scope of a worker agent and all worker agents dont need to be confined to the same skillset. It's awesome!
Exactly. The biggest win is capability isolation. Orchestrator no longer needs full worker scope, and workers no longer need bloated shared cognition. Skills turn agents into modular execution contexts instead of monolithic assistants. That changes multi-agent orchestration completely.
Not to mention for scalability. My previous swarm system had to manually assign skills and the customized MCP protocol at runtime, but it was set skills, set MCP, now it's modular and dynamic, I can have the orchestrator create a custom skill file (Which I highly recommend if you use Gemini 3.5 Flash. I suspect it's a MoE, which explains the speed and Pro level knowledge), which can cut your baseline context window considerably, while improving code quality. Eg. My pipeline uses orchestrator, Tier N managers, then per-file workers. I can now have managers carry skill files that hold the context of their module in the scope of the refactor, resulting in worker skills being written with the end-goal in mind. The other awesome feature they added was the /schedule and /goal. Those 2 mean you can have regular interval actions, eg. checking a discourse db for proposed changes to shared files, so there's no more toe stepping and structural re-alignment at set intervals. Goal means you can set and forget, it'll continue till it's done, like if you want to optimize a system, you set the goal post and you leave it. Stupid example, but theoretically possible, you can tell it 'here's a compression system, improve it until we've reached 90% reduction, while remaining lossless', yes you'll need to put in anti-loop guardrails, but theoretically you could leave that overnight and wake up to a successful algo. Whereas previously it would hit a wall. Combined with scheduled continues, even if it hits a 5h window quota, you've scheduled a restart for a minute after it hits, so it can run continuously, indefinitely.
Exactly. This is the first time these systems actually feel architecturally scalable instead of just “bigger prompts + more agents.” Dynamic skill generation completely changes orchestration because cognition becomes modular and runtime-scoped instead of globally shared. Having managers carry module-level intent and constraints down to per-file workers is far cleaner than stuffing everything into a giant shared context window.
The /goal and /schedule additions are honestly the bigger breakthrough though. That introduces persistence, temporal continuity, and autonomous iteration into agent systems. At that point they stop behaving like session-bound chat assistants and start looking more like distributed execution systems. Continuous scheduled recovery, long-horizon optimization loops, discourse/state synchronization between workers — that’s a very different category of infrastructure than most people realize.
Exactly. Add to that JIT MCP configuration to enable/disable tools for each worker, you have a lean, mean, development machine
Exactly. JIT MCP configuration is a massive part of making this actually scalable in practice. Workers no longer need permanent access to every tool or protocol upfront — capabilities become ephemeral and task-scoped. That keeps agents leaner, reduces unnecessary context/tool exposure, and makes orchestration far more deterministic. Combined with dynamic skills, it starts looking less like “AI agents” and more like distributed cognitive infrastructure.
Shifting the focus from raw model benchmarks to the 'skill file' standard highlights what actually matters for production: execution boundaries and context management.
Exactly. Models are becoming commodities.
The real differentiation is shifting toward orchestration, memory, execution boundaries, and how intelligence is packaged into reusable skills.
This was a fascinating read. The most interesting part wasn’t Gemini 3.5 Flash .It was the shift from “AI assistant” to composable agent through SKILL.md. The accessibility-reviewer example made the whole thing feel very real very quickly.
Exactly. That’s the moment it stopped feeling like a demo and started feeling like infrastructure.
SKILL.md turns AI from a chatbot into a composable execution layer and that changes the entire trajectory of agent design.
Really enjoyed the distinction here between agent config and reusable skills. As someone building AI products, I think that “JSON for behavior, markdown for capability” mental model is much closer to how production systems actually evolve than the keynote version. The hybrid point also landed for me — pairing LLM judgment with deterministic checks and pre-commit enforcement is where these workflows start feeling durable instead of demo-friendly.
Thanks man!
Your take on this system is absolutely correct. Seeing the new agentic coding systems and their enhanced Agentic COT Reasoning now implemented with proper deterministic checks is now extremely powerful. Using AI to build production grade systems is now on a much higher level. We are now very close to this system of "Prompt, Build, Review and Ship", being perfect. Or who knows some company pulls out "Superhuman Coder" and we just hit an enter button and do nothing 😂
This is a good point. The model announcement gets the attention, but the skill file idea may matter more for actual builders.
A stronger model helps, but repeatable behavior comes from giving the AI clearer operating context: project rules, preferences, workflows, constraints, examples, and decision patterns.
That is what most teams are missing. They keep asking for smarter models when the real problem is that every session starts with too much missing context.
Skill files feel like a step toward making AI assistants more consistent inside real work. Not just “answer this prompt,” but “understand how this team or project wants work done.”
The risk is that people treat skill files like another prompt hack. The useful version needs versioning, review, and cleanup, otherwise it becomes stale context that quietly shapes bad outputs.
Exactly. That’s the shift I was trying to point at in the article — intelligence alone doesn’t create consistency. Operational context does. Most failures in real workflows come from missing constraints, patterns, and team-specific expectations, not lack of raw model capability.
And I completely agree on the risk side too. If skill files just become giant unmaintained prompt dumps, they’ll decay fast and start introducing invisible behavioral drift. The useful long-term version probably looks much closer to software infrastructure: versioned, reviewed, modular, testable, and continuously refined alongside the codebase itself.
agent frameworks.
Yess🔥
Totally — the framework landscape moves fast, and picking the wrong one early can be costly. I've been gravitating toward composable, minimal abstractions rather than all-in-one platforms. What's your current go-to when you do reach for a framework?
this landing for teams that aren't already using agent frameworks.
This especially lands for teams that aren’t already deep into agent frameworks.
SKILL.md makes the shift feel practical instead of experimental.
That's a fair point — agent frameworks can be overkill for simple automation tasks. For my setup, I started with raw function-calling and only introduced a lightweight decision layer when the branching logic got unwieldy. Would be curious what your threshold is for reaching for a framework vs keeping it simple.
Fantastic description really cool🔥🔥
Thanks bhaii🥰
hmm, when i do /skills I only see the 2 I recently created. I dont any of these skills preinstalled that you mentioned.
Yeah, that’s because there currently aren’t a bunch of preinstalled/public skills exposed by default. The article was more about the underlying architecture and what it enables rather than a marketplace of bundled skills. The real power comes from creating scoped custom skills dynamically for specific workflows, modules, or agents. That’s where the orchestration and context-isolation benefits really start showing up.