DEV Community

Cover image for Everyone's Talking About Gemini 3.5 Flash. The Real Story at Google I/O 2026 Was a Skill File.
Sreejit Pradhan
Sreejit Pradhan Subscriber

Posted on

Everyone's Talking About Gemini 3.5 Flash. The Real Story at Google I/O 2026 Was a Skill File.

Google I/O Writing Challenge Submission

This is a submission for the Google I/O Writing Challenge


Everyone walked away from Google I/O 2026 talking about Gemini 3.5 Flash benchmarks. Veo 3. Gemini Omni doing multimodal physics. The usual keynote sugar rush. Good stuff. Expected.

But if you want to understand why this I/O actually changes how developers build — not in theory, in production, this week — you need to look at something that got maybe four sentences in the developer keynote.

A markdown file called SKILL.md.

I didn't read about this. I ran it. Here's what actually happened.


What Antigravity CLI Actually Creates (Not What the Slides Said)

Every I/O recap I've read describes AGENTS.md as the agent configuration primitive. Clean. Simple. One file.

That's not quite right. Here's what /agents shows in a fresh Antigravity CLI 1.0.2 session on a real project:

Create New Agents
  Workspace: C:/Users/sreej/Downloads/Projects/SoilSense AI/.agents/agents/{agent_name}/agent.json
  Global:    C:\Users\sreej\.gemini\antigravity-cli\agents\{agent_name}\agent.json

▼ Available Agents
  • /default   Default agent
Enter fullscreen mode Exit fullscreen mode

Agent definitions are JSON, not markdown. The markdown lives one level down — in skills:

Skills  
129 skills

Create new skills
  Workspace: ~/Downloads/Projects/SoilSense AI/.agents/skills/{skill_name}/SKILL.md
  Global:    ~/.gemini/antigravity-cli/skills/{skill_name}/SKILL.md
  Shared:    ~/.gemini/skills/{skill_name}/SKILL.md
Enter fullscreen mode Exit fullscreen mode

So the actual structure is:

your-project/
└── .agents/
    ├── agents/
    │   └── {agent_name}/
    │       └── agent.json      ← agent behavior (JSON)
    └── skills/
        └── {skill_name}/
            └── SKILL.md        ← reusable capabilities (markdown)
Enter fullscreen mode Exit fullscreen mode

And Antigravity ships with 129 built-in skills already — everything from agency-agentic-search-optimizer to agency-code-reviewer. You're not starting from zero. You're extending a library.

That's not a minor correction. That's a different mental model from what the keynote implied.


I Tested It on a Real Project

I ran this on SoilSense AI — a Capacitor/Android app with an existing codebase, git history, and a src/ directory full of React components. Not a demo project. A real one.

One prompt:

create a skill for SoilSense AI that reviews any new component 
for accessibility issues before committing
Enter fullscreen mode Exit fullscreen mode

What followed was not autocomplete. The agent:

  1. Read package.json to understand the stack
  2. Scanned src/, src/lib/, docs/PROJECT_STRUCTURE.md
  3. Checked ListPermissions — confirmed read/write access
  4. Ran git status to understand current state
  5. Proposed a hybrid approach and asked for approval before proceeding

The plan it proposed:

  • A global AI agent skill (soilsense-accessibility-reviewer) — a SKILL.md that instructs the agent to audit git-staged components using LLM-level reasoning
  • A standalone Python checker (check-a11y.py) for static WCAG rule enforcement
  • A pre-commit git hook that blocks commits containing critical violations

I typed proceed. Here's what it built:

Create(~/.gemini/config/skills/soilsense-accessibility-reviewer/SKILL.md)
Create(~/.gemini/config/skills/soilsense-accessibility-reviewer/scripts/check_a11y.py)
Create(SoilSense AI/scripts/check-a11y.py)
Create(SoilSense AI/.git/hooks/pre-commit)
Enter fullscreen mode Exit fullscreen mode

Then — without me asking — it created a mock broken component with intentional violations, staged it, and ran the hook against itself to verify.

Result:

5 Critical issues detected:
  - Missing alt tags
  - Custom clickable divs lacking tabIndex/onKeyDown handlers  
  - Empty button
  - Unlabeled form inputs

3 Warnings:
  - Redundant alt terms
  - Positive tabIndex anti-patterns
  - Unlabelled decorative SVG/Lucide icons

→ Commit blocked. Fix critical issues or use --no-verify to bypass.
Enter fullscreen mode Exit fullscreen mode

It caught real violations, blocked the commit, displayed results in a console table, then cleaned up the mock component and reset git state. The pre-commit hook is now active in the SoilSense AI repo.

One prompt. No orchestration code. No config files written by hand.

That's the thing nobody is explaining in I/O coverage: the skill file didn't just change what the agent knows — it changed what the agent does to your repository.


The Gemini CLI Retirement Nobody Is Explaining Clearly

Here's the detail buried in the Antigravity 2.0 announcement: Gemini CLI shuts down for consumer tiers on June 18, 2026. That's not optional. Free tier, AI Pro, AI Ultra — same message for all.

What you're migrating to:

Gemini CLI Antigravity CLI
Node.js runtime Go binary — zero runtime dependencies
GEMINI.md AGENTS.md / agent.json
.gemini/skills/ .agents/skills/{name}/SKILL.md
Gemini models only Gemini 3.5 Flash + Claude + GPT-OSS
Chat-first Agent orchestration-first
Open source Closed software

The multi-model routing is worth pausing on. Antigravity CLI supports Claude and GPT-OSS models through the same interface — you're not locked to Gemini at the CLI layer. The Managed Agents API is Gemini 3.5 Flash specifically, but locally you have model choice.

The last row is the one I keep thinking about. Gemini CLI was open source. Tens of thousands of contributors, forks, extensions built on it. Antigravity is closed. Google is moving developer tooling into its monetization stack and calling it an upgrade. That's accurate. It's also incomplete.


What the 129 Built-In Skills Actually Signal

When /skills showed 129 built-in skills, I scrolled through them. A few that caught my eye:

  • agency-agentic-search-optimizer — audits whether AI agents can actually accomplish tasks on your site (WebMCP readiness)
  • agency-ai-data-remediation-engineer — self-healing data pipelines using air-gapped local SLMs
  • agency-autonomous-optimization-architect — shadow-tests APIs for performance while enforcing financial constraints
  • agency-codebase-onboarding-engineer — helps new engineers understand unfamiliar codebases

These aren't autocomplete improvements. They're behaviors — things the agent will do autonomously when invoked. The skill file is the instruction set. The agent is the executor.

The accessibility reviewer I built for SoilSense AI is now skill number 130. It lives at ~/.gemini/config/skills/soilsense-accessibility-reviewer/SKILL.md. Every future Antigravity session in that project can invoke it.

That's the primitive. Not a feature. A composable unit of agent behavior that lives in version control.


Where I'd Push Back

A few things I'm not ready to be hyped about.

The closed-source problem is real. Gemini CLI being open source meant the community could audit the tool that had file system access to their codebases. Antigravity is closed. The pre-commit hook it created runs code from ~/.gemini/config/skills/ — a path Google controls the contents of at install time. For personal projects, fine. For anything enterprise, you need answers about what the agent runtime can and can't do with your code before you're committed.

proceed is doing a lot of work. The agent asked for approval before executing. I typed proceed without reading the full implementation plan. It created files in four locations, modified git hooks, and ran git commit against a real repository. The workflow assumes you'll review the plan carefully. In practice, under deadline pressure, most developers won't. That's a governance problem, not a technical one — but it's the kind of thing that causes incidents.

Skill scope creep is easy. The accessibility reviewer skill is global — it lives in ~/.gemini/config/skills/, not in the SoilSense AI project directory. That means it's available in every Antigravity session across every project on this machine. That's convenient. It's also how you end up with 60 global skills that conflict with each other in ways that are hard to debug. Antigravity's skill priority system (Workspace → Global → Shared) handles this, but you have to know it exists.


Getting Started (Windows, since that's what I actually used)

# Download from https://antigravity.google.com/download
# Or via winget (if available in your region)
winget install Google.AntigravityCLI

# Navigate to your project
cd "C:\Users\you\Projects\your-project"

# Launch
agy

# Inside the shell — explore what's available
/skills    # See 129 built-in skills + any you've created
/agents    # See available agents (just /default to start)

# Create your first skill with a plain English prompt
# Example: "create a skill that enforces our API response schema before any PR"
Enter fullscreen mode Exit fullscreen mode

Start with /skills before writing anything. There's a good chance what you want already exists in the 129 built-ins. The skill creator workflow (plain English → agent builds SKILL.md + supporting scripts + tests) is the fastest path to something that actually runs.


The Real Take

Google didn't ship a better autocomplete at I/O 2026. They shipped a runtime for agent behavior — and gave you a text file as the configuration interface.

One prompt to Antigravity CLI created a WCAG accessibility reviewer, a Python static analysis engine, a git pre-commit hook, and a self-verification test — for a real Android/Capacitor project I'm actually building. The commit hook is active right now. It will block the next accessibility violation before it hits the repo.

The Gemini 3.5 Flash benchmarks will be obsolete in six months. A skill file that enforces your team's standards on every commit — that compounds.

The platform is impressive. The 130th skill is what makes it real.

What would you build as your first custom skill — a linter rule, a PR description generator, or something specific to your stack? Especially curious if anyone has gotten workspace-scoped skills working alongside the global ones without conflicts.

Top comments (27)

Collapse
 
ofri-peretz profile image
Ofri Peretz

The part where the agent read your git status and project structure before proposing the hybrid approach (LLM skill + static Python checker + git hook) is exactly how I'd expect a competent engineer to scope the work — not just generate code, but understand the constraint surface first. I've been running static analysis at scale for years, and the hardest part is always the same: getting developers to actually run the checks before pushing. A pre-commit hook that an AI wired up end-to-end, including the enforcement layer, is legitimately useful if it doesn't produce a flood of false positives. The real test is whether that check-a11y.py script is maintainable six months from now when WCAG 2.2 rules change or your component patterns evolve.

Collapse
 
sreejit_ profile image
Sreejit Pradhan

Exactly. The impressive part wasn’t the codegen, it was the agent understanding the repo and constraint surface before deciding on architecture. Hybrid enforcement (LLM skill + deterministic checker + git hook) is the only approach that scales realistically. The real benchmark isn’t “does it work today”, it’s whether that checker survives evolving WCAG rules and component drift 6 months later without becoming noise developers bypass.

Collapse
 
unitbuilds profile image
UnitBuilds

I've been loving it, currently porting the SDK to windows (a rather annoying task, thanks Google for the support) to create a swarmer and mobile interface, so it's easier to interface with and manage multiple projects at once, without sitting at my desk. That custom skills system fits perfectly with my framework and what I built over the past year, I had built the system from scratch to utilize Vertex AI, but with the new Gemini Builds, I'm out of the cloud market, so pivoting to Sovereign systems, where those skills make a huge difference for multi-agent management. Orchestrator no longer needs to share the scope of a worker agent and all worker agents dont need to be confined to the same skillset. It's awesome!

Collapse
 
sreejit_ profile image
Sreejit Pradhan

Exactly. The biggest win is capability isolation. Orchestrator no longer needs full worker scope, and workers no longer need bloated shared cognition. Skills turn agents into modular execution contexts instead of monolithic assistants. That changes multi-agent orchestration completely.

Collapse
 
unitbuilds profile image
UnitBuilds

Not to mention for scalability. My previous swarm system had to manually assign skills and the customized MCP protocol at runtime, but it was set skills, set MCP, now it's modular and dynamic, I can have the orchestrator create a custom skill file (Which I highly recommend if you use Gemini 3.5 Flash. I suspect it's a MoE, which explains the speed and Pro level knowledge), which can cut your baseline context window considerably, while improving code quality. Eg. My pipeline uses orchestrator, Tier N managers, then per-file workers. I can now have managers carry skill files that hold the context of their module in the scope of the refactor, resulting in worker skills being written with the end-goal in mind. The other awesome feature they added was the /schedule and /goal. Those 2 mean you can have regular interval actions, eg. checking a discourse db for proposed changes to shared files, so there's no more toe stepping and structural re-alignment at set intervals. Goal means you can set and forget, it'll continue till it's done, like if you want to optimize a system, you set the goal post and you leave it. Stupid example, but theoretically possible, you can tell it 'here's a compression system, improve it until we've reached 90% reduction, while remaining lossless', yes you'll need to put in anti-loop guardrails, but theoretically you could leave that overnight and wake up to a successful algo. Whereas previously it would hit a wall. Combined with scheduled continues, even if it hits a 5h window quota, you've scheduled a restart for a minute after it hits, so it can run continuously, indefinitely.

Thread Thread
 
sreejit_ profile image
Sreejit Pradhan

Exactly. This is the first time these systems actually feel architecturally scalable instead of just “bigger prompts + more agents.” Dynamic skill generation completely changes orchestration because cognition becomes modular and runtime-scoped instead of globally shared. Having managers carry module-level intent and constraints down to per-file workers is far cleaner than stuffing everything into a giant shared context window.

The /goal and /schedule additions are honestly the bigger breakthrough though. That introduces persistence, temporal continuity, and autonomous iteration into agent systems. At that point they stop behaving like session-bound chat assistants and start looking more like distributed execution systems. Continuous scheduled recovery, long-horizon optimization loops, discourse/state synchronization between workers — that’s a very different category of infrastructure than most people realize.

Thread Thread
 
unitbuilds profile image
UnitBuilds

Exactly. Add to that JIT MCP configuration to enable/disable tools for each worker, you have a lean, mean, development machine

Thread Thread
 
sreejit_ profile image
Sreejit Pradhan

Exactly. JIT MCP configuration is a massive part of making this actually scalable in practice. Workers no longer need permanent access to every tool or protocol upfront — capabilities become ephemeral and task-scoped. That keeps agents leaner, reduces unnecessary context/tool exposure, and makes orchestration far more deterministic. Combined with dynamic skills, it starts looking less like “AI agents” and more like distributed cognitive infrastructure.

Collapse
 
syedahmershah profile image
Syed Ahmer Shah

Shifting the focus from raw model benchmarks to the 'skill file' standard highlights what actually matters for production: execution boundaries and context management.

Collapse
 
sreejit_ profile image
Sreejit Pradhan

Exactly. Models are becoming commodities.
The real differentiation is shifting toward orchestration, memory, execution boundaries, and how intelligence is packaged into reusable skills.

Collapse
 
shogun444 profile image
shogun 444 • Edited

This was a fascinating read. The most interesting part wasn’t Gemini 3.5 Flash .It was the shift from “AI assistant” to composable agent through SKILL.md. The accessibility-reviewer example made the whole thing feel very real very quickly.

Collapse
 
sreejit_ profile image
Sreejit Pradhan

Exactly. That’s the moment it stopped feeling like a demo and started feeling like infrastructure.
SKILL.md turns AI from a chatbot into a composable execution layer and that changes the entire trajectory of agent design.

Collapse
 
vicchen profile image
Vic Chen

Really enjoyed the distinction here between agent config and reusable skills. As someone building AI products, I think that “JSON for behavior, markdown for capability” mental model is much closer to how production systems actually evolve than the keynote version. The hybrid point also landed for me — pairing LLM judgment with deterministic checks and pre-commit enforcement is where these workflows start feeling durable instead of demo-friendly.

Collapse
 
sreejit_ profile image
Sreejit Pradhan

Thanks man!
Your take on this system is absolutely correct. Seeing the new agentic coding systems and their enhanced Agentic COT Reasoning now implemented with proper deterministic checks is now extremely powerful. Using AI to build production grade systems is now on a much higher level. We are now very close to this system of "Prompt, Build, Review and Ship", being perfect. Or who knows some company pulls out "Superhuman Coder" and we just hit an enter button and do nothing 😂

Collapse
 
sunychoudhary profile image
Suny Choudhary

This is a good point. The model announcement gets the attention, but the skill file idea may matter more for actual builders.

A stronger model helps, but repeatable behavior comes from giving the AI clearer operating context: project rules, preferences, workflows, constraints, examples, and decision patterns.

That is what most teams are missing. They keep asking for smarter models when the real problem is that every session starts with too much missing context.

Skill files feel like a step toward making AI assistants more consistent inside real work. Not just “answer this prompt,” but “understand how this team or project wants work done.”

The risk is that people treat skill files like another prompt hack. The useful version needs versioning, review, and cleanup, otherwise it becomes stale context that quietly shapes bad outputs.

Collapse
 
sreejit_ profile image
Sreejit Pradhan

Exactly. That’s the shift I was trying to point at in the article — intelligence alone doesn’t create consistency. Operational context does. Most failures in real workflows come from missing constraints, patterns, and team-specific expectations, not lack of raw model capability.

And I completely agree on the risk side too. If skill files just become giant unmaintained prompt dumps, they’ll decay fast and start introducing invisible behavioral drift. The useful long-term version probably looks much closer to software infrastructure: versioned, reviewed, modular, testable, and continuously refined alongside the codebase itself.

Collapse
 
xulingfeng profile image
xulingfeng

agent frameworks.

Collapse
 
sreejit_ profile image
Sreejit Pradhan

Yess🔥

Collapse
 
xulingfeng profile image
xulingfeng

Totally — the framework landscape moves fast, and picking the wrong one early can be costly. I've been gravitating toward composable, minimal abstractions rather than all-in-one platforms. What's your current go-to when you do reach for a framework?

Collapse
 
xulingfeng profile image
xulingfeng

this landing for teams that aren't already using agent frameworks.

Collapse
 
sreejit_ profile image
Sreejit Pradhan

This especially lands for teams that aren’t already deep into agent frameworks.
SKILL.md makes the shift feel practical instead of experimental.

Collapse
 
xulingfeng profile image
xulingfeng

That's a fair point — agent frameworks can be overkill for simple automation tasks. For my setup, I started with raw function-calling and only introduced a lightweight decision layer when the branching logic got unwieldy. Would be curious what your threshold is for reaching for a framework vs keeping it simple.

Collapse
 
pranay_patikar_2e775de616 profile image
Pranay Patikar

Fantastic description really cool🔥🔥

Collapse
 
sreejit_ profile image
Sreejit Pradhan

Thanks bhaii🥰

Collapse
 
digisal profile image
digiSal

hmm, when i do /skills I only see the 2 I recently created. I dont any of these skills preinstalled that you mentioned.

Collapse
 
sreejit_ profile image
Sreejit Pradhan

Yeah, that’s because there currently aren’t a bunch of preinstalled/public skills exposed by default. The article was more about the underlying architecture and what it enables rather than a marketplace of bundled skills. The real power comes from creating scoped custom skills dynamically for specific workflows, modules, or agents. That’s where the orchestration and context-isolation benefits really start showing up.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.