This article was originally published on Saru Blog.
What You Will Learn
- What GitHub Agentic Workflows are
- How they differ from traditional GitHub Actions
- Benefits for Claude Code users
- What can be automated in a 200K-line SaaS project
- Key factors in the adoption decision
What Are GitHub Agentic Workflows?
On February 13, 2026, GitHub released this as a technical preview. Co-developed by GitHub Next, Microsoft Research, and Azure Core Upstream, it's open source under the MIT license.
In short, a mechanism for automatically running AI coding agents on GitHub Actions.
Traditional GitHub Actions strictly define "when X happens, do Y" in YAML. Agentic Workflows write "when X happens, make this kind of judgment" in Markdown. The AI makes the judgment.
Traditional GitHub Actions:
Event → YAML-defined steps → Deterministic execution
Agentic Workflows:
Event → Markdown-described objectives → AI judges and executes
How It Works
Workflow Definition
Place Markdown files in .github/workflows/. Markdown, not YAML.
---
on:
issues: opened
permissions:
contents: read
issues: write
safe-outputs:
add-comment: true
add-labels: true
engine: claude
---
## Issue Triage
When a new issue is created, analyze its content and apply appropriate labels.
## Criteria
- Bug report → `bug` label
- Feature request → `enhancement` label
- Question → `question` label
- Security-related → `security` label + raise priority
## Comments
Leave triage results as a comment.
The frontmatter specifies the trigger, permissions, and AI engine to use. The body describes "what you want done" in natural language.
Compilation and Execution
# Compile with CLI (generates lock.yml from Markdown)
gh aw compile
# Manual trigger
gh aw run
gh aw compile parses the Markdown and generates a .lock.yml for GitHub Actions. This lock file is the actual workflow that runs. The Markdown is the human-readable specification; the lock.yml is the machine-executable procedure.
Available AI Engines
| Engine | Authentication | Notes |
|---|---|---|
| Copilot CLI | Account auth tied to Copilot license | Default |
| Claude Code | ANTHROPIC_API_KEY |
Requires Anthropic API key |
| OpenAI Codex | OPENAI_API_KEY |
Requires OpenAI API key |
The ability to choose Claude Code as the engine makes it a natural choice for developers already using Claude Code.
Differences from Traditional GitHub Actions
| Aspect | GitHub Actions (YAML) | Agentic Workflows (Markdown) |
|---|---|---|
| Definition | YAML (strict syntax) | Markdown (natural language) |
| Execution nature | Deterministic (same input → same output) | Non-deterministic (AI judges) |
| Best suited for | Builds, tests, deploys | Triage, reviews, reports |
| Permissions | Specified in workflow definition | Read-only by default + safe outputs |
| Error handling | Explicitly defined | AI judges |
The important point is that Agentic Workflows are not a replacement for CI/CD. GitHub's official blog states this explicitly:
Don't use agentic workflows as a replacement for GitHub Actions YAML workflows for CI/CD.
Builds, tests, and deploys remain with traditional YAML workflows. Agentic Workflows handle "ambiguous tasks" requiring AI judgment. Under the concept of "Continuous AI," they complement existing CI/CD.
Considering Application to the Saru Project
Current Automation
Saru already has the following automated via GitHub Actions:
| Workflow | Purpose | Type |
|---|---|---|
| build-apis.yml | Go lint, unit tests, integration tests | CI (YAML) |
| build-portals.yml | Frontend type-check, lint, build | CI (YAML) |
| e2e-tests.yml | E2E tests for all portals | CI (YAML) |
| security-scan.yml | gosec, npm audit | CI (YAML) |
| cross-post.yml | Blog cross-posting to platforms | CD (YAML) |
These are all deterministic processes — no reason to replace them with Agentic Workflows.
What Could Be Automated with Agentic Workflows
So what can be automated? Let me identify "manual tasks that require judgment."
1. Automatic Issue Triage
Current state: After creating an issue, I manually add labels and set priority. As a solo developer, I'm the only one doing this.
With Agentic Workflows: Trigger on issue creation to automatically analyze content, apply labels, set priority, and identify related files.
Assessment: Low impact for solo development. Little need to triage issues I wrote myself. Would be effective once the project goes OSS and external issues increase.
2. Automatic CI Failure Investigation
Current state: When CI fails, I read logs, investigate the cause, and fix it. As covered in Part 7, CI stabilization required enormous effort.
With Agentic Workflows: Trigger on CI failure to analyze logs, identify root causes, and automatically create fix PRs.
Assessment: The most compelling use case. Especially for E2E test flaky failures where root cause identification takes time. Even just having AI do the initial investigation would save significant time.
3. Automatic Dependabot PR Triage
Current state: When Dependabot PRs pile up, I review each one individually before merging.
With Agentic Workflows: Trigger on Dependabot PRs to review changes and make judgments: "patch version + tests pass → auto-merge," "major version → add needs-manual-review label."
Assessment: Effective. Dependabot PR handling is monotonous yet requires judgment — exactly what Agentic Workflows excel at.
4. Daily Status Report
Current state: None. Development status exists only in my head.
With Agentic Workflows: Auto-generate reports on daily issue/PR status, CI health, and outstanding items.
Assessment: Overkill for solo development. Would be effective for team development or when the project has OSS contributors.
Application Summary
| Use Case | Impact | Priority |
|---|---|---|
| CI failure investigation | High | ◎ |
| Dependabot PR triage | Medium | ○ |
| Issue triage | Low (solo phase) | △ |
| Daily status report | Low (solo phase) | △ |
Concerns About Adoption
1. Cost
Running Agentic Workflows incurs AI engine API calls.
- Copilot: ~2 premium requests per execution (agent execution + safe outputs)
- Claude Code: API billing via
ANTHROPIC_API_KEY - Codex: API billing via
OPENAI_API_KEY
If AI runs on every CI failure, monthly costs become unpredictable. E2E tests especially have many jobs, so failure frequency × API cost must be estimated.
2. Technical Preview Instability
As of February 2026, it's still a technical preview. GitHub's official documentation explicitly states "at your own risk." Too early to integrate into production CI/CD pipelines.
Documentation is still developing — details around Markdown frontmatter specifications and engine configuration require some trial-and-error exploration.
3. Trust in Non-Deterministic Execution
In the CI/CD world, "same input → same output" is a fundamental principle. Agentic Workflows are inherently non-deterministic — AI judgment may differ each time.
Safe outputs and read-only defaults provide safety margins, but handling cases like "AI applied the wrong label" or "created an irrelevant fix PR" becomes necessary.
4. Compatibility with Self-Hosted Runners
Saru runs parallel E2E tests on 15 self-hosted runners. Whether Agentic Workflows function correctly on self-hosted runners is unverified. Official documentation mostly assumes GitHub-hosted runners.
5. Coexistence with Claude Code CLI
This is the most important consideration. Saru already uses Claude Code CLI locally for development. If Claude Code also runs automatically on GitHub, clear role separation becomes essential:
Local development:
Human + Claude Code CLI → Code implementation, test creation
On GitHub:
Copilot → PR review (already in use)
Agentic Workflows → CI failure investigation, triage (under consideration)
Multiple AIs operating on the same repository with different contexts requires clearly defined roles to avoid confusion.
Next Steps
This article stays at the investigation and evaluation level. In the next article, I plan to actually implement Agentic Workflows in the Saru repository and verify:
- Building a CI failure auto-investigation workflow
- Execution with the Claude Code engine
- Operation on self-hosted runners
- Actual cost measurement
Summary
| Item | Detail |
|---|---|
| What are GitHub Agentic Workflows | A mechanism for auto-running AI agents on GitHub Actions |
| Definition method | Natural language in Markdown, not YAML |
| AI engines | Copilot CLI / Claude Code / OpenAI Codex |
| Relationship with CI/CD | Complement, not replacement (Continuous AI) |
| Effective use cases for solo dev | CI failure investigation, Dependabot PR triage |
| Current judgment | Worth evaluating, but too early for production given technical preview status |
Series Articles
- Part 1: Fighting Unmaintainable Complexity with Automation
- Part 2: Automating WebAuthn Tests in CI
- Part 3: Next.js x Go Monorepo Architecture
- Part 4: Multi-Tenant Isolation with PostgreSQL RLS
- Part 5: Multi-Portal Authentication Pitfalls
- Part 6: Developing a 200K-Line SaaS Alone with Claude Code
- Part 7: Landmines and Solutions in Self-Hosted CI/CD
- Part 8: Turning Solo Development into Team Development with Claude Code Agent Teams
- Part 9: pnpm + Next.js Standalone + Docker: 5 Failures Before Success
- Part 10: Evaluating GitHub Agentic Workflows (this article)
Top comments (0)