ko-chan

Posted on Feb 17 • Originally published at ko-chan.github.io

Evaluating GitHub Agentic Workflows — From a Claude Code User's Perspective [Part 10]

#githubactions #ai #agenticworkflow #claudecode

This article was originally published on Saru Blog.

What You Will Learn

What GitHub Agentic Workflows are
How they differ from traditional GitHub Actions
Benefits for Claude Code users
What can be automated in a 200K-line SaaS project
Key factors in the adoption decision

What Are GitHub Agentic Workflows?

On February 13, 2026, GitHub released this as a technical preview. Co-developed by GitHub Next, Microsoft Research, and Azure Core Upstream, it's open source under the MIT license.

In short, a mechanism for automatically running AI coding agents on GitHub Actions.

Traditional GitHub Actions strictly define "when X happens, do Y" in YAML. Agentic Workflows write "when X happens, make this kind of judgment" in Markdown. The AI makes the judgment.

Traditional GitHub Actions:
  Event → YAML-defined steps → Deterministic execution

Agentic Workflows:
  Event → Markdown-described objectives → AI judges and executes

How It Works

Workflow Definition

Place Markdown files in .github/workflows/. Markdown, not YAML.

---
on:
  issues: opened
permissions:
  contents: read
  issues: write
safe-outputs:
  add-comment: true
  add-labels: true
engine: claude
---

## Issue Triage

When a new issue is created, analyze its content and apply appropriate labels.

## Criteria

- Bug report → `bug` label
- Feature request → `enhancement` label
- Question → `question` label
- Security-related → `security` label + raise priority

## Comments

Leave triage results as a comment.

The frontmatter specifies the trigger, permissions, and AI engine to use. The body describes "what you want done" in natural language.

Compilation and Execution

# Compile with CLI (generates lock.yml from Markdown)
gh aw compile

# Manual trigger
gh aw run

gh aw compile parses the Markdown and generates a .lock.yml for GitHub Actions. This lock file is the actual workflow that runs. The Markdown is the human-readable specification; the lock.yml is the machine-executable procedure.

Available AI Engines

Engine	Authentication	Notes
Copilot CLI	Account auth tied to Copilot license	Default
Claude Code	`ANTHROPIC_API_KEY`	Requires Anthropic API key
OpenAI Codex	`OPENAI_API_KEY`	Requires OpenAI API key

The ability to choose Claude Code as the engine makes it a natural choice for developers already using Claude Code.

Differences from Traditional GitHub Actions

Aspect	GitHub Actions (YAML)	Agentic Workflows (Markdown)
Definition	YAML (strict syntax)	Markdown (natural language)
Execution nature	Deterministic (same input → same output)	Non-deterministic (AI judges)
Best suited for	Builds, tests, deploys	Triage, reviews, reports
Permissions	Specified in workflow definition	Read-only by default + safe outputs
Error handling	Explicitly defined	AI judges

The important point is that Agentic Workflows are not a replacement for CI/CD. GitHub's official blog states this explicitly:

Don't use agentic workflows as a replacement for GitHub Actions YAML workflows for CI/CD.

Builds, tests, and deploys remain with traditional YAML workflows. Agentic Workflows handle "ambiguous tasks" requiring AI judgment. Under the concept of "Continuous AI," they complement existing CI/CD.

Considering Application to the Saru Project

Current Automation

Saru already has the following automated via GitHub Actions:

Workflow	Purpose	Type
build-apis.yml	Go lint, unit tests, integration tests	CI (YAML)
build-portals.yml	Frontend type-check, lint, build	CI (YAML)
e2e-tests.yml	E2E tests for all portals	CI (YAML)
security-scan.yml	gosec, npm audit	CI (YAML)
cross-post.yml	Blog cross-posting to platforms	CD (YAML)

These are all deterministic processes — no reason to replace them with Agentic Workflows.

What Could Be Automated with Agentic Workflows

So what can be automated? Let me identify "manual tasks that require judgment."

1. Automatic Issue Triage

Current state: After creating an issue, I manually add labels and set priority. As a solo developer, I'm the only one doing this.

With Agentic Workflows: Trigger on issue creation to automatically analyze content, apply labels, set priority, and identify related files.

Assessment: Low impact for solo development. Little need to triage issues I wrote myself. Would be effective once the project goes OSS and external issues increase.

2. Automatic CI Failure Investigation

Current state: When CI fails, I read logs, investigate the cause, and fix it. As covered in Part 7, CI stabilization required enormous effort.

With Agentic Workflows: Trigger on CI failure to analyze logs, identify root causes, and automatically create fix PRs.

Assessment: The most compelling use case. Especially for E2E test flaky failures where root cause identification takes time. Even just having AI do the initial investigation would save significant time.

3. Automatic Dependabot PR Triage

Current state: When Dependabot PRs pile up, I review each one individually before merging.

With Agentic Workflows: Trigger on Dependabot PRs to review changes and make judgments: "patch version + tests pass → auto-merge," "major version → add needs-manual-review label."

Assessment: Effective. Dependabot PR handling is monotonous yet requires judgment — exactly what Agentic Workflows excel at.

4. Daily Status Report

Current state: None. Development status exists only in my head.

With Agentic Workflows: Auto-generate reports on daily issue/PR status, CI health, and outstanding items.

Assessment: Overkill for solo development. Would be effective for team development or when the project has OSS contributors.

Application Summary

Use Case	Impact	Priority
CI failure investigation	High	◎
Dependabot PR triage	Medium	○
Issue triage	Low (solo phase)	△
Daily status report	Low (solo phase)	△

Concerns About Adoption

1. Cost

Running Agentic Workflows incurs AI engine API calls.

Copilot: ~2 premium requests per execution (agent execution + safe outputs)
Claude Code: API billing via ANTHROPIC_API_KEY
Codex: API billing via OPENAI_API_KEY

If AI runs on every CI failure, monthly costs become unpredictable. E2E tests especially have many jobs, so failure frequency × API cost must be estimated.

2. Technical Preview Instability

As of February 2026, it's still a technical preview. GitHub's official documentation explicitly states "at your own risk." Too early to integrate into production CI/CD pipelines.

Documentation is still developing — details around Markdown frontmatter specifications and engine configuration require some trial-and-error exploration.

3. Trust in Non-Deterministic Execution

In the CI/CD world, "same input → same output" is a fundamental principle. Agentic Workflows are inherently non-deterministic — AI judgment may differ each time.

Safe outputs and read-only defaults provide safety margins, but handling cases like "AI applied the wrong label" or "created an irrelevant fix PR" becomes necessary.

4. Compatibility with Self-Hosted Runners

Saru runs parallel E2E tests on 15 self-hosted runners. Whether Agentic Workflows function correctly on self-hosted runners is unverified. Official documentation mostly assumes GitHub-hosted runners.

5. Coexistence with Claude Code CLI

This is the most important consideration. Saru already uses Claude Code CLI locally for development. If Claude Code also runs automatically on GitHub, clear role separation becomes essential:

Local development:
  Human + Claude Code CLI → Code implementation, test creation

On GitHub:
  Copilot → PR review (already in use)
  Agentic Workflows → CI failure investigation, triage (under consideration)

Multiple AIs operating on the same repository with different contexts requires clearly defined roles to avoid confusion.

Next Steps

This article stays at the investigation and evaluation level. In the next article, I plan to actually implement Agentic Workflows in the Saru repository and verify:

Building a CI failure auto-investigation workflow
Execution with the Claude Code engine
Operation on self-hosted runners
Actual cost measurement

Summary

Item	Detail
What are GitHub Agentic Workflows	A mechanism for auto-running AI agents on GitHub Actions
Definition method	Natural language in Markdown, not YAML
AI engines	Copilot CLI / Claude Code / OpenAI Codex
Relationship with CI/CD	Complement, not replacement (Continuous AI)
Effective use cases for solo dev	CI failure investigation, Dependabot PR triage
Current judgment	Worth evaluating, but too early for production given technical preview status

Series Articles

DEV Community