DEV Community

ko-chan
ko-chan

Posted on • Originally published at ko-chan.github.io

Evaluating GitHub Agentic Workflows — From a Claude Code User's Perspective [Part 10]

This article was originally published on Saru Blog.


What You Will Learn

  • What GitHub Agentic Workflows are
  • How they differ from traditional GitHub Actions
  • Benefits for Claude Code users
  • What can be automated in a 200K-line SaaS project
  • Key factors in the adoption decision

What Are GitHub Agentic Workflows?

On February 13, 2026, GitHub released this as a technical preview. Co-developed by GitHub Next, Microsoft Research, and Azure Core Upstream, it's open source under the MIT license.

In short, a mechanism for automatically running AI coding agents on GitHub Actions.

Traditional GitHub Actions strictly define "when X happens, do Y" in YAML. Agentic Workflows write "when X happens, make this kind of judgment" in Markdown. The AI makes the judgment.

Traditional GitHub Actions:
  Event → YAML-defined steps → Deterministic execution

Agentic Workflows:
  Event → Markdown-described objectives → AI judges and executes
Enter fullscreen mode Exit fullscreen mode

How It Works

Workflow Definition

Place Markdown files in .github/workflows/. Markdown, not YAML.

---
on:
  issues: opened
permissions:
  contents: read
  issues: write
safe-outputs:
  add-comment: true
  add-labels: true
engine: claude
---

## Issue Triage

When a new issue is created, analyze its content and apply appropriate labels.

## Criteria

- Bug report → `bug` label
- Feature request → `enhancement` label
- Question → `question` label
- Security-related → `security` label + raise priority

## Comments

Leave triage results as a comment.
Enter fullscreen mode Exit fullscreen mode

The frontmatter specifies the trigger, permissions, and AI engine to use. The body describes "what you want done" in natural language.

Compilation and Execution

# Compile with CLI (generates lock.yml from Markdown)
gh aw compile

# Manual trigger
gh aw run
Enter fullscreen mode Exit fullscreen mode

gh aw compile parses the Markdown and generates a .lock.yml for GitHub Actions. This lock file is the actual workflow that runs. The Markdown is the human-readable specification; the lock.yml is the machine-executable procedure.

Available AI Engines

Engine Authentication Notes
Copilot CLI Account auth tied to Copilot license Default
Claude Code ANTHROPIC_API_KEY Requires Anthropic API key
OpenAI Codex OPENAI_API_KEY Requires OpenAI API key

The ability to choose Claude Code as the engine makes it a natural choice for developers already using Claude Code.

Differences from Traditional GitHub Actions

Aspect GitHub Actions (YAML) Agentic Workflows (Markdown)
Definition YAML (strict syntax) Markdown (natural language)
Execution nature Deterministic (same input → same output) Non-deterministic (AI judges)
Best suited for Builds, tests, deploys Triage, reviews, reports
Permissions Specified in workflow definition Read-only by default + safe outputs
Error handling Explicitly defined AI judges

The important point is that Agentic Workflows are not a replacement for CI/CD. GitHub's official blog states this explicitly:

Don't use agentic workflows as a replacement for GitHub Actions YAML workflows for CI/CD.

Builds, tests, and deploys remain with traditional YAML workflows. Agentic Workflows handle "ambiguous tasks" requiring AI judgment. Under the concept of "Continuous AI," they complement existing CI/CD.

Considering Application to the Saru Project

Current Automation

Saru already has the following automated via GitHub Actions:

Workflow Purpose Type
build-apis.yml Go lint, unit tests, integration tests CI (YAML)
build-portals.yml Frontend type-check, lint, build CI (YAML)
e2e-tests.yml E2E tests for all portals CI (YAML)
security-scan.yml gosec, npm audit CI (YAML)
cross-post.yml Blog cross-posting to platforms CD (YAML)

These are all deterministic processes — no reason to replace them with Agentic Workflows.

What Could Be Automated with Agentic Workflows

So what can be automated? Let me identify "manual tasks that require judgment."

1. Automatic Issue Triage

Current state: After creating an issue, I manually add labels and set priority. As a solo developer, I'm the only one doing this.

With Agentic Workflows: Trigger on issue creation to automatically analyze content, apply labels, set priority, and identify related files.

Assessment: Low impact for solo development. Little need to triage issues I wrote myself. Would be effective once the project goes OSS and external issues increase.

2. Automatic CI Failure Investigation

Current state: When CI fails, I read logs, investigate the cause, and fix it. As covered in Part 7, CI stabilization required enormous effort.

With Agentic Workflows: Trigger on CI failure to analyze logs, identify root causes, and automatically create fix PRs.

Assessment: The most compelling use case. Especially for E2E test flaky failures where root cause identification takes time. Even just having AI do the initial investigation would save significant time.

3. Automatic Dependabot PR Triage

Current state: When Dependabot PRs pile up, I review each one individually before merging.

With Agentic Workflows: Trigger on Dependabot PRs to review changes and make judgments: "patch version + tests pass → auto-merge," "major version → add needs-manual-review label."

Assessment: Effective. Dependabot PR handling is monotonous yet requires judgment — exactly what Agentic Workflows excel at.

4. Daily Status Report

Current state: None. Development status exists only in my head.

With Agentic Workflows: Auto-generate reports on daily issue/PR status, CI health, and outstanding items.

Assessment: Overkill for solo development. Would be effective for team development or when the project has OSS contributors.

Application Summary

Use Case Impact Priority
CI failure investigation High
Dependabot PR triage Medium
Issue triage Low (solo phase)
Daily status report Low (solo phase)

Concerns About Adoption

1. Cost

Running Agentic Workflows incurs AI engine API calls.

  • Copilot: ~2 premium requests per execution (agent execution + safe outputs)
  • Claude Code: API billing via ANTHROPIC_API_KEY
  • Codex: API billing via OPENAI_API_KEY

If AI runs on every CI failure, monthly costs become unpredictable. E2E tests especially have many jobs, so failure frequency × API cost must be estimated.

2. Technical Preview Instability

As of February 2026, it's still a technical preview. GitHub's official documentation explicitly states "at your own risk." Too early to integrate into production CI/CD pipelines.

Documentation is still developing — details around Markdown frontmatter specifications and engine configuration require some trial-and-error exploration.

3. Trust in Non-Deterministic Execution

In the CI/CD world, "same input → same output" is a fundamental principle. Agentic Workflows are inherently non-deterministic — AI judgment may differ each time.

Safe outputs and read-only defaults provide safety margins, but handling cases like "AI applied the wrong label" or "created an irrelevant fix PR" becomes necessary.

4. Compatibility with Self-Hosted Runners

Saru runs parallel E2E tests on 15 self-hosted runners. Whether Agentic Workflows function correctly on self-hosted runners is unverified. Official documentation mostly assumes GitHub-hosted runners.

5. Coexistence with Claude Code CLI

This is the most important consideration. Saru already uses Claude Code CLI locally for development. If Claude Code also runs automatically on GitHub, clear role separation becomes essential:

Local development:
  Human + Claude Code CLI → Code implementation, test creation

On GitHub:
  Copilot → PR review (already in use)
  Agentic Workflows → CI failure investigation, triage (under consideration)
Enter fullscreen mode Exit fullscreen mode

Multiple AIs operating on the same repository with different contexts requires clearly defined roles to avoid confusion.

Next Steps

This article stays at the investigation and evaluation level. In the next article, I plan to actually implement Agentic Workflows in the Saru repository and verify:

  • Building a CI failure auto-investigation workflow
  • Execution with the Claude Code engine
  • Operation on self-hosted runners
  • Actual cost measurement

Summary

Item Detail
What are GitHub Agentic Workflows A mechanism for auto-running AI agents on GitHub Actions
Definition method Natural language in Markdown, not YAML
AI engines Copilot CLI / Claude Code / OpenAI Codex
Relationship with CI/CD Complement, not replacement (Continuous AI)
Effective use cases for solo dev CI failure investigation, Dependabot PR triage
Current judgment Worth evaluating, but too early for production given technical preview status

Series Articles

Top comments (0)