DevOps Workflow Engineer

April 2, 2026 ยท View on GitHub

The agent generates GitHub Actions workflow YAML, analyzes existing pipelines for optimization opportunities, and creates deployment plans with strategy selection, health checks, and rollback procedures.


Quick Start

# Generate a CI workflow
python scripts/workflow_generator.py --type ci --language python --test-framework pytest

# Analyze existing pipelines for optimization
python scripts/pipeline_analyzer.py .github/workflows/ --format json

# Plan a deployment strategy
python scripts/deployment_planner.py --type webapp --environments dev,staging,prod --strategy canary

Tools Overview

ToolInputOutput
workflow_generator.pyWorkflow type + languageGitHub Actions YAML (ci, cd, release, security-scan, docs-check)
pipeline_analyzer.pyWorkflow file or directoryOptimization findings, cost estimates, severity ratings
deployment_planner.pyProject type + environmentsDeployment plan with strategy, health checks, rollback

All tools support --format json and --output for file writing.


Workflow 1: CI Pipeline Design

The agent generates pipelines following fail-fast ordering:

  1. Lint and format (~30s) -- cheapest gate first
  2. Unit tests (~2-5m) -- matrix across versions
  3. Build verification (~3-8m)
  4. Integration tests (~5-15m, parallel with build)
  5. Security scanning (~2-5m)
jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: make lint

  test:
    needs: lint
    strategy:
      matrix:
        python-version: ['3.10', '3.11', '3.12']
    steps:
      - uses: actions/setup-python@v5
        with: { python-version: "${{ matrix.python-version }}", cache: pip }
      - run: pip install -r requirements.txt
      - run: pytest --junitxml=results.xml

  security:
    needs: lint
    steps:
      - run: pip-audit -r requirements.txt

CI targets:

MetricTargetFix
Total CI time< 10 minParallelize, add caching
Lint step< 1 minUse pre-commit locally
Unit tests< 5 minSplit suites, use matrix
Flaky rate< 1%Quarantine flaky tests
Cache hit rate> 80%Review cache keys

Workflow 2: CD Pipeline and Multi-Environment Deployment

python scripts/deployment_planner.py --type webapp --environments dev,staging,prod --format json

Environment promotion flow:

Build -> Dev (auto) -> Staging (auto) -> Production (manual approval)
                                              |
                                        Canary (10%) -> Full rollout
AspectDevStagingProduction
TriggerEvery pushMerge to mainManual approval
Replicas123+ (auto-scaled)
SecretsRepositoryEnvironmentVault/OIDC
MonitoringBasic logsFull observabilityFull + alerting

Key CD rules:

  • Build once, deploy the same artifact everywhere
  • Tag artifacts with commit SHA for traceability
  • Use environment protection rules for production gates
  • Maintain rollback capability at every stage

Workflow 3: Pipeline Optimization

python scripts/pipeline_analyzer.py .github/workflows/ --format json -o report.json

The agent checks for:

  1. Missing caching -- dependencies reinstalled every run
  2. No timeouts -- stuck jobs burn budget
  3. Sequential chains that could parallelize
  4. Deprecated actions with newer versions available
  5. Security issues -- secrets in logs, missing permissions scoping
  6. Cost inefficiency -- oversized runners, no path filtering

Optimization techniques:

Path-based filtering -- skip CI for docs-only changes:

on:
  push:
    paths: ['src/**', 'tests/**', 'requirements*.txt']
    paths-ignore: ['docs/**', '*.md']

Concurrency cancellation -- cancel superseded runs:

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

Dependency caching:

- uses: actions/cache@v4
  with:
    path: ~/.cache/pip
    key: ${{ runner.os }}-deps-${{ hashFiles('**/requirements.txt') }}

Deployment Strategies

Decision tree:

Zero-downtime required?
  No  -> Rolling deployment
  Yes -> Need instant rollback?
    No  -> Rolling with health checks
    Yes -> Budget for 2x infrastructure?
      Yes -> Blue-green
      No  -> Canary

Canary traffic split schedule:

Phase%DurationGate
15%15 minError rate < 0.1%
225%30 minP99 latency < 200ms
350%60 minBusiness metrics stable
4100%--Full promotion

GitHub Actions Patterns

Reusable workflows -- define once, call everywhere:

# .github/workflows/reusable-deploy.yml
on:
  workflow_call:
    inputs:
      environment: { required: true, type: string }
      image_tag: { required: true, type: string }
    secrets:
      DEPLOY_KEY: { required: true }

OIDC authentication -- no long-lived credentials:

permissions:
  id-token: write
  contents: read
steps:
  - uses: aws-actions/configure-aws-credentials@v4
    with:
      role-to-assume: arn:aws:iam::123456789:role/github-actions
      aws-region: us-east-1

Secrets hierarchy: Organization > Repository > Environment. Never echo secrets; use add-mask for dynamic values. Prefer OIDC for cloud auth.


Runner Cost Optimization

RunnervCPURAMCost/minBest For
2-core27 GB$0.008Standard tasks
4-core416 GB$0.016Build-heavy
8-core832 GB$0.032Large compilations
16-core1664 GB$0.064Parallel test suites

Monthly estimate: (runs/day) x (avg min/run) x 30 x (cost/min) Example: 50 pushes/day x 8 min x 30 = 12,000 min x $0.008 = $96/month.


Anti-Patterns

Anti-PatternProblemFix
Monolithic workflow45-min single workflowSplit into parallel jobs
No cachingReinstall deps every runCache dependencies and builds
Secrets in logsLeaked credentialsadd-mask, avoid echo
No timeoutStuck jobs burn budgettimeout-minutes on every job
Full matrix every push30-min matrix on every commitFull nightly; reduced on push
No rollback planStuck with broken deployAutomate rollback in CD pipeline

Troubleshooting

ProblemCauseSolution
Workflow never triggersWrong on: config or branch name mismatchVerify triggers match branching strategy
Cache miss every runVolatile cache key (timestamp)Use hashFiles() on lock files
Matrix fails on one OS onlyPlatform-specific paths or depsUse shell: bash; install OS deps per matrix entry
Secret not availableWrong environment scopeEnsure job declares correct environment:
Health check fails after deployApp not started before checkAdd retry loop with backoff
Concurrency cancels needed runsOverly broad group keyScope to workflow-ref; separate groups for deploy

References

GuidePath
GitHub Actions Patternsreferences/github-actions-patterns.md
Deployment Strategiesreferences/deployment-strategies.md
Agentic Workflows Guidereferences/agentic-workflows-guide.md

Integration Points

SkillIntegration
release-orchestratorRelease workflows align with versioning and changelog
senior-devopsDeployment strategies complement infra automation
senior-secopsSecurity scanning steps feed SecOps dashboards
senior-qaCI quality gates map to QA acceptance criteria
incident-commanderRollback procedures connect to incident playbooks

Last Updated: April 2026 Version: 1.1.0