CI/CD Recipe — Gating PRs on AI Quality

Drop-in workflows that run your AI agent under SteelSpine on every PR, fail the build on regression, and produce signed audit artifacts retained 90 days.

GitHub Actions

Drop this in .github/workflows/ai-quality.yml:

name: AI Quality Gate

on:
  pull_request:
    paths:
      - 'agents/**'
      - 'prompts/**'
      - 'tests/agent/**'

jobs:
  steelspine-eval:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Install SteelSpine
        # Mirror the bundle you received from LemonSqueezy into your own
        # private artifact store (S3, GHCR, internal Artifactory, etc.) and
        # fetch from there in CI. The CI runner needs the same .tgz that
        # ships with your purchase.
        env:
          BUNDLE_URL: ${{ secrets.STEELSPINE_BUNDLE_URL }}  # e.g. s3:// or https:// to your mirror
        run: |
          curl -fsSL "$BUNDLE_URL" -o /tmp/steelspine.tgz
          tar -xzf /tmp/steelspine.tgz -C $HOME/
          bash $HOME/.prime/setup.sh
          echo "$HOME/.prime/bin" >> $GITHUB_PATH

      - name: Activate license
        run: steelspine license activate "${{ secrets.STEELSPINE_LICENSE_KEY }}"

      - name: Capture baseline run from main
        run: |
          git fetch origin main
          git checkout origin/main -- agents/ prompts/
          steelspine run --label baseline python3 agents/run_test_suite.py
          BASELINE_ID=$(steelspine run list --json | python3 -c "import sys,json; print(json.load(sys.stdin)[-1]['run_id'])")
          steelspine baseline set "$BASELINE_ID"
          git checkout HEAD -- agents/ prompts/

      - name: Capture PR run
        run: steelspine run --label pr-${{ github.event.pull_request.number }} \
                           python3 agents/run_test_suite.py

      - name: Hard gate on quality criteria
        run: |
          steelspine eval --last 1 \
            --min-pass-rate 0.95 \
            --max-failures 0 \
            --forbid "POLICY VIOLATION" \
            --forbid "EXCEPTION"

      - name: Regression gate (fail if PR is worse than main)
        run: steelspine compare --strict
        # exits 0 if PR is no worse than baseline
        # exits 2 if PR has more failures or fails where baseline succeeded

      - name: Generate signed audit (always — even on failure)
        if: always()
        run: |
          steelspine verify-run --compliance-html > audit.html
          steelspine verify-run > audit.txt

      - name: Upload audit artifact
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: steelspine-audit-pr-${{ github.event.pull_request.number }}
          path: |
            audit.html
            audit.txt
          retention-days: 90

What this gives you

Behavior	Value
PRs touching `agents/` automatically capture an agent run	Catches regressions before merge
Compares PR run against `main` baseline	Quantitative quality gate
`--strict` exits non-zero on regression (Run B worse than Run A)	Hard CI fail on quality drop
`eval` enforces minimum pass rate, max failures, forbidden patterns	Tight quality control
`verify-run --compliance-html` artifact retained 90 days	EU AI Act Art.12 logging requirement, automatically

Exit codes

Command	Exit 0	Exit 1	Exit 2
`steelspine eval`	criteria met	criteria failed	—
`steelspine compare --strict`	no regression	(n/a)	regression detected
`steelspine compare --fail-on-diff`	identical	any divergence	—
`steelspine verify-run`	integrity verified	(rare — internal error)	—

GitHub Actions fails the job on any non-zero exit.

GitLab CI

ai-quality-gate:
  image: ubuntu:22.04
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
      changes:
        - agents/**
        - prompts/**
  variables:
    # Mirror the bundle you received from LemonSqueezy into a private
    # artifact store (S3, GitLab Package Registry, Artifactory, etc.) and
    # set STEELSPINE_BUNDLE_URL as a masked CI/CD variable in project settings.
    STEELSPINE_BUNDLE_URL: $STEELSPINE_BUNDLE_URL
  before_script:
    - apt-get update && apt-get install -y curl python3
    - curl -fsSL "$STEELSPINE_BUNDLE_URL" -o /tmp/steelspine.tgz
    - tar -xzf /tmp/steelspine.tgz -C $HOME/
    - bash $HOME/.prime/setup.sh
    - export PATH="$HOME/.prime/bin:$PATH"
    - steelspine license activate "$STEELSPINE_LICENSE_KEY"
  script:
    - steelspine run --label pr-$CI_MERGE_REQUEST_IID python3 agents/run_test_suite.py
    - steelspine eval --last 1 --min-pass-rate 0.95 --max-failures 0
    - steelspine compare --strict
  artifacts:
    when: always
    paths:
      - audit.html
      - audit.txt
    expire_in: 90 days

Setting `STEELSPINE_LICENSE_KEY` as a CI secret

The license key activates SteelSpine on the CI runner. Each runner counts against your license seat limit, so for high-throughput CI, request a multi-seat license.

GitHub: Repository Settings → Secrets and variables → Actions → New repository secret. Name: STEELSPINE_LICENSE_KEY.

GitLab: Project Settings → CI/CD → Variables → Add variable. Key: STEELSPINE_LICENSE_KEY. Mask the variable.

Tips

Cache the install. SteelSpine extracts to ~50MB. Cache it across CI runs to save 5-10s per pipeline:

- uses: actions/cache@v4
  with:
    path: ~/.prime
    key: steelspine-${{ runner.os }}-1.0.0

Run capture is idempotent: re-running the same PR doesn't pollute history; old runs prune per config.json retention rules.

The audit.html artifact is a real EU AI Act Article 12 deliverable: download from any successful or failed run and you have a tamper-evident, auditor-verifiable record retained for 90 days.

For data-sensitive pipelines: set archive_dir in ~/.prime/config.json to a CI-isolated path so audit artifacts don't leak between projects.