Drop-in workflows that run your AI agent under SteelSpine on every PR, fail the build on regression, and produce signed audit artifacts retained 90 days.
Drop this in .github/workflows/ai-quality.yml:
name: AI Quality Gate
on:
pull_request:
paths:
- 'agents/**'
- 'prompts/**'
- 'tests/agent/**'
jobs:
steelspine-eval:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install SteelSpine
run: |
curl -fsSL https://steelspine.ai/dist/steelspine_bundle_latest_linux.tgz \
-o /tmp/steelspine.tgz
tar -xzf /tmp/steelspine.tgz -C $HOME/
bash $HOME/.prime/setup.sh
echo "$HOME/.prime/bin" >> $GITHUB_PATH
- name: Activate license
run: steelspine license activate "${{ secrets.STEELSPINE_LICENSE_KEY }}"
- name: Capture baseline run from main
run: |
git fetch origin main
git checkout origin/main -- agents/ prompts/
steelspine run --label baseline python3 agents/run_test_suite.py
BASELINE_ID=$(steelspine run list --json | python3 -c "import sys,json; print(json.load(sys.stdin)[-1]['run_id'])")
steelspine baseline set "$BASELINE_ID"
git checkout HEAD -- agents/ prompts/
- name: Capture PR run
run: steelspine run --label pr-${{ github.event.pull_request.number }} \
python3 agents/run_test_suite.py
- name: Hard gate on quality criteria
run: |
steelspine eval --last 1 \
--min-pass-rate 0.95 \
--max-failures 0 \
--forbid "POLICY VIOLATION" \
--forbid "EXCEPTION"
- name: Regression gate (fail if PR is worse than main)
run: steelspine compare --strict
# exits 0 if PR is no worse than baseline
# exits 2 if PR has more failures or fails where baseline succeeded
- name: Generate signed audit (always — even on failure)
if: always()
run: |
steelspine verify-run --compliance-html > audit.html
steelspine verify-run > audit.txt
- name: Upload audit artifact
if: always()
uses: actions/upload-artifact@v4
with:
name: steelspine-audit-pr-${{ github.event.pull_request.number }}
path: |
audit.html
audit.txt
retention-days: 90
| Behavior | Value |
|---|---|
PRs touching agents/ automatically capture an agent run | Catches regressions before merge |
Compares PR run against main baseline | Quantitative quality gate |
--strict exits non-zero on regression (Run B worse than Run A) | Hard CI fail on quality drop |
eval enforces minimum pass rate, max failures, forbidden patterns | Tight quality control |
verify-run --compliance-html artifact retained 90 days | EU AI Act Art.12 logging requirement, automatically |
| Command | Exit 0 | Exit 1 | Exit 2 |
|---|---|---|---|
steelspine eval | criteria met | criteria failed | — |
steelspine compare --strict | no regression | (n/a) | regression detected |
steelspine compare --fail-on-diff | identical | any divergence | — |
steelspine verify-run | integrity verified | (rare — internal error) | — |
GitHub Actions fails the job on any non-zero exit.
ai-quality-gate:
image: ubuntu:22.04
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
changes:
- agents/**
- prompts/**
before_script:
- apt-get update && apt-get install -y curl python3
- curl -fsSL https://steelspine.ai/dist/steelspine_bundle_latest_linux.tgz -o /tmp/steelspine.tgz
- tar -xzf /tmp/steelspine.tgz -C $HOME/
- bash $HOME/.prime/setup.sh
- export PATH="$HOME/.prime/bin:$PATH"
- steelspine license activate "$STEELSPINE_LICENSE_KEY"
script:
- steelspine run --label pr-$CI_MERGE_REQUEST_IID python3 agents/run_test_suite.py
- steelspine eval --last 1 --min-pass-rate 0.95 --max-failures 0
- steelspine compare --strict
artifacts:
when: always
paths:
- audit.html
- audit.txt
expire_in: 90 days
STEELSPINE_LICENSE_KEY as a CI secretThe license key activates SteelSpine on the CI runner. Each runner counts against your license seat limit, so for high-throughput CI, request a multi-seat license.
GitHub: Repository Settings → Secrets and variables → Actions → New repository secret. Name: STEELSPINE_LICENSE_KEY.
GitLab: Project Settings → CI/CD → Variables → Add variable. Key: STEELSPINE_LICENSE_KEY. Mask the variable.
Cache the install. SteelSpine extracts to ~50MB. Cache it across CI runs to save 5-10s per pipeline:
- uses: actions/cache@v4
with:
path: ~/.prime
key: steelspine-${{ runner.os }}-1.0.0
Run capture is idempotent: re-running the same PR doesn't pollute history; old runs prune per config.json retention rules.
The audit.html artifact is a real EU AI Act Article 12 deliverable: download from any successful or failed run and you have a tamper-evident, auditor-verifiable record retained for 90 days.
For data-sensitive pipelines: set archive_dir in ~/.prime/config.json to a CI-isolated path so audit artifacts don't leak between projects.