"We have observability" is the answer engineering gives when compliance asks "can you prove what the agent did?" They are not the same answer. Observability is built to help you understand your system. Audit is built to convince someone who doesn't trust you. The gap between them is exactly where compliance lives.
Observability and audit overlap — both involve recording what happened — but they optimize for opposite things.
| Observability | Audit-grade logging | |
|---|---|---|
| Audience | Your own engineers | Auditors, regulators, customers, courts |
| Goal | Understand and fix behavior | Prove behavior to a skeptic |
| Trust model | You trust your own data | The reviewer must not have to trust you |
| Mutability | Sampled, dropped, edited, expired — fine | Every record must be tamper-evident |
| Verification | You read the dashboard | A third party verifies integrity independently |
| Retention | Days to weeks, cost-driven | A fixed period set by regulation |
An observability stack that samples 1% of traces, drops high-cardinality fields to save money, and lets an admin edit or expire data is doing its job well. That same behavior makes it useless as evidence. Sampling means the one run that mattered may not be there; mutability means even if it is, you can't prove it wasn't changed.
Regimes like the EU AI Act's Article 12 record-keeping obligation don't ask "can you observe your agent?" They ask, in effect:
The compliance question is not "what does your system do?" It is "prove what it did, to someone who assumes you might be lying." Observability answers the first. Only audit-grade logging answers the second.
Teams often assume audit is observability with a longer retention setting. It isn't. You can keep traces for seven years and still fail an audit, because retention doesn't add the missing property: tamper-evidence. A seven-year-old trace that any admin could have edited is no more trustworthy than a seven-day-old one. The thing compliance needs — provable integrity — is a cryptographic property the typical observability backend was never designed to provide.
SteelSpine is built for the second job. It captures agent events without sampling, signs each one (HMAC-SHA256), chains them by hash, and seals the chain with an Ed25519 signature. The result is a record that:
steelspine verify-run --compliance-html > audit.html # the artifact an auditor reads
steelspine pack-verify run.spine.tgz # VERIFIED ✓ / TAMPERED !
Audit-grade does not mean "magic." The cryptography is classical (HMAC-SHA256 + Ed25519) and is not quantum-resistant. And because the signing key is held by the operator, the chain proves the log wasn't altered after capture without the key — it does not by itself prove the operator couldn't have produced a different log to start with. That last property (non-repudiation against the operator) is a key-custody control (HSM/KMS/third-party timestamping), separate from the logging format. We name the boundary because audit is precisely the context where an overclaim gets caught — and a caught overclaim discredits the genuine claim next to it.
Keep your observability stack — it's the right tool for debugging and SLOs. But don't send compliance to the observability dashboard. When the question is "prove it to someone who doesn't trust us," you need coverage, tamper-evidence, and independent verification: an audit trail, not a trace.
SteelSpine runs alongside whatever you already use. The free tier produces tamper-evident, independently verifiable audit records locally — download a sample run and verify it yourself, no signup required.
Start free Read the compliance mapping →This article is general engineering guidance, not legal advice. Your exact obligations under the EU AI Act or any other regime depend on your system's classification and the current consolidated text — confirm with qualified counsel.