Compliance · June 7, 2026

AI Agent Observability vs Audit: What Compliance Actually Needs

"We have observability" is the answer engineering gives when compliance asks "can you prove what the agent did?" They are not the same answer. Observability is built to help you understand your system. Audit is built to convince someone who doesn't trust you. The gap between them is exactly where compliance lives.

Two different jobs

Observability and audit overlap — both involve recording what happened — but they optimize for opposite things.

	Observability	Audit-grade logging
Audience	Your own engineers	Auditors, regulators, customers, courts
Goal	Understand and fix behavior	Prove behavior to a skeptic
Trust model	You trust your own data	The reviewer must not have to trust you
Mutability	Sampled, dropped, edited, expired — fine	Every record must be tamper-evident
Verification	You read the dashboard	A third party verifies integrity independently
Retention	Days to weeks, cost-driven	A fixed period set by regulation

An observability stack that samples 1% of traces, drops high-cardinality fields to save money, and lets an admin edit or expire data is doing its job well. That same behavior makes it useless as evidence. Sampling means the one run that mattered may not be there; mutability means even if it is, you can't prove it wasn't changed.

What compliance actually asks for

Regimes like the EU AI Act's Article 12 record-keeping obligation don't ask "can you observe your agent?" They ask, in effect:

Did you automatically record the relevant events over the system's operation? (Coverage, not sampling.)
Can you produce those records on request, for the required retention period? (Durability.)
Can their integrity be relied on — i.e., shown not to have been altered after the fact? (Tamper-evidence.)
Can that be checked without taking your word for it? (Independent verification.)

The compliance question is not "what does your system do?" It is "prove what it did, to someone who assumes you might be lying." Observability answers the first. Only audit-grade logging answers the second.

Why you can't just "turn up retention" on observability

Teams often assume audit is observability with a longer retention setting. It isn't. You can keep traces for seven years and still fail an audit, because retention doesn't add the missing property: tamper-evidence. A seven-year-old trace that any admin could have edited is no more trustworthy than a seven-day-old one. The thing compliance needs — provable integrity — is a cryptographic property the typical observability backend was never designed to provide.

What audit-grade adds

SteelSpine is built for the second job. It captures agent events without sampling, signs each one (HMAC-SHA256), chains them by hash, and seals the chain with an Ed25519 signature. The result is a record that:

covers the run rather than sampling it,
detects any later alteration, insertion, or deletion,
verifies offline with a published public key — no login to our platform required.

steelspine verify-run --compliance-html > audit.html   # the artifact an auditor reads
steelspine pack-verify run.spine.tgz                    # VERIFIED ✓ / TAMPERED !

The honest scope

Audit-grade does not mean "magic." The cryptography is classical (HMAC-SHA256 + Ed25519) and is not quantum-resistant. And because the signing key is held by the operator, the chain proves the log wasn't altered after capture without the key — it does not by itself prove the operator couldn't have produced a different log to start with. That last property (non-repudiation against the operator) is a key-custody control (HSM/KMS/third-party timestamping), separate from the logging format. We name the boundary because audit is precisely the context where an overclaim gets caught — and a caught overclaim discredits the genuine claim next to it.

The practical takeaway

Keep your observability stack — it's the right tool for debugging and SLOs. But don't send compliance to the observability dashboard. When the question is "prove it to someone who doesn't trust us," you need coverage, tamper-evidence, and independent verification: an audit trail, not a trace.

Add the audit layer your observability stack doesn't have

SteelSpine runs alongside whatever you already use. The free tier produces tamper-evident, independently verifiable audit records locally — download a sample run and verify it yourself, no signup required.

Start free Read the compliance mapping →

This article is general engineering guidance, not legal advice. Your exact obligations under the EU AI Act or any other regime depend on your system's classification and the current consolidated text — confirm with qualified counsel.