Post-hoc audits miss 41.7% of what your agent actually does.
Output-only tracing covers the response channel.
PermForge hooks call.tool, scope.upgrade, agent.spawn, and token.forward
inline — before regulated data crosses the wrong boundary.
Source · AgentLeak benchmark · 4,979 traces · 2026
first 12 audits free · starting June 2026 · before Colorado 6/30
Four benchmarks pin the gap. Public, citable, reproducible.
Your internal AI risk review will reach at least one of these in 2026 H2. We don't argue the gap with you — we measure it on your traces.
- 41.7% arxiv.org · 2502.16793 ↗
AgentLeak · visibility gap
Multi-agent privacy violations missed by output-only audits. 4,979 production traces. Inter-agent channel = 68.9% of leakage, invisible to Braintrust / LangSmith.
- 10.8% arxiv.org · 2406.12045 ↗
τ-bench · policy compliance gap
Even SOTA agents fail organizational policy in 1 of 10 multi-turn workflows. The gap is structural — not a model upgrade fix.
- jailbreak arxiv.org · 2410.09024 ↗
AgentHarm · model self-defense gap
Refusal-trained LLMs are easily jailbroken when operating as browser agents. Built-in safety training is not enough at agent-time — an external control plane is essential.
- late by design PermForge category brief · 2026
The category limit of post-hoc tracing
Tracing tools — output-channel observability platforms broadly — surface what already shipped. Useful for debugging, eval loops, and incident review. Structurally insufficient for sub-call permission control, which has to land its decision before the call completes.
Four entry points. Miss one — you still leak.
The permission-creep stories we hear in buyer calls cluster around these four boundaries.
We instrument each one with policy-driven decisions — target p99 ≤ 18ms in SDK benchmarks, verifiable on your own workload during the audit.
- 01
call.toolevery tool invocation
agent fan-out crosses tenant / matter / patient boundaries inside one task
catches · cross-matter access · MNPI propagation · PHI cross-touch
- 02
scope.upgraderead → write boundary transitions
agents decide mid-task to elevate privilege — OAuth scopes are session-level, not sub-call
catches · silent step-up · privilege creep · unapproved writes
- 03
agent.spawnevery sub-agent token inheritance
child agents do what parents would not — token inheritance was designed for humans
catches · lateral movement · parent-child contract drift
- 04
token.forwardcross-process credential pass
each forward expands trust radius without re-authorization
catches · persistent token leak · sub-process exfiltration
Five capabilities. Drop-in for any agent runtime.
We ship an SDK (Python · TypeScript · Go) + a policy console. No data plane change · no model swap · works alongside Braintrust, LangSmith, Langfuse — they handle output tracing, we handle the during.
- 01
Inline blocking · not after-the-fact trace
Decisions land before the tool call completes. Sub-call latency p99 ≤ 18ms on our SDK · negligible at human-perceived agent speeds.
pair · replaces · post-hoc tracing for the during-execution window
- 02
Risk-graded routing · low-friction by default
Low risk auto-passes. Medium batches to async approval. High blocks with full evidence chain. Policy templates ship per regulation — not a blank rule editor.
pair · replaces · static RBAC and step-up auth at session granularity
- 03
Async batched elicitation · no approval fatigue
220 sub-call asks collapse to ≤ 5 human decisions per task. Slack · mobile · SMS · Magic Link. Timeout defaults are policy, not vibes.
pair · replaces · MCP elicitation prompts and manual review queues
- 04
Signed evidence trail · audit-grade by construction
Every request → decision → approver → timestamp → outcome is hash-chained and immutable. Maps directly to EU AI Act Annex III, ABA 5.3, and HIPAA Minimum Necessary evidence formats.
pair · replaces · "best effort" log export at audit time
- 05
Circuit breaker · revoke and kill
Approval was wrong? Revoke. Behavior anomalous? Kill. This is the contractual "right to interrupt" your enterprise customers will start requiring in 2026 H2 procurement.
pair · replaces · informal incident response runbooks
Drop-in next to your existing agent. No data plane change.
One pf.guard() wrap around your tool-call entry point — that's the entire surface area.
Policy templates ship per regulation. Decisions land before the call completes, evidence is hash-chained.
import { PermForge } from "@permforge/sdk";
const pf = new PermForge({ apiKey: process.env.PERMFORGE_KEY });
// wrap your existing tool-call entry point
const result = await pf.guard(async () => {
return agent.callTool("search_clients", { matter });
});
// inline policy decision · evidence written · escalation if blocked from permforge import PermForge
pf = PermForge(api_key=os.environ["PERMFORGE_KEY"])
# wrap your existing tool-call entry point
with pf.guard():
result = agent.call_tool("search_clients", matter=matter)
# inline policy decision · evidence written · escalation if blocked import permforge "github.com/permforge/sdk-go"
pf := permforge.New(os.Getenv("PERMFORGE_KEY"))
// wrap your existing tool-call entry point
result, err := pf.Guard(ctx, func() (any, error) {
return agent.CallTool("search_clients", matter)
})
// inline policy decision · evidence written · escalation if blocked Your customers and your auditors are about to ask. Four deadlines stack in 2026 H2.
Network insurers and SOC 2 reviewers have already begun flagging "agent permission audit trail" as a renewal condition. The permission control gap is becoming a cash-flow event, not a research interest.
- 2026-06-30 T-34 days
Colorado AI Act enforcement
First enforcement day for "high-risk AI" duty to disclose and manage. Your customers in Colorado will start asking for the evidence trail.
- 2026-08-02 T-67 days
EU AI Act · high-risk systems
Article 14 demands proportionate real-time oversight. Annex III conformity assessments due. €35M or 7% global revenue fines. Mid-execution evidence is the gap.
- 2026-02 · in force +101 days · in force
ABA Model Rule 5.3
Extended to AI agents under attorney supervision. Law firm AI procurement now requires sub-call audit trail conformity.
- 2026-03 · in force +73 days · in force
CMS HIPAA Minimum Necessary
Clarified for AI: agent-driven PHI access must demonstrate per-call necessity. This is exactly the fan-out failure mode.
I spent the last 6 months talking to engineers shipping vertical AI agents in legal, healthcare, and financial compliance. Every one had the same gap: they could see the agent's final answer, but not what it did between calls. PermForge is the layer that closes that — sub-call, inline, with evidence you can hand to your auditor.
We're running 12 free audits starting June 2026 — before Colorado AI Act enforcement (6/30) and EU AI Act (8/2) — to calibrate the product against real regulated-agent traces. Your data and your logo stay private. We publish nothing without your written sign-off. If you'd be willing to send us 1 week of traces, please email me or grab a shadow audit below.
The questions an engineer asks on minute three.
Anything we miss here, just email us — we'll write you back and add it.
-
01 Does this slow down my agent?
Mean overhead is single-digit milliseconds per gated sub-call · p99 ≤ 18ms. For tool calls that themselves take 200–800ms (LLM, vector DB, external API), the gate is below user-perceivable latency. We publish a benchmark suite as part of PermBench so you can verify on your own workload.
-
02 Does my data leave my network?
No. The SDK runs in-process inside your VPC. The optional sidecar runs as a container alongside your agent — also in your VPC. Only signed audit events (no payload) reach our control plane, and even that is opt-in.
-
03 How does this compare to Braintrust / LangSmith / Langfuse?
Those are post-hoc tracing tools — they see the output channel after it shipped. PermForge is the inline control plane — it decides before the call completes. We pair with them: they continue handling output observability, we handle the during.
-
04 How does this compare to OPA / Cedar / Permit.io / Cerbos?
Those are static authorization engines designed for app-level RBAC: yes / no per request, session-level scopes. PermForge is purpose-built for agents — sub-call granularity, async batched human approval, composite-semantic detection across N calls in one task, and signed evidence in audit-ready format.
-
05 What's open-source?
PermBench (the benchmark suite + 120+ failure cases) is Apache 2.0 from day one — see github.com/permforge/permbench. The SDK is source-available under a fair-source license; the control plane is commercial. You can fork PermBench and run scoring on your own agent for free, forever.
-
06 What happens after the shadow audit?
If we find ≥ 3 silent boundary crossings worth fixing, we offer the paid pilot at $80K / year for the first 5 customers — that covers SDK + console + 4 policy packs + founder-led support through Q1 2027. If we find fewer, we tell you so and walk away. Audits run June–September 2026, before Colorado AI Act (6/30) and EU AI Act (8/2) enforcement.
Free shadow audit · 1 week.
You send 1 week of agent traces — de-identified is fine, or wire the SDK for a live capture. You get back a signed report listing every silent permission boundary your agent crossed, with regulatory exposure estimates.
first 12 audits free · June 2026 1-week turnaround no commitment
- 01 You send A 1–7 day window of agent traces (de-identified is fine). Or wire our SDK for a live capture.
- 02 We run PermBench scoring + behavioral graph extraction. We map every cross-tenant access, scope upgrade, and silent ethical-wall hit.
- 03 You get A signed report: visibility gap %, regulatory exposure, recommended controls. Pilot pricing only if it warrants it.