For engineering teams shipping regulated AI PermForge ≠ Perforce · we are AI agent permission runtime

Post-hoc audits miss 41.7% of what your agent actually does.

Output-only tracing covers the response channel. PermForge hooks call.tool, scope.upgrade, agent.spawn, and token.forward inline — before regulated data crosses the wrong boundary.

Source · AgentLeak benchmark · 4,979 traces · 2026

Free shadow audit See the 5-line install

first 12 audits free · starting June 2026 · before Colorado 6/30

§02 · evidence not opinion

Four benchmarks pin the gap. Public, citable, reproducible.

Your internal AI risk review will reach at least one of these in 2026 H2. We don't argue the gap with you — we measure it on your traces.

41.7% arxiv.org · 2502.16793 ↗

AgentLeak · visibility gap

Multi-agent privacy violations missed by output-only audits. 4,979 production traces. Inter-agent channel = 68.9% of leakage, invisible to Braintrust / LangSmith.
10.8% arxiv.org · 2406.12045 ↗

τ-bench · policy compliance gap

Even SOTA agents fail organizational policy in 1 of 10 multi-turn workflows. The gap is structural — not a model upgrade fix.
jailbreak arxiv.org · 2410.09024 ↗

AgentHarm · model self-defense gap

Refusal-trained LLMs are easily jailbroken when operating as browser agents. Built-in safety training is not enough at agent-time — an external control plane is essential.
late by design PermForge category brief · 2026

The category limit of post-hoc tracing

Tracing tools — output-channel observability platforms broadly — surface what already shipped. Useful for debugging, eval loops, and incident review. Structurally insufficient for sub-call permission control, which has to land its decision before the call completes.

§03 · the hooks

Four entry points. Miss one — you still leak.

The permission-creep stories we hear in buyer calls cluster around these four boundaries. We instrument each one with policy-driven decisions — target p99 ≤ 18ms in SDK benchmarks, verifiable on your own workload during the audit.

01 call.tool

every tool invocation

agent fan-out crosses tenant / matter / patient boundaries inside one task

catches · cross-matter access · MNPI propagation · PHI cross-touch
02 scope.upgrade

read → write boundary transitions

agents decide mid-task to elevate privilege — OAuth scopes are session-level, not sub-call

catches · silent step-up · privilege creep · unapproved writes
03 agent.spawn

every sub-agent token inheritance

child agents do what parents would not — token inheritance was designed for humans

catches · lateral movement · parent-child contract drift
04 token.forward

cross-process credential pass

each forward expands trust radius without re-authorization

catches · persistent token leak · sub-process exfiltration

§04 · what your stack gains

Five capabilities. Drop-in for any agent runtime.

We ship an SDK (Python · TypeScript · Go) + a policy console. No data plane change · no model swap · works alongside Braintrust, LangSmith, Langfuse — they handle output tracing, we handle the during.

01
Inline blocking · not after-the-fact trace

Decisions land before the tool call completes. Sub-call latency p99 ≤ 18ms on our SDK · negligible at human-perceived agent speeds.

pair · replaces · post-hoc tracing for the during-execution window
02
Risk-graded routing · low-friction by default

Low risk auto-passes. Medium batches to async approval. High blocks with full evidence chain. Policy templates ship per regulation — not a blank rule editor.

pair · replaces · static RBAC and step-up auth at session granularity
03
Async batched elicitation · no approval fatigue

220 sub-call asks collapse to ≤ 5 human decisions per task. Slack · mobile · SMS · Magic Link. Timeout defaults are policy, not vibes.

pair · replaces · MCP elicitation prompts and manual review queues
04
Signed evidence trail · audit-grade by construction

Every request → decision → approver → timestamp → outcome is hash-chained and immutable. Maps directly to EU AI Act Annex III, ABA 5.3, and HIPAA Minimum Necessary evidence formats.

pair · replaces · "best effort" log export at audit time
05
Circuit breaker · revoke and kill

Approval was wrong? Revoke. Behavior anomalous? Kill. This is the contractual "right to interrupt" your enterprise customers will start requiring in 2026 H2 procurement.

pair · replaces · informal incident response runbooks

§05 · drop-in · 5 lines

Drop-in next to your existing agent. No data plane change.

One pf.guard() wrap around your tool-call entry point — that's the entire surface area. Policy templates ship per regulation. Decisions land before the call completes, evidence is hash-chained.

agent.ts

$ npm install @permforge/sdk

import { PermForge } from "@permforge/sdk";

const pf = new PermForge({ apiKey: process.env.PERMFORGE_KEY });

// wrap your existing tool-call entry point
const result = await pf.guard(async () => {
  return agent.callTool("search_clients", { matter });
});
// inline policy decision · evidence written · escalation if blocked

$ pip install permforge

from permforge import PermForge

pf = PermForge(api_key=os.environ["PERMFORGE_KEY"])

# wrap your existing tool-call entry point
with pf.guard():
    result = agent.call_tool("search_clients", matter=matter)
# inline policy decision · evidence written · escalation if blocked

$ go get github.com/permforge/sdk-go

import permforge "github.com/permforge/sdk-go"

pf := permforge.New(os.Getenv("PERMFORGE_KEY"))

// wrap your existing tool-call entry point
result, err := pf.Guard(ctx, func() (any, error) {
    return agent.CallTool("search_clients", matter)
})
// inline policy decision · evidence written · escalation if blocked

target p99 ≤ 18ms · benched in-SDK SDK < 200 KB · zero runtime deps runs in your VPC · no data leaves

§06 · why this quarter

Your customers and your auditors are about to ask. Four deadlines stack in 2026 H2.

Network insurers and SOC 2 reviewers have already begun flagging "agent permission audit trail" as a renewal condition. The permission control gap is becoming a cash-flow event, not a research interest.

2026-06-30 T-34 days

Colorado AI Act enforcement

First enforcement day for "high-risk AI" duty to disclose and manage. Your customers in Colorado will start asking for the evidence trail.
2026-08-02 T-67 days

EU AI Act · high-risk systems

Article 14 demands proportionate real-time oversight. Annex III conformity assessments due. €35M or 7% global revenue fines. Mid-execution evidence is the gap.
2026-02 · in force +101 days · in force

ABA Model Rule 5.3

Extended to AI agents under attorney supervision. Law firm AI procurement now requires sub-call audit trail conformity.
2026-03 · in force +73 days · in force

CMS HIPAA Minimum Necessary

Clarified for AI: agent-driven PHI access must demonstrate per-call necessity. This is exactly the fan-out failure mode.

§07 · a note from the founder

Pre-launch · 2026-05 For the first 12 audits · starting June 2026

I spent the last 6 months talking to engineers shipping vertical AI agents in legal, healthcare, and financial compliance. Every one had the same gap: they could see the agent's final answer, but not what it did between calls. PermForge is the layer that closes that — sub-call, inline, with evidence you can hand to your auditor.

We're running 12 free audits starting June 2026 — before Colorado AI Act enforcement (6/30) and EU AI Act (8/2) — to calibrate the product against real regulated-agent traces. Your data and your logo stay private. We publish nothing without your written sign-off. If you'd be willing to send us 1 week of traces, please email me or grab a shadow audit below.

Founder · PermForge

contact@permforge.com · LinkedIn · GitHub

§08 · objections an engineer raises on a call

The questions an engineer asks on minute three.

Anything we miss here, just email us — we'll write you back and add it.

01 Does this slow down my agent?

Mean overhead is single-digit milliseconds per gated sub-call · p99 ≤ 18ms. For tool calls that themselves take 200–800ms (LLM, vector DB, external API), the gate is below user-perceivable latency. We publish a benchmark suite as part of PermBench so you can verify on your own workload.
02 Does my data leave my network?

No. The SDK runs in-process inside your VPC. The optional sidecar runs as a container alongside your agent — also in your VPC. Only signed audit events (no payload) reach our control plane, and even that is opt-in.
03 How does this compare to Braintrust / LangSmith / Langfuse?

Those are post-hoc tracing tools — they see the output channel after it shipped. PermForge is the inline control plane — it decides before the call completes. We pair with them: they continue handling output observability, we handle the during.
04 How does this compare to OPA / Cedar / Permit.io / Cerbos?

Those are static authorization engines designed for app-level RBAC: yes / no per request, session-level scopes. PermForge is purpose-built for agents — sub-call granularity, async batched human approval, composite-semantic detection across N calls in one task, and signed evidence in audit-ready format.
05 What's open-source?

PermBench (the benchmark suite + 120+ failure cases) is Apache 2.0 from day one — see github.com/permforge/permbench. The SDK is source-available under a fair-source license; the control plane is commercial. You can fork PermBench and run scoring on your own agent for free, forever.
06 What happens after the shadow audit?

If we find ≥ 3 silent boundary crossings worth fixing, we offer the paid pilot at $80K / year for the first 5 customers — that covers SDK + console + 4 policy packs + founder-led support through Q1 2027. If we find fewer, we tell you so and walk away. Audits run June–September 2026, before Colorado AI Act (6/30) and EU AI Act (8/2) enforcement.

§09 · shadow audit · how it works

Free shadow audit · 1 week.

You send 1 week of agent traces — de-identified is fine, or wire the SDK for a live capture. You get back a signed report listing every silent permission boundary your agent crossed, with regulatory exposure estimates.

first 12 audits free · June 2026 1-week turnaround no commitment

01
You send A 1–7 day window of agent traces (de-identified is fine). Or wire our SDK for a live capture.
02
We run PermBench scoring + behavioral graph extraction. We map every cross-tenant access, scope upgrade, and silent ethical-wall hit.
03
You get A signed report: visibility gap %, regulatory exposure, recommended controls. Pilot pricing only if it warrants it.

to contact@permforge.com

subject Shadow audit · request

Hi PermForge,

Company: [your company]
Vertical: [legal / healthcare / financial / other]
Agent runtime: [e.g. LangGraph 0.4 · OpenAI Agents SDK · custom · ...]
Daily task volume (rough): [e.g. 200 tasks / day · 5K sub-calls]
Trace format you can share: [OTel / Langfuse export / our schema / can wire SDK]
Window you'd like covered: [1 day / 1 week / Q3 2026]

Best,
[your name]
[title · linkedin]

Open this in my email

Or DM founder on LinkedIn ↗
Regulated traces stay in your VPC. We receive only what you send, and destroy our copy after sign-off unless you opt in.

Post-hoc audits miss 41.7% of what your agent actually does.

Four benchmarks pin the gap. Public, citable, reproducible.

AgentLeak · visibility gap

τ-bench · policy compliance gap

AgentHarm · model self-defense gap

The category limit of post-hoc tracing

Four entry points. Miss one — you still leak.

Five capabilities. Drop-in for any agent runtime.

Inline blocking · not after-the-fact trace

Risk-graded routing · low-friction by default

Async batched elicitation · no approval fatigue

Signed evidence trail · audit-grade by construction

Circuit breaker · revoke and kill

Drop-in next to your existing agent. No data plane change.

Your customers and your auditors are about to ask. Four deadlines stack in 2026 H2.

Colorado AI Act enforcement

EU AI Act · high-risk systems

ABA Model Rule 5.3

CMS HIPAA Minimum Necessary

The questions an engineer asks on minute three.

Free shadow audit · 1 week.