For engineering teams shipping regulated AI PermForge ≠ Perforce · we are AI agent permission runtime

Post-hoc audits miss 41.7% of what your agent actually does.

Output-only tracing covers the response channel. PermForge hooks call.tool, scope.upgrade, agent.spawn, and token.forward inline — before regulated data crosses the wrong boundary.

Source · AgentLeak benchmark · 4,979 traces · 2026

first 12 audits free · starting June 2026 · before Colorado 6/30

§02 · evidence not opinion

Four benchmarks pin the gap. Public, citable, reproducible.

Your internal AI risk review will reach at least one of these in 2026 H2. We don't argue the gap with you — we measure it on your traces.

  1. AgentLeak · visibility gap

    Multi-agent privacy violations missed by output-only audits. 4,979 production traces. Inter-agent channel = 68.9% of leakage, invisible to Braintrust / LangSmith.

  2. τ-bench · policy compliance gap

    Even SOTA agents fail organizational policy in 1 of 10 multi-turn workflows. The gap is structural — not a model upgrade fix.

  3. AgentHarm · model self-defense gap

    Refusal-trained LLMs are easily jailbroken when operating as browser agents. Built-in safety training is not enough at agent-time — an external control plane is essential.

  4. late by design PermForge category brief · 2026

    The category limit of post-hoc tracing

    Tracing tools — output-channel observability platforms broadly — surface what already shipped. Useful for debugging, eval loops, and incident review. Structurally insufficient for sub-call permission control, which has to land its decision before the call completes.

§03 · the hooks

Four entry points. Miss one — you still leak.

The permission-creep stories we hear in buyer calls cluster around these four boundaries. We instrument each one with policy-driven decisions — target p99 ≤ 18ms in SDK benchmarks, verifiable on your own workload during the audit.

  1. 01 call.tool

    every tool invocation

    agent fan-out crosses tenant / matter / patient boundaries inside one task

    catches · cross-matter access · MNPI propagation · PHI cross-touch

  2. 02 scope.upgrade

    read → write boundary transitions

    agents decide mid-task to elevate privilege — OAuth scopes are session-level, not sub-call

    catches · silent step-up · privilege creep · unapproved writes

  3. 03 agent.spawn

    every sub-agent token inheritance

    child agents do what parents would not — token inheritance was designed for humans

    catches · lateral movement · parent-child contract drift

  4. 04 token.forward

    cross-process credential pass

    each forward expands trust radius without re-authorization

    catches · persistent token leak · sub-process exfiltration

§04 · what your stack gains

Five capabilities. Drop-in for any agent runtime.

We ship an SDK (Python · TypeScript · Go) + a policy console. No data plane change · no model swap · works alongside Braintrust, LangSmith, Langfuse — they handle output tracing, we handle the during.

  1. 01

    Inline blocking · not after-the-fact trace

    Decisions land before the tool call completes. Sub-call latency p99 ≤ 18ms on our SDK · negligible at human-perceived agent speeds.

    pair · replaces · post-hoc tracing for the during-execution window

  2. 02

    Risk-graded routing · low-friction by default

    Low risk auto-passes. Medium batches to async approval. High blocks with full evidence chain. Policy templates ship per regulation — not a blank rule editor.

    pair · replaces · static RBAC and step-up auth at session granularity

  3. 03

    Async batched elicitation · no approval fatigue

    220 sub-call asks collapse to ≤ 5 human decisions per task. Slack · mobile · SMS · Magic Link. Timeout defaults are policy, not vibes.

    pair · replaces · MCP elicitation prompts and manual review queues

  4. 04

    Signed evidence trail · audit-grade by construction

    Every request → decision → approver → timestamp → outcome is hash-chained and immutable. Maps directly to EU AI Act Annex III, ABA 5.3, and HIPAA Minimum Necessary evidence formats.

    pair · replaces · "best effort" log export at audit time

  5. 05

    Circuit breaker · revoke and kill

    Approval was wrong? Revoke. Behavior anomalous? Kill. This is the contractual "right to interrupt" your enterprise customers will start requiring in 2026 H2 procurement.

    pair · replaces · informal incident response runbooks

§05 · drop-in · 5 lines

Drop-in next to your existing agent. No data plane change.

One pf.guard() wrap around your tool-call entry point — that's the entire surface area. Policy templates ship per regulation. Decisions land before the call completes, evidence is hash-chained.

agent.ts
$ npm install @permforge/sdk
import { PermForge } from "@permforge/sdk";

const pf = new PermForge({ apiKey: process.env.PERMFORGE_KEY });

// wrap your existing tool-call entry point
const result = await pf.guard(async () => {
  return agent.callTool("search_clients", { matter });
});
// inline policy decision · evidence written · escalation if blocked
$ pip install permforge
from permforge import PermForge

pf = PermForge(api_key=os.environ["PERMFORGE_KEY"])

# wrap your existing tool-call entry point
with pf.guard():
    result = agent.call_tool("search_clients", matter=matter)
# inline policy decision · evidence written · escalation if blocked
$ go get github.com/permforge/sdk-go
import permforge "github.com/permforge/sdk-go"

pf := permforge.New(os.Getenv("PERMFORGE_KEY"))

// wrap your existing tool-call entry point
result, err := pf.Guard(ctx, func() (any, error) {
    return agent.CallTool("search_clients", matter)
})
// inline policy decision · evidence written · escalation if blocked
target p99 ≤ 18ms · benched in-SDK SDK < 200 KB · zero runtime deps runs in your VPC · no data leaves
§06 · why this quarter

Your customers and your auditors are about to ask. Four deadlines stack in 2026 H2.

Network insurers and SOC 2 reviewers have already begun flagging "agent permission audit trail" as a renewal condition. The permission control gap is becoming a cash-flow event, not a research interest.

  1. 2026-06-30 T-34 days

    Colorado AI Act enforcement

    First enforcement day for "high-risk AI" duty to disclose and manage. Your customers in Colorado will start asking for the evidence trail.

  2. 2026-08-02 T-67 days

    EU AI Act · high-risk systems

    Article 14 demands proportionate real-time oversight. Annex III conformity assessments due. €35M or 7% global revenue fines. Mid-execution evidence is the gap.

  3. 2026-02 · in force +101 days · in force

    ABA Model Rule 5.3

    Extended to AI agents under attorney supervision. Law firm AI procurement now requires sub-call audit trail conformity.

  4. 2026-03 · in force +73 days · in force

    CMS HIPAA Minimum Necessary

    Clarified for AI: agent-driven PHI access must demonstrate per-call necessity. This is exactly the fan-out failure mode.

§07 · a note from the founder
Pre-launch · 2026-05 For the first 12 audits · starting June 2026

I spent the last 6 months talking to engineers shipping vertical AI agents in legal, healthcare, and financial compliance. Every one had the same gap: they could see the agent's final answer, but not what it did between calls. PermForge is the layer that closes that — sub-call, inline, with evidence you can hand to your auditor.

We're running 12 free audits starting June 2026 — before Colorado AI Act enforcement (6/30) and EU AI Act (8/2) — to calibrate the product against real regulated-agent traces. Your data and your logo stay private. We publish nothing without your written sign-off. If you'd be willing to send us 1 week of traces, please email me or grab a shadow audit below.

Founder · PermForge
§08 · objections an engineer raises on a call

The questions an engineer asks on minute three.

Anything we miss here, just email us — we'll write you back and add it.

  1. 01 Does this slow down my agent?

    Mean overhead is single-digit milliseconds per gated sub-call · p99 ≤ 18ms. For tool calls that themselves take 200–800ms (LLM, vector DB, external API), the gate is below user-perceivable latency. We publish a benchmark suite as part of PermBench so you can verify on your own workload.

  2. 02 Does my data leave my network?

    No. The SDK runs in-process inside your VPC. The optional sidecar runs as a container alongside your agent — also in your VPC. Only signed audit events (no payload) reach our control plane, and even that is opt-in.

  3. 03 How does this compare to Braintrust / LangSmith / Langfuse?

    Those are post-hoc tracing tools — they see the output channel after it shipped. PermForge is the inline control plane — it decides before the call completes. We pair with them: they continue handling output observability, we handle the during.

  4. 04 How does this compare to OPA / Cedar / Permit.io / Cerbos?

    Those are static authorization engines designed for app-level RBAC: yes / no per request, session-level scopes. PermForge is purpose-built for agents — sub-call granularity, async batched human approval, composite-semantic detection across N calls in one task, and signed evidence in audit-ready format.

  5. 05 What's open-source?

    PermBench (the benchmark suite + 120+ failure cases) is Apache 2.0 from day one — see github.com/permforge/permbench. The SDK is source-available under a fair-source license; the control plane is commercial. You can fork PermBench and run scoring on your own agent for free, forever.

  6. 06 What happens after the shadow audit?

    If we find ≥ 3 silent boundary crossings worth fixing, we offer the paid pilot at $80K / year for the first 5 customers — that covers SDK + console + 4 policy packs + founder-led support through Q1 2027. If we find fewer, we tell you so and walk away. Audits run June–September 2026, before Colorado AI Act (6/30) and EU AI Act (8/2) enforcement.

§09 · shadow audit · how it works

Free shadow audit · 1 week.

You send 1 week of agent traces — de-identified is fine, or wire the SDK for a live capture. You get back a signed report listing every silent permission boundary your agent crossed, with regulatory exposure estimates.

first 12 audits free · June 2026 1-week turnaround no commitment

  1. 01
    You send A 1–7 day window of agent traces (de-identified is fine). Or wire our SDK for a live capture.
  2. 02
    We run PermBench scoring + behavioral graph extraction. We map every cross-tenant access, scope upgrade, and silent ethical-wall hit.
  3. 03
    You get A signed report: visibility gap %, regulatory exposure, recommended controls. Pilot pricing only if it warrants it.