Skip to content
Assay

Graded by a real deployment, not a judge model.

Write-isolated verification infrastructure for cloud-infrastructure agents. The reward is earned against live infrastructure, and it cannot be faked.

Frontier coding models look nearly solved on short, single-file benchmarks, and still struggle with the long-horizon infrastructure work that runs real production systems. We believe that gap is a data problem, not a capability ceiling. There is very little long-horizon training signal a lab can actually trust.

Assay exists to build that signal. We turn real cloud-engineering work into training environments where an agent provisions, migrates, and repairs actual infrastructure, and where the grade comes from the deployed system itself, not from a model's opinion of the work.

The verifier runs where the agent cannot reach it. The record it writes cannot be edited, replayed, or forged. Trust is the product; everything else follows from it.

Sam Silva

Founder, Assay

Two accounts. One wall.

The agent works and deploys in its own cloud account. The verifier observes from a separate trust domain, grades what actually exists, and signs a record the agent can never touch.

Trust domain A agent

Live infrastructure

The deployed system, observable state

Agent

Writes infrastructure as code

Trust domain B verifier

Verifier

Grades deployed state deterministically

Signed record

Immutable, independently verifiable

FIG. 1 The write-isolation boundary. Grades are computed, signed, and stored where the agent has no path.

Deploy-graded.

Success is observed from the state of live infrastructure. Deploys succeed or fail; resources exist or they do not.

Write-isolated.

Grading runs in a separate cloud account. No shared credentials, no network path, no way for the agent to reach its own reward.

Independently verifiable.

Every grade is a signed, write-once record. A lab can verify each one with a public key, without trusting our word.

We are small on purpose.

Assay is a founding team being assembled around one uncompromising design. If verification is your research problem, we should talk.