One seat is open.

Assay is a founding team being assembled deliberately. We hire for ownership of whole problems, starting with this one.

How we work.

Prevention over detection

If a property matters, we make it hold by construction.

Evidence over opinion

Grades come from deployed systems and signed records, not judgment calls.

Publish the design, not the results

Credibility comes from architecture anyone can inspect.

Small team, deep ownership

Few people, whole problems, no handoffs.

Open now

Co-founder, RL and evaluation.

This is less a job posting than a search for a co-author. The seat owns the scientific half of Assay: proving, with a rigor a frontier lab would accept, that training on deploy-graded environments moves a model.

What you would own

The post-training loop: supervised fine-tuning and RL on open models, with synthetic data where it earns its place.
The proof: you design and run the pre-registered eval as co-author, from baselines and seeds to the out-of-distribution splits.
The signal: the strategy for which trajectories we collect and why, so every graded episode earns its cost.

Who this fits

Eval design is your research identity, not a side skill. You have shipped applied post-training work. And something in you prefers a hard, binary, execution-grounded reward under adversarial constraints to the comfort of preference scores.

What you do not need

A cloud-infrastructure background. The harness and the domain are covered; this seat is RL and evaluation.

Write to hello@assayops.ai with the strongest thing you have built or evaluated.