One seat is open.
Assay is a founding team being assembled deliberately. We hire for ownership of whole problems, starting with this one.
How we work.
Prevention over detection
If a property matters, we make it hold by construction.
Evidence over opinion
Grades come from deployed systems and signed records, not judgment calls.
Publish the design, not the results
Credibility comes from architecture anyone can inspect.
Small team, deep ownership
Few people, whole problems, no handoffs.
Open now
Co-founder, RL and evaluation.
This is less a job posting than a search for a co-author. The seat owns the scientific half of Assay: proving, with a rigor a frontier lab would accept, that training on deploy-graded environments moves a model.
What you would own
- The post-training loop: supervised fine-tuning and RL on open models, with synthetic data where it earns its place.
- The proof: you design and run the pre-registered eval as co-author, from baselines and seeds to the out-of-distribution splits.
- The signal: the strategy for which trajectories we collect and why, so every graded episode earns its cost.
Who this fits
Eval design is your research identity, not a side skill. You have shipped applied post-training work. And something in you prefers a hard, binary, execution-grounded reward under adversarial constraints to the comfort of preference scores.
What you do not need
A cloud-infrastructure background. The harness and the domain are covered; this seat is RL and evaluation.
Write to hello@assayops.ai with the strongest thing you have built or evaluated.