Methodology

Every Indie Bench evaluation is scored against a published rubric. Each rubric has a stable identifier (e.g. IB-CODE-2026.1), a public version history, and a transparent scoring breakdown. We publish the rubric so you can disagree with the weighting and recompute the score from the raw per-task data.

When a rubric changes substantively, we bump the version, mark the previous version superseded, and keep it published. Evaluations scored under the old version stay valid — just labelled with the version they used.

Active rubrics

IB-CODE-2026.2 — The Indie Operator Coding Rubric
Revision 2 of the Indie Operator Coding Rubric. Adds explicit evaluator-bias mitigations (cross-LLM scoring, bias-check tasks, public methodology repo), brief-adherence sub-dimension under First-pass correctness, split Pricing Reality (per-task-cost + predictability), an IDE-vs-CLI scope statement, clean-first-pass Error Recovery 5 default, and (v2.1.0) a mandatory cost report on every eval page in dollars-per-task with token footnotes where knowable.

Superseded

IB-CODE-2026.1 — The Indie Operator Coding Rubric superseded
A reproducible rubric for evaluating AI coding tools against the tasks indie hackers and solo SaaS operators actually do. Twelve tasks, six scoring dimensions, weighted to 100. Superseded by IB-CODE-2026.2 the same day (after the first eval surfaced bias surfaces and rubric gaps); evaluations scored under v1.0.0 remain published but should be re-scored under v2.0.0 when revisited.