Evaluations
Every published evaluation, newest first. Each page links to the methodology version it was scored against; older evaluations remain published even when methodology versions are superseded.
- Claude Code under IB-CODE-2026.1: a methodology stress-test watching
The first run of the Indie Operator Coding Rubric is also the rubric's own stress-test: we score Claude Code on a partial task set, document where the rubric breaks, and use the run to draft IB-CODE-2026.2. Claude Code's preliminary score is 81/100 across three tasks — strong on writing-shaped tasks, weaker on SQL correctness under load, and the rubric itself missed a 'tool-driven scope creep' failure mode we now plan to score.