AI tools, evaluated under a real methodology.
Indie Bench publishes head-to-head evaluations of the AI tools indie hackers and solo operators actually buy — coding assistants, writing tools, research agents, design generators. We use a named, versioned, transparent rubric and score every tool on the same tasks.
Recent evaluations
- Claude Code under IB-CODE-2026.1: a methodology stress-test watching
The first run of the Indie Operator Coding Rubric is also the rubric's own stress-test: we score Claude Code on a partial task set, document where the rubric breaks, and use the run to draft IB-CODE-2026.2. Claude Code's preliminary score is 81/100 across three tasks — strong on writing-shaped tasks, weaker on SQL correctness under load, and the rubric itself missed a 'tool-driven scope creep' failure mode we now plan to score.