Tasks
Every tool evaluated under IB-CODE-2026.2 is run against these 15 tasks under identical conditions. Each task has its own page tracking which tool currently wins — useful when you want the best tool for one specific job rather than a general comparison.
- Build a Stripe Checkout integration
New code, integration
- Scaffold email/password + Google OAuth in Next.js
New code, framework choice
- Add soft-delete to a 12K-line TypeScript repo
Existing codebase, pattern-following
- Fix a production bug from a stack trace
Debugging, ambiguity tolerance
- Reversible SQL migration with non-locking backfill
Domain-specific correctness
- Generate a landing page with Tailwind
New code, design sense
- Refactor a 200-line function preserving tests
Refactoring, behavior preservation
- Write integration tests for a REST endpoint
Test generation
- Debug a CORS issue across frontend + backend
Real-world ambiguity, multi-file reasoning
- Write a deployment script for Vercel + Supabase
Infra, real-world tooling
- Convert a Python prototype to TypeScript + Express
Cross-language understanding
- Write a customer-facing CHANGELOG from 15 commits
Domain-specific writing for developers
- Java 3.x legacy maintenance — JSP + JDBC
Bias-check task (long-context legacy)
- Embedded C: buffer-managed UART, no heap, <4KB stack
Bias-check task (hardware-constrained)
- Rails ActiveRecord model with STI and counter cache
Bias-check task (idiom-heavy framework)