Tasks

Every tool evaluated under IB-CODE-2026.2 is run against these 15 tasks under identical conditions. Each task has its own page tracking which tool currently wins — useful when you want the best tool for one specific job rather than a general comparison.

  1. Build a Stripe Checkout integration

    New code, integration

  2. Scaffold email/password + Google OAuth in Next.js

    New code, framework choice

  3. Add soft-delete to a 12K-line TypeScript repo

    Existing codebase, pattern-following

  4. Fix a production bug from a stack trace

    Debugging, ambiguity tolerance

  5. Reversible SQL migration with non-locking backfill

    Domain-specific correctness

  6. Generate a landing page with Tailwind

    New code, design sense

  7. Refactor a 200-line function preserving tests

    Refactoring, behavior preservation

  8. Write integration tests for a REST endpoint

    Test generation

  9. Debug a CORS issue across frontend + backend

    Real-world ambiguity, multi-file reasoning

  10. Write a deployment script for Vercel + Supabase

    Infra, real-world tooling

  11. Convert a Python prototype to TypeScript + Express

    Cross-language understanding

  12. Write a customer-facing CHANGELOG from 15 commits

    Domain-specific writing for developers

  13. Java 3.x legacy maintenance — JSP + JDBC

    Bias-check task (long-context legacy)

  14. Embedded C: buffer-managed UART, no heap, <4KB stack

    Bias-check task (hardware-constrained)

  15. Rails ActiveRecord model with STI and counter cache

    Bias-check task (idiom-heavy framework)