Simple, honest pricing

Start free. Upgrade when you need unlimited testing.

Free

$0
forever
  • 3 test runs total
  • Compare 2 Claude versions
  • Up to 5 test cases per run
  • Pass/fail dashboard
  • Model pinning recommendation
  • Unlimited test runs
  • Unlimited test cases
  • Save & export results
  • Priority support
Start Free
MOST POPULAR

Pro

$9
/month
Cancel anytime
  • Unlimited test runs
  • Unlimited test cases per run
  • Compare 2 Claude versions
  • Pass/fail dashboard
  • Model pinning recommendation
  • Save & export results (CSV)
  • Test history (last 30 runs)
  • Priority email support
Upgrade to Pro — $9/mo

Common Questions

Do you store my API key?

Never. Your Anthropic API key lives only in your browser session. We have no backend that touches it.

Which Claude versions do you test against?

We test against claude-3-5-sonnet-20241022 (the 4.6 era model) and claude-sonnet-4-5 (the 4.7 model). These are the two versions where regressions have been most reported.

What counts as a "test run"?

One test run = clicking the Run button once. It calls both models with all your test cases. Free users get 3 runs total.

Can I test my own prompts?

Yes — that's the whole point. Add any prompt you use in production, define what the output should contain, and see which model version passes.

What if neither version passes my tests?

The dashboard will show 0% for both. This usually means your expected output string needs adjusting, or the prompt needs work regardless of model version.