Sources & Attribution
All data used in the CSI is derived from direct API measurements and publicly available pricing information. No synthetic or estimated data is used for the core index calculations.
Model APIs
| Model | Provider | Access Method | Input $/1M | Output $/1M |
|---|---|---|---|---|
| Claude Opus 4 | Anthropic | Direct API | $15.00 | $75.00 |
| Claude Sonnet 4 | Anthropic | Direct API | $3.00 | $15.00 |
| Claude Haiku 4.5 | Anthropic | Direct API | $1.00 | $5.00 |
| GPT-4o | OpenAI | Direct API | $2.50 | $10.00 |
| GPT-4o Mini | OpenAI | Direct API | $0.15 | $0.60 |
| Gemini 2.5 Flash | Direct API | $0.15 | $0.60 | |
| Gemini 2.5 Pro | Direct API | $1.25 | $10.00 | |
| Grok 3 | xAI via OpenRouter | OpenRouter | $3.00 | $15.00 |
| DeepSeek V3.2 | DeepSeek via OpenRouter | OpenRouter | $0.26 | $0.38 |
| DeepSeek R1 | DeepSeek via OpenRouter | OpenRouter | $0.45 | $2.15 |
| Cohere Command A | Cohere via OpenRouter | OpenRouter | $2.50 | $10.00 |
| Cohere Command R+ | Cohere via OpenRouter | OpenRouter | $2.50 | $10.00 |
| Llama 3.3 70B Instruct | Meta via OpenRouter | OpenRouter | $0.39 | $0.39 |
| Mistral Large | Mistral via OpenRouter | OpenRouter | $2.00 | $6.00 |
| Nemotron Super 49B | NVIDIA via OpenRouter | OpenRouter | $0.10 | $0.40 |
| Qwen 2.5 72B | Alibaba via OpenRouter | OpenRouter | $0.12 | $0.39 |
Pricing is sourced from each provider’s official pricing page and stored in the database. It is refreshed manually when providers announce changes.
Pricing Sources
- Anthropic — Claude Opus 4, Sonnet 4, and Haiku 4.5 pricing from official documentation
- Google — Gemini 2.5 Flash and 2.5 Pro pricing from Google AI developer pricing
- OpenAI — GPT-4o and GPT-4o Mini pricing from the API pricing page
- OpenRouter — Grok 3, DeepSeek V3.2, DeepSeek R1, Cohere Command A, Cohere Command R+, Llama 3.3 70B, Mistral Large, Nemotron Super 49B, and Qwen 2.5 72B pricing from OpenRouter model pages
Infrastructure
- Database: Supabase (PostgreSQL) for measurement storage and index computation
- Benchmark harness: Python, executing from a single location to ensure consistent network latency
- Scoring: Deterministic regex and keyword-based scoring applied identically across all models
- Frontend: Static HTML/CSS/JS reading directly from the Supabase REST API
Limitations
- Latency measurements include network round-trip time and may vary by geography and time of day.
- Scoring functions use keyword and pattern matching, not semantic evaluation. Edge cases in model responses may be scored imperfectly.
- The task set (12 tasks) is intentionally small for v1. It tests breadth, not depth, within each domain.
- OpenRouter pricing and latency for routed models (Grok, DeepSeek, Cohere, Llama, Mistral, Nemotron, Qwen) include the intermediary’s margin and routing overhead.
- Token counts are as reported by each provider’s API and may use different tokenization schemes.
License
The CSI methodology, code, and data are provided for informational and research purposes. The benchmark results reflect point-in-time measurements and should not be used as the sole basis for procurement decisions.