Sources & Attribution

All data used in the CSI is derived from direct API measurements and publicly available pricing information. No synthetic or estimated data is used for the core index calculations.

Model APIs

Model	Provider	Access Method	Input $/1M	Output $/1M
Claude Opus 4	Anthropic	Direct API	$15.00	$75.00
Claude Sonnet 4	Anthropic	Direct API	$3.00	$15.00
Claude Haiku 4.5	Anthropic	Direct API	$1.00	$5.00
GPT-4o	OpenAI	Direct API	$2.50	$10.00
GPT-4o Mini	OpenAI	Direct API	$0.15	$0.60
Gemini 2.5 Flash	Google	Direct API	$0.15	$0.60
Gemini 2.5 Pro	Google	Direct API	$1.25	$10.00
Grok 3	xAI via OpenRouter	OpenRouter	$3.00	$15.00
DeepSeek V3.2	DeepSeek via OpenRouter	OpenRouter	$0.26	$0.38
DeepSeek R1	DeepSeek via OpenRouter	OpenRouter	$0.45	$2.15
Cohere Command A	Cohere via OpenRouter	OpenRouter	$2.50	$10.00
Cohere Command R+	Cohere via OpenRouter	OpenRouter	$2.50	$10.00
Llama 3.3 70B Instruct	Meta via OpenRouter	OpenRouter	$0.39	$0.39
Mistral Large	Mistral via OpenRouter	OpenRouter	$2.00	$6.00
Nemotron Super 49B	NVIDIA via OpenRouter	OpenRouter	$0.10	$0.40
Qwen 2.5 72B	Alibaba via OpenRouter	OpenRouter	$0.12	$0.39

Pricing is sourced from each provider’s official pricing page and stored in the database. It is refreshed manually when providers announce changes.

Pricing Sources

Anthropic — Claude Opus 4, Sonnet 4, and Haiku 4.5 pricing from official documentation
Google — Gemini 2.5 Flash and 2.5 Pro pricing from Google AI developer pricing
OpenAI — GPT-4o and GPT-4o Mini pricing from the API pricing page
OpenRouter — Grok 3, DeepSeek V3.2, DeepSeek R1, Cohere Command A, Cohere Command R+, Llama 3.3 70B, Mistral Large, Nemotron Super 49B, and Qwen 2.5 72B pricing from OpenRouter model pages

Infrastructure

Database: Supabase (PostgreSQL) for measurement storage and index computation
Benchmark harness: Python, executing from a single location to ensure consistent network latency
Scoring: Deterministic regex and keyword-based scoring applied identically across all models
Frontend: Static HTML/CSS/JS reading directly from the Supabase REST API

Limitations

Latency measurements include network round-trip time and may vary by geography and time of day.
Scoring functions use keyword and pattern matching, not semantic evaluation. Edge cases in model responses may be scored imperfectly.
The task set (12 tasks) is intentionally small for v1. It tests breadth, not depth, within each domain.
OpenRouter pricing and latency for routed models (Grok, DeepSeek, Cohere, Llama, Mistral, Nemotron, Qwen) include the intermediary’s margin and routing overhead.
Token counts are as reported by each provider’s API and may use different tokenization schemes.

License

The CSI methodology, code, and data are provided for informational and research purposes. The benchmark results reflect point-in-time measurements and should not be used as the sole basis for procurement decisions.