GPT-5 token & cost calculator
OpenAI GPT-5 is the flagship of the GPT-5 family — the model OpenAI positions for hardest-task reasoning, multi-step agentic flows, and the kinds of problems where a single percentage point of quality lift translates directly into product outcomes. The pricing reflects that positioning: at $1.25 input / $10 output per million tokens, GPT-5 is roughly comparable to Claude 4.5 Sonnet on input but ⅔ of its output rate, and meaningfully cheaper than Claude 4.7 Opus or GPT-4.1 on most realistic workloads.
What makes the budgeting math distinctive on GPT-5 is the 8× input/output ratio. Output is the cost driver to a degree most teams underestimate when they first ship — clamping max_tokens aggressively, validating structured responses, and routing routine work to GPT-5 Mini are the three controls that compound. The calculator below shows the exact token count for whatever you paste, and there really is no approximation involved: gpt-tokenizer ships OpenAI's canonical vocab, so the number you see is the number OpenAI will bill.
Saved scenariosnone yet
Saved on this browser only — never uploaded. Up to 10 scenarios.
Tip: save a scenario when you have a prompt + model + response length you might revisit. Useful for sizing features before committing to a vendor.
Verify privacysince this page loaded — updates live
Open DevTools → Network. Type into the calculator. No request bodies should contain your prompt text.
Pricing
GPT-5 is flat-priced — no tiered surcharge above a context threshold. The cached-input discount is not modeled in the calculator; expect 30–50% savings on the input side once you wire up prompt caching in production.
| Tier | Input $/M | Output $/M |
|---|---|---|
| All input | $1.25 | $10 |
| Context window | 400,000 tokens | |
Verified against openai.com on 2026-05-09.
Worked examples
These three scenarios sit at typical chat / system-prompt / long-doc-Q&A sizes. The dollar figures are exact because the tokenizer is exact — no ±2% caveat applies for OpenAI models.
| Scenario | Input | Output | Cost |
|---|---|---|---|
Short chat turn A typical Q&A turn with a small system prompt. | 800 | 400 | <$0.01 |
System prompt + tool spec A larger context window with a tool schema, single response. | 5,000 | 500 | $0.011 |
Long document Q&A A long-form input (e.g. transcript) with a structured response. | 50,000 | 1,500 | $0.077 |
The instinct that pays off: route by request type, not by team affinity. GPT-5 is the default reach-for in the GPT-5 family, but if a particular request type fails reliably on GPT-5 Mini and your latency budget is tight, GPT-5 isn't always the answer — sometimes the right move is to fix the prompt or change the schema. Cost-aware routing layers tend to use Mini for ≥80% of traffic in production deployments.
How is this counted?
We tokenize via gpt-tokenizer's o200k_base encoding — the same vocab GPT-5, GPT-4.1, and the o-series all use. Because the tokenizer is canonical, calibration factor is 1.0 and the result is exact. Inputs over 50,000 characters tokenize in a Web Worker so the page stays responsive for very long prompts. The "approx" pill that appears on Claude and Gemini calculators is suppressed here.
FAQ
- Yes. Unlike Claude and Gemini, OpenAI publishes the canonical tokenizer (tiktoken). The MIT package gpt-tokenizer ships the same vocab, so the number you see here matches what OpenAI will bill you for that input — no approximation, no calibration.
- GPT-5 sits at the top of the GPT-5 family for hardest-task quality; Mini is roughly 5× cheaper on input and Nano is roughly 25× cheaper. The right call depends on your eval set — most production workloads should run cheaper requests on Mini and route only the genuinely-hard ones to GPT-5.
- GPT-5 supports a 400,000-token context window — larger than the Claude 4.x family (200k) but smaller than Gemini 2.5 (1M) or GPT-4.1 (1M). For long-document workloads where context length is the binding constraint, GPT-4.1 or Gemini 2.5 Pro may be a better fit even if quality on shorter prompts favors GPT-5.
- No. Tokenization runs in JavaScript on the page (or in a Web Worker for inputs over 50,000 characters). There is no server endpoint that ever receives prompt text. The only serverless function on the site is /api/og for social preview images.
- OpenAI charges a discounted rate for input tokens that match a cached prefix from a recent request. The calculator above does not currently model this — for production deployments where the same system prompt is reused thousands of times, your effective cost will be lower than the headline number, often by 50%+.
Is the token count exact?
How does GPT-5 compare to GPT-5 Mini?
What is the context window?
Does my prompt leave the browser?
How does the cached-input rate work?
Compare against every other model
To see this exact prompt scored against every supported model, sorted by total cost, paste it into the home calculator and toggle Compare across all models. GPT-5 numbers are exact; cross-vendor comparison against Claude and Gemini lands within ±2–3%.
Related models
The natural comparison set: GPT-5 Mini (the cheaper sibling that handles the routine 80% of traffic), GPT-4.1 (when context length matters more than reasoning), and Claude 4.5 Sonnet (cross-vendor mid-range with a similar cost profile).