GPT-5 token & cost calculator

OpenAI GPT-5 is the flagship of the GPT-5 family — the model OpenAI positions for hardest-task reasoning, multi-step agentic flows, and the kinds of problems where a single percentage point of quality lift translates directly into product outcomes. The pricing reflects that positioning: at $1.25 input / $10 output per million tokens, GPT-5 is roughly comparable to Claude 4.5 Sonnet on input but ⅔ of its output rate, and meaningfully cheaper than Claude 4.7 Opus or GPT-4.1 on most realistic workloads.

What makes the budgeting math distinctive on GPT-5 is the 8× input/output ratio. Output is the cost driver to a degree most teams underestimate when they first ship — clamping max_tokens aggressively, validating structured responses, and routing routine work to GPT-5 Mini are the three controls that compound. The calculator below shows the exact token count for whatever you paste, and there really is no approximation involved: gpt-tokenizer ships OpenAI's canonical vocab, so the number you see is the number OpenAI will bill.

Expected response (output tokens)

Prompt

Client-side. Never uploaded.

0 / 1,000,000 charactersContext window: 400,000 tokens

Or start with an example

Total estimated cost

$0.010GPT-5

Tokensexact

Input cost

$0.00

Output cost (est.)

$0.010

@ 1,024 response tokens

Context used

of 400,000

Verified 2026-05-09 · exact

Saved scenariosnone yet

Saved on this browser only — never uploaded. Up to 10 scenarios.

Tip: save a scenario when you have a prompt + model + response length you might revisit. Useful for sizing features before committing to a vendor.

Verify privacysince this page loaded — updates live

Prompt uploads0Always 0 — by design

Outgoing requests0Analytics + page assets only — no prompt content

Cookies on this origin0Vercel Analytics + Clarity may set first-party cookies

localStorage keys0Theme preference + saved scenarios live here

Server endpoints1/api/og only — accepts title + subtitle, never prompt text

Inspect

Open DevTools → Network. Type into the calculator. No request bodies should contain your prompt text.

Pricing

GPT-5 is flat-priced — no tiered surcharge above a context threshold. The cached-input discount is not modeled in the calculator; expect 30–50% savings on the input side once you wire up prompt caching in production.

Tier	Input $/M	Output $/M
All input	$1.25	$10
Context window	400,000 tokens

Verified against openai.com on 2026-05-09.

Worked examples

These three scenarios sit at typical chat / system-prompt / long-doc-Q&A sizes. The dollar figures are exact because the tokenizer is exact — no ±2% caveat applies for OpenAI models.

Scenario	Input	Output	Cost
Short chat turn A typical Q&A turn with a small system prompt.	800	400	<$0.01
System prompt + tool spec A larger context window with a tool schema, single response.	5,000	500	$0.011
Long document Q&A A long-form input (e.g. transcript) with a structured response.	50,000	1,500	$0.077

The instinct that pays off: route by request type, not by team affinity. GPT-5 is the default reach-for in the GPT-5 family, but if a particular request type fails reliably on GPT-5 Mini and your latency budget is tight, GPT-5 isn't always the answer — sometimes the right move is to fix the prompt or change the schema. Cost-aware routing layers tend to use Mini for ≥80% of traffic in production deployments.

How is this counted?

We tokenize via gpt-tokenizer's o200k_base encoding — the same vocab GPT-5, GPT-4.1, and the o-series all use. Because the tokenizer is canonical, calibration factor is 1.0 and the result is exact. Inputs over 50,000 characters tokenize in a Web Worker so the page stays responsive for very long prompts. The "approx" pill that appears on Claude and Gemini calculators is suppressed here.

FAQ

Is the token count exact?

Yes. Unlike Claude and Gemini, OpenAI publishes the canonical tokenizer (tiktoken). The MIT package gpt-tokenizer ships the same vocab, so the number you see here matches what OpenAI will bill you for that input — no approximation, no calibration.

How does GPT-5 compare to GPT-5 Mini?

GPT-5 sits at the top of the GPT-5 family for hardest-task quality; Mini is roughly 5× cheaper on input and Nano is roughly 25× cheaper. The right call depends on your eval set — most production workloads should run cheaper requests on Mini and route only the genuinely-hard ones to GPT-5.

What is the context window?

GPT-5 supports a 400,000-token context window — larger than the Claude 4.x family (200k) but smaller than Gemini 2.5 (1M) or GPT-4.1 (1M). For long-document workloads where context length is the binding constraint, GPT-4.1 or Gemini 2.5 Pro may be a better fit even if quality on shorter prompts favors GPT-5.

Does my prompt leave the browser?

No. Tokenization runs in JavaScript on the page (or in a Web Worker for inputs over 50,000 characters). There is no server endpoint that ever receives prompt text. The only serverless function on the site is /api/og for social preview images.

How does the cached-input rate work?

OpenAI charges a discounted rate for input tokens that match a cached prefix from a recent request. The calculator above does not currently model this — for production deployments where the same system prompt is reused thousands of times, your effective cost will be lower than the headline number, often by 50%+.

Compare against every other model

To see this exact prompt scored against every supported model, sorted by total cost, paste it into the home calculator and toggle Compare across all models. GPT-5 numbers are exact; cross-vendor comparison against Claude and Gemini lands within ±2–3%.

Related models

The natural comparison set: GPT-5 Mini (the cheaper sibling that handles the routine 80% of traffic), GPT-4.1 (when context length matters more than reasoning), and Claude 4.5 Sonnet (cross-vendor mid-range with a similar cost profile).