What is a token in the context of LLMs?

A token is the basic unit of text that language models process. Tokens are roughly 3–4 characters of English text on average — so 100 tokens is approximately 75 words. Punctuation, symbols, and non-Latin characters can each cost multiple tokens.

Why do token counts differ between models?

Each model family uses its own tokenizer (BPE vocabulary). GPT-4o and GPT-3.5 share the cl100k_base tokenizer. Claude's tokenizer is slightly less efficient for English on average. Llama 3 and Gemini 1.5 use their own vocabularies with different efficiency characteristics.

Can I upload PDF and Word files?

Yes — the tool extracts text from .pdf files using PDF.js and from .docx files using Mammoth. Code files, JSON, CSV, Markdown, and plain text are supported natively.

Is my file uploaded to a server?

No — all processing happens entirely in your browser. Your file never leaves your device.

How accurate is the token count?

The tool uses a BPE-approximation algorithm that matches the actual GPT-4 tokenizer within roughly ±5% for typical English prose. For exact counts on production workloads, use the official tokenizer libraries (tiktoken for OpenAI, etc.).

— tool guide

LLM Token Counter

Upload any file — .txt, .pdf, .docx, .json, .py, .md and more — and instantly see how many tokens it uses across GPT-4o, Claude 3.5, Llama 3, and Gemini 1.5. Includes context window usage bars and estimated API costs.

Open LLM Token Counter → free, no sign-in

— try it live open full screen ↗

— what it is

Every call to a language model API is billed in tokens, not characters or words. Before you send a large document to GPT-4o, Claude, or Gemini, it's useful to know exactly how many tokens it contains — so you can estimate the cost, check whether it fits within the model's context window, and compare how different models would handle the same file.

The LLM Token Counter lets you drop any file and see the token breakdown across the major model families instantly. It extracts text from PDFs and Word documents, runs an accurate BPE-approximation tokenizer in your browser, and shows per-model counts alongside context window usage bars and input cost estimates at current May 2025 pricing.

— key features

✓Supports .txt, .md, .json, .csv, .py, .js, .html, .pdf, .docx, and more
✓Token counts for GPT-4o, GPT-4o-mini, GPT-3.5 Turbo, Claude 3.5 Sonnet, Claude 3 Haiku, Llama 3 70B, Gemini 1.5 Pro, and Gemini 1.5 Flash
✓Context window usage bars — instantly see what percentage of each model's window your file fills
✓Estimated API input cost at current pricing (May 2025)
✓PDF text extraction via PDF.js, Word document extraction via Mammoth
✓Copy full report to clipboard — model, token count, context %, and cost in one go
✓Runs entirely in your browser — no file is uploaded to any server

— who it's for

Developers building LLM-powered features who need to know whether a document fits in a context window before deciding on chunking strategy, or whether a prompt template has grown too expensive.

AI researchers and prompt engineers comparing how efficiently different models tokenize the same corpus — useful when switching providers or optimizing for cost.

Anyone curious about API costs before submitting a large document to a paid LLM endpoint. Drop the file, see the cost estimate, decide whether to trim it first.

— how it works

Drop or select a file. The tool extracts raw text — reading code and plain-text files directly, using PDF.js for PDFs, and Mammoth for .docx files. The extracted text is then run through an in-browser BPE tokenization approximation that matches GPT-4's cl100k_base tokenizer within roughly ±5% for English prose.

Per-model multipliers adjust the base count: Claude's tokenizer is slightly less efficient (+5%), Llama 3 even more so (+10%), while Gemini 1.5 is marginally more efficient (-5%). Each model family gets its own card showing the token count, a colour-coded context window bar (green under 50%, yellow 50–80%, red over 80%), and the estimated input cost at current pricing.

The "Copy report" button copies a plain-text summary covering all models — convenient for pasting into tickets, documentation, or cost spreadsheets.

— models covered

Model	Context	$/1M tokens
GPT-4o	128K	$2.50
GPT-4o-mini	128K	$0.15
GPT-3.5 Turbo	16K	$0.50
Claude 3.5 Sonnet	200K	$3.00
Claude 3 Haiku	200K	$0.25
Llama 3 70B	8K	$0.59
Gemini 1.5 Pro	1M	$1.25
Gemini 1.5 Flash	1M	$0.075

— faq

What is a token in the context of LLMs?

A token is the basic unit of text that language models process. Tokens are roughly 3–4 characters of English text — so 100 tokens ≈ 75 words. Punctuation, symbols, and non-Latin characters can each cost multiple tokens.
Why do token counts differ between models?

Each model family uses a different BPE vocabulary. GPT-4o and GPT-3.5 share the cl100k_base tokenizer. Claude, Llama 3, and Gemini 1.5 each have their own vocabularies with slightly different efficiencies for English text.
Can I upload PDF and Word files?

Yes — .pdf files are processed with PDF.js and .docx files with Mammoth, both running entirely in-browser. Code, JSON, CSV, Markdown, and plain text are handled natively.
Is my file uploaded anywhere?

No — everything runs locally in your browser. Your file never leaves your device.
How accurate are the token counts?

The BPE-approximation matches the GPT-4 tokenizer within ±5% for typical English prose. For exact production counts, use tiktoken (OpenAI) or the official tokenizer libraries for each model.

— related tools

Word Counter
Count words, characters, sentences, and reading time

open →
JSON Formatter
Format, validate, and minify JSON — useful before tokenizing API payloads

open →
Markdown to HTML Converter
Convert .md files to HTML — then count tokens on the output

open →

Drop your file and see the token breakdown across all major LLM providers in seconds.

Open LLM Token Counter → Build your own tool with AI