Text picked up from the web, Word documents, PDFs, spreadsheets, or API responses often carries invisible baggage: non-breaking spaces that look like regular spaces, zero-width joiners, curly quotes that break JSON parsing, BOM characters at the start of files, or directional marks that scramble text rendering. The Special Character Identifier makes every one of those characters visible and identifiable.
How it works
Paste any block of text into the tool and it renders each character individually, colour-coded by category:
- Normal printable characters — shown as-is
- Whitespace — spaces, tabs, non-breaking spaces, zero-width spaces — highlighted so they're visible even though they have no visible glyph
- Control characters — BOM, soft hyphen, directional marks, etc. — shown with a placeholder and labelled by name
- Extended Unicode — emoji, diacritics, special punctuation — shown with their official Unicode name on hover
Click any character to get its full details: code point (U+00A0), Unicode name (NO-BREAK SPACE), HTML entity ( ), UTF-8 byte sequence, and JavaScript escape ( ).
The most common culprits
Non-breaking space (U+00A0)
Looks exactly like a regular space (U+0020) but behaves differently — it won't break across a line, and in many contexts it's treated as a non-whitespace character, causing trim() calls and regex patterns to miss it. Extremely common in text copied from web pages, where HTML is used for layout.
Zero-width space (U+200B) and zero-width non-joiner (U+200C)
Completely invisible. Zero width. Show up in text copied from certain CMSes, messaging apps, and right-to-left text editors. Can cause string comparisons to fail and search to miss matches even though the text looks identical.
Curly quotes (U+2018, U+2019, U+201C, U+201D)
Word processors auto-correct straight quotes to curly (typographic) quotes. These look fine in prose but break JSON, YAML, SQL, shell commands, and any context expecting ASCII quote characters. They're especially sneaky because many fonts render them at essentially the same visual weight as straight quotes.
Byte Order Mark (U+FEFF)
Added by some text editors and Windows tools at the start of UTF-8 files. Invisible in most editors but causes issues when the file is read programmatically — suddenly your first field name has a hidden character prepended to it.
Soft hyphen (U+00AD)
A hint to the renderer that it may break the word here if needed. Invisible when not at a line break. Shows up in text copied from typeset documents and can cause unexpected string length counts and display anomalies.
Debugging scenarios where this tool saves time
- JSON parsing fails with an unexpected token error, but you can't see anything wrong when you look at the string
- A CSV field value doesn't match a regex even though it looks correct
- A string comparison returns false for two values that appear identical on screen
- A user reports that search is not finding their text when they copy-paste it from a document
- A SQL query fails with a syntax error pointing to whitespace
- An API is returning characters that render as boxes in your UI but you don't know which ones to filter
Invisible characters are one of the most time-consuming bugs to find by eye. A tool that renders them visibly turns a 20-minute investigation into a 30-second one.
Related: Unicode Character Map
If you know a character exists and want to find it deliberately — rather than identifying one that snuck into your text — the Unicode Character Map lets you search the entire Unicode database by name, block, or code point.
Identify every character in your text →
Paste in any text and instantly see what's hiding inside it.