Skip to content

Unicode

Unicode text handling: East Asian ambiguous character width, wide character wrapping at line boundaries, and tab stop behavior with mixed-width text. Correct Unicode handling is essential for TUI applications to maintain proper cursor alignment and text layout across different scripts and character sets. Unicode width handling is arguably the hardest problem in terminal emulation. The wcwidth() function (from 1988) predates emoji entirely. Different terminals use different Unicode versions for width tables, and there's no standard for grapheme cluster width.

11 features in this category · Specification ↗

Terminal Unicode handling has three hard problems. Width calculation: is a character 1 or 2 columns wide? UAX #11 provides East_Asian_Width properties, but ambiguous-width characters (like certain Greek and Cyrillic symbols) vary between terminals. Grapheme clustering: a flag emoji like U+1F1F3 U+1F1F4 (two regional indicators) should display as one 2-column glyph, not two separate characters. And variation selectors: U+FE0E forces text presentation (1 column), U+FE0F forces emoji presentation (2 columns) — the same codepoint can be different widths depending on the following byte.

The most treacherous case is zero-width joiners (ZWJ, U+200D). A ZWJ sequence like woman + ZWJ + laptop should render as a single emoji glyph if the terminal's font supports it, but as three separate characters if it doesn't. The terminal must either trust the font's ligature tables or maintain its own ZWJ sequence database — and that database changes with every Unicode release.

For developers, the practical test is simple: does the cursor end up in the right place after printing a string? If a terminal calculates "hello" + flag_emoji as 7 columns but the font renders the flag as 2 columns, every subsequent character on that line will be offset. This breaks table alignment, progress bars, box drawing, and any TUI that relies on precise cursor positioning. The wcwidth() function and its many implementations are the battleground where these disagreements play out.

Analysis2026-04-06

Unicode defines 10 features in the terminfo.dev matrix. Average adoption across terminals: 77%. Full compliance (100%): Ghostty, iTerm2, vterm.js. Lowest: vt100.js at 40% (4/10).

Terminal Applications