homechevron_rightProfessionalchevron_rightComputers

Unicode scripts and blocks

The calculators for counting number of characters per different Unicode blocks and Unicode scripts for a given text.

The calculator below groups the characters of the input text into Unicode blocks and counts the number of characters belonging to one or another block.

PLANETCALC, Unicode blocks

Unicode blocks

Digits after the decimal point: 2
The file is very large. Browser slowdown may occur during loading and creation.
The file is very large. Browser slowdown may occur during loading and creation.

Unicode blocks

There are 17 planes in Unicode code space, each plane has 216 or 65536 continuous code points.
A plane may contain one or more Unicode blocks. A Unicode block size is greater or equal than 16 and less or equal 65536.A Unicode block as well as a Unicode plane is a contiguous group of characters within a unique range of code points. Each block has its own unique name. The complete list of Unicode blocks can be found here http://www.unicode.org/Public/UNIDATA/Blocks.txt.

Unicode scripts

A Unicode script is a collection of letters and other written signs that share a common graphological style and history. The collection is used (in full, or as a subset) to represent textual information in a writing system for one or more languages.

Blocks and scripts relation

Despite the fact that the name of the blocks often corresponds to some script, not all block characters belong to this script. Moreover one block may contain characters for several scripts, for example, 0370..03FF Greek and Coptic. And the characters of one script can be scattered over several blocks.

The following calculator counts the number of characters belonging to Unicode scripts.

PLANETCALC, Unicode scripts

Unicode scripts

Digits after the decimal point: 2
The file is very large. Browser slowdown may occur during loading and creation.
The file is very large. Browser slowdown may occur during loading and creation.



So, single script characters may occupy inconsecutive code points in Unicode code space.
For example, Cyrillic characters, used for Russian and other Slavic languages occupy the following code point ranges :
0400..0484, 0487..052F, 1C80..1C88, 1D2B, 1D78, 2DE0..2DFF, A640..A69F, FE2E..FE2F.

These characters are spread among 7 Unicode blocks:
0400..04FF Cyrillic
0500..052F Cyrillic Supplement
1C80..1C8F Cyrillic Extended-C
1D00..1D7F Phonetic Extensions
2DE0..2DFF Cyrillic Extended-A
A640..A69F Cyrillic Extended-B
FE20..FE2F Combining Half Marks

All other script code point ranges can be found here http://www.unicode.org/Public/UNIDATA/Scripts.txt.

URL copied to clipboard
Creative Commons Attribution/Share-Alike License 3.0 (Unported) PLANETCALC, Unicode scripts and blocks

Comments