Halfwidth and fullwidth forms

In CJK (Chinese, Japanese and Korean) computing, graphic characters are traditionally classed into fullwidth (in Taiwan and Hong Kong: 全形; in CJK and Japanese: 全角) and halfwidth (in Taiwan and Hong Kong: åŠå½¢; in CJK and Japanese: åŠè§’) characters. With fixed-width fonts, a halfwidth character occupies half the width of a fullwidth character, hence the name.
In the days of computer terminals and text mode computing, characters were normally laid out in a grid, often 80 columns by 24 or 25 lines. Each character was displayed as a small dot matrix, often about 8 pixels wide, and an SBCS (single byte character set) was generally used to encode characters of western languages.
For a number of practical and aesthetic reasons, Han characters would need to be twice as wide as these fixed-width SBCS characters. These "fullwidth characters" were typically encoded in a DBCS (double byte character set), although less common systems used other variable-width character sets that used more bytes per character.
Halfwidth and Fullwidth Forms is also the name of a Unicode block U+FF00–FFEF.
In Unicode
Halfwidth and Fullwidth Forms | |
---|---|
Range |
U+FF00..U+FFEF (240 code points) |
Plane | BMP |
Scripts |
Hangul (52 char.) Katakana (55 char.) Latin (52 char.) Common (66 char.) |
Symbol sets | Variant width characters |
Assigned | 225 code points |
Unused | 15 reserved code points |
Unicode version history | |
1.0.0 | 216 (+216) |
1.1 | 223 (+7) |
3.2 | 225 (+2) |
Note: [1][2] |
In Unicode, if a certain grapheme can be represented as either a fullwidth character or a halfwidth character, it is said to have both a fullwidth form and a halfwidth form.
Halfwidth and Fullwidth Forms is the name of Unicode block U+FF00–FFEF, the last of the Basic Multilingual Plane excepting the short Specials block at U+FFF0–FFFF.
Range U+FF01–FF5E reproduces the characters of ASCII 21 to 7E as fullwidth forms, that is, a fixed width form used in CJK computing. This is useful for typesetting Latin characters in a CJK environment. U+FF00 does not correspond to a fullwidth ASCII 20 (space character), since that role is already fulfilled by U+3000 "ideographic space."
Range U+FF65–FFDC encodes halfwidth forms of Katakana and Hangul characters – see half-width kana. Range U+FFE0–FFEE includes fullwidth and halfwidth symbols.
Halfwidth and Fullwidth Forms[1][2] Official Unicode Consortium code chart (PDF) | ||||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
U+FF0x | ï¼ | " | # | $ | ï¼… | & | ' | ( | ) | * | + | , | ï¼ | . | ï¼ | |
U+FF1x | ï¼ | 1 | ï¼’ | 3 | ï¼” | 5 | ï¼– | ï¼— | 8 | ï¼™ | : | ï¼› | < | ï¼ | > | ? |
U+FF2x | ï¼ | A | ï¼¢ | ï¼£ | D | ï¼¥ | F | G | H | I | J | K | L | ï¼ | ï¼® | O |
U+FF3x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
U+FF4x | ï½€ | ï½ | b | c | d | ï½… | f | g | h | i | j | k | l | ï½ | n | ï½ |
U+FF5x | ï½ | q | ï½’ | s | ï½” | u | ï½– | ï½— | x | ï½™ | z | ï½› | | | ï½ | ~ | ⦅ |
U+FF6x | ï½ | 。 | ï½¢ | ï½£ | 、 | ï½¥ | ヲ | ァ | ィ | ゥ | ェ | ォ | ャ | ï½ | ï½® | ッ |
U+FF7x | ー | ア | イ | ウ | エ | オ | カ | キ | ク | ケ | コ | サ | シ | ス | セ | ソ |
U+FF8x | ï¾€ | ï¾ | ツ | テ | ト | ï¾… | ニ | ヌ | ネ | ノ | ハ | ヒ | フ | ï¾ | ホ | ï¾ |
U+FF9x | ï¾ | ム | ï¾’ | モ | ï¾” | ユ | ï¾– | ï¾— | リ | ï¾™ | レ | ï¾› | ワ | ï¾ | ゙ | ゚ |
U+FFAx | HW HF |
ᄀ | ï¾¢ | ï¾£ | ᄂ | ï¾¥ | ᆭ | ᄃ | ᄄ | ᄅ | ᆰ | ᆱ | ᆲ | ï¾ | ï¾® | ᆵ |
U+FFBx | ᄚ | ᄆ | ᄇ | ᄈ | ᄡ | ᄉ | ᄊ | ᄋ | ᄌ | ᄍ | ᄎ | ᄏ | ᄐ | ᄑ | ᄒ | |
U+FFCx | ï¿‚ | ᅢ | ï¿„ | ï¿… | ᅥ | ᅦ | ï¿Š | ï¿‹ | ï¿Œ | ï¿ | ï¿Ž | ï¿ | ||||
U+FFDx | ᅭ | ᅮ | ᅯ | ᅰ | ᅱ | ᅲ | ᅳ | ᅴ | ᅵ | |||||||
U+FFEx | ï¿ | ï¿¡ | ï¿¢ | ï¿£ | ¦ | ï¿¥ | ₩ | │ | ï¿© | ↑ | ï¿« | ↓ | ï¿ | ï¿® | ||
Notes |
See also
- CJK
- Han unification
- Half-width kana
- Monospaced font
- East Asian punctuation
- Em size - full width forms
References
- ↑ "Unicode character database". The Unicode Standard. Retrieved 22 March 2013.
- ↑ The Unicode Standard Version 1.0, Volume 1. Addison-Wesley Publishing Company, Inc. 1991 [1990]. ISBN 0-201-56788-1.
External links
- Halfwidth and Fullwidth Forms at Alan Wood's Unicode Resources