Certain Unicode characters are shared between Far East and non-Far East scripts requiring the calculation of font and language based on the Unicode character code and the chp.idctHint property.
Characters are classified into one of four groups, ASCII, Far East, floating, and non-Far East. Properties are calculated as follows:
Character type | Font (ftc) | Language (lid) |
ASCII | sprmCRgftc0 | sprmCRglid0 |
non-Far East | sprmCRgftc2 | sprmCRglid0 |
Far East | sprmCRgftc1 | sprmCRglid1 |
shared character | sprmCRgftc2 if chp.idctHint is 0 | sprmCRglid0 if chp.idctHint is 0 sprmCRglid1 if chp.idctHint is 1 |
The table below defines the classification of various ranges of Unicode characters:
Unicode subrange | character range | Classification |
usrBasicLatin | 0x20->0x7f | ASCII |
usrLatin1 | 0xa0->0xff | some shared (see notes below) |
usrLatinXA | 0x100->0x17f | some shared (see notes below) |
usrLatinXB | 0x180->0x24f | some shared (see notes below) |
usrIPAExtensions | 0x250->0x2af | some shared (see notes below) |
usrSpacingModLetters | 0x2b0->0x2ff | Shared |
usrCombDiacritical | 0x300->0x36f | Shared |
usrBasicGreek | 0x370->0x3cf | Shared |
usrGreekSymbolsCop | 0x3d0->0x3ff | non-Far East |
usrCyrillic | 0x400->0x4ff | Shared |
usrArmenian | 0x500->0x58f | non-Far East |
usrBasicHebrew | 0x5d0->0x5ff | non-Far East |
usrHebrewXA | 0x590->0x5cf | non-Far East |
usrBasicArabic | 0x600->0x652 | non-Far East |
usrArabicX | 0x653->0x6ff | non-Far East |
usrDevangari | 0x900->0x97f | non-Far East |
usrBengali | 0x980->0x9ff | non-Far East |
usrGurmukhi | 0xa00->0xa7f | non-Far East |
usrGujarati | 0xa80->0xaff | non-Far East |
usrOriya | 0xb00->0xb7f | non-Far East |
usrTamil | 0x0b80->0x0bff | non-Far East |
usrTelugu | 0x0c00->0x0c7f | non-Far East |
usrKannada | 0x0c80->0x0cff | non-Far East |
usrMalayalam | 0x0d00->0x0d7f | non-Far East |
usrThai | 0x0e00->0x0e7f | non-Far East |
usrLao | 0x0e80->0x0eff | non-Far East |
usrBasicGeorgian | 0x10d0->0x10ff | non-Far East |
usrGeorgianExtended | 0x10a0->0x10cf | non-Far East |
usrHangulJamo | 0x1100->0x11ff | non-Far East |
usrLatinExtendedAdd | 0x1e00->0x1eff | Shared |
usrGreekExtended | 0x1f00->0x1fff | non-Far East |
usrGeneralPunct | 0x2000->0x206f | Shared |
usrSuperAndSubscript | 0x2070->0x209f | Shared |
usrCurrencySymbols | 0x20a0->0x20cf | Shared |
usrCombDiacriticsS | 0x20d0->0x20ff | Shared |
usrLetterlikeSymbols | 0x2100->0x214f | Shared |
usrNumberForms | 0x2150->0x218f | Shared |
usrArrows | 0x2190->0x21ff | Shared |
usrMathematicalOps | 0x2200->0x22ff | Shared |
usrMiscTechnical | 0x2300->0x23ff | Shared |
usrControlPictures | 0x2400->0x243f | Shared |
usrOpticalCharRecog | 0x2440->0x245f | Shared |
usrEnclosedAlphanum | 0x2460->0x24ff | Shared |
usrBoxDrawing | 0x2500->0x257f | Shared |
usrBlockElements | 0x2580->0x259f | Shared |
usrGeometricShapes | 0x25a0->0x25ff | Shared |
usrMiscDingbats | 0x2600->0x26ff | Shared |
usrDingbats | 0x2700->0x27bf | Shared |
usrCJKSymAndPunct | 0x3000->0x303f | Far East |
usrHiragana | 0x3040->0x309f | Far East |
usrKatakana | 0x30a0->0x30ff | Far East |
usrBopomofo | 0x3100->0x312f | Far East |
usrHangulCompatJamo | 0x3130->0x318f | Far East |
usrCJKMisc | 0x3190->0x319f | Far East |
usrEnclosedCJKLtMnth | 0x3200->0x32ff | Far East |
usrCJKCompatibility | 0x3300->0x33ff | Far East |
usrHangul | 0xac00->0xd7a3 | Far East |
usrReserved1 | ||
usrReserved2 | ||
usrCJKUnifiedIdeo | 0x4e00->0x9fff | Far East |
usrPrivateUseArea | 0xe000->0xf8ff | Shared |
usrCJKCompatibilityIdeographs | 0xf900->0xfaff | Far East |
usrAlphaPresentationForms | 0xfb00->0xfb4f | Shared |
usrArabicPresentationFormsA | 0xfb50->0xfdff | Shared |
usrCombiningHalfMarks | 0xfe20->0xfe2f | Far East |
usrCJKCompatForms | 0xfe30->0xfe4f | Far East |
usrSmallFormVariants | 0xfe50->0xfe6f | Far East |
usrArabicPresentationFormsB | 0xfe70->0xfefe | Shared |
usrHFWidthForms | 0xff00->0xffef | Far East |
usrSpecials | 0xfff0->0xfffd | non-Far East |
The table below describes the behavior of the unicode subrange usrLatin1. Shared characters are marked in this table with a 1, while characters marked with a 0 are considered "non-Far East". All other characters in this unicode subrange are considered "non-Far East".
// 0 1 2 3 4 5 6 7 8 9 a b c d e f
0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, // 0x00a0-0x00af
1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, // 0x00b0-0x00bf
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x00c0-0x00cf
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, // 0x00d0-0x00df
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x00e0-0x00ef
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, // 0x00f0-0x00ff
};
The table below describes the behavior of the unicode range usrLatinXA. Shared characters are marked in this table with a 1, while characters marked with a 0 are considered "non-Far East". All other characters in this unicode subrange are considered "non-Far East".
// 0 1 2 3 4 5 6 7 8 9 a b c d e f
1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x0100-0x010f
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, // 0x0110-0x011f
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, // 0x0120-0x012f
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x0130-0x013f
0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, // 0x0140-0x014f
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x0150-0x015f
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, // 0x0160-0x016f
In usrLatinXB shared characters are 0x192, 0x1FA, 0x1FB, 0x1FC, 0x1FD, 0x1FE and 0x1FF. All other characters in this unicode subrange are considered "non-Far East".
In usrIPAExtensions shared characters are 0x251, and 0x261.
An optimization is available. If the Far East font chp.ftcFE is 0 and chp.idctHint is 0 and chp.ftcAscii is equal to chp.ftcOther, the font is chp.ftcAscii and the language is chp.lidDefault.