Tai Languages

Unicode support for Tai languages of Assam

This project is maintained by andjc

Tai Aiton Unicode notes

The Tai Aiton language is spoken in parts of Assam, India. The writing system used by Tai Aiton, Tai Phake and Khmati has been unified with the Myanmar script, although each language has its own typographic and calligraphic traditions.

Consonants

Glyph က
Codepoint 1000 1075 1004 AA61 AA6C 107A 1010 1011 AA6B 1012
Glyph
Codepoint 1015 1078 1019 1017 101A AA7A 101C 101D AA6D 1022

U+AA6B is used to represent the phonemes /n/ and /d/. In modern usage, some Tai Aiton have started using the Burmese character U+1012 to represent /d/.

U+1019 is used to represent the phonemes /m/ and /b/. In modern usage, the Burmese character U+1017 can be observed representing /b/.

Medial consonants

Glyph –ျ –ြ –ၞ
Codepoint 103B 103C 105E

Subjoined consonants

Glyph –္က –္ꩬ –္တ –္ထ –္ပ –္ယ –္လ
Codepoint 1039 1000 1039 AA6C 1039 1010 1039 1011 1039 1015 1039 101A 1039 101C

Final consonants

The sign –်  (sat) is used to mark a final consonant. In the modern orthography the use of sat is obligatory, but most likely will be absent in older manuscripts.

Glyph က် င် ꩡ် ၺ် တ် ꩫ် ပ် မ် ဝ်
Codepoint 1000 103A 1004 103A AA61 103A 107A 103A 1010 103A AA6B 103A 1015 103A 1019 103A 101D 103A

Vowels

Vowels in open syllables

Glyph –ႃ –ႜ –ီ –ူ –ေ –ေႃ –ုဝ် –ိုဝ်
Codepoint 1083 109C 102E 1030 1031 1031 1083 102F 101D 103A 102D 102F 101D 103A

–ာ is not included in the modern orthography, but can be observed in older manuscripts for both Tai words and Pali loan words.

Glyph –ာ
Codepoint 102C

Vowels in closed syllables

These vowels are followed by a final consonant.

Glyph –ိ –ု –ွ –ို
Codepoint 102D 102F 103D 102D 102F

Tai Phake uses the combination <U+103D U+103F>. For completeness, support should be added for Tai Aiton.

Vowel-consonant combinations and diphthongs

Glyph –ံ –်ံ –ႝ –ွႝ –ွေ –ိုႜ –်ွ –်ၞ
Codepoint 1036 103A 1036 109D 103D 109D 103D 1031 102D 102F 109C 103A 103D 103A 105E

Ligatures

The following ligatures do not take diacritics, but are considered as words.

Glyph
Codepoint AA77 AA78 AA79

The following ligature also occurs:

Glyph –ွု
Codepoint 103D 102F

Reduplication

At least one Tai Khamti orthography uses a single character U+AA70 to mark reduplication. The reduplication character is functionally similar to the corresponding character in Thai U+0E46 THAI CHARACTER MAIMAYOK. It is a spacing character. it is known to ligate with certain other charcaters forming a combining mark to indiacte reduplications

We are uncertain at this point, what U+AA70 represents orthographically, so rather than using U+AA70 in Tai Aiton, we have based Aiton reducplication encoding on teh Tai Aiton orthography, rather than the established Tai Khamti model.

Tai Aiton indicates reduplication by doubling the one of four final characters. Tai Aiton uses ligatures for each of these four doubled final characters. Sat U+103A, the final vowel I U+102E, final AM U+1036 and final AI U+109D can all be doubled.

Glyph –်် –ီီ –ံံ –ႝႝ
Codepoint 103A 103A 102E 102E 1036 1036 109D 109D

Digits

Glyph
Codepoint 1041 1042 1043 1044 1045 1046 1047 1048 1049 1040

Punctuation

Glyph
Codepoint 104A 104B

These notes are based on Unicode Technical Note #11 (version 4), N3492, and The Tai languages of Assam – a grammar and texts by S. Morey.