Andrew Cunningham
MARC-8 and UTF-8 are the valid encodings for MARC21 records.
Tools and libraries that use or generate MARC21 data often assume all data is either MARC-8 or UTF-8.
ệ – ệ 1EC7
ệ – ẹ◌̂ 1EB9 0302
ệ – ê◌̣ 00EA 0323
ệ – e◌̂◌̣ 0065 0302 0323
ệ – e◌̣◌̂ 0065 0323 0302
All forms are canonically equivalent.
ử – ử 1EED
ử – ư◌̉ 01B0 0309
ử – ủ◌̛ 1EE7 031B
ử – u◌̛◌̉ 0075 031B 0309
ử – u◌̉◌̛ 0075 0309 031B
ử 1EED (NFC)
u◌̛◌̉ 0075 031B 0309 (NFD)
ư◌̉ 01B0 0309 (MARC-21)
Character | MARC21 | NFD |
---|---|---|
Ơ | U+01A0 | U+004F U+031B |
ơ | U+01A1 | U+008F U+031B |
Ư | U+01AF | U+0055 U+031B |
ư | U+01B0 | U+0075 U+031B |
Languages of Vietnam, and Thai and Lao romanisation
Character | MARC21 | NFD |
---|---|---|
Ё | U+0401 | U+0415 U+0308 |
ё | U+0451 | U+0435 U+0308 |
Ѓ | U+0403 | U+0413 U+0301 |
ѓ | U+0453 | U+0433 U+0301 |
Ї | U+0407 | U+0406 U+0308 |
ї | U+0457 | U+0456 U+0308 |
Character | MARC21 | NFD |
---|---|---|
Ќ | U+040C | U+041A U+0301 |
ќ | U+045C | U+043A U+0301 |
Ў | U+040E | U+0423 U+0306 |
ў | U+045E | U+0443 U+0306 |
Й | U+0419 | U+0418 U+0306 |
й | U+0439 | U+0438 U+0306 |
Character | MARC21 | NFD |
---|---|---|
آ | U+0622 | U+0627 U+0653 |
أ | U+0623 | U+0627 U+065 |
ؤ | U+0624 | U+0648 U+0654 |
إ | U+0625 | U+0627 U+0655 |
ئ | U+0626 | U+064A U+0654 |
Adlam, Arabic, Garay, Hanifi Rohingya, Hebrew, Mandaic, Mende Kikakui, N'Ko, Samaritan, Syriac, Thaana, and Yezidi.
Indus script, Egyptian hieroglyphs, Cypriot syllabary, Phoenician alphabet, Imperial Aramaic, Old South Arabian, Old North Arabian, Pahlavi, Avestan, Hatran, Sogdian/Manichaean, Nabatean, Old Ge'ez, Kharosthi, Old Turkic runes (Orkhon runes), Old Hungarian runes, Old Italic alphabets (Early Etruscan), Lydian alphabet (RTL, LTR, & boustrophedon)
The dir
global attribute indicates the directionality of the element's text. It can have the following values:
<html lang="aii-Syrn" dir="rtl">
⋮
</html>
In the first instance, rely on the Unicode Bidirectional Algorithm (UCA)
When the default bidi rendering isn't enough, use directional formatting characters
(function() {
'use strict';
document.body.style.fontFamily = "'Bibliotheca LCG', 'Noto Sans'";
document.querySelectorAll(".vernacular").forEach(vern => {
vern.setAttribute("dir", "auto")
});
/* Arabic - ar */
document.querySelectorAll(".vernacular:lang(ar)").forEach(vern => {
vern.setAttribute("dir", "rtl");
vern.style.fontFamily = "'Scheherazade New', Amiri";
vern.style.textAlign = "right";
});
/* Persian - fa */
document.querySelectorAll(".vernacular:lang(fa)").forEach(vern => {
vern.setAttribute("dir", "rtl");
vern.style.fontFamily = "'Scheherazade New', Amiri";
vern.style.textAlign = "right";
});
/* Russian - ru */
document.querySelectorAll(".vernacular:lang(ru)").forEach(vern => {
vern.setAttribute("dir", "ltr");
vern.parentElement.innerHTML = vern.parentElement.innerHTML.replaceAll("i︠a︡", "i͡a").replaceAll("i︠u︡", "i͡u").replaceAll("t︠s︡", "t͡s");
});
})();
language-extlang-script-region-variant-extension-privateuse
Transliteration versus transcription
Ελληνική Δημοκρατία | |
---|---|
Ellēnikḗ Dēmokratía | elinikí ðimokratía (IPA) |
Hellēnikē Dēmokratia
ALA-LC Romanization isn't always one to one.
ໄຊ
ໄຊ້
ໄສ
ໄ໊ຊ
ໄສ່
ໄສ້
ໄ໋ຊ
⇒ sai
ໄ ⇒ ai ; ຊ ⇒ s ; ສ ⇒ s ; tones ⇒ ∅
There are too many characters in Unicode for a single font. Require flexible approach to web typography.