Buckwalter transliteration
The Buckwalter Arabic transliteration was developed at Xerox by Tim Buckwalter in the 1990s. It is an ASCII only transliteration scheme, representing Arabic orthography strictly one-to-one, unlike the more common romanization schemes that add morphological information not expressed in Arabic script. Thus, for example, a wāw will be transliterated as w regardless of whether it is realized as a vowel /uː/ or a consonant /w/. Only when the wāw is modified by a hamzah (ؤ) does the transliteration change to &. The unmodified letters are straightforward to read (except for *=dhaal and E=ayin, v=thaa), but the transliterations of letters with diacritics and the harakat take some time to get used to, for example the nunated -un, -an, -in appear as N, F, K, and the sukūn ("no vowel") as o. Taʾ marbūṭah ة is p.
Since the original Buckwalter scheme was developed, several other variants have emerged, although they are not all standardized. Buckwalter transliteration is not compatible with XML, so "XML safe" versions often modify the following characters: < > & (أ إ and ؤ respectively; Buckwalter suggests transliterating them as I O W, respectively). Completely "safe" transliteration schemes replace all non-alphanumeric characters (such as $';*) with alphanumeric characters. For a complete description of different Buckwalter schemes as well as a more detailed discussion of the trade-offs between different schemes, see.[1]
When transliterating Arabic text, several other issues may arise. First, some Arabic characters are not specified in the transliteration table, including non-alphabetic characters such as ۞ and , punctuation such as ؛ ؟, and "Hindi" or "Eastern Arabic" numerals. Similarly, sometimes Arabic sentences will borrow non-Arabic letters from Persian, some of which are defined in the full Buckwalter table.[2] Symbols that are not defined in the transliteration table may be deleted, kept as non-Latin symbols embedded in transliterated text, or transliterated into different (non-conflicting) Latin symbols. (For instance, it is straightforward to convert from Hindi numerals to Arabic numerals.) Another issue that arises is how to handle transliterating Arabic text with embedded ASCII text; for instance, an Arabic sentence that refers to "IBM" or an Arabic sentence that includes a quote in English. If the Latin text is not explicitly marked, it is a challenge to distinguish transliterated Arabic from Latin. If transliterated text with embedded Latin is later transliterated back to Arabic, the Latin text will be transliterated into garbage Arabic. Finally, another important decision to make is how much normalization of the Arabic text should be done during transliteration. This may include removing ـ kashida, removing short vowels and/or other diacritics, and/or normalizing spelling.[1]
Buckwalter transliteration table
Arabic letters | ا | ب | ت | ث | ج | ح | خ | د | ذ | ر | ز | س | ش | ص | ض | ط | ظ | ع | غ | ف | ق | ك | ل | م | ن | ه | و | ي / ی[3] |
DIN 31635 | ʾ / ā | b | t | ṯ | ǧ | ḥ | ḫ | d | ḏ | r | z | s | š | ṣ | ḍ | ṭ | ẓ | ʿ | ġ | f | q | k | l | m | n | h | w / ū | y / ḫ |
Buckwalter | A | v | j | H | x | * | $ | S | D | T | Z | E | g | w | y | |||||||||||||
Qalam | ' / aa | th | kh | dh | sh | ` | gh | |||||||||||||||||||||
BATR | A / aa | c | K | z' | x | E | g | w / uu | y / ii | |||||||||||||||||||
IPA (MSA) | ʔ, aː | b | t | θ | dʒ ɡ ʒ |
ħ | x | d | ð | r | z | s | ʃ | sˤ | dˤ | tˤ | ðˤ zˤ |
ʕ | ɣ | f | q | k | l | m | n | h | w, uː | j, iː |
- hamza
- lone hamza: '
- hamza on alif: >
- hamza below alif: <
- hamza on wa: &
- hamza on ya: }
- alif
- madda on alif: |
- alif al-wasla: {
- dagger alif: `
- alif maqsura: Y
- harakat
- fatha: a
- damma: u
- kasra: i
- fathatayn: F
- dammatayn: N
- kasratayn K
- shadda: ~
- sukun: o
- ta marbouta: p
- tatwil: _
Notes
- 1 2 Habash, Nizar. Introduction to Arabic Natural Language Processing. Morgan & Claypool, 2010.
- ↑ Buckwalter, Tim. Buckwalter Arabic Transliteration Table.
- ↑ In Egypt, Sudan and sometimes other regions, the final form is always ی (without dots).
External links
- Information about the transliteration
- Transliteration on XRCE site
- Transliteration on Tim Buckwalter's site
- Extended Buckwalter Transliteration (for the Quran)