Internationalization and localization
Part of a series on |
Translation |
---|
Types |
Theory |
Technologies |
Localization |
Institutional |
|
Related topics |
|
In computing, internationalization and localization are means of adapting computer software to different languages, regional differences and technical requirements of a target market (locale).[1] Internationalization is the process of designing a software application so that it can potentially be adapted to various languages and regions without engineering changes. Localization is the process of adapting internationalized software for a specific region or language by adding locale-specific components and translating text. Localization (which is potentially performed multiple times, for different locales) uses the infrastructure or flexibility provided by internationalization (which is ideally performed only once, or as an integral part of ongoing development).[2]
Naming
The terms are frequently abbreviated to the numeronyms i18n (where 18 stands for the number of letters between the first i and the last n in the word “internationalization,” a usage coined at DEC in the 1970s or 80s)[3][4] and L10n for “localization,” due to the length of the words.[5]
Some companies, like IBM and Sun Microsystems, use the term "globalization", g11n, for the combination of internationalization and localization.[6] Also known as "glocalization" (a portmanteau of globalization and localization).
Microsoft[7] defines Internationalization as a combination of World-Readiness and localization. World-Readiness is a developer task, which enables a product to be used with multiple scripts and cultures (globalization) and separating user interface resources in a localizable format (localizability, abbreviated to L12y).[8]
Hewlett-Packard and HP-UX created the system called NLS (National Language Support or Native Language Support) to produce localizable software.[1]
Scope
According to Software without frontiers, the design aspects to consider when internationalizing a product are "data encoding, data and documentation, software construction, hardware device support, user interaction"; while the key design areas to consider when making a fully internationalized product from scratch are "user interaction, algorithm design and data formats, software services, documentation".[1]
Translation is typically the most time-consuming component of language localization.[1] This may involve:
- For film, video, and audio, translation of spoken words or music lyrics, often using either dubbing or subtitles
- Text translation for printed materials, digital media (possibly including error messages and documentation)
- Potentially altering images and logos containing text to contain translations or generic icons[1]
- Different translation length and differences in character sizes (e.g. between Latin alphabet letters and Chinese characters) can cause layouts that work well in one language to work poorly in others.[1]
- Consideration of differences in dialect, register or variety[1]
- Writing conventions like:
- Formatting of numbers (especially decimal separator and digit grouping)
- Date and time format, possibly including use of different calendars
Standard locale data
Computer software can encounter differences above and beyond straightforward translation of words and phrases, because computer programs can generate content dynamically. These differences may need to be taken into account by the internationalization process in preparation for translation. Many of these differences are so regular that a conversion between languages can be easily automated. The Common Locale Data Repository by Unicode provides a collection of such differences. Its data is used by major OSs, including Microsoft Windows, OS X and Debian, and by major internet companies or projects such as Google and the Wikimedia Foundation. Examples of such differences include:
- Different "scripts" in different writing systems use different characters - a different set of letters, syllograms, logograms, or symbols. Modern system use the Unicode standard to represent many different languages with a single character encoding.
- Writing direction is left to right in most European languages (e.g. German), right-to-left in Hebrew and Arabic, and optionally vertical in some Asian languages.[1]
- Complex text layout, for languages where characters change shape depending on context.
- Capitalization exists in some scripts and not in others.
- Different languages and writing systems have different text sorting rules
- Different languages have different numeral systems, which might need to be supported if Western Arabic numerals are not used
- Different languages have different pluralization rules, which can complicate programs that dynamically display numerical content.[9] Other grammar rules might also vary, e.g. genitive.
- Different languages use different punctuation (e.g. quoting text using double-quotes (" "), as in English, or guillemets (« »), as in French).
- Keyboard shortcuts can only make use of buttons actually on the keyboard layout which is being localized for. If a shortcut corresponds to a word in a particular language (e.g. Ctrl-s stands for "save" in English), it may need to be changed.[10]
National conventions
Different countries have different economic conventions, including variations in:
- Paper sizes
- Broadcast television systems and popular storage media
- Telephone number format
- Postal address format, postal codes, and choice of delivery services
- Currency (symbols, positions of currency markers, and reasonable amounts due to different inflation history) - ISO 4217 codes are often used for internationalization
- System of measurement
- Battery sizes
- Voltage and current standards
In particular, the United States and Europe differ in most of these cases. Other areas often follow one of these.
Specific third-party services, such as online maps, weather reports, or payment service providers, might not be available worldwide from the same carriers, or at all.
Time zones vary across the world, and this must be taken into account if a product originally only interacted with people in a single time zone. For internationalization, UTC is often used internally and then converted into a local time zone for display purposes.
Different countries have different legal requirements, meaning for example:
- Regulatory compliance may require customization for a particular jurisdiction, or a change to the product as a whole, such as:
- Privacy law compliance
- Additional disclaimers on a web site or packaging
- Different consumer labelling requirements
- Compliance with export restrictions and regulations on encryption
- Compliance with an Internet censorship regime or subpoena procedures
- Requirements for accessibility
- Collecting different taxes, such as sales tax, value added tax, or customs duties
- Sensitivity to different political issues, like geographical naming disputes and disputed borders shown on maps (e.g. failing to show Kashmir as Indian is a crime in India)
- Government assigned numbers have different formats (such as passports, the Social Security number and other national identification numbers)
Localization also may take into account differences in culture, such as:
- Local holidays
- Personal name and title conventions
- Aesthetics
- Comprehensibility and cultural appropriateness of images and color symbolism
- Ethnicity, clothing, and socioeconomic status of people and architecture of locations pictured
- Local customs and conventions, such as social taboos, popular local religions, or superstitions such as blood types in Japanese culture vs. astrological sign in other cultures
Business process for internationalizing software
In order to internationalize a product, it is important to look at a variety of markets that your product will foreseeably enter.[1]
Details such as field length for street addresses, unique format for the address, ability to make the postal code field optional to address countries that do not have postal codes or the state field for countries that do not have states, plus the introduction of new registration flows that adhere to local laws are just some of the examples that make internationalization a complex project.[11][12]
A broader approach takes into account cultural factors regarding for example the adaptation of the business process logic or the inclusion of individual cultural (behavioral) aspects.[13][1]
Already in the 1990s, companies such as Bull used machine translation (Systran) in large scale, for all their translation activity: human translators handled pre-editing (making the input machine-readable) and post-editing.[1]
Engineering
Both in re-engineering an existing software or designing a new internationalized software, the first step of internationalization is to split each potentially locale-dependent part (whether code, text or data) into a separate module.[1] Each module can then either rely on a standard library/dependency or be independently replaced as needed for each locale.
The current prevailing practice is for applications to place text in resource strings which are loaded during program execution as needed.[1] These strings, stored in resource files, are relatively easy to translate. Programs are often built to reference resource libraries depending on the selected locale data.
The storage for translatable and translated strings is sometimes called a message catalog[1] as the strings are called messages. The catalog generally comprises a set of files in a specific localization format and a standard library to handle said format. One software library and format that aids this is gettext.
Thus to get an application to support multiple languages one would design the application to select the relevant language resource file at runtime. The code required to manage data entry verification and many other locale-sensitive data types also must support differing locale requirements. Modern development systems and operating systems include sophisticated libraries for international support of these types, see also #Standard locale data above.
Many localization issues (e.g. writing direction, text sorting) require more profound changes in the software than text translation. For example, OpenOffice.org achieves this with compilation switches.
Process
A globalization method includes, after planning, three implementation steps: internationalization, localization and quality assurance.[1]
To some degree (e.g. for Quality assurance), the development team needs to include someone who handles the basic/central stages of the process which then enable all the others.[1] Such persons should typically understand foreign languages and cultures and have some technical background. Specialized technical writers are required to construct a culturally appropriate syntax for potentially complicated concepts, coupled with engineering resources to deploy and test the localization elements.
Once properly internationalized, software can rely on more decentralized models for localization: free and open source software usually rely on self-localization by end-users and volunteers, sometimes organized in teams.[14] The KDE3 project, for example, has been translated into over 100 languages;[15] MediaWiki in 270 languages, of which 100 mostly complete as of 2016.[16]
While translating existing text to other languages may seem easy, it is more difficult to maintain the parallel versions of texts throughout the life of the product.[17] For instance, if a message displayed to the user is modified, all of the translated versions must be changed.
Commercial considerations
In a commercial setting, the benefit from localization is access to more markets.[1] In the early 1980s, Lotus 1-2-3 took 2 years to separate program code and text and lost the market lead in Europe over Microsoft Multiplan.[1]
However, there are considerable costs involved, which go far beyond just engineering. Further, business operations must adapt to manage the production, storage and distribution of multiple discrete localized products, which are often being sold in completely different currencies, regulatory environments and tax regimes.
Finally, sales, marketing and technical support must also facilitate their own operations in the new languages, in order to support customers for the localized products. Particularly for relatively small language populations, it may thus never be economically viable to offer a localized product. Even where large language populations could justify localization for a given product, and a product's internal structure already permits localization, a given software developer/publisher may lack the size and sophistication to manage the ancillary functions associated with operating in multiple locales. For example, Microsoft Windows 7 had 96 language packs available.[18]
See also
- Bidirectional script support
- Computer accessibility
- Computer russification, localization into Russian language
- Game localization
- Globalization
- Globalization Management System
- Glocalization
- Input method editor
- International Components for Unicode
- Language code
- Language industry
- Language localization
- Pseudolocalization, a software testing method for testing a software product's readiness for localization.
- Separation of concerns
- Translation
- Website localization
References
- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Patrick A.V. Hall, Martyn A. Ould, eds. (1996). Software Without Frontiers: A multi-platform, multi-cultural, multi-nation approach. With contributions and leadership by Ray Hudson, Costas Spyropoulos, Timo Honkela et al. Wiley. ISBN 9780471969747.
- ↑ Bert Esselink (2003). The Evolution of Localization (PDF). Guide to Localization. Multilingual Computing and Technology.
In a nutshell, localization revolves around combining language and technology to produce a product that can cross cultural and language barriers. No more, no less.
- ↑ "Glossary of W3C Jargon". World Wide Web Consortium. Retrieved 2008-10-13.
- ↑ "Origin of the Abbreviation I18n".
- ↑ "Localization vs. Internationalization". World Wide Web Consortium.
- ↑ IBM Globalization web site
- ↑ Microsoft "Globalization Step-by-Step" guide
- ↑ MSDN.microsoft.com
- ↑ GNU.org
- ↑ languagetranslationsservices.wordpress.com Archived April 3, 2015, at the Wayback Machine.
- ↑ Internationalizing a Product: What is Internationalization (i18n), Localization (L10n) and Globalization (g11n) Archived April 2, 2015, at the Wayback Machine.
- ↑ "International Address Formats". Microsoft Developer Network. Microsoft. Retrieved 10 December 2013.
- ↑ Pawlowski, J.M. (2008): Culture Profiles: Facilitating Global Learning and Knowledge Sharing. Proc. of ICCE 2008, Taiwan, Nov. 2008. Draft Version
- ↑ Reina, Laura Arjona; Robles, Gregorio; González-Barahona, Jesús M. (2013-06-25). Petrinja, Etiel; Succi, Giancarlo; Ioini, Nabil El; Sillitti, Alberto, eds. A Preliminary Analysis of Localization in Free Software: How Translations Are Performed. IFIP Advances in Information and Communication Technology. Springer Berlin Heidelberg. pp. 153–167. doi:10.1007/978-3-642-38928-3_11. ISBN 9783642389276.
- ↑ For the current list see KDE.org
- ↑ https://translatewiki.net/wiki/Translating:Group_statistics
- ↑ How to translate a game into 20 languages and avoid going to hell
- ↑ "List of Windows language packs". "List of Windows language packs".
- Notes
- .NET Internationalization: The Developer's Guide to Building Global Windows and Web Applications, Guy Smith-Ferrier, Addison-Wesley Professional, 7 August 2006. ISBN 0-321-34138-4
- A Practical Guide to Localization, Bert Esselink, John Benjamins Publishing, [2000]. ISBN 1-58811-006-0
- Lydia Ash: The Web Testing Companion: The Insider's Guide to Efficient and Effective Tests, Wiley, May 2, 2003. ISBN 0-471-43021-8
- Business Without Borders: A Strategic Guide to Global Marketing, Donald A. DePalma, Globa Vista Press [2004]. ISBN 978-0-9765169-0-3
External links
Look up internationalization or localization in Wiktionary, the free dictionary. |
Wikibooks has a book on the topic of: FOSS Localization |
- Internationalization with Qt - Internationalization within a popular C++ GUI framework
- Globalization and Localization Association (GALA)