Computer-assisted translation

Computer-assisted translation, computer-aided translation or CAT is a form of language translation in which a human translator uses computer software to support and facilitate the translation process.

Computer-assisted translation is sometimes called machine-assisted, or machine-aided, translation (not to be confused with machine translation).

Overview

The automatic machine translation systems available today are not able to produce high-quality translations unaided: their output must be edited by a human to correct errors and improve the quality of translation. Computer-assisted translation (CAT) incorporates that manual editing stage into the software, making translation an interactive process between human and computer.[1]

Some advanced computer-assisted translation solutions include controlled machine translation (MT). Higher priced MT modules generally provide a more complex set of tools available to the translator, which may include terminology management features and various other linguistic tools and utilities. Carefully customized user dictionaries based on correct terminology significantly improve the accuracy of MT, and as a result, aim at increasing the efficiency of the entire translation process.

Range of tools

Computer-assisted translation is a broad and imprecise term covering a range of tools, from the fairly simple to the complicated. These can include:

Concepts

Translation memory software

Translation memory programs store previously translated source texts and their equivalent target texts in a database and retrieve related segments during the translation of new texts.

Such programs split the source text into manageable units known as "segments". A source-text sentence or sentence-like unit (headings, titles or elements in a list) may be considered a segment, or texts may be segmented into larger units such as paragraphs or small ones, such as clauses. As the translator works through a document, the software displays each source segment in turn and provides a previous translation for re-use, if the program finds a matching source segment in its database. If it does not, the program allows the translator to enter a translation for the new segment. After the translation for a segment is completed, the program stores the new translation and moves on to the next segment. In the dominant paradigm, the translation memory, in principle, is a simple database of fields containing the source language segment, the translation of the segment, and other information such as segment creation date, last access, translator name, and so on. Another translation memory approach does not involve the creation of a database, relying on aligned reference documents instead.

Some translation memory programs function as standalone environments, while others function as an add-on or macro to commercially available word-processing or other business software programs. Add-on programs allow source documents from other formats, such as desktop publishing files, spreadsheets, or HTML code, to be handled using the TM program.

Language search-engine software

New to the translation industry, Language search-engine software is typically an Internet-based system that works similarly to Internet search engines. Rather than searching the Internet, however, a language search engine searches a large repository of Translation Memories to find previously translated sentence fragments, phrases, whole sentences, even complete paragraphs that match source document segments.

Language search engines are designed to leverage modern search technology to conduct searches based on the source words in context to ensure that the search results match the meaning of the source segments. Like traditional TM tools, the value of a language search engine rests heavily on the Translation Memory repository it searches against.

Terminology management software

Terminology management software provides the translator a means of automatically searching a given terminology database for terms appearing in a document, either by automatically displaying terms in the translation memory software interface window or through the use of hot keys to view the entry in the terminology database. Some programs have other hotkey combinations allowing the translator to add new terminology pairs to the terminology database on the fly during translation. Some of the more advanced systems enable translators to check, either interactively or in batch mode, if the correct source/target term combination has been used within and across the translation memory segments in a given project. Independent terminology management systems also exist that can provide workflow functionality, visual taxonomy, work as a type of term checker (similar to spell checker, terms that have not been used correctly are flagged) and can support other types of multilingual term facet classifications such as pictures, videos, or sound.[2]

Alignment software

Alignment programs take completed translations, divide both source and target texts into segments, and attempt to determine which segments belong together in order to build a translation memory or other reference resource with the content. Many alignment programs allow translators to manually realign mismatched segments. The resulting bitext alignment can then be imported into a translation memory program for future translations or used as a reference document.

Interactive machine translation

Interactive machine translation is a paradigm in which the automatic system attempts to predict the translation the human translator is going to produce by suggesting translation hypotheses. These hypotheses may either be the complete sentence, or the part of the sentence that is yet to be translated.

Crowd translation

Crowd-assisted translation refers to employing large numbers of bilingual human translators who collaborate via social media. When Facebook needed to translate a large body of existing English language text on its graphical user interfaces, the company made use of the voluntary help of its already-existing bilingual user base, organized by Yishan Wong.

Some notable CAT tools

The list below includes only some of the existent and available software. It is not exhaustive and is only intended to be taken as example, not as a complete reference. Several relevant tools are missing in the list.

Name Supported File Formats OS Language Widget tool License
Across Language Server[3] Microsoft Word (DOC, DOT, DOCX, and DOCM files), Microsoft Excel (XLS files, and XLSX and XLSM files), Microsoft PowerPoint (PPT and PPS files as well as PPTX, PPSX, and PPTM files), Rich Text Format1 (RTF files), text files (TXT files), TeX (TEX files), HTML, XHTML, XML, SGML, Adobe FrameMaker (in MIF format), Adobe InDesign (IDML files, in INX exchange format) Adobe InCopy (INCX format), BroadVision QuickSilver (ILDOC files), QuarkXPress (export to XTG format, re-import upon translation), Executable files (EXE files); Dynamic Link Library (DLL files), Resource Script files (RC, RC2, and DLL files), .NET RESX files (RESX files), .NET RESOURCE files (RESOURCE files), .NET Satellite Assemblies (RESOURCE.DLL files), Windows Installer (MSI files), configuration files (INI files), OLE Control, extensions (OCX files), Screen Saver (SCR files), Control Panel Extensions (CPL files), National Language Support (NLS files, Portable Object (PO files), MC files (Message Compiler), PROPERTIES files (Java Property), Android: (string XML files with HTML markup), iOS (text-based strings files (UTF-8 or UTF-16)), BlackBerry (ANSI-based text files in rrc format), XML Localization Interchange (XLIFF or XLF), Drawing Interchange Format (DXF files) Windows Proprietary
Déjà Vu[4] Microsoft Office (Word, Excel, Powerpoint, also embedded objects, and Access), Help Contents (CNT), FrameMaker (MIF), PageMaker, QuarkXPress, QuickSilver/Interleaf ASCII, Java Properties (.properties), HTML, HTML Help, XML, RC, C/Java/C++, IBM TM/2, Trados Workbench, Trados BIF (old TagEditor), Trados TagEditor, JavaScript, VBScript, ODBC, TMX, EBU, InDesign (TXT, ITD, INX, IDML), GNU GetText (PO/POT), OpenOffice, OpenDocument SDLX (ITD), ResX, XLIFF (XLF, XLIF, XLIFF, MQXLIFF, unsegmented and segmented SDLXLIFF), Visio (VDX), PDF, Transit NXT PPF, WordFast Pro TXML Windows Proprietary
GlobalSight Text ANSI / ASCII / Unicode for Windows, Text for Apple Macintosh, HTML, XML (ASP.NET, ASP, JSP, XSL), SGML, MS Word for Windows, MS Excel, MS PowerPoint, RTF, RC, Adobe FrameMaker, Adobe InDesign Java platform / Java Apache License 2.0
gtranslator PO POSIX CGTK+ GNU General Public License
Lokalize Gettext PO, Qt ts, XLIFF, TMX Cross-platform C++Qt GNU General Public License
MateCat doc, dot, docx, dotx, docm, dotm, pdf, xls, xlt, xlsm, xlsx, xltx, pot, pps, ppt, potm, potx, ppsm, ppsx, pptm,pptx, odp, ods, odt, sxw, sxc, sxl, txt, csv, xml, rtf, htm, html, xhtml, xml, xliff, sdlxliff, tmx, ttx, itd, xlf, mif, inx, idml, icml, xtg, tag, xml, dita, properties, rc, resx, sgml, sgm web based (supports Safari and Chrome) PHP GNU Lesser General Public License
Redokun Adobe InDesign (IDML files) Cross-platform Proprietary
memoQ[5] .MIF, InDesign formats (.INDD, .INX, .IDML), .XML, .DITA, .XML, .MM, .PO, .HTML, .HMT, .SHT, .properties, .DOC, .RTF, .BAK, .DOT, .DOCX, .XLS, .XML, .XLSX, .XLSM, .XLS, .XLT, .PPT, .PPS, .POT, .PPF, .PPTX, .PPSX, .POTX, .SLDX, .VDX, .HHC, .HHK, .ODT, .ODF, .TXT, .INF, .INI, .REG, .PDF, .SVG, .SDLPPX, .TTX, .SDLXLIFF, .TMX, .TXML, .RESX, .XLF, .XLIF, .XLIFF, XLIFF:doc Windows Proprietary
Memsource .doc, .docx, .dot, .dotx, .docm, .dotm, .rtf, .ppt, .pptx, .pot, .potx, .pptm, .potm, .xls, .xlsx, .xlt, .xltx, .xlsm, .xltm, .htm, .html, .idml (.indd), .icml, .mif (version 8 and above only), .svg, .ttx (pre-segmented), .sdlxliff, .xml, .xhtm, .xhtml, Android .xml, .xliff, .xliff for WordPress, tmx, .dita, .ditamap, .pdf, .catkeys, .csv, (Magento).csv, .dbk, .desktop, (Mozilla).DTD, .epub, (Joomla).ini, .json, .lang, .Plist, .po, .properties, (Java).properties, .resx, .srt, .strings, .sub, .ts, .txt, .wiki, .yaml, .zip Web, Windows, Mac OS X, Linux Proprietary
MetaTexis Microsoft Word, Excel and Powerpoint, all kinds of text formats, XML, HTML, XLIFF, RTF, TRADOS Studio (SDLXLIFF), TagEditor (TTX), POT/PO, Manual Maker, several further formats... Microsoft Office Word add-in Proprietary
OmegaT Plain text, HTML, XHTML, StarOffice, OpenOffice.org, OpenDocument (ODF), MS Office Open XML, Help & Manual, HTML Help Compiler (HCC), LaTeX, DokuWiki, QuarkXPress CopyFlow Gold, DocBook, Android Resource, Java Properties, Typo3 LocManager, Mozilla DTD, Windows RC, WiX, ResX, INI files, XLIFF, PO, SubRip Subtitles, SVG Images Java platform / Java GNU General Public License
Open Language Tools XLIFF, HTML/XHTML, XML, DocBook SGML, ASCII, StarOffice/OpenOffice/ODF, PO, .properties, .java (ResourceBundle), .msg/.tmsg (catgets) Java platform / Java CDDL
Poedit PO Cross-platform C++GTK+ MIT License
Pootle PO, XLIFF, OpenOffice GSI files (.sdf), TMX, TBX, Java Properties, DTD, CSV, HTML, XHTML, Plain Text Cross-platform PythonWeb GNU General Public License
SDL Trados Features four translation environments: dedicated TagEditor, MSWord Interface, SDLX, the integrated interface SDL Trados Studio 2014. Filters for translating with Trados Studio or TagEditor available: Word, Excel, PowerPoint, OpenOffice, InDesign, QuarkXPress, PageMaker, Interleaf, Framemaker, HTML, SGML, XML, SVG, Xliff, Legacy Trados files TTX, ITD, Word Bilingual, Wordfast, MemoQ .... Includes SDL MultiTerm for terminology management and Project Management Dashboard for automating tasks and tracking. Windows Proprietary
SmartCAT Text documents: DOCX, DOC, TXT, RTF. PowerPoint presentations: PPTX, PPSX, PPT, PPS. Excel spreadsheets: XLSX, XLS. Scanned documents and images: PDF, JPG, TIFF, BMP, PNG, GIF, DJVU and more. HTML pages: HTML, HTM. OpenOffice files: ODP, ODS. Resource files: RESX. Bilingual files: TTX. Industry-standard formats: SDLXLIFF, XLF, XLIFF, TMX. HTML pages: HTML, HTM. InDesign CS4 Markup: IDML. Cross-platform C#Web Proprietary
TM-database XLIFF, TMX,TBX,TXT, XLSX, XLS, XML, RESX, HTML, HTM, INX, UNI, JS, SMALI, PO, TS, IDML. Windows C++ Proprietary
Virtaal XLIFF, PO and MO, TMX, TBX, Wordfast TM, Qt ts
Many others via converters in the Translate Toolkit
Cross-platform PythonGTK+ GNU General Public License
Wordfast Classic Microsoft Word, Rich Text... Microsoft Office Word add-in (Windows / Mac) Proprietary
Wordfast Pro MS Word, Excel, PowerPoint (all versions), PDF, SGML, HTML, XML, InDesign, FrameMaker, tagged documents, XLIFF, etc. Java platform / Java Proprietary
Xliffie XLIFF Mac Objective-C MIT License
XTM[6] Microsoft Office (doc,docx,xls,xlsx, xlsx, ppt,pptx), Microsoft Visio (vdx), Open Office ( sxw, odt, ods, odp ), FrameMaker (MIF), Adobe InDesign ( idml), Adobe Illustrator (fxg, svg), Adobe Photoshop, Wordfast, PDF, Sdf, Svg, Yml, Yalm, iPhone apps ( strings ), Php, Xml, Xhtml, Xht, Xlf, Xliff, Sdlxliff, Trados (ttx) Java Properties (.properties), HTML, HTM, Rtf, Po/pot, DIta, Asp, Aspx, Txt, Tpl, Resx, Ini, Json Cross-platform Proprietary

According to a 2006 survey undertaken by Imperial College of 874 translation professionals from 54 countries, primary tool usage was reported as follows: Trados (35%), Wordfast (17%), Déjà Vu (16%), SDL Trados 2006 (15%), SDLX (4%), STAR Transit (3%), OmegaT (3%), others (7%).[7]

See also

References

External links

Wikibooks has a book on the topic of: CAT-Tools
This article is issued from Wikipedia - version of the Wednesday, April 06, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.