Computer Assisted Translations and OmegaT: Quick Start Guide

How to translate the document in Word, and not to bathe with formatting? How to avoid translating the same over and over agaginHow to keep consistency? How to avoid purchasing expensive software? How to work effectively and quickly?

If you are familiar with Trados, MemoQ or CrowdIn, skip it to the installation instructions. If this is a new word for you — welcome to the wonderful world of Computer Aided Translation.

About computer aided translation

Google Translatemachine translation, the computer translates for you. CAT is a principle of operation when the computer only helps by automating routine processes. CAT tools divide source text into segments lines, sentences or paragraphs. Humans translate segments one by one, and the translation is stored in a special database — translation memory (TM). If the translator meets a similar segment, the program will show a hint or a possible translation. And the program can handle identical segments automatically. CATs are especially good when translating manuals, legal documents, and program interfaces anywhere similar wording widely occurs. In fiction, translation assistance will not be so obvious, but it can still help with memorizing names and places. The more texts on related subjects you translate, the more translations accumulate in the database, and the more suggestions you will see. Over the years, you can accumulate such a vast database that the new document will be half-ready before you even start. When the translation is finished, the program creates a document identical to the original, retaining structure and formatting, but replacing the original text with your translation. CAT tools do not alter the original document, so irreversibly messing up the document impossible. The output will be a fully translated file.

What kind of CATs are there?

All kinds. Trados, MemoQ — expensive corporate complexes, installed on your computer. CrowdIn, Tolmach and others working directly in the browser. As a rule, everything costs money, or there are restrictions on the volume of projects. But not all that bad: I have been using the same system for almost eight years, it is called OmegaT, free open-source software that works on Windows, Mac and Linux systems and is constantly improved by the community. And I’m going to tell you about it.

What can OmegaT do?

OmegaT
www.omegat.org
Freeware (GPLv3), open source
Windows, macOS, Linux

It can do all that is described in the first Chapter. It helps translator in the work, and various other little things.

File Formats

  • Microsoft Word, Excel, PowerPoint (only new .xlsx .docx and *.pptx, the old one must first convert)
  • OpenOffice .ods .odt and other
  • Text files .txt .rtf
  • Text files structure of key=value (*.ini and such)
  • HTML
  • Files with an XML structure (you can configure yourself)
  • And many others.

Languages Any. Almost everything there is in Unicode. For rare languages, you may need to adjust segmentation rules, but everything can be solved.


I’m not going to retell OmegaT manual. It is full and meaningful, and to see her is very important. Will continue only basic operations with the program that will help you get started.


Installation

Download the distribution from omegat.org. I will use the English version of 4.1.1 branches Latest for Windows. Required JAVA to run. If you are not sure whether you have it, download the version labelled JRE. Don’t be put off by the Beta label, the program runs more than stable.

Spell check

After installation the program is ready to work, but by default, it doesn’t have a spell checker.

  1. Run OmegaT
  2. Go to Options -> Preferences -> Spellchecker
  3. Put the check Automatically check the spelling of text
  4. Press Install new dictionary
  5. Select the language (for example, ru_RU for Russian), press Install
  6. Press Close. In the list you can see Russian language.
  7. Exit the settings.

How to create a project

OmegaT does not work with individual files and projects. Project is a set of folders with a specific structure. To translate the file, you need to create a project and add the file there.

  1. Run OmegaT
  2. Project -> New, select a location to save and the project name. I recommend to give projects meaningful names to indicate the language pair. For example, Test-Project_EN-RU.
  3. In the resulting window, specify the language pair Source Files Language is the language from which you are translating; Target Language Files is the language into which you translate. You need to indicate two — or four-letter code. For example, RU — Russian language and ru-RU ru-BY — clarify that it is Russian from Russia and Russian from Belarus. To run the spell check, the code must match the code specified in the settings spell (if the spell is installed ru-RU and the project will be RU, spellcheck will not work).
  4. Below tick tick Enable Sentence-level Segmenting (divide segments in sentences, not paragraphs) and Auto-propagation of Translations (insert the translation automatically). Remove Tags (remove tags) are better to keep off, I’ll explain it later.
  5. Press OK.

What are these folders for?

Inside project folder there are several sub-directories:

  • dictionary — you can add dictionaries to StarDict format; this function is quite useless.
  • glossary — the base term project, more on that later;
  • omegat — translation memory and backup project;
  • source is the folder source files;
  • target is the folder where translations will appear;
  • tm is the folder for the additional translation memories, more about that later.

and the file omegat.project with the configuration of the current project.

How to add files

After creating the project, you will see this window: Click Copy Files to Source Folder and select the files you want to translate. The files will be copied to the folder \source\ newly created project. You can add the files there manually. Just copy the files in \source\ through Windows Explorer. For example, I created two files — Excel and Word, which I will show the work of OmegaT.

Interface

OmegaT is running, the files are added. Let’s see how they look in the program. Here is the source document in Word. Here we see a heading, paragraphs, formatting (bold, links, underlines). And here is how it looks in OmegaT: note: the entire text is divided into sentences, formatting is not visible, there were some things in gray, and header the header is duplicated. What’s the matter?

  1. Text divided into segments Each sentence became a separate segment. Segmentation rules can be manually configured if necessary.
  2. Formatting is not visible in OmegaT, it is replaced with the They represent a reduction of tags from Word, which otherwise would look like <t>. To preserve the original formatting, you need to leave those tags as is, inscribing the translation between the tags the same logic as in the original. Option Remove tags in the project settings removes tags together with formatting. Not recommended if it is important to preserve the original formatting.
  3. Header is not duplicated. In fact, the top (in green) always displays the text in the original language and cannot be changed. Below that is a text box where by default you copied the same text. It needs to be deleted and write the translation.

In addition, in the right part of the program there are two sectors: Fuzzy Matches Glossary (dictionary project). Fuzzy Matches are the results of the database search of the project. There will be shown hints in the translation, based on your previous translations. Glossary (project-specific dictionary) is a result of a search by a Glossary, which you make up yourself. Unlike the translation memory, this is not a finished text, but only hints at some of the terminology. It is a powerful tool that helps keep consistency in terminology.

How to translate

  1. Double click on the segment for translation Under the original text will appear in the editable text string, the cursor is at the beginning and the line will be duplicated the original text.
  2. Enter your translation
  3. Press Enter When clicking the translation will be preserved and the cursor moves to the next segment.

Repeat until you finish the document. At any moment you can return to the previous segment, simply double-clicking on it. In the lower right corner there is a convenient progress bar. Click on it to switch the viewing mode.

Current file: % translated segments (segments remaining) / Project: % of translated segments (segments left), the total number of segments.

this row shows that the current file is translated to 5.8% of unique segments, it remains to translate another 1382. And in total, the project translated 63% of the segments remained 1756, and their total number in the project — 5979.

File: translated unique segments / total number of unique segments (unique-translated segments only unique segments, only segments in the project)

the second mode to illustrate this, file 1592 of unique translated segments 146 and the project from 4748 unique segments translated 2992. All segments (including repeats) — 5979. Figures 14/14 at the end do not belong to the counter project. It is an indicator of the length of the segment you are working on. It shows that the original was 14 characters, and the translation is 14. This feature is useful in cases when you should strictly observe the length of the string, for example in the translation interface.

Fuzzy matches

The most important tool for any CAT. This is the reason they are exists. Explain with an example: sample document first the proposal is very similar to fourth. I went one by one and translated the first sentence. When I reached the fourth, the program showed fuzzy match: Look closely at the panel matches: To the top of the text appears at source language stored in memory translation. Blue widely words that are present in the memory of the translation, but missing in the current sentence (compared with matching), green — words located near the missing parts. Below is the translation stored in the memory. If you press Ctrl+R, then it will be copied in the translation. Below there are three percentage figures. They mean the degree of match between sentence and TM. Read more about the mechanism of calculations in theOmegaT manual.

Automatic translation of identical segments

Of course, if a mechanism Fuzzy Match find 100% match, it can insert it automatically. For example, take a file again, this time in Excel. Translation tasks for software interface often looks like this. And here is how the file looks in OmegaT: Please note that the original was six lines See All. The program removed all duplicates, leaving only the one line. Enough to translate only this one string, and the remaining segments are also translated.

Glossary

The Glossary operates very simple. First you add words to it (original and translation). Now, when the word occur in the text, you will see a tip in the Glossary window. Thus, when the new offer came a term, you’ll know exactly how to translate it. For example, if the translation of the program interface always have to write “Good” instead of “OK”, you just have to to add to the dictionary the word “OK” with the translation “Good”. Adding a few hundred words in the draft, you will greatly simplify your life. To add word to Glossary, select it, right-click and select Add Glossary Entry. Furthermore, you can add massively to the file \glossary\glossary.txt in the format “original tabulation translation” (table in Excel, saved in a tab-delimited *.csv)

How to save and export

Project -> Save means “save the project”, i.e. a record of all translations in the database file. And in order to get files, you have to select Project -> Create translated documents. OmegaT will create a new file in the folder \target\ with the same name as the original and all the text will change to the translation. If any segments you forgot to translate, they will remain untranslated.

How to enable machine translation

In some situations, machine translation (such as Google Translate) can help you translate faster. OmegaT can be configured so that in its interface it display machine translation of the segment, which you can use directly, or quickly edit. In OmegaT, you can connect systems such as Google Translate, Microsoft Translator Yandex.Translator. The first two are paid solutions, and Yandex.Translator provides its services free of charge (reasonable use). Now I will tell you how to do it.

  1. Register an account on the Yandex. for Example, create an mail at mail.yandex.com.
  2. Go to page in the section “Translator” .
  3. Click Создать новый ключ (create new key), enter a description (for yourself), press Создать (create).

Add key in OmegaT:

  1. In OmegaT, go to Options -> Preferences -> Machine Translation
  2. Select Yandex Translate, check it and press the Configure
  3. Copy API key to the box that appears, tap OK
  4. In the window that appears you can set a password or skip this action. Password is needed in order to protect your API key. Useful for paid translation systems. Not very important in our case.

Close the settings. Now in the main window of the program click on the tab Machine Translations in the bottom of the window. To the box with machine translation was always on view, click on the small icon with two Windows. Now when you navigate to a new segment of the program will make a request to Yandex.Translator, receive the response and show it in a machine translation window. Hotkey Ctrl+M to insert the result into the translation.

How to create your translation memory

Let’s say you already have a high-quality bilingual file, and you want to use it in your project as reference material. If the Excel file where in one column the original text, and the cells opposite the appropriate translation, make TM very easy. OmegaT works with translation memory in the format *.tmx — Translation Memory eXtended. I use an online service translatum.gr.

  1. Create a new Excel file (sure *.xlsx)
  2. In the first column insert original text, translation to the second. Do not use formatting, it will not be saved
  3. Click the link Converter
  4. Select the file you created
  5. Specify the codes of the source and target language for Example, if you have an English-Russian text, it would be EN-US and RU-RU
  6. Click Submit
  7. This will open a page where you can download the archive with the translation memory.
tmx converter

To use a translation memory in the project, unzip the file and place the file in the project folder, a subdirectory of \tm\ (to display fuzzy matches) or \tm\auto\ (to force the use of 100% matches). There are separate apps for creating *.tmx directly into the computer, such as XLS(X)tiTMX, TMBuilder and tmx-maker, but no solution worked for me without issues. While the most reliable result of the online Converter.

Conclusion

OmegaT is a powerful tool to process and translate texts, and is suitable for eager translators. It has a great potential and can be used individually or in a team, for both full-time and

8 thoughts on “Computer Assisted Translations and OmegaT: Quick Start Guide

  1. Hi, I just followed your instructions regarding the Yandex api, and indeed I have two entries on the machine translator box that says “Yandex translate”, but one of them says “API not specified” and does not display any translation (the other one does). Do you know what is happening there?

    1. Check if you somehow have two of the engines enabled in Preferences – Machine Translation. I usually have two enabled, Google Translate and Yandex Translate

  2. Hi,
    I’d like to “Translate ALL” in one click with my machine translation API. I don’t see any option for this, just Ctrl-M for each of a a few 1000 segments.

    1. Yup, OmegaT is not made for that. Some other CAT might allow that though, I heard that memoQ is capable of “pre-translate” everything with MT. Or maybe there’s a plugin for OmegaT that can do this.

  3. Great piece even thought its a decade old already 😉
    I am looking for a solution where i have a sourcefile which contains lines of raw english product data, spellcheck/proofread that, modify it and then immediately translate it and be able to modify and safe that result in a separate file. SOrt of libreoffice spellchecker and OmegaT combined in 1.

    1. For proofreading purpose, you can use the English as source, and English (region) as target. Then have another project set up for translation into another language, like Spanish. You can even set the source folder in the English-Spanish project to be the target of English-English(region), that way you won’t even have to copy-paste documents back and forth.

      As for it’s age… Well, it’s being updated frequently, and honestly, I like that no one can just turn OmegaT off or ask a subscription fee one day, and I don’t care whether or not Chinese Firewall or Russian Networking agency would ban it’s IP addresses.

  4. I just installed the latest version of Omega T and tried to add both the Google and Microsoft translator as well as Yandex and they all charge. So I added a credit card on all 3 but I still can’t manage to find the API key on any of them. Does anybody know the exact steps to obtain it? I got the Google API key for just a few minutes (one project) and then when I tried to translate a different project it said: (400: Bad Request). Why would it only work for one project? It’s really starting to get on my nerves 🙁

    1. They only charge a small fee to confirm your credit card, but won’t charge for volumes of an average freelancer. At least that was the situation when I wrote this article.

      400 bad request means that something might be misconfigured, although it’s hard to tell without screenshots or anything.

Leave a Reply

Your email address will not be published. Required fields are marked *