Você está na página 1de 13

Translating with 3 open source tools: Okapi Framework

(Rainbow), OmegaT, Apsic XBench

In this workshop we will show you how to rely on open source tools to accomplish from simple
to complex translation processes. We will use:

 Okapi Framework (specifically the Rainbow tool) for pre- and post-processing
 OmegaT as a CAT tool
 Apsic XBench as terminology manager and quality assurance tool

We will also use OpenOffice and Poedit for final verification.

Disclaimer: All information given here is purely for educational purposes. Although we use the
aforementioned tools on a daily basis, we cannot guarantee that you can approach each and every project
with them.

We will translate 3 sample files:

1. a Microsoft Word document (*.doc)


2. a Microsoft Excel spreadsheet (*.xlsx)
3. a gettext catalog (*.po), frequently used in software localization

Preprocessing

Word document (*.doc)

1. Open the *.doc file in MS Word.


2. Save as *.docx (File | Save as Word 2007 document).
o If you are on MS Office 2003, you need the Compatibility Pack to open and save Office
2007 formats.
o If you have multiple files, you can use the "Microsoft Office Migration Planning Manager",
a free command line tool that lets you convert Office 2003 documents to Office 2007
format in batch. See:
http://www.microsoft.com/downloads/details.aspx?FamilyID=13580cd7-a8bc-40ef-8281-
dd2c325a5a81&DisplayLang=en#filelist
o The reason why we convert from *.doc to *docx is that Okapi filters work almost flawlessly
for Office 2007 files. The RTF filter, though very reliable, may not work well with more
complex files.
3. If a message window pops up, click Yes.

Published by Qabiria.com, September 2010 1


4. Close MS Word.

Alternatively, if you want to stick to open source software, you can:

1. Open the *.doc file in OpenOffice Writer.


2. Save as *.odt.
3. Close OpenOffice.

Excel document (*.xlsx)


Often clients asked us to put translations in a separate column, next to the source text. In order
to do so, you have two options:

 You can simply translate the source column, which will be replaced by the target text and
later paste back the original.
 Or you can copy the source text into the target column and customize the filter
configuration, so that the translated file will not need any postprocessing.

We will choose the second option, to illustrate a specific feature of Rainbow.

1. Open the *.xlsx file in MS Excel.


2. Copy the source text into the target column.
3. Save the *xlsx file.
4. Close MS Excel.

Again, if you want use only open source tools, you can:

1. Open the *.xlsx file in OpenOffice Calc.


2. Copy the source text into the target column.
3. Save as *.ods.
4. Close OpenOffice.

gettext catalog (*.po)


No preparation is necessary.

Creating the translation package for OmegaT


1. Launch Rainbow (Okapi Tools).
2. To make sure you are starting this tutorial in a fresh environment, you should first create a
new project: Select the command New in the File menu, or press Ctrl+N.
3. It is recommended to save the Rainbow project file. This allows you to safely use relative
paths for all the parameters of your project and therefore to be able to move the project
and its files around without having to reset any paths. To save the project select the Save
command from the File menu. Enter a filename with the path of where your files are.

Published by Qabiria.com, September 2010 2


4. Select the Input List 1 tab. This is the list where we will enumerate all the source
documents to prepare.

5. Select the command Add Documents from the Input menu, or press
Ctrl+Insert. This opens the Add Documents dialog box where you select the
source documents you want to prepare. You can also simply drag and drop the documents
on the list. All documents must be in or below the input root directory you have chosen
for the source, in our case the directory where you saved the project. We will choose to
drag the *.docx, *.xlsx and the *.po file to the Input List 1 tab.
6. Select the *.xlsx file in the Input List 1.
7. Right-click on okf_openxml on the right column.
8. Choose Edit document properties (or simply double-click) to view and modify
the current filter for this file.
9. Click on Create...
10. In the window type the name for the new configuration filter: for instance, “blocked-
column”.
11. Click OK.
12. Go to Excel Options tab.
13. Select Column A next to Sheet 1 Columns to Exclude in order to exclude the
source text from the conversion.
14. Click OK.

Published by Qabiria.com, September 2010 3


15. Click OK in the Input Document Properties window.
16. Go to Languages and Encoding.
17. Set source language and target language codes (in our example we will use EN-US and IT-
IT).

Published by Qabiria.com, September 2010 4


18. Go to Other Settings tab.
19. Set your root folder with no trailing slash (usually your project folder).
20. Set Custom sub-folder, if necessary (this is where the translated file will be stored).
21. Go to Utilities | Translation package creation...
22. Select OmegaT in the first tab.

23. In Package Location tab select Root of the output directory (this is
where the files to be translated will be stored). Leaving the default variable, ${ProjDir},
will place the package in the Rainbow project directory.
24. Package name is the name of the folder / project in OmegaT. You can replace pack1 with
anything you want. If you are using reference numbers for your projects, this is the place to
input it.
25. Optionally, you can zip the whole package, for instance if you want to send the package to
another provider. Simply mark the option Compress the package into a ZIP
file.
26. It is recommended to mark the option Pre-segment the extracted text with the following
rules. If you don't select this option, the separation between text units is based on the
structure of the original file format. For example, the content of two <p> elements in
HTML gives you two text unit. Segmentation allows you to break down the content of the
text units into smaller parts, usually corresponding to sentences. The segmentation is done
using segmentation rules defined in a standard SRX (Segmentation Rules eXchange)
document (SRX 2.0 is supported). SRX use regular expressions to specify patterns before
and after a break or a non-break position. Okapi Framework includes a specific module,
Ratel, to create and maintain segmentation rules as SRX files.
Note: Text units flagged as non-translatable are not segmented. Text units that are already
Published by Qabiria.com, September 2010 5
segmented are not re-segmented.

27. You can even mark the option: Pre-translate the extracted text to use a
translation resource to leverage matches into the prepared document (if the package to
generate allows to do so). You can choose among several web-based machine translators
or a local TM (the detailed process is explained in the Annex at the end of this document).

28. Click Execute.


29. You will see "Processing documents" briefly on the status bar.
30. If any errors occur, they will be logged in a pop-up window.

Translation with OmegaT


The result of the previous step is an OmegaT project folder containing the following subfolders
and files:

 glossary\
 omegat\
 original\
 source\
 target\
 tm\

Published by Qabiria.com, September 2010 6


 manifest.xml
 omegat.project
 report.html

The converted source files are located in the source subfolder. Now you're ready to translate
the documents.

1. Launch OmegaT.
2. Go to Project | Open (CTRL+O).
3. Locate pack1 project and open it.
4. The Project files window pops up.
5. You can find detailed statistics information about the project files in the file located in:
..\pack1\omegat\project_stats.txt
6. OmegaT will open the first source file in the Editor.
7. Type your translation.
8. Press "ENTER" to advance to the next segment.
9. Once you have finished the translation, you're ready for QA.
10. Save your project.

Published by Qabiria.com, September 2010 7


Quality check / Quality assurance with XBench

In order to perform an automated quality check on the bilingual files, we will use Apsic Xbench.
This tool tries to find segments with the following potential problems:

 Untranslated segments
 Segments that have the same source text but a different target text
 Segments that have the same target text but a different source text
 Segments where the target text matches the source text
 Segments with tag errors
 Segments with numerical errors
 Segments with double blanks
 Segments that deviate from the key terms of the project (if key terms have been previously
defined)
 Segments that meet the search criteria of entries in the Project or the Personal Checklist
(if Project or Personal Checklists have been defined).

1. Launch Apsic Xbench.


2. Go to Project | New.
3. The Project Properties window pops up.
4. Click Add...
5. Choose the type of file you'd like to check (TMX). We will check the TMX project file
created by OmegaT during the translation.

Published by Qabiria.com, September 2010 8


6. Browse the project_save.tmx file inside omegat folder.
7. Click Next.
8. Mark the Ongoing translation option.

9. Click OK.
10. Go to QA tab.
11. Click Check Ongoing Translation.
12. Review the results and edit in OmegaT if necessary.

Published by Qabiria.com, September 2010 9


Postprocessing

1. After you're done with the Editing phase in OmegaT, go to File | Create
translated documents (CTRL+D), to create target documents. Beware: you'll
create the *.xliff target documents, not the *.doc, *.xlsx and *.po files.
2. On the Status bar the message Target Documents Created is visualized.
3. Close OmegaT

The output of this step are 3 target *.xliff files in the target subfolder inside pack1 folder.

Now you have to convert your files back to the original format.

1. Open Rainbow.
2. Locate manifest.xml file in OmegaT project root folder.
3. Drag and drop manifest.xml into Rainbow's Input List 1 tab.
4. Click on Utilities | Translation Package Post-Processing...

Published by Qabiria.com, September 2010 10


5. A new window (Translation Package Manifest) pops up.
6. Select the document(s) you want to post-process (usually all of them).
7. Note the path shown on the status bar. This points to the folder where the translated
document(s) are retrieved from.
8. The Output column shows the relative path of the output document that will be generated.
9. Click Execute to start the process, or Cancel to stop.
10. If any errors occur, they will be logged in a pop-up window.
11. Close Rainbow.
12. Click No when asked to save the settings.
13. Locate the done folder in the OmegaT project. If you specified any Custom subfolder, it
will be there.

Final verification

Word Excel gettext

1. Launch MS Word. 1. Launch MS Excel. 1. Launch Poedit


2. Open the Word file. 2. Open the Excel file. 2. Open the .po file.
3. Review the document. 3. Review the document. 3. Review the document.
4. Save it as *.doc. 4. Save the document
(saving the document
will create a *.mo file,
the compiled version of
the *.po file).

Links

Okapi Framework: http://okapi.opentag.com/


OmegaT: http://www.omegat.org/
Apsic Xbench: http://www.apsic.com/en/products_xbench.html (our review:
http://www.qabiria.com/en/resources/articles/117-strumenti-gratuiti-per-traduttori-1o-puntata-
apsic-xbench.html)
XBench Plug-in for Russian: http://www.databridge.ru/ru/article/modul-russkogo-yazyka-dlya-
programmy-avtomatizirovannogo-kontrolya-kachestva-perevoda-xbench

Published by Qabiria.com, September 2010 11


Annex: Leveraging using a TM
The pre-translation tab in Rainbow allows you to leverage your translation using one or two
"translation resources". A translation resource is either a TM or a MT engine that Rainbow can
access using a "connector". The list of the available connectors is on the tab Pre-Translation, in the
Translation Package Creation window. In this window check Pre-translate the
extracted text and select a connector from the list called Primary translation
resource to use.

If you want to use your own TM(s), the TM resources currently supported are:

 SimpleTM TM Engine: A very rudimentary exact-match only TM system


 GlobalSight TM Web Services: Useful if you have access to a GlobalSight server.
 MyMemory TM Web Services: A public TM repository. You could load your TMX file
there and use this. In practice it may be a bit too slow (but it should work). Visit
http://mymemory.translated.net/ to upload your TMX files.
 OpenTran web repository: Another public TM repository. Currently, you cannot
upload your own TMX files there, but it offers a lot of open-source software strings. A bit
slow too.
 Translate Toolkit TM server: The Translate Toolkit has a nice locale server easy to
install and run, and you can use your TMX files directly with it or you can convert them to
a PO file and use that. See documentation here:
http://translate.sourceforge.net/wiki/toolkit/tmserver
 Pensieve TM Engine: The Pensieve TM engine is Okapi's own TM engine. You simply
need to create a TM and import your TMX into it.

This is the complete procedure to create a Pensieve TM and to import a TMX into it:

1. Launch Rainbow.
2. Drop your TMX file in the Input List 1.
3. Go to Utilities > Convert file format.
4. Enter the full path of the directory where your TM should go into Directory of the
TM where to import. That directory should be just for the TM. It is recommended to
name it with a .pentm extension because other tools can detect it. For example:
"H:\TM\tm.pentm"
5. Click Execute.
6. Now you should have a "H:\TM\tm.pentm" directory which contains your indexed TM,
ready to use.
7. Go back to the Pre-Translation tab:
8. Select "Pensieve TM Engine".
9. Click "Settings".
10. In "TM Directory" enter the TM to use, for example: "H:\TM\tm.pentm"
11. Click OK.
12. Click Execute to run the Translation Package Creation.

Published by Qabiria.com, September 2010 12


13. Important: You will get better matches if the source text is segmented like the TM. So if
your TM is made of sentences (vs. paragraph) you want to make sure to enable the pre-
segmentation when preparing the file. That's in the Options tab: check "Pre-segment the
extracted text" and enter the path of the SRX file you want to use. There is a default one
in the "config" sub-folder of the directory where you have unzipped the Okapi tools.

Published by Qabiria.com, September 2010 13

Você também pode gostar