Escolar Documentos
Profissional Documentos
Cultura Documentos
In this workshop we will show you how to rely on open source tools to accomplish from simple
to complex translation processes. We will use:
Okapi Framework (specifically the Rainbow tool) for pre- and post-processing
OmegaT as a CAT tool
Apsic XBench as terminology manager and quality assurance tool
Disclaimer: All information given here is purely for educational purposes. Although we use the
aforementioned tools on a daily basis, we cannot guarantee that you can approach each and every project
with them.
Preprocessing
You can simply translate the source column, which will be replaced by the target text and
later paste back the original.
Or you can copy the source text into the target column and customize the filter
configuration, so that the translated file will not need any postprocessing.
Again, if you want use only open source tools, you can:
5. Select the command Add Documents from the Input menu, or press
Ctrl+Insert. This opens the Add Documents dialog box where you select the
source documents you want to prepare. You can also simply drag and drop the documents
on the list. All documents must be in or below the input root directory you have chosen
for the source, in our case the directory where you saved the project. We will choose to
drag the *.docx, *.xlsx and the *.po file to the Input List 1 tab.
6. Select the *.xlsx file in the Input List 1.
7. Right-click on okf_openxml on the right column.
8. Choose Edit document properties (or simply double-click) to view and modify
the current filter for this file.
9. Click on Create...
10. In the window type the name for the new configuration filter: for instance, “blocked-
column”.
11. Click OK.
12. Go to Excel Options tab.
13. Select Column A next to Sheet 1 Columns to Exclude in order to exclude the
source text from the conversion.
14. Click OK.
23. In Package Location tab select Root of the output directory (this is
where the files to be translated will be stored). Leaving the default variable, ${ProjDir},
will place the package in the Rainbow project directory.
24. Package name is the name of the folder / project in OmegaT. You can replace pack1 with
anything you want. If you are using reference numbers for your projects, this is the place to
input it.
25. Optionally, you can zip the whole package, for instance if you want to send the package to
another provider. Simply mark the option Compress the package into a ZIP
file.
26. It is recommended to mark the option Pre-segment the extracted text with the following
rules. If you don't select this option, the separation between text units is based on the
structure of the original file format. For example, the content of two <p> elements in
HTML gives you two text unit. Segmentation allows you to break down the content of the
text units into smaller parts, usually corresponding to sentences. The segmentation is done
using segmentation rules defined in a standard SRX (Segmentation Rules eXchange)
document (SRX 2.0 is supported). SRX use regular expressions to specify patterns before
and after a break or a non-break position. Okapi Framework includes a specific module,
Ratel, to create and maintain segmentation rules as SRX files.
Note: Text units flagged as non-translatable are not segmented. Text units that are already
Published by Qabiria.com, September 2010 5
segmented are not re-segmented.
27. You can even mark the option: Pre-translate the extracted text to use a
translation resource to leverage matches into the prepared document (if the package to
generate allows to do so). You can choose among several web-based machine translators
or a local TM (the detailed process is explained in the Annex at the end of this document).
glossary\
omegat\
original\
source\
target\
tm\
The converted source files are located in the source subfolder. Now you're ready to translate
the documents.
1. Launch OmegaT.
2. Go to Project | Open (CTRL+O).
3. Locate pack1 project and open it.
4. The Project files window pops up.
5. You can find detailed statistics information about the project files in the file located in:
..\pack1\omegat\project_stats.txt
6. OmegaT will open the first source file in the Editor.
7. Type your translation.
8. Press "ENTER" to advance to the next segment.
9. Once you have finished the translation, you're ready for QA.
10. Save your project.
In order to perform an automated quality check on the bilingual files, we will use Apsic Xbench.
This tool tries to find segments with the following potential problems:
Untranslated segments
Segments that have the same source text but a different target text
Segments that have the same target text but a different source text
Segments where the target text matches the source text
Segments with tag errors
Segments with numerical errors
Segments with double blanks
Segments that deviate from the key terms of the project (if key terms have been previously
defined)
Segments that meet the search criteria of entries in the Project or the Personal Checklist
(if Project or Personal Checklists have been defined).
9. Click OK.
10. Go to QA tab.
11. Click Check Ongoing Translation.
12. Review the results and edit in OmegaT if necessary.
1. After you're done with the Editing phase in OmegaT, go to File | Create
translated documents (CTRL+D), to create target documents. Beware: you'll
create the *.xliff target documents, not the *.doc, *.xlsx and *.po files.
2. On the Status bar the message Target Documents Created is visualized.
3. Close OmegaT
The output of this step are 3 target *.xliff files in the target subfolder inside pack1 folder.
Now you have to convert your files back to the original format.
1. Open Rainbow.
2. Locate manifest.xml file in OmegaT project root folder.
3. Drag and drop manifest.xml into Rainbow's Input List 1 tab.
4. Click on Utilities | Translation Package Post-Processing...
Final verification
Links
If you want to use your own TM(s), the TM resources currently supported are:
This is the complete procedure to create a Pensieve TM and to import a TMX into it:
1. Launch Rainbow.
2. Drop your TMX file in the Input List 1.
3. Go to Utilities > Convert file format.
4. Enter the full path of the directory where your TM should go into Directory of the
TM where to import. That directory should be just for the TM. It is recommended to
name it with a .pentm extension because other tools can detect it. For example:
"H:\TM\tm.pentm"
5. Click Execute.
6. Now you should have a "H:\TM\tm.pentm" directory which contains your indexed TM,
ready to use.
7. Go back to the Pre-Translation tab:
8. Select "Pensieve TM Engine".
9. Click "Settings".
10. In "TM Directory" enter the TM to use, for example: "H:\TM\tm.pentm"
11. Click OK.
12. Click Execute to run the Translation Package Creation.