Você está na página 1de 14

integrated translation environment

Introduction to Regular Expressions in memoQ

2004-2013 Kilgray Translation Technologies. All rights reserved.

Regular expressions tutorial

Contents
Contents ...................................................................................................................................... 2 1 Basics on regular expressions ..................................................................................................... 3 1.1 Introduction to regular expressions .......................................................................................... 3 1.2 Special characters in regular expressions ................................................................................. 3 2 memoQ and regular expressions ................................................................................................ 5 2.1 Auto-translation rules ............................................................................................................... 5 2.1.1 Using Auto-translatables in the QA check .............................................................................. 8 2.2 Segmentation rules ................................................................................................................... 8 2.3 Cascading filters ...................................................................................................................... 10 2.4 The Regex text filter ................................................................................................................ 11 2.5 The internal Regex Tagger ....................................................................................................... 13 This tutorial covers the regular expressions functionalities of memoQ 6.2. It contains text items from the English user interface of the program. These items are under constant verification and are subject to change without prior notification.

memoQ integrated translation environment

Page 2 of 14

Regular expressions tutorial

1 Basics on regular expressions

Regular expressions are patterns used to describe text strings and to match these character combinations in strings.

1.1 Introduction to regular expressions


To explain regular expressions, the following example sentence is used: This is a regular expression. This example sentence can be described as the following: a group of 5 words ending with a dot a string that starts with a capital T and ends with a dot a group of characters, followed by a space, followed by another group of characters, etc. until we meet a dot

There are countless possibilities to describe this regular expression and with what you want to match it with.

1.2 Special characters in regular expressions


There are a few basic rules to observe when you create regular expressions. The first rule: Keep it simple. 10 commands are enough for a basic usage. The following commands can be used to create your own expressions: any character () a group [] \s \d ? * + {x} { x, y } a range any space any digit as few times as possible matches the preceding character 0 or more times matches the preceding character 1 or more times exactly x times between x and y times

When you want to literally match one of the characters used as commands, you can use: \ backslash \* an asterisk \[ a left bracket \( a left parenthesis

memoQ integrated translation environment

Page 3 of 14

Regular expressions tutorial

1.2.1 Example How would you describe a date given in its numeric form? 31/01/2012 31/01/2012 = \d{1,2}/ \d{1,2}/ \d{4} Or (if you are sure day and month will always be marked using 2 digits) \d{2}/ \d{2}/ \d{4} You can break down the regular expression into groups: 31/01/2012 DD/MM/YYYY = (\d{2})/( \d{2})/ (\d{4})

group 1 $1

group 2 $2

group3 $3

You can also transform the regular expression when you have the numeric date as following: MM/DD/YYYY. To transform your regular expression, exchange the groups: MM/DD/YYYY

group 2 $2

group 1 $1

group3 $3

Note: A group is of what is in parenthesis like (\d{2}) and can be expressed as $1. The dollar symbol expresses such a group.

memoQ integrated translation environment

Page 4 of 14

Regular expressions tutorial

2 memoQ and regular expressions

In memoQ, you can adjust light resources (auto-translation rules, segmentation rules) to enhance the default regular expressions or to create new ones. If you are an advanced user of regular expressions, memoQ uses the standard .NET implementation of regex. You can also use a cascading filter and Regex text filter functionalities in memoQ to improve the file import. You can also use the Regex Tagger to tag code after the document was already imported. The following sections describe how you can edit, create and use regular expressions in memoQ. Please also see the Kilgray webinar on regular expressions on the Kilgray website > Resource Center: http://kilgray.com/webinars/regex-masses-english-011749

2.1 Auto-translation rules


In memoQ, you can define regular expressions. Open memoQ, go to Tools > Resource console. Select Auto-translation rules in the Resource categories on the left.

memoQ integrated translation environment

Page 5 of 14

Regular expressions tutorial

memoQ offers you default regular expressions for each language. You can create new expressions in clicking the Create new command link below, or you can clone an existing one. To clone a default set of auto-translatables, click the Clone command link below the Auto-translation rules list. You can also import auto-translatable rules created by another memoQ user. Click the Import new command link below the Auto-translation rules list, browse to the MQRES file which contains the regular expression, and import the file. Note: memoQ enables you to exchange light resources such as Auto-Translation rules in the MQRES file format, which is a memoQ proprietary file format, to import and export resources from memoQ. Select an auto-translation rule set, and click the Edit link. The Edit auto-translation rule set dialog appears:

memoQ integrated translation environment

Page 6 of 14

Regular expressions tutorial

1. Enter your rule in the Auto-translation rules. 2. Click the Add button to add your auto-translation rule. You can also change an existing rule, then click the Change button. Click the Delete button to delete a rule. 3. When you enter a rule, you need to enter in the Replace order rules the rule you want to replace your expression with. In the example above: Figuren is replaced by Figs., ([\d] (1,4)) is the first number to be replaced, made of 1 to 4 digits and corresponds to group $2 in the Replace order rules (bis) is replaced by to ([\d] (1,4)) is the second number to be replaced, also made of 1 to 4 digits and corresponds to group $4 in the Replace order rules

4. Click OK to close the Edit auto-translation rule set dialog. Note: If you want to automatically replace some words or expressions by their translated equivalents, you can enter custom translation pairs in the Translation pairs. Translation pairs can, for example, be used for translating names of months, days, names of measurement units etc. Further information on auto-translation rule sets can be found in the memoQ Help: Functions and Settings > Edit resources > Light resources > Edit auto-translation rules. Further information on regu-

memoQ integrated translation environment

Page 7 of 14

Regular expressions tutorial

lar expressions can be found in the memoQ Help as well: Functions and Settings > Regular Expressions and Tagging.

2.1.1 Using Auto-translatables in the QA check


You can check if the auto-translation rules are correctly applied in your target text in using the QA. In Project home, go to Settings > QA settings. Select the QA settings for this project, click the Edit command link. The Edit QA settings dialog appears:

In the Segments and terms tab, check the Check auto-translatables check box. memoQ now checks if the auto-translation rules for the target text are correctly applied when you run the QA.

2.2 Segmentation rules


In memoQ, you can define segmentation rules. Open memoQ, and go to Tools > Resource console. Select Segmentation rules in the Resource categories on the left. Select Segmentation rules, and click the Edit command link. The Edit segmentation rule set dialog appears:

memoQ integrated translation environment

Page 8 of 14

Regular expressions tutorial

Note: You can also import or export SRX files. SRX is the Segmentation Rule Exchange format to exchange segmentation rules from different tools. This enables you to use the same segmentation rules in memoQ as well as in other tools. In the Segmentation tab, you have 2 lists: Rules and Exceptions. You can add, change, or delete segmentation rules. Click the Preview button to display a preview of the segmentation rule set which you want to apply. In the Custom lists tab, you can add, change, or delete Custom lists. Select a custom list to see the corresponding elements in the List items displayed:

memoQ integrated translation environment

Page 9 of 14

Regular expressions tutorial

Select for instance the #abbr_long# in the general abbreviation list. You can see the abbreviations of this list in the List items. Items in #abbr_long# do not have to be preceded by a whitespace. Select for instance the #abbr_short# items; they need to be preceded by a whitespace. For example, if you have eg. as an abbreviation, you should include it here, not in #abbr_long#. If you do not, memoQ will not start a new sentence after "beg." Click the Preview button to display a preview of the segmentation rule you created or changed. IMPORTANT: Segmentation rules must be selected before you import documents into a memoQ project. To assign segmentation rules, create a project, but do not import documents. You can click Finish in the New project wizard after the first dialog. In Project home, go to Settings > Segmentation rules, and check the check box of the segmentation rules you want to apply for the document import. Then start the document import.

2.3 Cascading filters


You can use cascading filters to import a document. memoQ detects the default filter based on the file extension, but you can also select a second filter in a filter chain. A filter chain or a cascading filter is a document filter configuration where memoQ runs a second filter after the default document filter when it imports a document. This is useful when the imported text contains further markup. For example, cells in an Excel workbook may contain HTML markup, and you can turn that into sensible inline tags by applying the HTML filter or the Regex tagger after the Excel document filter. How to add a cascading filter: 1. In the Translations pane of Project home, click the Import with options... link, then select Change filter and configuation to display the Document import settings dialog. 2. In the Document import settings dialog, click Add cascading filter.... 3. In the Filter drop-down list, choose one of the filters available. Choose one that is most appropriate for the text that is imported by the first filter. In most cases, you will use the XML filter, the HTML filter, or the Regex tagger in this place.

4. In the Filter configuration drop-down list, you can choose the available filter configurations for the selected document type or filter. When you configure a filter, you can save the settings to be re-used or shared as a filter configuration resource. Select a filter configuration from the drop-down list. Click the folder icon to display the Load filter configuration dialog. A cascading filter selection could look like the following:

memoQ integrated translation environment

Page 10 of 14

Regular expressions tutorial

5. Click OK to return to the Document import options, and click OK to import the document using the specified cascading filter.

2.4 The Regex text filter


Using a Regex text filter, memoQ can process structured text files and extract translatable content from these files. memoQ can also extract context and comments for the imported content. You can mainly control the regex text filter through regular expressions. The Regex text filter processes structured text files in three steps: 1. It breaks up the files into paragraphs. 2. Extracts paragraphs that contain translatable text. 3. From the extracted paragraphs, it extracts translatable text, and optionally context and comments. The options of the filter follow these three steps: 1. To specify how paragraphs are separated; 2. To specify how an imported paragraph should look like; 3. To list those parts that really needed to be translated. This procedure requires writing up regular expressions, and this is something you can do through trial and error. Before you proceed with importing the file, you can always click the Preview tab to see what will be imported. You have 2 options to configure a Regex-based text filter: Option 1:

memoQ integrated translation environment

Page 11 of 14

Regular expressions tutorial

1. In the Translations pane of Project home, click the Import with options... command link below the document list. 2. In the Open dialog, select All files from the Files of type drop-down list. Click Open to proceed: the Document import settings dialog appears. 3. From the Filter drop-down list, choose Regex text filter. Option 2: 1. From the Tools menu, choose Resource console > Filter configurations. 2. Click the Create new command link, and choose from the Filters drop-down list Regex text filter. 3. Enter a name for the filter, and click OK. The filter is now listed in the Filters list. 4. Select the just created filter, and click the Edit command link. Both options open the Edit filter configuration dialog:

Further information on how to use the different tabs to configure this filter can be found in the memoQ Help.

memoQ integrated translation environment

Page 12 of 14

Regular expressions tutorial

2.5 The internal Regex Tagger


The internal Regex Tagger runs after a document was imported. You can use the Regex Tagger to create tags out of non-translatable elements in any text. When you are working on a document in the translation editor, choose Run Regex Tagger... from the Format menu. The Tag current document dialog appears:

You can set up multiple rules in a single regular expression filter configuration. These are listed in the top box of the Rules section. To add a pattern to turn into a tag: 1. First type a regular expression in the Regular expression text box. This can be a simple expression. For example, if you want to replace the word 'memoQ' with an empty inline tag, simply type 'memoQ' in the Regular expression text field.

2. You can also enter more complex expressions where a simple pattern can represent several different character sequences. If you click the Pattern... link next to the text field, you get a menu of elements with the most commong available commands. For example, the regular expression '<[^/].*?>' matches text that starts with the '<' character, followed by the shortest possible sequence of characters that does not contain the '/' character, and ends in a '>' character. In short, text that matches this pattern looks like an XML <tag>.

memoQ integrated translation environment

Page 13 of 14

Regular expressions tutorial

3. After you type the regular expression, choose the type of tag you want to see in the place of the text. You can choose to use an opening tag , a closing tag , or an empty tag . These correspond to the types of tags commonly used in XML markup. Note: If you check the Required check box, memoQ will indicate a tagging error instead of a simple warning if the corresponding tag is not copied to the target text. 4. In the Display text text box, you can specify what label memoQ should write inside the tag. This is called a replacement rule, and you also use these in auto-translation rules. You can write any text here, but you can also use the pre-defined $0 expression which matches the whole pattern.. Note: If the regular expression contains groups, you can use $1, $2 etc. to refer to the first, second etc. group in the replacement rule. You can choose from available options if you click the Pattern... link next to the Display text box. 5. After you filled in the Display text field, click Add to add the rule to the list. You can also edit an existing rule, click the rule in the list, and click Change. You can also remove a rule from the list, click the rule, and click Delete. 6. Click Run tagger now to close the dialog, and match your patterns against the text in the active document. Click Cancel to close the dialog without making changes. Note: The Run tagger now button does not save the rules. If you want to re-use the rule set for later tagging, click the Save icon at the top before leaving the Tag current document dialog.

memoQ integrated translation environment

Page 14 of 14

Você também pode gostar