Escolar Documentos
Profissional Documentos
Cultura Documentos
Contents
Contents ...................................................................................................................................... 2 1 Basics on regular expressions ..................................................................................................... 3 1.1 Introduction to regular expressions .......................................................................................... 3 1.2 Special characters in regular expressions ................................................................................. 3 2 memoQ and regular expressions ................................................................................................ 5 2.1 Auto-translation rules ............................................................................................................... 5 2.1.1 Using Auto-translatables in the QA check .............................................................................. 8 2.2 Segmentation rules ................................................................................................................... 8 2.3 Cascading filters ...................................................................................................................... 10 2.4 The Regex text filter ................................................................................................................ 11 2.5 The internal Regex Tagger ....................................................................................................... 13 This tutorial covers the regular expressions functionalities of memoQ 6.2. It contains text items from the English user interface of the program. These items are under constant verification and are subject to change without prior notification.
Page 2 of 14
Regular expressions are patterns used to describe text strings and to match these character combinations in strings.
There are countless possibilities to describe this regular expression and with what you want to match it with.
When you want to literally match one of the characters used as commands, you can use: \ backslash \* an asterisk \[ a left bracket \( a left parenthesis
Page 3 of 14
1.2.1 Example How would you describe a date given in its numeric form? 31/01/2012 31/01/2012 = \d{1,2}/ \d{1,2}/ \d{4} Or (if you are sure day and month will always be marked using 2 digits) \d{2}/ \d{2}/ \d{4} You can break down the regular expression into groups: 31/01/2012 DD/MM/YYYY = (\d{2})/( \d{2})/ (\d{4})
group 1 $1
group 2 $2
group3 $3
You can also transform the regular expression when you have the numeric date as following: MM/DD/YYYY. To transform your regular expression, exchange the groups: MM/DD/YYYY
group 2 $2
group 1 $1
group3 $3
Note: A group is of what is in parenthesis like (\d{2}) and can be expressed as $1. The dollar symbol expresses such a group.
Page 4 of 14
In memoQ, you can adjust light resources (auto-translation rules, segmentation rules) to enhance the default regular expressions or to create new ones. If you are an advanced user of regular expressions, memoQ uses the standard .NET implementation of regex. You can also use a cascading filter and Regex text filter functionalities in memoQ to improve the file import. You can also use the Regex Tagger to tag code after the document was already imported. The following sections describe how you can edit, create and use regular expressions in memoQ. Please also see the Kilgray webinar on regular expressions on the Kilgray website > Resource Center: http://kilgray.com/webinars/regex-masses-english-011749
Page 5 of 14
memoQ offers you default regular expressions for each language. You can create new expressions in clicking the Create new command link below, or you can clone an existing one. To clone a default set of auto-translatables, click the Clone command link below the Auto-translation rules list. You can also import auto-translatable rules created by another memoQ user. Click the Import new command link below the Auto-translation rules list, browse to the MQRES file which contains the regular expression, and import the file. Note: memoQ enables you to exchange light resources such as Auto-Translation rules in the MQRES file format, which is a memoQ proprietary file format, to import and export resources from memoQ. Select an auto-translation rule set, and click the Edit link. The Edit auto-translation rule set dialog appears:
Page 6 of 14
1. Enter your rule in the Auto-translation rules. 2. Click the Add button to add your auto-translation rule. You can also change an existing rule, then click the Change button. Click the Delete button to delete a rule. 3. When you enter a rule, you need to enter in the Replace order rules the rule you want to replace your expression with. In the example above: Figuren is replaced by Figs., ([\d] (1,4)) is the first number to be replaced, made of 1 to 4 digits and corresponds to group $2 in the Replace order rules (bis) is replaced by to ([\d] (1,4)) is the second number to be replaced, also made of 1 to 4 digits and corresponds to group $4 in the Replace order rules
4. Click OK to close the Edit auto-translation rule set dialog. Note: If you want to automatically replace some words or expressions by their translated equivalents, you can enter custom translation pairs in the Translation pairs. Translation pairs can, for example, be used for translating names of months, days, names of measurement units etc. Further information on auto-translation rule sets can be found in the memoQ Help: Functions and Settings > Edit resources > Light resources > Edit auto-translation rules. Further information on regu-
Page 7 of 14
lar expressions can be found in the memoQ Help as well: Functions and Settings > Regular Expressions and Tagging.
In the Segments and terms tab, check the Check auto-translatables check box. memoQ now checks if the auto-translation rules for the target text are correctly applied when you run the QA.
Page 8 of 14
Note: You can also import or export SRX files. SRX is the Segmentation Rule Exchange format to exchange segmentation rules from different tools. This enables you to use the same segmentation rules in memoQ as well as in other tools. In the Segmentation tab, you have 2 lists: Rules and Exceptions. You can add, change, or delete segmentation rules. Click the Preview button to display a preview of the segmentation rule set which you want to apply. In the Custom lists tab, you can add, change, or delete Custom lists. Select a custom list to see the corresponding elements in the List items displayed:
Page 9 of 14
Select for instance the #abbr_long# in the general abbreviation list. You can see the abbreviations of this list in the List items. Items in #abbr_long# do not have to be preceded by a whitespace. Select for instance the #abbr_short# items; they need to be preceded by a whitespace. For example, if you have eg. as an abbreviation, you should include it here, not in #abbr_long#. If you do not, memoQ will not start a new sentence after "beg." Click the Preview button to display a preview of the segmentation rule you created or changed. IMPORTANT: Segmentation rules must be selected before you import documents into a memoQ project. To assign segmentation rules, create a project, but do not import documents. You can click Finish in the New project wizard after the first dialog. In Project home, go to Settings > Segmentation rules, and check the check box of the segmentation rules you want to apply for the document import. Then start the document import.
4. In the Filter configuration drop-down list, you can choose the available filter configurations for the selected document type or filter. When you configure a filter, you can save the settings to be re-used or shared as a filter configuration resource. Select a filter configuration from the drop-down list. Click the folder icon to display the Load filter configuration dialog. A cascading filter selection could look like the following:
Page 10 of 14
5. Click OK to return to the Document import options, and click OK to import the document using the specified cascading filter.
Page 11 of 14
1. In the Translations pane of Project home, click the Import with options... command link below the document list. 2. In the Open dialog, select All files from the Files of type drop-down list. Click Open to proceed: the Document import settings dialog appears. 3. From the Filter drop-down list, choose Regex text filter. Option 2: 1. From the Tools menu, choose Resource console > Filter configurations. 2. Click the Create new command link, and choose from the Filters drop-down list Regex text filter. 3. Enter a name for the filter, and click OK. The filter is now listed in the Filters list. 4. Select the just created filter, and click the Edit command link. Both options open the Edit filter configuration dialog:
Further information on how to use the different tabs to configure this filter can be found in the memoQ Help.
Page 12 of 14
You can set up multiple rules in a single regular expression filter configuration. These are listed in the top box of the Rules section. To add a pattern to turn into a tag: 1. First type a regular expression in the Regular expression text box. This can be a simple expression. For example, if you want to replace the word 'memoQ' with an empty inline tag, simply type 'memoQ' in the Regular expression text field.
2. You can also enter more complex expressions where a simple pattern can represent several different character sequences. If you click the Pattern... link next to the text field, you get a menu of elements with the most commong available commands. For example, the regular expression '<[^/].*?>' matches text that starts with the '<' character, followed by the shortest possible sequence of characters that does not contain the '/' character, and ends in a '>' character. In short, text that matches this pattern looks like an XML <tag>.
Page 13 of 14
3. After you type the regular expression, choose the type of tag you want to see in the place of the text. You can choose to use an opening tag , a closing tag , or an empty tag . These correspond to the types of tags commonly used in XML markup. Note: If you check the Required check box, memoQ will indicate a tagging error instead of a simple warning if the corresponding tag is not copied to the target text. 4. In the Display text text box, you can specify what label memoQ should write inside the tag. This is called a replacement rule, and you also use these in auto-translation rules. You can write any text here, but you can also use the pre-defined $0 expression which matches the whole pattern.. Note: If the regular expression contains groups, you can use $1, $2 etc. to refer to the first, second etc. group in the replacement rule. You can choose from available options if you click the Pattern... link next to the Display text box. 5. After you filled in the Display text field, click Add to add the rule to the list. You can also edit an existing rule, click the rule in the list, and click Change. You can also remove a rule from the list, click the rule, and click Delete. 6. Click Run tagger now to close the dialog, and match your patterns against the text in the active document. Click Cancel to close the dialog without making changes. Note: The Run tagger now button does not save the rules. If you want to re-use the rule set for later tagging, click the Save icon at the top before leaving the Tag current document dialog.
Page 14 of 14