EmacsWiki Regular Expression

EmacsWiki: Regular Expression
SiteMap Search ElispArea HowTo Glossary RecentChanges News Problems Suggestions Australia, Australia Day
Search
RegularExpression
A regular expression (abbreviated regexp or sometimes just re) is a search-string with wildcards and more. It is a pattern that is matched against the text to be searched. See Manual:Regexps. Examples: "alex" A plain string is a regular expression that matches the string exactly. The above regular expression matches alex. "alexa?" Some characters have special meanings in a regular expression. The question mark, for example, says that the preceding expression (the character a in this case) may or may not be present. The above regular expression matches alex or alexa. Regexps are important to Emacs users in many ways, including these:
We search with them interactively. Try C-M-s (command isearch-forward-regexp). Emacs code uses them to parse text. We use regexps all the time, without knowing it, when we use Emacs.
Contents 1. Regular Expression Syntax 2. Idiosyncrasies of Emacs Regular Expressions 3. Some Regexp Examples 4. Some Emacs Commands that Use Regular Expressions 5. Tools for Constructing Regexps 6. Study and Practice 1. Use Icicles to Learn about Regexps
Regular Expression Syntax

Here is the syntax used by Emacs for regular expressions. Any character matches itself, except for the list below.
http://www.emacswiki.org/emacs/RegularExpression (1 of 7) [1/26/2012 11:56:13 AM]
The following characters are special : . * + ? ^ $ \ [ Between brackets [], the following are special : ] - ^ Many characters are special when they follow a backslash see below. . * + ? ^ $ [...] [^..] [a-z] \ \| \w \b \sc  \< \> \` \' \1 \n \{3\} \{3,\} \{3,6\} any character (but newline) previous character or group, repeated 0 or more time previous character or group, repeated 1 or more time previous character or group, repeated 0 or 1 time start of line end of line any character between brackets any character not in the brackets any character between a and z prevents interpretation of following special char or word constituent word boundary character with c syntax (e.g. \s- for whitespace char) start\end of group start\end of word start\end of buffer string matched by the first group string matched by the nth group previous character or group, repeated 3 times previous character or group, repeated 3 or more times previous character or group, repeated 3 to 6 times
.?, +?, and ?? are non-greedy versions of ., +, and ? see NonGreedyRegexp. Also, \W, \B, and \Sc match any character that does not match \w, \b, and \sc. Characters are organized by category. Use C-u C-x = to display the category of the character under the cursor. \ca \Ca \cl \cg ascii character non-ascii character (newline included) latin character greek character
Here are some [[syntax_classes?]] that can be used between brackets, []. [:digit:] [:upper:] [:space:] [:xdigit:] [:cntrl:] [:ascii:] Syntax classes: a digit, same as [0-9] a letter in uppercase a whitespace character, as defined by the syntax table an hexadecimal digit a control character an ascii character
\s\sw \s_ \s. \s( \s) \s" \s\
whitespace character word constituent symbol constituent punctuation character open delimiter character close delimiter character string quote character escape character
\s/ \s$ \s' \s< \s> \s! \s|
character quote character paired delimiter expression prefix comment starter comment ender generic comment delimiter generic string delimiter
You can see the current [[syntax_table?]] by typing C-h s. The syntax table depends on the current mode. As expected, letters a..z are listed as word constituents in text-mode. Other word constituents in this mode include A..Z, 0..9, $, %, currency units, accented letters, kanjis. See EmacsSyntaxTable for details.
Idiosyncrasies of Emacs Regular Expressions
In a interactive search involving a regexp, a space character stands for one or more whitespace characters (tabs are whitespace characters). Enter C-q SPC to get a single space character. Or put the following in your InitFile to override this behaviour. (setq search-whitespace-regexp nil) [^ ] matches all characters not in the list, even newlines. Put a newline in the list if you want it not to be matched. You can enter a newline character using C-o, C-q C-j, or C-q 012 RET. Note also that \s- matches space, tab, newline and carriage return. This can be handy in a [^ ] construct. Default case handling for replacing commands executes case conversion. This means that both upper and lower case match in the regexp, whereas the case in the replacement string is chosen according to the match syntax. Try for example replacing john by harry below. Case conversion can be toggled on/off by typing M-c in the minibuffer during search. You can also set the variable case-fold-search to nil to disable case conversion; see CaseFoldSearch for more details. In the following example, only the last line would then be replaced. John JOHN john => => => Harry HARRY harry
Backslashes must be double-quoted when used in Lisp code. Regular expressions are often specified using strings in EmacsLisp. Some abbreviations are available: \n for newline, \t for tab, \b for backspace, \u3501 for character with unicode value 3501, and so on. Backslashes must be entered as \\. Here are two ways to replace the decimal point by a comma (e.g. 1.5 > 1,5), first by an interactive command, second by executing Lisp code (type C-x C-e after the expression to get it executed).
M-x replace-regexp RET $[0-9]+$\. RET \1, RET (while (re-search-forward "\$[0-9]+\$\\." nil t) (replace-match "\\1,"))
Some Regexp Examples

[-+[:digit:]] $\+\|-$?[0-9]+$\.[0-9]+$? $\w+$ +\1\> \<[[:upper:]]\w* +$ \w\{20,\} \w+phony\> $19\|20$[0-9]\{2\} ^.\{6,\} ^[a-zA-Z0-9_]\{3,16\}$ <tag[^> C-q C-j ]*>$.*?$</tag> digit or + or - sign decimal number (-2 or 1.5 but not .2 or 1.) two consecutive, identical words word starting with an uppercase letter trailing whitespaces (note the starting SPC) word with 20 letters or more word ending by phony year 1900-2099 at least 6 symbols decent string for a user name html tag
Some Emacs Commands that Use Regular Expressions

C-M-s C-M-r replace-regexp query-replace-regexp align-regexp highlight-regexp occur multi-occur how-many keep-lines flush-lines grep lgrep rgrep dired-do-copy-regexp dired-do-rename-regexp find-grep-dired incremental forward search matching regexp incremental backward search matching regexp replace string matching regexp same, but query before each replacement align, using strings matching regexp as delimiters highlight strings matching regexp show lines containing a match show lines in all buffers containing a match count the number of strings matching regexp delete all lines except those containing matches delete lines containing matches call unix grep command and put result in a buffer user-friendly interface to the grep command recursive grep copy files with names matching regexp rename files matching regexp display files containing matches for regexp with Dired
Note that list-matching-lines is an alias for occur and delete-matching-lines is an alias for flush-lines. The command highlight-regexp is bound to C-x w h. Also query-replaceregexp is bound by default to C-M-%, although some people prefer using an alias, like M-x qrr. Put the following in your InitFile to create such alias. (defalias 'qrr 'query-replace-regexp) See also: IncrementalSearch, ReplaceRegexp, AlignCommands, OccurBuffer, DiredPower
Tools for Constructing Regexps
Command re-builder constructs a regular expression. You enter the regexp in a small window at the bottom of the frame. The first 200 matches in the buffer are highlighted, so you can see if the regexp does what you want. Use Lisp syntax, which means doubling backslashes and using \\\\ to match a literal backslash. Macro rx provides user-friendly syntax for regular expressions. For example, (rx (one-ormore blank) line-end) returns the regexp string "\$?:[[:blank:]]+$\$". See rx. SymbolicRegexp is similar in aim to rx.
Study and Practice
Read about regexps in the Elisp manual (see also RegexpReferences), and study EmacsLisp code that uses regexps. Regexp searching (C-M-s) is a great way to learn about regexps see Regexp Searches. Change your regexp on the fly and see immediately what difference the change makes. Some examples of use (see also ReplaceRegexp and EmacsCrashRegexp):
r
Search for trailing whitespace: C-M-s SPC+$ Highlight all trailing whitespace: M-x highlight-regexp RET SPC+$ RET RET Delete trailing whitespace: M-x replace-regexp RET SPC+$ RET RET (same as M-x delete-trailing-whitespace) Search for open delimiters: C-M-s \s( Search for duplicated words (works across lines): C-M-s $\<\w+\>$\s-+\1 Count number of words in buffer: M-x how-many RET \< RET Align words beginning with an uppercase letter followed by a lowercase letter: M-: (setq case-fold-search nil) RET then M-x align-regexp RET \<[[: upper:]][[:lower:]] RET Replace word foo by bar (wont replace fool by barl): M-x replace-regexp RET \<foo\> RET bar Keep only the first two words on each line: M-x replace-regexp RET ^$\W*\w+ \W+\w+$.* RET \1 RET Suppress lines beginning with ;;: M-x flush-lines RET ^;; RET Remove the text after the first ; on each line: M-x replace-regexp RET $[^;]* $;.* RET \1 RET Keep only lines that contain an email address: M-x keep-lines RET \w+$\.\w+ $?@$\w\|\.$+ RET
Keep only one instance of consecutive empty lines: M-x replace-regexp RET ^Cq C-j\{2,\} RET C-q C-j RET Keep words or letters in uppercase, one per line: M-x replace-regexp RET [^[: upper:]]+ RET C-o RET List lines beginning with Chapter or Section: M-x occur RET ^$Chapter\| Section$ RET List lines with more than 80 characters: M-x occur RET ^.\{81,\} RET
Use Icicles to Learn about Regexps

Icicles provides these interactive ways to learn about regexps: `C-` (icicle-search) shows you regexp matches, as does C-M-s, but it can also show matches. Showing matched you (that is, highlight) regexp subgroup subgroups is very helpful for learning, and Icicles is unique in this. There are two ways that you can use this feature:
r
You can seach for a regexp, but limit the search context, used for further searching, to a particular subgroup match. For example, you can search for and highlight Lisp argument lists, by using a regexp subgroup that matches lists, placing that subgroup after defun: (defun [^(]*$([^(]*)$, that is, defun, followed by non-`( character(s), followed by `(, possibly followed by non-`) character(s), followed by `). You can search for a regexp without limiting the search context to a subgroup match. In this case, Icicles highlights each subgroup match in a different color. Heres an example, showing how each subgroup of the complex regexp ($[-a-z*]+$ * $(\(([-a-z]+ *\([^)]*$)\))\).* is matched:
`C-` also helps you learn by letting you use two simple regexps (search within a search) as an alternative to coming up with a single, complex regexp to do the same job. And, as with incremental search, you can change the second regexp on the fly to see immediately what difference the change makes. See Icicles - Search Commands, Overview
S-TAB during minibuffer input shows you all matches for your input string, which can be a regexp. So, just type a regexp whenever the minibuffer is active for completion and hit S-TAB to see what the regexp matches. Try this with command input (M-x), buffer switching (C-x b), file visiting (C-x f), help (C-h f, C-h v), and so on. Almost any time you type input in the minibuffer, you can type a regexp and use S-TAB to see what it matches (and then choose one of the matching candidates to input, if you want).
CategoryRegexp CategoryGlossary
SiteMap Search ElispArea HowTo Glossary RecentChanges News Problems Suggestions Add Translation Edit this page View other revisions Administration Last edited 2011-04-24 19:54 UTC by dr jerry (diff)
This work is licensed to you under version 2 of the GNU General Public License. Alternatively, you may choose to receive this work under any other license that grants the right to use, copy, modify, and/or distribute the work, as long as that license imposes the restriction that derivative works have to grant the same rights and impose the same restriction. For example, you may choose to receive this work under the GNU Free Documentation License, the CreativeCommons ShareAlike License, the XEmacs manual license, or similar licenses.

EmacsWiki Regular Expression

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

EmacsWiki Regular Expression

Enviado por

Direitos autorais:

Formatos disponíveis

EmacsWiki: Regular Expression

Regular Expression Syntax