Escolar Documentos
Profissional Documentos
Cultura Documentos
com/grep
Nanotech news
Searching with Grep Archives
This section describes the Grep option in BBEdit's Find
Newswire Today
command, which allows you to find and change text
that matches a set of conditions you specify.
Combined with the multi-file search and replace
Page tags
features, BBEdit's grep capabilities can make many
feed nano nanotech
editing tasks quicker and easier, whether you are
nanotechnology newswire
modifying Web pages, extracting data from a file, or
just rearranging a phone list.
top12000
What Is Grep or Pattern Searching? How to join this site?
Site members
Grep patterns offer a powerful way to make changes Recent changes
to your data that "plain text" searches simply cannot. Page Tags
For example, suppose you have a list of people's
Site Manager
names that you want to alphabetize. If the names
appear last name first, you can easily put these Other Wikidot sites
names in a BBEdit window and use the Sort tool. But if
the list is arranged first name first, a simple grep Add a new page
pattern can be used to put the names in the proper
order for sorting.
1 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
Note:
BBEdit-Talk
http://www.barebones.com/support/lists/index.shtml
Tech Note
2 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
/pcre/>.
Writing Search Patterns
3 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
Note
Wildcard Matches…
any character except a line break (that
.
is, a carriage return)
beginning of a line (unless used in a
^
character class)
end of line (unless used in a character
$
class)
4 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
^From: Patrick
^foo$
foobar
foo
fighting foo
WARNING
5 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
Escape Matches
only at the beginning of the document
(as opposed to ^, which matches at the
\A
beginning of the document and also at
the beginning of each line)
any word boundary, defined as any
\b position between a \w character and a
\W character, in either order
\B any position that is not a word boundary
at the end of the document (as opposed
to $, which matches at the end of the
\z
document and also at the end of each
line)
at the end of the document, or before a
\Z trailing return at the end of the doc, if
there is one
6 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
Character
Matches
Class
any one of the characters x, y,
[xyz]
z
[^xyz] any character except x, y, z
any character in the range a to
[a-z]
z
Character
Matches
Class
[aeiou] any vowel
[^aeiou] any character that is not a vowel
any character from a-z, A-Z, or
[a-zA-Z0-9]
0-9
any character that is neither a
[^aeiou0-9]
vowel nor a digit
7 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
Character
Matches
Class
[]0-9] any digit or ]
[aeiou^] a vowel or ^
[-A-Z] a dash or A - Z
any character in the range from
[—A]
- to A
[aeiou-] any vowel or -
[aei\-ou] any vowel or -
Character
Matches
Class
\r line break (carriage return)
\n Unix line break (line feed)
\t tab
\f page break (form feed)
\a alarm (hex 07)
a named control character, like \cC
\cX
for Control-C
backspace (hex 08) (only in
\b
character classes)
8 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
Special
Matches
Character
any whitespace character (space,
\s tab, carriage return, line feed, form
feed)
any non-whitespace character (any
\S
character not included by \s)
any word character (a-z, A-Z, 0-9,
\w
_, and some 8-bit characters)
any non-word character (all
\W characters not included by \w,
including carriage returns)
\d any digit (0-9)
any non-digit character (including
\D
carriage return)
9 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
Pattern Matches
p* zero or more p's
p+ one or more p's
p? zero or one p's
match exactly COUNT p's, where
p{COUNT}
COUNT is an integer
match at least MIN p's, where MIN
p{MIN,}
is an integer
p{MIN, match at least MIN p's, but no more
MAX} than MAX
10 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
11 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
literal plus
sign, followed
by more
digits
four digits,
followed by a
tab or a
\d{4}[\t
space, 2152 B.C.
]B\.C\.
followed by
the string
B.C.
an optional
dollar sign,
followed by
1,234.56 or
one or more
$4,296,459.19
digits and
\$?[0-9,]+\.\d* or
commas,
$3,5,6,4.0000
followed by a
or 0. (oops!)
period, then
zero or more
digits
Creating Subpatterns
Subpatterns provide a means of organizing or
grouping complex grep patterns. This is primarily
important for two reasons: for limiting the scope of
the alternation operator (which otherwise creates an
alternation of everything to its left and right), and for
changing the matched text when performing
replacements. A subpattern consists of any simple or
complex pattern, enclosed in a pair of parentheses:
Pattern Matches
(p) the pattern p and remembers it
12 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
appears:
13 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
a word character, a
second word
character, followed
(\w)(\w)\2\1 abba
by the second one
again and the first
one again
Using Alternation
The alternation operator | allows you to match any of
several patterns at a given point. To use this operator,
place it between one or more patterns x|y to match
either x or y.
14 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
IMPORTANT
<.+>
15 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
Non-Greedy Quantifiers
IMPORTANT
Quantifier Matches…
+? one or more
*? zero or more
?? zero or one
{COUNT}? match exactly COUNT times
{MIN,}? match at least MIN times
{MIN, match at least MIN times, but no
MAX}? more than MAX
<.+?>
16 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
<B>BBEdit!</B>
<B>.*</B>
but for the same reasons as before, this will match the
entire line of text. The solution is similar; we will use
the non-greedy *? quantifier:
<B>.*?</B>
Pattern Matches…
the entire matched pattern
&
[replacement only]
the pattern p and remembers it
(p)
[search only]
\1, \2, …, the nth subpattern in the entire
\99 search pattern
Note
17 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
ACME [A-Za-z]+
&?
18 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
19 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
Case Transformations
Replace patterns can also change the case of the
original text when using subpattern replacements. The
syntax is similar to Perl's, specifically:
Modifier Effect
\u Make the next character uppercase
Make all following characters uppercase
\U until reaching another case specifier
(\u, \L, \l ) or \E
\l Make the next character lowercase
Make all following characters lowercase
\L until reaching another case specifier
(\u, \U, \l ) or \E
End case transformation opened by \U
\E
or \L
mumbo-jumbo
20 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
(\w+)(\W)(\w+)
\U\1\E\2\3 MUMBO-jumbo
\u\1\2\u\3 Mumbo-Jumbo
\lMUMBLE\2\3 mUMBLE-jumbo
\Ufred-\uwilma FRED-Wilma
Examples
The example patterns in this section describe some
common character classes and shortcuts used for
constructing grep patterns, and addresses some
common tasks that you might find useful in your work.
Matching Identifiers
One of the most common things you will use grep
patterns for is to find and modify identifiers, such as
variables in computer source code or object names in
HTML source documents. To match an arbitrary
identifier in most programming languages, you might
use this search pattern:
[a-z][a-zA-Z0-9]*
21 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
[a-z_][a-zA-Z0-9]*
[a-zA-Z][a-zA-Z0-9_]*
You can see that there are tabs or spaces between the
labels on the left and the data on the right, but you
have no way of knowing how many spaces or tabs
there will be on any given line. Here is a character
class that means "match one or more white space
characters."
[ \t]+
22 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
\1\("\2"\)
".*"
".+?"
23 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
^(\d+\.\d+\.?\d*\.?\d*)[ \t]+([a-z
]+)
\1 \2
\.?\d*
^From:[ \t]+(.*)\r.*\rSubject:[
\t]+(.*)
24 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
\2 \1
Junior X. Potter
Jill Safai
Dylan Schuyler Goode
Walter Wang
\2, \1
Potter, Junior X.
Safai, Jill
Goode, Dylan Schuyler
Wang, Walter
25 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
Matching Nulls
The grep engine used in versions of BBEdit prior to 6.5
was unable to search text that contained null
characters (ASCII value zero), but this limitation has
since been removed. Here's one way to match a null:
\x{0}
Back-references
The following charts explain the rules BBEdit uses for
determining back-references.
In Search Patterns
Modifier Effect
A backslash followed by a zero is an
octal character reference. Up to two
further octal characters are read. Thus,
"\040" will match a space character,
\0
and "\07" will match the ASCII BEL
(\x07), but "\08" will match an ASCII
null followed by the digit 8 (because
octal characters only range from 0-7).
A backslash followed by a single
decimal digit from 1 to 9 is always a
\1-9
backreference to the Nth captured
subpattern.
A backslash followed by two decimal
digits, which taken together form the
\10-99
integer N (ranging from 10 to 99), is a
backreference to the Nth captured
26 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
In Character Classes
Modifier Effect
Inside a character class, a backslash
followed by up to three octal digits
generates a single byte character
reference from the least significant
\OCTAL eight bits of the value. Thus, the
character class "[\7]" will match a
single byte with octal value 7
(equivalent to "\x07"). "[\8]" will match
a literal "8" character.
** In Replacement Patterns
Modifier Effect**
If more than two decimal digits follow
the backslash, only the first two are
considered part of the backreference.
Thus, "\111" would be interpreted as the
11th backreference, followed by a literal
"1". You may use a leading zero; for
\NNN+
example, if in your replacement pattern
you want the first backreference
followed by a literal "1", you can use
"\011". (If you use "\11", you will get
the 11th backreference, even if it is
empty.)
27 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
Class Meaning
alnum letters and digits
alpha letters
ascii character codes 0-127
cntrl control characters
digit decimal digits (same as \d)
graph printing characters, excluding spaces
lower lower case letters
print printing characters, including spaces
punct punctuation characters
space white space (same as \s)
upper upper case letters
word "word" characters (same as \w)
xdigit hexadecimal digits
28 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
[[:^digit:]]+
Non-Capturing Parentheses
As described in the preceding section "Creating
Subpatterns", bare parentheses cluster and capture
the subpatterns they contain. The portion of the
matching pattern contained within the first pair of
parentheses is available in the back-reference \1, the
second in \2, and so on.
((red|white) (king|queen))
\1 "red king"
\2 "red"
\3 "king"
(?:PATTERN)
29 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
(?:(red|white) (king|queen))
\1 "red"
\2 "king"
(?KEY…)
Extension Meaning
Cluster-only parentheses, no
(?:…)
capturing
Comment, discard all text
(?#…)
between the parentheses
(?imsx-imsx) Enable/disable pattern modifiers
(?imsx- Cluster-only parentheses with
imsx:…) modifiers
30 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
Comments
The sequence (?# marks the start of a comment
which continues up to the next closing parenthesis.
Nested parentheses are not permitted. The characters
that make up a comment play no part in the pattern
matching at all.
Pattern Modifiers
The settings for case sensitivity, multi-line matching,
whether the dot character can match returns, and
"extended syntax" can be turned on and off within a
pattern by including sequences of letters between "(?"
and ")".
31 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
grep patterns
Example Effect
(?imsx) Turn all four options on
(?-imsx) Turn all four options off
(?i-msx) Turn "i" on, turn "m", "s", and "x" off
32 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
(?i)abc
a(?i)bc
ab(?i)c
abc(?i)
33 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
Positional Assertions
Positional assertions "anchor" a pattern, without
actually matching any characters. Simple assertions
have already been described: those which are invoked
with the escape sequences \b, \B, \A, \Z, \z, ^ and $.
For example, the pattern \bfoo\b will only match the
string "foo" if it has word breaks on both sides, but
the \b's do not themselves match any characters; the
entire text matched by this pattern are the three
characters "f", "o", and "o".
\w+(?=;)
foo(?!bar)
(?!foo)bar
34 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
effect.
(?<!foo)bar
(?<=Martin|Lewis)
is permitted, but
(?<!dogs?|cats?)
(?<=ab(c|de))
(?<=abc|abde)
35 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
(?<=\d{3})(?<!999)foo
(?<=\d{3}…)(?<!999)foo
(?<=(?<!foo)bar)baz
(?<=\d{3}(?!999)…)foo
Conditional Subpatterns
36 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
if-then: (?(condition)yes-pattern)
if-then-else: (?(condition)yes-
pattern|no-pattern)
(red|blue)
(red|blue)( car)?
(red|blue car|blue)
((blue)|(red))(?(2) car)?
37 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
Once-Only Subpatterns
With both maximizing (greedy) and minimizing
(non-greedy) repetition, failure of what follows
normally causes the repeated item to be reevaluated
to see if a different number of repeats allows the rest
of the pattern to match. Sometimes it is useful to
prevent this, either to change the nature of the match,
or to cause it to fail earlier than it otherwise might,
when the author of the pattern knows there is no
point in carrying on.
38 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
(?>\d+)bar
abcd$
39 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
^.*abcd$
^(?>.*)(?<=abcd)
(\D+|<\d+>)*[!?]
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
40 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
((?>\D+)|<\d+>)*[!?]
Recursive Patterns
Consider the problem of matching a string in
parentheses, allowing for unlimited nested, balanced
parentheses. Without the use of recursion, the best
that can be done is to use a pattern that matches up
to some fixed depth of nesting. It is not possible to
handle an arbitrary nesting depth. Perl 5.6 has
provided an experimental facility that allows regular
expressions to recurse (among other things). It does
this by interpolating Perl code in the expression at run
time, and the code can refer to the expression itself.
Obviously, BBEdit's grep engine cannot support the
interpolation of Perl code. Instead, the special item
(?R) is provided for the specific case of recursion. The
following recursive pattern solves the parentheses
problem:
\(((?>[^()]+)|(?R))*\)
41 of 42 7/21/08 5:04 PM
BioNanoTech: Grep http://bionanotech.wikidot.com/grep
(aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
Hosted by Wikidot.com — get help | terms of service | privacy | report a bug | flag as objectionable
your free wiki now!
Unless stated otherwise Content of this page is licensed under Creative Commons
Attribution-Share Alike 2.5 License.
42 of 42 7/21/08 5:04 PM