Você está na página 1de 29

Regular expressions, often abbreviated as regex or regexp, describe a pattern or particular sequence of characters and are used to search

for and replace strings. Most characters used in a regex will represent themselves, but there are special characters (known as metacharacters) that take on special meaning in the context of the UNIX utility/tool in which they are used. Since the topic of regular expressions is quite extensive, this brief overview will only focus on two of its frequently used positional or anchor metacharacters, the caret (^) and the dollar sign ($). The caret is used to match at the beginning of a line, and the dollar sign is used to match at the end of a line. Carets will logically be found on the left-hand side of a regex, and dollar signs on the right. To demonstrate the usage of these two positional metacharacters, the same data file used for last week's tip will be used again this week. The only change made was the insertion of 4 blank lines between each line of text. The file unixfile now contains the following data:

unix training learn unix unix class learning unix unix course Using grep, all lines in unixfile that begin with "unix" will be extracted with the help of the caret metacharacter: # grep '^unix' unixfile unix training unix class unix course Removing the caret from the beginning of the regex and adding a dollar sign to the end will cause grep to display lines ending with "unix": # grep 'unix$' unixfile learn unix learning unix

These two metacharacters can also be combined in a single regex to identify/manipulate blank lines. The -c option for grep will be used with a regex containing both the caret and the dollar sign to count the number of blank lines in unixfile:

# grep -c '^$' unixfile 4

1|Page

Eliminating Duplicate Lines - The uniq Command: Sample data file:

$ cat unixfile unix commands shell script command prompt unix commands unix system administration shell script unix commands Since there are only a few number of lines in unixfile, duplicate lines can be quickly identified with a visual inspection and then removed if desired. This manual process is not a viable option for files containing hundreds or thousands of lines. The tool for a job of this magnitude is the uniq (unique) command:

uniq [option] [input-file] [output-file]

uniq can be used to display, count, or delete adjacent duplicate lines from a file or standard input (stdin). If duplicate lines in a file are not adjacent to one another, uniq will not treat them as duplicates:

$ uniq unixfile unix commands shell script command prompt unix commands unix system administration shell script unix commands For this reason, uniq is often combined with the sort command to group duplicate lines prior to performing an action on them: $ sort unixfile | uniq command prompt shell script unix commands unix system administration To compress this combination of commands, the -u option can be used with sort to produce the same results:

$ sort -u unixfile command prompt shell script

2|Page

unix commands unix system administration Two useful options for the uniq command are -d and c.

cat suneel1.dat

123|suneel|ASE 100|NULL |ASE 897|kumar |ASET 100|gunda |AC 110|raja |POK 11 |anil |ASE

cut -c 5-10 suneel1.dat

suneel NULL kumar gunda raja anil

cut -d \| -f 1,3 suneel1.dat

123|ASE 100|ASE 897|ASET 100|AC 110|POK 11 |ASE

cut -d \| -f 2,3 suneel1.dat >

suneeel1.dat

cat suneeel1.dat

suneel|ASE NULL |ASE kumar |ASET gunda |AC raja |POK

3|Page

anil |ASE

cat suneel2.dat

100 |nokia|ASE 101|samam|ASET 102|afdfeg|AC cut -d \| -f1 suneel2.dat | sed s/[^0-9]//g

100 101 102

paste

suneel1.dat suneeel1.dat

123|suneel|ASE suneel|ASE 100|NULL |ASE NULL |ASE 897|kumar |ASET kumar |ASET 100|gunda |AC gunda |AC 110|raja |POK raja |POK 11 |anil |ASE anil |ASE

paste -d \| suneel1.dat suneeel1.dat

123|suneel|ASE|suneel|ASE 100|NULL |ASE|NULL |ASE 897|kumar |ASET|kumar |ASET 100|gunda |AC|gunda |AC 110|raja |POK|raja |POK 11 |anil |ASE|anil |ASE

cat suneeel.dat

suneel kumar suneel.gunda@tcs.com 9830156191 mohan babu mohan.babu@yahoo.com 09848211395 anil kumar agunda@manh.com 09894117191

4|Page

paste -s -d "||\n" suneeel.dat

suneel kumar|suneel.gunda@tcs.com|9830156191 mohan babu|mohan.babu@yahoo.com|09848211395 anil kumar|agunda@manh.com|09894117191

sort suneel1.dat

100|NULL |ASE 100|gunda |AC 110|raja |POK 111|anil |ASE 123|suneel|ASE 897|kumar |ASET

sort -t"|" -k 2 suneel1.dat

111|anil |ASE 100|gunda |AC 897|kumar |ASET 100|null |ASE 110|raja |POK 123|suneel|ASE

sort -t"|" -r -k 2 suneel1.dat / sort -t"|" -k 2r suneel1.dat

123|suneel|ASE 110|raja |POK 100|null |ASE 897|kumar |ASET 100|gunda |AC 111|anil |ASE

Positional Parameters
As we have already seen, you can define values for variables with statements of the form varname=value, e.g.:
$ fred=bob $ print "$fred" bob

5|Page

Some environment variables are predefined by the shell when you log in. There are other built-in variables that are vital to shell programming. We will look at a few of them now and save the others for later. The most important special, built-in variables are called positional parameters. These hold the command-line arguments to scripts when they are invoked. Positional parameters have names 1, 2, 3, etc., meaning that their values are denoted by $1, $2, $3, etc. There is also a positional parameter 0, whose value is the name of the script (i.e., the command typed in to invoke it). Two special variables contain all of the positional parameters (except positional parameter 0): * and @. The difference between them is subtle but important, and it's apparent only when they are within double quotes.
"$*"

is a single string that consists of all of the positional parameters, separated by the first character in the environment variable IFS (internal field separator), which is a space, TAB, and NEWLINE by default. On the other hand, "$@" is equal to "$1" "$2"... "$N", where N is the number of positional parameters. That is, it's equal to N separate doublequoted strings, which are separated by spaces. We'll explore the ramifications of this difference in a little while. The variable # holds the number of positional parameters (as a character string). All of these variables are "read-only," meaning that you can't assign new values to them within scripts. For example, assume that you have the following simple shell script:
print "fred: $@" print "$0: $1 and $2" print "$# arguments"

Assume further that the script is called fred. Then if you type fred bob dave, you will see the following output:
fred: bob dave fred: bob and dave 2 arguments

In this case, $3, $4, etc., are all unset, which means that the shell will substitute the empty (or null) string for them. [4] [4] Unless the option nounset is turned on.

6|Page

Positional parameters in functions


Shell functions use positional parameters and special variables like * and # in exactly the same way as shell scripts do. If you wanted to define fred as a function, you could put the following in your .profile or environment file:
function fred { print "fred: $*" print "$0: $1 and $2" print "$# arguments" }

You will get the same result if you type fred bob dave. Typically, several shell functions are defined within a single shell script. Therefore each function will need to handle its own arguments, which in turn means that each function needs to keep track of positional parameters separately. Sure enough, each function has its own copies of these variables (even though functions don't run in their own subshells, as scripts do); we say that such variables are local to the function. However, other variables defined within functions are not local [5] (they are global), meaning that their values are known throughout the entire shell script. For example, assume that you have a shell script called ascript that contains this:
function afunc { print in function $0: $1 $2 var1="in function" } var1="outside of function" print var1: $var1 print $0: $1 $2 afunc funcarg1 funcarg2 print var1: $var1 print $0: $1 $2

If you invoke this script by typing ascript arg1 arg2, you will see this output:
var1: outside of function ascript: arg1 arg2 in function afunc: funcarg1 funcarg2 var1: in function ascript: arg1 arg2

In other words, the function afunc changes the value of the variable var1 from "outside of function" to "in function," and that change is known outside the function, while $0, $1, and $2 have different values in the function and the main script. Figure 4.2 shows this graphically

7|Page

It is possible to make other variables local to functions by using the typeset command. Now that we have this background, let's take a closer look at "$@" and "$*". These variables are two of the shell's greatest idiosyncracies, so we'll discuss some of the most common sources of confusion.

Why are the elements of "$*" separated by the first character of IFS instead of just spaces? To give you output flexibility. As a simple example, let's say you want to print a list of positional parameters separated by commas. This script would do it:
IFS=, print $*

Changing IFS in a script is fairly risky, but it's probably OK as long as nothing else in the script depends on it. If this script were called arglist, then the command arglist bob dave ed would produce the output bob,dave,ed. Why does "$@" act like N separate double-quoted strings? To allow you to use them again as separate values. For example, say you want to call a function within your script with the same list of positional parameters, like this:
function countargs { print "$# args." }

8|Page

Assume your script is called with the same arguments as arglist above. Then if it contains the command countargs "$*", the function will print 1 args. But if the command is countargs "$@", the function will print 3 args.

Operator
-a file -d file -f file

True if...
file exists file is a directory file is a regular file (i.e., not a directory or other special type of file)

-r file You have read permission on file -s file file exists and is not empty -w file You have write permission on file -x file You have execute permission on file, or directory search permission if it is a directory -O file -G file file1 -nt file2 file1 -ot file2 You own file Your group ID is the same as that of file file1 is newer than file2 file1 is older than file2

Specifically, the -nt and -ot operators compare modification times of two files.

Test Comparison
-lt Less than -le Less than or equal -eq Equal -ge Greater than or equal -gt Greater than -ne Not equal

String Formatting Options The various options to typeset Option


-Ln -Rn -Zn -l -u

Operation
Left-justify. Remove leading blanks; if n is given, fill with blanks or truncate on right to length n. Right-justify. Remove trailing blanks; if n is given, fill with blanks or truncate on left to length n. Same as above, except add leading 0's instead of blanks if needed. Convert letters to lowercase. Convert letters to uppercase.

Examples of typeset String Formatting Options 9|Page

Statement
typeset -L v=$alpha typeset -L10 v=$alpha typeset -R v=$alpha typeset -R16 v=$alpha typeset -l v=$alpha typeset -uR5 v=$alpha typeset -Z8 v="123.50"

Value of v
"aBcDeFgHiJkLmNoPqRsTuVwXyZ " "aBcDeFgHiJ" " aBcDeFgHiJkLmNoPqRsTuVwXyZ" "kLmNoPqRsTuVwXyZ" " abcdefghijklmnopqrstuvwxyz" "VWXYZ" "00123.50"

Sed and Awk:


cat list John Daggett, 341 King Road, Plymouth MA Alice Ford, 22 East Broadway, Richmond VA Orville Thomas, 11345 Oak Bridge Road, Tulsa OK Terry Kalkas, 402 Lans Road, Beaver Falls PA Eric Adams, 20 Post Road, Sudbury MA Hubert Sims, 328A Brook Road, Roanoke VA Amy Wilde, 334 Bayshore Pkwy, Mountain View CA

Sal Carpenter, 73 6th Street, Boston MA


sed 's/MA/Massachusetts/' list

John Daggett, 341 King Road, Plymouth Massachusetts Alice Ford, 22 East Broadway, Richmond VA Orville Thomas, 11345 Oak Bridge Road, Tulsa OK Terry Kalkas, 402 Lans Road, Beaver Falls PA Eric Adams, 20 Post Road, Sudbury Massachusetts Hubert Sims, 328A Brook Road, Roanoke VA Amy Wilde, 334 Bayshore Pkwy, Mountain View CA

Sal Carpenter, 73 6th Street, Boston Massachusetts There are three ways to specify multiple instructions on the command line:

1. Separate instructions with a semicolon.


sed 's/ MA/, Massachusetts/; s/ PA/, Pennsylvania/' list

2. Precede each instruction by -e.


sed -e 's/ MA/, Massachusetts/' -e 's/ PA/, Pennsylvania/' list

10 | P a g e

3. Use the multiline entry capability of the Bourne shell.[1] Press RETURN after entering a single quote and a secondary prompt (>) will be displayed for multiline input.
$ sed ' > s/ MA/, Massachusetts/ > s/ PA/, Pennsylvania/ > s/ CA/, California/' list

$ cat sedscr s/ s/ s/ s/ s/ MA/, PA/, CA/, VA/, OK/, Massachusetts/ Pennsylvania/ California/ Virginia/ Oklahoma/

$ sed -f sedscr list John Daggett, 341 King Road, Plymouth, Massachusetts Alice Ford, 22 East Broadway, Richmond, Virginia Orville Thomas, 11345 Oak Bridge Road, Tulsa, Oklahoma Terry Kalkas, 402 Lans Road, Beaver Falls, Pennsylvania Eric Adams, 20 Post Road, Sudbury, Massachusetts Hubert Sims, 328A Brook Road, Roanoke, Virginia Amy Wilde, 334 Bayshore Pkwy, Mountain View, California Sal Carpenter, 73 6th Street, Boston, Massachusetts

Option -e -f -n

Description Editing instruction follows. Filename of script follows. Suppress automatic output of input lines

Awk, in the usual case, interprets each input line as a record and each word on that line, delimited by spaces or tabs, as a field. (These defaults can be changed.) One or more consecutive spaces or tabs count as a single delimiter. Awk allows you to reference these fields, in either patterns or procedures. $0 represents the entire input line. $1, $2, ... refer to the individual fields on the input line. Awk splits the input record before the script is applied. Let's look at a few examples, using the sample input file list. 11 | P a g e

There AWK built-in variables are:


FS - The

Input Field Separator Variable Output Field Separator Variable of Fields Variable

OFS - The

NF - The Number NR - The RS - The

Number of Records Variable Record Separator Variable Record Separator Variable Filename Variable"

ORS - The

FILENAME - The Current

The last variable known to regular AWK is "FILENAME," which tells you the name of the file being read. $ awk '{ print $1 }' list John Alice Orville Terry Eric Hubert Amy Sal $ awk '$1 ~ /Amy/ {print $0}' list Amy Wilde, 334 Bayshore Pkwy, Mountain View CA This example prints each record in the file list' whose first field contains the string Amy'. The operator `~' is called a matching operator, it tests whether a string (here, the field $1) matches a given regular expression. $ awk '/MA/' list John Daggett, 341 King Road, Plymouth MA Eric Adams, 20 Post Road, Sudbury MA

Sal Carpenter, 73 6th Street, Boston MA


/', surround the string MA' in the actual awk program. The slashes indicate that MA' is a pattern to search for. This type of pattern is called a regular expression. There are single-quotes around the awk program so that the shell won't interpret any of it as special shell characters. In an awk rule, either the pattern or the action can be omitted, but not both. If the pattern is omitted, then the action is performed for every input line. If the action is omitted, the default action is to print all lines that match the pattern.

`
$ awk '/MA/ { print $1 }' list John Eric

12 | P a g e

Sal

$ ls -l | awk '$6 == "Nov" { sum += $5 } END { print sum }' This command prints the total number of bytes in all the _les in the current directory that were last modi_ed in November (of any year). The part outside the curly braces is called the "pattern", and the part inside is the "action". $ awk -F, '/MA/ { print $1 }' list John Daggett Eric Adams Sal Carpenter $ awk -F, '{ print $1; print $2; print $3 }' list John Daggett 341 King Road Plymouth MA Alice Ford 22 East Broadway Richmond VA Orville Thomas 11345 Oak Bridge Road Tulsa OK Terry Kalkas 402 Lans Road Beaver Falls PA Eric Adams 20 Post Road Sudbury MA Hubert Sims 328A Brook Road Roanoke VA Amy Wilde 334 Bayshore Pkwy Mountain View CA Sal Carpenter 73 6th Street Boston MA To compare a field to a string, use the following method: $ awk '$1=="foo"{print $2}' filename $cat suneel1.dat 123|suneel|ASE 100|NULL |ASE 897|kumar |ASET 100|gunda |AC

13 | P a g e

110|raja 11 |anil

|POK |ASE

$ awk -F\| '/suneel/{print $1,$3}' suneel1.dat 123 ASE If you want only those lines where "foo" occurs in the second field, use the ~ ("contains") operator: $ awk -F\| '$2~/suneel/{print $3,$1}' suneel1.dat ASE 123 $ awk -F\| '$2~/suneel/{print $3,$1}' suneel1.dat ASE 100 ASET 897 AC 100 POK 110 ASE 11 $ awk -F\| '{print x+=$1,$0}' suneel1.dat 123 223 1120 1220 1330 1341 123|suneel|ASE 100|NULL |ASE 897|kumar |ASET 100|gunda |AC 110|raja |POK 11 |anil |ASE

$ awk -F\| '/suneel/{for(i=1;i<=NF;i++) print $i }' suneel1.dat 123 suneel ASE The above code prints a running total of the first column:

Option
-f -F

Description
Filename of script follows. Change field separator.

$ cat suneel1.dat 5|123|suneel|ASE 2|100|NULL |ASE 1|897|kumar |ASET 4|100|gunda |AC|2000 6|110|raja |POK 3|11 |anil |ASE

14 | P a g e

$ awk -F\| '{ if ($1 > max) max = $1 arr[$1] = $0 } END { for (x = 1; x <= max; x++) print arr[x] }' suneel1.dat 1|897|kumar |ASET 2|100|NULL |ASE 3|11 |anil |ASE 4|100|gunda |AC|2000 5|123|suneel|ASE 6|110|raja |POK $ cat nameState s/ CA/, California/ s/ MA/, Massachusetts/ s/ OK/, Oklahoma/ s/ PA/, Pennsylvania/ s/ VA/, Virginia/ $ sed -f nameState list | awk -F, '{ print $4 }' Massachusetts Virginia Oklahoma Pennsylvania Massachusetts Virginia California Massachusetts

We want to produce a report that sorts the names by state and lists the name of the state followed by the name of each person residing in that state. The following example shows the byState program.
#! /bin/sh awk -F, '{ print $4 ", " $0 }' $* | sort | awk -F, ' $1 == LastState { print "\t" $2 } $1 != LastState { LastState = $1 print $1 }'

15 | P a g e

$ sed -f nameState list | byState California Amy Wilde Massachusetts Eric Adams John Daggett Sal Carpenter Oklahoma Orville Thomas Pennsylvania Terry Kalkas Virginia Alice Ford Hubert Sims

To examine how the byState program works, let's look at each part separately. $ sed -f nameState list | awk -F, '{ print $4 ", " $0 }' Massachusetts, John Daggett, 341 King Road, Plymouth, Massachusetts Virginia, Alice Ford, 22 East Broadway, Richmond, Virginia Oklahoma, Orville Thomas, 11345 Oak Bridge Road, Tulsa, Oklahoma Pennsylvania, Terry Kalkas, 402 Lans Road, Beaver Falls, Pennsylvania Massachusetts, Eric Adams, 20 Post Road, Sudbury, Massachusetts Virginia, Hubert Sims, 328A Brook Road, Roanoke, Virginia California, Amy Wilde, 334 Bayshore Pkwy, Mountain View, California Massachusetts, Sal Carpenter, 73 6th Street, Boston, Massachusetts The sort program, by default, sorts lines in alphabetical order, looking at characters from left to right. In order to sort records by state, and not names, we insert the state as a sort key at the beginning of the record. Now the sort program can do its work for us. (Notice that using the sort utility saves us from having to write sort routines inside awk.) The second time awk is invoked we perform a programming task. The script looks at the first field of each record (the state) to determine if it is the same as in the previous record. If it is not the same, the name of the state is printed followed by the person's name. If it is the same, then only the person's name is printed. $1 == LastState { print "\t" $2 } $1 != LastState { LastState = $1 print $1 print "\t" $2 }'

Special Characters Usage . Matches any single character except newline. In awk, dot can match newline 16 | P a g e

Special Characters Usage also. * Matches any number (including zero) of the single character (including a character specified by a regular expression) that immediately precedes it. [...] Matches any one of the class of characters enclosed between the brackets. A circumflex (^) as first character inside brackets reverses the match to all characters except newline and those listed in the class. In awk, newline will also match. A hyphen (-) is used to indicate a range of characters. The close bracket (]) as the first character in class is a member of the class. All other metacharacters lose their meaning when specified as members of a class. ^ First character of regular expression, matches the beginning of the line. Matches the beginning of a string in awk, even if the string contains embedded newlines. $ As last character of regular expression, matches the end of the line. Matches the end of a string in awk, even if the string contains embedded newlines. \{n,m\} Matches a range of occurrences of the single character (including a character specified by a regular expression) that immediately precedes it. \{n\} will match exactly n occurrences, \{n,\} will match at least n occurrences, and \ {n,m\} will match any number of occurrences between n and m. (sed and grep only, may not be in some very old versions.) \ Escapes the special character that follows. Extended Metacharacters (egrep and awk) Special Characters Usage + Matches one or more occurrences of the preceding regular expression. ? Matches zero or one occurrences of the preceding regular expression. | Specifies that either the preceding or following regular expression can be matched (alternation). () Groups regular expressions. {n,m} Matches a range of occurrences of the single character (including a character specified by a regular expression) that immediately precedes it. {n} will match exactly n occurrences, {n,} will match at least n occurrences, and {n,m} will match any number of occurrences between n and m. (POSIX egrep and POSIX awk, not in traditional egrep or awk.

One of the first things you'll notice about sed commands is that sed will apply them to every input line. Sed is implicitly global, unlike ed, ex, or vi. s/CA/California/g

If the same command were entered from the ex command prompt in vi, it would make the replacement for all occurrences on the current line only. In sed, it is as though each 17 | P a g e

line has a turn at becoming the current line and so the command is applied to every line. Line addresses are used to supply context for, or restrict, an operation . (In short: Nothing gets done in vi unless you tell it which lines to work on, while sed will work on every line unless you tell it not to.) For instance, by supplying the address "Sebastopol" to the previous substitute command, we can limit the replacement of "CA" by "California" to just lines containing "Sebastopol."
/Sebastopol/s/CA/California/g

How to change | delimiter into Seddula delimiter? cat filename|sed 's/|/'`echo '\0307'`'/g' How to select a column from a file having seddula as delimiter? cat filename|awk -F '\307' '{print $2}' cut -d `echo '\0307'` -f2,3 file name cut -d "`echo '\0307'`" -f2,3 file name

for loop with two parameters in each [list] element $ cat test #!/bin/bash # Planets revisited. # Associate the name of each planet with its distance from the sun. for planet in "Mercury 36" "Venus 67" "Earth 93" "Mars 142" "Jupiter 483" do set $planet # Parses variable "planet" and sets positional parameters. # the "" prevents nasty surprises if $planet is null or begins with a dash. # May need to save original positional parameters, since they get overwritten. # One way of doing this is to use an array, # original_params=("$@") echo "$1 $2,000,000 miles from the sun" #two tabsconcatenate zeroes onto parameter $2 done # (Thanks, S.C., for additional clarification.) exit 0 $ ./test

18 | P a g e

Mercury 36,000,000 miles from the sun Venus 67,000,000 miles from the sun Earth 93,000,000 miles from the sun Mars 142,000,000 miles from the sun Jupiter 483,000,000 miles from the sun How to send an email with attachment and subject? echo "The script has started" if [[ -s example.dat ]] then MESSAGE="The UPC descriptions for the UPCs present in the attachment are missing in the Order_UPC table." (echo $MESSAGE; echo $DISCLAIMER; uuencode example.dat example.dat) | mail -s "No UPC description in the Order_UPC" suneel.gunda.consultant@acnielsen.com if [[ $? != 0 ]] then echo "ERROR : Mail could not be send" else echo "Mail Successfully sent to Suneel Gunda" fi else fi echo "Mail not sent as file is empty"

Sending an attachment in excel format. filename=subh.csv sender=nn.kumar@bt.com recipient=nn.kumar@tcs.com cat <<! | /usr/lib/sendmail -t -n MIME-Version: 1.0 From:${sender} To: ${recipient} Subject: ${subject} Content-Type: multipart/mixed; boundary="_boundarystring"

This is a multi-part message in MIME format. --_boundarystring Content-Transfer-Encoding: Base64

19 | P a g e

Content-Type: application/vnd.ms-excel Content-Disposition: attachment; filename="${filename}"

--_boundarystring-! if [ $? -eq 0 ] s then echo "mail sent success" else echo "mail not sent" fi SCPing a folder: To copy a folder from prod to dev Log into dev scp -r an02@lnx03:/ab/data_prod/file/AJF . scp file_name an02@lnx06:/home/an02/devel/sandbox/bin To use wild char while copying scp -r an02@lnx03:/ab/data_prod/file_in/"*"AJF"*" . scp an02@lnx03:/ab/data_prod/file/product_reference.dat $FILE/

The following primaries can be used to construct condition: -a file True if file exists. (Not available in sh.) -b file True if file exists and is a block special file. -c file True if file exists and is a character special file. -d file True if file exists and is a directory. -e file True if file exists. (Not available in sh.) -f file True if file exists and is a regular file. Alter- natively, if /usr/bin/sh users specify /usr/ucb before /usr/bin in their PATH environment vari- able, then test will return true if file exists and is (not-a-directory). The csh test and [ built-ins always use this alternative behavior. -g file True if file exists and its set group ID flag is set. 20 | P a g e

-G file True if file exists and its group matches the effective group ID of this process. (Not avail- able in sh.) -h file True if file exists and is a symbolic link. -k file True if file exists and has its sticky bit set. -L file True if file exists and is a symbolic link. -n string True if the length of string is non-zero. -o option True if option named option is on. (Not available in csh or sh.) -O file True if file exists and is owned by the effective user ID of this process. (Not available in sh.) -p file True if file is a named pipe (FIFO). -r file True if file exists and is readable. -s file True if file exists and has a size greater than zero. -S file True if file exists and is a socket. (Not available in sh.) -t [file_descriptor] True if the file whose file descriptor number is file_descriptor is open and is associated with a terminal. If file_descriptor is not specified, 1 is used as a default value. -u file True if file exists and its set-user-ID flag is set. -w file True if file exists and is writable. True will indicate only that the write flag is on. The file will not be writable on a read-only file system even if this test indicates true. -x file True if file exists and is executable. True will indicate only that the execute flag is on. If file is a directory, true indicates that file can be searched. -z string True if the length of string string is zero. file1 -nt file2 True if file1 exists and is newer than file2. (Not available in sh.) file1 -ot file2 True if file1 exists and is older than file2. (Not available in sh.) file1 -ef file2 True if file1 and file2 exist and refer to the same file. (Not available in sh.) string True if the string string is not the null string. string1 = string2 True if the strings string1 and string2 are identical. string1 != string2 True if the strings string1 and string2 are not identical. n1 -eq n2 True if the integers n1 and n2 are algebraically equal. n1 -ne n2 True if the integers n1 and n2 are not algebrai- cally equal. n1 -gt n2 True if the integer n1 is algebraically greater than the integer n2. n1 -ge n2 True if the integer n1 is algebraically greater than or equal to the integer n2. n1 -lt n2 True if the integer n1 is algebraically less than the integer n2. n1 -le n2 True if the integer n1 is algebraically less than or equal to the integer n2.

21 | P a g e

condition1 -a condition2 True if both condition1 and condition2 are true. The -a binary primary is left associative and has higher precedence than the -o binary primary. condition1 -o condition2 True if either condition1 or condition2 is true. The -o binary primary is left associative.
To check the space available in the UNIX box: df k To check the space occupied by a file: du h Untarring files tar -xvvf TEST1BFO.tar tar -xzvf

Tarring files
tar cvf file.tar inputfile1 inputfile2

Search for file with a specific name in a set of files (-name)


find . -name "rc.conf" -print

This command will search in the current directory and all sub directories for a file named rc.conf. Note: The -print option will print out the path of any file that is found with that name. In general -print will print out the path of any file that meets the find criteria How to search for a directory?
find -type d -mtime +10

type will define the type of the file f for normal file and l for link. It is difficult to find directories that have not been accessed because the find command modifies the directory's access time. How to apply a unix command to a set of file (-exec).
find . -name "rc.conf" -exec chmod o+r '{}' \;

This command will search in the current directory and all sub directories. All files named rc.conf will be processed by the chmod -o+r command. The argument '{}' inserts each found file into the chmod command line. The \; argument indicates the exec command line has ended. 22 | P a g e

The end results of this command is all rc.conf files have the other permissions set to read access (if the operator is the owner of the file). How to apply a complex selection of files (-o and -a).
find /usr/src -not \( -name "*,v" -o -name ".*,v" \) '{}' \; -print

This command will search in the /usr/src directory and all sub directories. All files that are of the form '*,v' and '.*,v' are excluded. Important arguments to note are: -not means the negation of the expression that follows \( means the start of a complex expression. \) means the end of a complex expression. -o means a logical or of a complex expression. In this case the complex expression is all files like '*,v' or '.*,v' The above example is shows how to select all file that are not part of the RCS system. This is important when you want go through a source tree and modify all the source files... but ... you don't want to affect the RCS version control files. How to search for a string in a selection of files (-exec grep ...).
find . -exec grep "www.athabasca" '{}' \; -print

This command will search in the current directory and all sub directories. All files that contain the string will have their path printed to standard output. If you want to just find each file then pass it on for processing use the -q grep option. This finds the first occurrance of the search string. It then signals success to find and find continues searching for more files.
find . -exec grep -q "www.athabasca" '{}' \; -print

This command is very important for process a series of files that contain a specific string. You can then process each file appropriately. An example is find all html files with the string "www.athabascau.ca". You can then process the files with a sed script to change those occurrances of "www.athabascau.ca" with "intra.athabascau.ca". Timestamps for a file mtime, ctime, and atime Unix keeps 3 timestamps for each file: mtime, ctime, and atime. Most people seem to understand atime (access time), it is when the file was last read. There does seem to be some confusion between mtime and ctime though. ctime is the inode change time while mtime is the file modification time. "Change" and "modification" are pretty much synonymous. There is no clue to be had by pondering those words. Instead you need to focus on what is being changed. mtime changes when you write to the file. It is the age of the data in the file. Whenever mtime changes, so does ctime. But ctime changes a few extra times. For example, it will change if you change the owner or the permissions on 23 | P a g e

the file. Let's look at a concrete example. We run a package called Samba that lets PC's access files. To change the Samba configuration, I just edit a file called smb.conf. (This changes mtime and ctime.) I don't need to take any other action to tell Samba that I changed that file. Every now and then Samba looks at the mtime on the file. If the mtime has changed, Samba rereads the file. Later that night our backup system runs. It uses ctime, which also changed so it backs up the file. But let's say that a couple of days later I notice that the permissions on smb.conf are 666. That's not good. anyone can edit the file. So I do a "chmod 644 smb.conf". This changes only ctime. Samba will not reread the file. But later that night, our backup program notices that ctime has changes, so it backs up the file. That way, if we lose the system and need to reload our backups, we get the new improved permission setting. Here is a second example. Let's say that you have a data file called employees.txt which is a list of employees. And you have a program to print it out. The program not only prints the data, but it obtains the mtime and prints that too. Now someone has requested an employee list from the end of the year 2000 and you found a backup tape that has that file. Many restore programs will restore the mtime as well. When you run that program it will print an mtime from the end of the year 2000. But the ctime is today. So again, our backup program will see the file as needing to be backed up. Suppose your restore program did not restore the mtime. You don't want your program to print today's date. Well no problem. mtime is under your control. You can set it to what ever you want. So just do: $ touch -t 200012311800 employees.txt This will set mtime back to the date you want and it sets ctime to now. You have complete control over mtime, but the system stays in control of ctime. So mtime is a little bit like the date on a letter while ctime is like the postmark on the envelope. Matching Patterns If the pattern matches the beginning of the variable's value, delete the shortest part that matches and return the rest. If the pattern matches the beginning of the variable's value, ${variable##pattern} delete the longest part that matches and return the rest. If the pattern matches the end of the variable's value, delete the ${{variable%pattern} shortest part that matches and return the rest. If the pattern matches the end of the variable's value, delete the ${variable%%pattern} longest part that matches and return the rest.
${variable#pattern}

Expression
${path##/*/} ${path#/*/}

Result
long.file.name billr/mem/long.file.name

24 | P a g e

$path ${path%.*} ${path%%.*}

/home/billr/mem/long.file.name /home/billr/mem/long.file /home/billr/mem/loang

SFTP (Secured File Transfer Protocol)


sftp -b /dev/stdin $FILE_SERVER <<EOF >> ${LOG_DIR}/archive_tar_AIS_$ {DATE}.log cd $ARCHIVE_DIRECTORY_AI_to_FS put $archive_file_name bye EOF

Interpretation of $ from a file in another script


cat z.dat

directory=$AI_SERIAL_ERROR Next script while read line do

temp_line=${line%%=*} temp_dir=${line##*=} x=`echo echo $temp_dir` y=`eval $x` if [ "$temp_line" == "directory" ]; then echo "Y is $y." if [ -d $y ] then echo "Yes" else echo "No" fi fi

done < z.dat kill "$@" $(jobs -p)

This command will kill the processes one after another. Job p will list down all the processes.
export -p

will show all the variables that are exported. be used for exporting a function

export -f can ls m

will list the files with comma as the separator.

25 | P a g e

chmod - change permission for a file (or directory) Prototype: chmod [ugo][+|-][rwx] <file> Examples: 'chmod u+w my_file' 'chmod ug+rwx my_directory' grep - search a file for lines matching a string Prototype: grep [options] <string> <file> Common Options: -v ignore lines that match (i.e., the opposite of normal functionality) -c return a count of the number of lines that matched -n prefix each line of output with the line number Examples: 'grep foo my_file' 'grep -c foo *.pl' 'grep -v foo my_file' du - show disk usage Prototype: du [options] <file_list> du [options] <directory_list> Common Options: -s show a summary -k show size in kilobytes -H show size in "human friendly" format Examples: 'du -s /home/tscheetz' 'du -sk /home/tscheetz' 'du -sk *' df - report free disk space Prototype: df [options] 26 | P a g e

Common Options: -h print sizes in human readable format (KB, MB, GB) -H print sizes in human readable format (powers of 1000) -k print sizes in kilobytes Examples: 'df' 'df -h' Conditional DML record string('|') type; if(type == 'header') record decimal('|') file_name; decimal('\n') no_of_reocrds; end; else if(type='body') record string('|') name; string('|') address; string('\n') salary; else if(type='trailer') record string('\n') record_count; end; end; header|x.dat|200 body|suneel|chennai|1000 body|mansur|hyderabad|9000 trailer|200 m_env v coop sys version force_error out.name :: if(in.name == 'suneel') force_error("Junk data") else in.name;

PI and utility are two possible interfaces to databases from the Ab Initio software and their uses can differ depending on the database in question. 27 | P a g e

Details Enterprise-level database software often provides more than one interface to its data. For example, it usually provides an API (application programming interface) that allows a software developer to use database vendor-provided functions to talk directly to the program. In addition, the vendor usually provides small programs, or utilities, that allow the user to accomplish a specific task or range of tasks. For example, the vendor might provide a utility to load data into a table or extract table data, or provide a command-line interface to the database engine. The exact functionality of the utility or API varies by database vendor; for that reason, specific details are not provided here. API and utility modes both have advantages and disadvantages: API mode Provides flexibility: generally, the vendor opens up a range of functions for the programmer to use; this permits a wide variety of tasks to be performed against the database. However, the tradeoff is performance; this is often a slower process than using a utility. As an Ab Initio user, you might use API mode when you want to use a function that is not available through a utility. In some instances, a component will only run in API mode for just this reason the function inherent in the component is not available through that vendor's published utilities. In general, however, it is useful to remember that API mode executes SQL statements. Utility mode Makes direct use of the vendor's utilities to access the database. These programs are generally tuned by the vendor for optimum performance. The tradeoff here is functionality. For example, you might not be able to set up a commit table. In such an instance, you must trust the ability of the utility to do its job correctly. Because the granular control given by API mode is not present in utility mode, utility mode is best when your purpose most closely resembles the purpose for which the utility was created. For example, any support of transactionality and record locking is subject to the abilities of the utility in question. Also, unlike API mode, utility mode does not normally run SQL statements. If you cannot accomplish what you want using utility mode, you might need t suneel gunda: o use API mode.

A layout is one of the following: A URL that specifies the location of a serial file A URL that specifies the location of the control partition of a multifile A list of URLs that specify the locations of: The partitions of an ad hoc multifile 28 | P a g e

The working directories of a program component When an all-to-all flow connects components with layouts containing a large numbers of partitions, the Co>Operating System uses many networking resources. If the number of partitions in the source and destination components is N, an all-to-all flow uses resources proportional to N to the 2nd power. To save network resources, you can mark an all-to-all flow as using two-stage routing. With two-stage routing, the all-to-all flow uses only resources. For example, an all-to-all flow with 25 partitions uses 25*25 = 625 resources, but with two-stage routing uses only 2*25*5 = 250 resources. select emp_id, count cnt from emp group by emp_id having cnt > 1;

29 | P a g e

Você também pode gostar