Escolar Documentos
Profissional Documentos
Cultura Documentos
Awk works as " search for pattern in the records of the file and if found perform actions on the line"
Awk program consists of Rules- A Rule consists of two things- Pattern and Action.
Pattern to search for and action to perform when pattern is found.
###########################################
Example 1:
Search for the pattern 81491 and print the records having this text.
bos90631:sm017r awk ' /81491/ {print $0} ' test1
sukul 8149158828 mumbai 100/900/200
uma 8149122222 chennai 100/800/300
Note that entire program is enclosed inside single quotes.
Pattern / / is used for regular expresions.
print $0 and print both are the same. Both print the entire record.
Here print is the action and the action is enclosed within {}
###########################################
Example 2:
Pattern is what helps us select the rows and action indicates
what to do with the rows selected. Actions are always written inside {}.
Either the pattern or action can be omitted but not both.
If pattern is omitted then action is performed on every line.
If action is omitted default action is to print the entire record.
# skipping pattern
bos90631:sm017r awk ' {print }' test1
sukul 8149158828 mumbai 100/900/200
uma 8149122222 chennai 100/800/300
bhanu 8097123451 Jhansi 200/1000/500
shushant 7798977047 nepal 200/9000/100
himanshu 9090909090 bokharo 100/800/300
# skipping action
bos90631:sm017r awk '/81491/ ' test1
sukul 8149158828 mumbai 100/900/200
uma 8149122222 chennai 100/800/300
###########################################
Example 3: Empty action field
Empty action field is different than no action.
Not writing {} and not writing anything in {} are two different things.
Not writing anything within {} will print nothing.
There will be no output.
bos90631:sm017r awk '/81491588/ {} ' test1
[no output]
###########################################
Example 4: Multiple rules in single awk program.
We can have multiple rules in a awk program.
Multiple rules can be coded one after another.
If a record satisfies multiple rules then multiple actions will be carried
out on that record.
Example:
awk '/sh/ {print $0}
/nepal/ {print $0}' test1
In above example /sh/ is the 1st pattern and /nepal/ is the 2ns pattern.
Below shows that one record matches both the patterns "sh" and "nepal"
and hence is printed twice(2 actions)
bos90631:sm017r awk '/sh/ {print $0}
/nepal/ {print $0}' test1>
shushant 7798977047 nepal 200/9000/100
shushant 7798977047 nepal 200/9000/100
himanshu 9090909090 bokharo 100/800/300
##########################################
Example 6: awk is case senstive.
Note that awk is case sensitive.
bos90631:sm017r awk '/SUKUL/ ' test1
[No Output]
##########################################
We can set the field separator on the command line using -F option
awk -F/ '{ print $2 }' test1
900
800
1000
9000
800
Example 19: getline command
awk reads input file one record at a time implicitly.
We can also read the record explicitly by making use of getline command.(with no arguments)
The command getline returns a numeric indicating if it was successful or not:
1) 1 if record is found
2) 0 if end of file is encountered.
3) -1 error if file cannot be opened.
When we execute getline without arguments the next record is read in to $0.
The original record that was already in $0 will be overridden.
So we should use getline oney after we are done working with
current record because it gets flushed the moment we read next record using getline.
The value of NF,NR and FNR and $0 are also set as per new record.
Below is an example of file having carriage return and code using getline to fix it.
[test file]
test3:
sukul 8149158828
mumbai 100/900/200
uma 8149122222 chennai
100/800/300
bhanu
8097123451 Jhansi 200/1000/500
shushant 7798977047 nepal 200/9000/100
himanshu 9090909090 bokharo 100/800/300
fixing carriage return:
awk '{record=$0;noofflds=split(record,a)
while (noofflds!=4 && (getline > 0))
{
record=record $0
noofflds=split(record,a)
}
print record
}' test3
bos90631:sm017r awk '{record=$0;noofflds=split(record,a)
while (noofflds!=4 && (getline > 0))
{
record=record $0
noofflds=split(record,a)
}
print record
uma,8149122222,chennai,100/800/300
bhanu,8097123451,Jhansi,200/1000/500
shushant,7798977047,nepal,200/9000/100
himanshu,9090909090,bokharo,100/800/300
Example 25: Built in Variable OUTPUT RECORD SEPARATOR
ORS: outtput record separator.
Each print statement prints the ORS after each string is printed.
Default value is \n and hence we mve in to the next record after every print stetement.
We can set this value in BEGIN pattern so as to apply to all records
pht022e2:/home/nemo_dev/sm017r> awk 'BEGIN{ ORS="OOOO"} ;{print $0}' test1
sukul 8149158828 mumbai 100/900/200OOOOuma 8149122222 chennai 100/800/300OOOObhanu
8097123451 Jhansi 200/1000/500OOOOshushant 7798977047 nepal 200/9000/100OOOOhimanshu
9090909090 bokharo 100/800/300OOOO
pht022e2:/home/nemo_dev/sm017r> awk 'BEGIN{ ORS="\n\n"} ;{print $0}' test1
sukul 8149158828 mumbai 100/900/200
uma 8149122222 chennai 100/800/300
bhanu 8097123451 Jhansi 200/1000/500
shushant 7798977047 nepal 200/9000/100
himanshu 9090909090 bokharo 100/800/300
Note that If the ORS does not contain a new line characters then all output will run on the same line.
Example 26: Built in variable Output Format (OFMT)
When we print numbers awk converts them to string before printing.
It makes use of the function sprintf.
sprintf makes use of variable OFMT which indicates how to print numbers.
Default value of OFMT is %.6g
awk 'BEGIN {OFMT="%d"};{ print 17.35}' test1
This should print 17
(Does not work on my installation.)
The output separators OFS and ORS do not have an effect on printf.
A format specifier starts with % and ends with a format control character.
Following are the format control characters:
c--> ascii character.
d--> decimal integer.
i--> decimal intteger
e--> exponential notation.
f--> floating notation.
g--> prints numbr in either scientific notation or floating point notation,whichever uses characters.
o--> usigned octal integer
s--> string
x--> unsignd hexadecimal integer
%% --> % used to print a %
Modifiers for printf statments:
modifiers spcified between % and the format control letters.
1) -(minus): minus sign before the width modifier specifies left justification
2) 'width': specifies the desired width of output field.
The value is the minimum width and not the maximum width.
If the item is more in width it can be as wide as necessary.
3) .prec : specifies the precision to be used.
NUmber of digits to be printed after the decimal point.
awk 'BEGIN { printf "%-20s | %-20s | %-20s | %30s \n","NAME","PHONENO","CITY","SCORE"}
{ printf "%-20s | %-20s | %-20s | %30s \n",$1,$2,$3,$4 }' test1
pht022e2:/home/nemo_dev/sm017r> awk 'BEGIN { printf "%-20s | %-20s | %-20s | %30s
\n","NAME","PHONENO","CITY","SCORE"}
{ printf "%-20s | %-20s | %-20s | %30s \n",$1,$2,$3,$4 }' test1>
NAME
| PHONENO
| CITY
|
SCORE
sukul
| 8149158828
| mumbai
|
100/900/200
uma
| 8149122222
| chennai
|
100/800/300
bhanu
| 8097123451
| Jhansi
|
200/1000/500
shushant
| 7798977047
| nepal
|
200/9000/100
himanshu
| 9090909090
| bokharo
|
100/800/300
Example 28: Sending print/printf output to files
Normally print and printf both send output to the standard output which is screen.
But we can make the output go to other places like another file or another command.
To print output to the file we use print > "filename".
Note that filename should be inside double quotes.
awk ' {print $1 >"namefile"}
{print $2 > "phonefile"} ' test1
This shows that wwe can can create multiple files over single read of the input file.
pht022e2:/home/nemo_dev/sm017r> more namefile
sukul
uma
bhanu
shushant
himanshu
pht022e2:/home/nemo_dev/sm017r> more phonefile
8149158828
8149122222
8097123451
7798977047
9090909090
We can also use >> to append to existing file.
Example 29: Sending the print output to a command as input
This means giving print output as input to another command.
The command should be inside double quotes.
awk '{ print $1 > "namefile" ;print $1 | "sort -r > namefilesrt"}' test1
bos90631:sm017r more namefile
sukul
uma
bhanu
shushant
himanshu
bos90631:sm017r more namefilesrt
uma
sukul
shushant
himanshu
bhanu
Though we did not close the output file and command it is recommended
to close the command and file using the close function.
The file or the pipe stays open till we actually close the file/command
or awk exits.
close(filename)
close(command)
The command used to open the pipe should be used exactly to close it.
Reasons to close the command/file:
1) To start reading the file again from top in same awk program the file should be closed and then
reopened.
2) To prevent exceeding number of files open at any given time.
3) If we are pipe data to a command, it buffers the data and only after we close it the command will
execute
(or at the end of awk)
awk ' { report="mailx sukulm@gmail.com";
print "Awk script worked" | report
print " Please check logs" | report
close(report) }' test1
Only when we close the report the maail command runs.
Element value: 3
Element no: 1
Element value: 1
Note that element is 2 it continues to the next cycle and ignores the rest of the code.
Example 39: next statement.
The next statement causes awk to immeditely stop the processing of current record and go on to the
next record.
Rest of the code after next is not executed for the current record.
It starts entire processing for the next record.
NEXT is different from getline because of the fact that the next statement abandon the remaining code
processing and starts from beginning for the new record.
But getline continues the remaining processing using newly fetchhed record.
awk ' {if ($1=="bhanu")
next
print $1}' test1
sukul
uma
shushant
himanshu
Note that when name was bhanu processing started with next record and thus
print statement was not executed for bhanu.
Example 40: exit statement
awk immediately exit the program and ignores remaining records.
awk ' {if ($1=="bhanu")
exit
print $1}' test1
sukul
uma
As soon as text "bhanu" is found the program exits.
Example 41: Arrays
Arrays in awk are associative.
Each of the awk elements are identified by their indices.
Awk arrays are different from arrays in other languages:
1) no need to specify the size of the arrays before using them
2) any number or string can be an index.
array1["CAT"]="meoww"
array2["DOG"]="barks"
Above array is valid even when we dont have numeric indices.
Also we can add elememts at any position.
a[1]="Sukul"
a[2]="uma"
a[20]="shushant"
Note that we can add element at 20th position irrespective whether we have added elements 3,4,5...
Notice the below 2 for loops and understand why for(i in array) is used with awk arrays.
awk '{ a[1]="sukul";a[2]="uma";a[5]="bhanu";
for (i=1;i<=5;i++)
{ print a[i]
}
}' testx
sukul
uma
bhanu
Note that since we had not assigned values to a[3] and a[4] above for loop printed blanks for them.
Ideally we should not printed anything because they dont exist.
Thus the above for loop is not inteligent enough to understand whether
the element exists or not.
Instead below for loop makes more sense
awk '{ a[1]="sukul";a[2]="uma";a[5]="bhanu";
for (i in a )
{ print a[i]
}
}' testx
uma
bhanu
sukul
Note that this for loop understand existence or non-existence of an
array element and prints them accordingly.
This is the reason why we use for( i in array) syntax when working with arrays in awk.
Example 42: numeric built in functions
awk ' {
print int(17.23) #gives integer part
print sqrt(900) #gives square root
print exp(2)
# exponential
print log(10)
# natural log
print sin(30)
# sine. (x in radians)
print cos(30)
# cosine. (x in radians)
} ' testx
17
30
7.38906
2.30259
30
30
Example 43: String built in function- index
index(string1,string2) : searches string1 for 1st occurenence of string2 and returns the position of
beginning of string2.
If not found it returns zero
Below shows the position of 1st "u" in the data file
awk '{ print index($0,"u")}' test1
2
1
5
3
8
Example 44: String built in function- length
Returns the length of the string input
#prints the lengths of names
awk '{ print length($1)}' test1
5
3
5
8
8
Example 45: String built in function- match
match(string,regexp): searches for regexp in the string
and returns the position where the substring begins and
if no match found returns 0.
It also sets two built in variables
1) RSTART: sets the value of index where the substring begins
2) RLENGTH: length of the characters of matached string
note: did not work on my installation.
Example 46: String built in function- split
split(string,arrayname,separator)
awk splits the string 'string' into array 'arrayname' based on the separator we provide.
Split returns the number of array elements th split created.
If we skip separator, FS value is used.
awk '{ numberofelements=split($0,array1,"u")
print "Record no:" NR
print "Number of array elements created:" numberofelements
print array1[1],"|",array1[2],"|",array1[3]}' test1
Record no:1
Number of array elements created:4
s | k | l 8149158828 m
Record no:2
Number of array elements created:2
| ma 8149122222 chennai 100/800/300 |
Record no:3
Number of array elements created:2
bhan | 8097123451 Jhansi 200/1000/500 |
Record no:4
Number of array elements created:2
substr(string,start,length)
pht022e2:/home/nemo_dev/sm017r> awk '{ s1=substr($0,5,10);print s1}' test1
l 81491588
8149122222
u 80971234
hant 77989
nshu 90909
Example 50: String built in function-toupper, tolower
Used to convert case from upper to lower OR lower to upper case.
pht022e2:/home/nemo_dev/sm017r> awk '{ record=toupper($0);print record}' test1
SUKUL 8149158828 MUMBAI 100/900/200
UMA 8149122222 CHENNAI 100/800/300
BHANU 8097123451 JHANSI 200/1000/500
SHUSHANT 7798977047 NEPAL 200/9000/100
HIMANSHU 9090909090 BOKHARO 100/800/300
Example 51: system builtin function- system
Used to execute any system command from awk itself.
The system command is run and control comes back to awk.
pht022e2:/home/nemo_dev/sm017r> awk '{ record=toupper($0);print record}
END { system("ls -lrt test*")}' test1>
SUKUL 8149158828 MUMBAI 100/900/200
UMA 8149122222 CHENNAI 100/800/300
BHANU 8097123451 JHANSI 200/1000/500
SHUSHANT 7798977047 NEPAL 200/9000/100
HIMANSHU 9090909090 BOKHARO 100/800/300
-rw-r----- 1 sm017r
nemo_dev
187 Aug 8 05:31 test1
note the last line of the output. It contains the result of ls -lrt test* that was run
from within awk.
Example 52 : understanding ARGV and ARGC.
The command line arguments that we pass to awk program are stored in an array called ARGV.
ARGC: This contains the number of command line arguments.
The ARGV is indexed from 0 to ARGC-1
awk '{print ARGC;
print ARGV[0]
print ARGV[1]}' test1
this prints all the 3 for each line in the input file.
Note that ARGV[1] is the name of the input file .
2
awk
test1
2
awk
test1
2
awk
test1
2
awk
test1
2
awk
test1
Example 52: Built variables ENVIRON and FILENAME
awk also has a array ENVIRON which contains the values of the environment variables.
The index for this array is the name of the variable.
FILENAME variable gives the name of the input file.
If the data is read from standard input the value is set to "-".
awk '{print ENVIRON["HOME"], ENVIRON["SHELL"], FILENAME }' test1
/home/nemo_dev/sm017r /usr/bin/ksh test1
/home/nemo_dev/sm017r /usr/bin/ksh test1
/home/nemo_dev/sm017r /usr/bin/ksh test1
/home/nemo_dev/sm017r /usr/bin/ksh test1
/home/nemo_dev/sm017r /usr/bin/ksh test1
we can see that ENVIRON["HOME"] prints the value of the HOME
environment variable and same also applies to ENVIRON["SHELL"].