Thursday, January 26, 2017

AWK - Built-in Variables

AWK provides several built-in variables. They play an important role while writing AWK scripts. This chapter demonstrates the usage of built-in variables.

Standard AWK variables

The standard AWK variables are discussed below.

ARGC

It implies the number of arguments provided at the command line.
Example
[jerry]$ awk 'BEGIN {print "Arguments =", ARGC}' One Two Three Four
On executing this code, you get the following result −
Output
Arguments = 5
But why AWK shows 5 when you passed only 4 arguments? Just check the following example to clear your doubt.

ARGV

It is an array that stores the command-line arguments. The array's valid index ranges from 0 to ARGC-1.
Example
[jerry]$ awk 'BEGIN { 
   for (i = 0; i < ARGC - 1; ++i) { 
      printf "ARGV[%d] = %s\n", i, ARGV[i] 
   } 
}' one two three four
On executing this code, you get the following result −
Output
ARGV[0] = awk
ARGV[1] = one
ARGV[2] = two
ARGV[3] = three

CONVFMT

It represents the conversion format for numbers. Its default value is %.6g.
Example
[jerry]$ awk 'BEGIN { print "Conversion Format =", CONVFMT }'
On executing this code, you get the following result −
Output
Conversion Format = %.6g

ENVIRON

It is an associative array of environment variables.
Example
[jerry]$ awk 'BEGIN { print ENVIRON["USER"] }'
On executing this code, you get the following result −
Output
jerry
To find names of other environment variables, use env command.

FILENAME

It represents the current file name.
Example
[jerry]$ awk 'END {print FILENAME}' marks.txt
On executing this code, you get the following result −
Output
marks.txt
Please note that FILENAME is undefined in the BEGIN block.

FS

It represents the (input) field separator and its default value is space. You can also change this by using -F command line option.
Example
[jerry]$ awk 'BEGIN {print "FS = " FS}' | cat -vte
On executing this code, you get the following result −
Output
FS =  $

NF

It represents the number of fields in the current record. For instance, the following example prints only those lines that contain more than two fields.
Example
[jerry]$ echo -e "One Two\nOne Two Three\nOne Two Three Four" | awk 'NF > 2'
On executing this code, you get the following result −
Output
One Two Three
One Two Three Four

NR

It represents the number of the current record. For instance, the following example prints the record if the current record contains less than three fields.
Example
[jerry]$ echo -e "One Two\nOne Two Three\nOne Two Three Four" | awk 'NR < 3'
On executing this code, you get the following result −
Output
One Two
One Two Three

FNR

It is similar to NR, but relative to the current file. It is useful when AWK is operating on multiple files. Value of FNR resets with new file.

OFMT

It represents the output format number and its default value is %.6g.
Example
[jerry]$ awk 'BEGIN {print "OFMT = " OFMT}'
On executing this code, you get the following result −
Output
OFMT = %.6g

OFS

It represents the output field separator and its default value is space.
Example
[jerry]$ awk 'BEGIN {print "OFS = " OFS}' | cat -vte
On executing this code, you get the following result −
Output
OFS =  $

ORS

It represents the output record separator and its default value is newline.
Example
[jerry]$ awk 'BEGIN {print "ORS = " ORS}' | cat -vte
On executing the above code, you get the following result −
Output
ORS = $
$

RLENGTH

It represents the length of the string matched by match function. AWK's match function searches for a given string in the input-string.
Example
[jerry]$ awk 'BEGIN { if (match("One Two Three", "re")) { print RLENGTH } }'
On executing this code, you get the following result −
Output
2

RS

It represents (input) record separator and its default value is newline.
Example
[jerry]$ awk 'BEGIN {print "RS = " RS}' | cat -vte
On executing this code, you get the following result −
Output
RS = $
$

RSTART

It represents the first position in the string matched by match function.
Example
[jerry]$ awk 'BEGIN { if (match("One Two Three", "Thre")) { print RSTART } }'
On executing this code, you get the following result −
Output
9

SUBSEP

It represents the separator character for array subscripts and its default value is \034.
Example
[jerry]$ awk 'BEGIN { print "SUBSEP = " SUBSEP }' | cat -vte
On executing this code, you get the following result −
Output
SUBSEP = ^\$

$0

It represents the entire input record.
Example
[jerry]$ awk '{print $0}' marks.txt
On executing this code, you get the following result −
Output
1) Amit     Physics   80
2) Rahul    Maths     90
3) Shyam    Biology   87
4) Kedar    English   85
5) Hari     History   89

$n

It represents the nth field in the current record where the fields are separated by FS.
Example
[jerry]$ awk '{print $3 "\t" $4}' marks.txt
On executing this code, you get the following result −
Output
Physics   80
Maths     90
Biology   87
English   85
History   89

GNU AWK Specific Variables

GNU AWK specific variables are as follows −

ARGIND

It represents the index in ARGV of the current file being processed.
Example
[jerry]$ awk '{ 
   print "ARGIND   = ", ARGIND; print "Filename = ", ARGV[ARGIND] 
}' junk1 junk2 junk3
On executing this code, you get the following result −
Output
ARGIND   =  1
Filename =  junk1
ARGIND   =  2
Filename =  junk2
ARGIND   =  3
Filename =  junk3

BINMODE

It is used to specify binary mode for all file I/O on non-POSIX systems. Numeric values of 1, 2, or 3 specify that input files, output files, or all files, respectively, should use binary I/O. String values of r or w specify that input files or output files, respectively, should use binary I/O. String values of rw or wr specify that all files should use binary I/O.

ERRNO

A string indicates an error when a redirection fails for getline or if close call fails.
Example
[jerry]$ awk 'BEGIN { ret = getline < "junk.txt"; if (ret == -1) print "Error:", ERRNO }'
On executing this code, you get the following result −
Output
Error: No such file or directory

FIELDWIDTHS

A space separated list of field widths variable is set, GAWK parses the input into fields of fixed width, instead of using the value of the FS variable as the field separator.

IGNORECASE

When this variable is set, GAWK becomes case-insensitive. The following example demonstrates this −
Example
[jerry]$ awk 'BEGIN{IGNORECASE = 1} /amit/' marks.txt
On executing this code, you get the following result −
Output
1) Amit  Physics   80

LINT

It provides dynamic control of the --lint option from the GAWK program. When this variable is set, GAWK prints lint warnings. When assigned the string value fatal, lint warnings become fatal errors, exactly like --lint=fatal.
Example
[jerry]$ awk 'BEGIN {LINT = 1; a}'
On executing this code, you get the following result −
Output
awk: cmd. line:1: warning: reference to uninitialized variable `a'
awk: cmd. line:1: warning: statement has no effect

PROCINFO

This is an associative array containing information about the process, such as real and effective UID numbers, process ID number, and so on.
Example
[jerry]$ awk 'BEGIN { print PROCINFO["pid"] }'
On executing this code, you get the following result −
Output
4316

TEXTDOMAIN

It represents the text domain of the AWK program. It is used to find the localized translations for the program's strings.
Example
[jerry]$ awk 'BEGIN { print TEXTDOMAIN }'
On executing this code, you get the following result −
Output
messages
The above output shows English text due to en_IN locale

No comments:

Post a Comment