Go to the previous, next section.

# Built-in Functions

Built-in functions are functions that are always available for your `awk` program to call. This chapter defines all the built-in functions in `awk`; some of them are mentioned in other sections, but they are summarized here for your convenience. (You can also define new functions yourself. See section User-defined Functions.)

## Calling Built-in Functions

To call a built-in function, write the name of the function followed by arguments in parentheses. For example, `atan2(y + z, 1)` is a call to the function `atan2`, with two arguments.

Whitespace is ignored between the built-in function name and the open-parenthesis, but we recommend that you avoid using whitespace there. User-defined functions do not permit whitespace in this way, and you will find it easier to avoid mistakes by following a simple convention which always works: no whitespace after a function name.

Each built-in function accepts a certain number of arguments. In most cases, any extra arguments given to built-in functions are ignored. The defaults for omitted arguments vary from function to function and are described under the individual functions.

When a function is called, expressions that create the function's actual parameters are evaluated completely before the function call is performed. For example, in the code fragment:

```i = 4
j = sqrt(i++)
```

the variable `i` is set to 5 before `sqrt` is called with a value of 4 for its actual parameter.

## Numeric Built-in Functions

Here is a full list of built-in functions that work with numbers:

`int(x)`
This gives you the integer part of x, truncated toward 0. This produces the nearest integer to x, located between x and 0.

For example, `int(3)` is 3, `int(3.9)` is 3, `int(-3.9)` is -3, and `int(-3)` is -3 as well.

`sqrt(x)`
This gives you the positive square root of x. It reports an error if x is negative. Thus, `sqrt(4)` is 2.

`exp(x)`
This gives you the exponential of x, or reports an error if x is out of range. The range of values x can have depends on your machine's floating point representation.

`log(x)`
This gives you the natural logarithm of x, if x is positive; otherwise, it reports an error.

`sin(x)`
This gives you the sine of x, with x in radians.

`cos(x)`
This gives you the cosine of x, with x in radians.

`atan2(y, x)`
This gives you the arctangent of `y / x`, with the quotient understood in radians.

`rand()`
This gives you a random number. The values of `rand` are uniformly-distributed between 0 and 1. The value is never 0 and never 1.

Often you want random integers instead. Here is a user-defined function you can use to obtain a random nonnegative integer less than n:

```function randint(n) {
return int(n * rand())
}
```

The multiplication produces a random real number greater than 0 and less than n. We then make it an integer (using `int`) between 0 and `n - 1`.

Here is an example where a similar function is used to produce random integers between 1 and n:

```awk '
# Function to roll a simulated die.
function roll(n) { return 1 + int(rand() * n) }

# Roll 3 six-sided dice and print total number of points.
{
printf("%d points\n", roll(6)+roll(6)+roll(6))
}'
```

Note: `rand` starts generating numbers from the same point, or seed, each time you run `awk`. This means that a program will produce the same results each time you run it. The numbers are random within one `awk` run, but predictable from run to run. This is convenient for debugging, but if you want a program to do different things each time it is used, you must change the seed to a value that will be different in each run. To do this, use `srand`.

`srand(x)`
The function `srand` sets the starting point, or seed, for generating random numbers to the value x.

Each seed value leads to a particular sequence of "random" numbers. Thus, if you set the seed to the same value a second time, you will get the same sequence of "random" numbers again.

If you omit the argument x, as in `srand()`, then the current date and time of day are used for a seed. This is the way to get random numbers that are truly unpredictable.

The return value of `srand` is the previous seed. This makes it easy to keep track of the seeds for use in consistently reproducing sequences of random numbers.

`time()`
The function `time` (not in all versions of `awk`) returns the current time in seconds since January 1, 1970.

`ctime(then)`
The function `ctime` (not in all versions of `awk`) takes an numeric argument in seconds and returns a string representing the corresponding date, suitable for printing or further processing.

## Built-in Functions for String Manipulation

The functions in this section look at the text of one or more strings.

`index(in, find)`
This searches the string in for the first occurrence of the string find, and returns the position where that occurrence begins in the string in. For example:

```awk 'BEGIN { print index("peanut", "an") }'
```

prints `3'. If find is not found, `index` returns 0.

`length(string)`
This gives you the number of characters in string. If string is a number, the length of the digit string representing that number is returned. For example, `length("abcde")` is 5. By contrast, `length(15 * 35)` works out to 3. How? Well, 15 * 35 = 525, and 525 is then converted to the string `"525"', which has three characters.

If no argument is supplied, `length` returns the length of `\$0`.

`match(string, regexp)`
The `match` function searches the string, string, for the longest, leftmost substring matched by the regular expression, regexp. It returns the character position, or index, of where that substring begins (1, if it starts at the beginning of string). If no match if found, it returns 0.

The `match` function sets the built-in variable `RSTART` to the index. It also sets the built-in variable `RLENGTH` to the length of the matched substring. If no match is found, `RSTART` is set to 0, and `RLENGTH` to -1.

For example:

```awk '{
if (\$1 == "FIND")
regex = \$2
else {
where = match(\$0, regex)
if (where)
print "Match of", regex, "found at", where, "in", \$0
}
}'
```

This program looks for lines that match the regular expression stored in the variable `regex`. This regular expression can be changed. If the first word on a line is `FIND', `regex` is changed to be the second word on that line. Therefore, given:

```FIND fo*bar
My program was a foobar
But none of it would doobar
FIND Melvin
JF+KM
This line is property of The Reality Engineering Co.
This file created by Melvin.
```

`awk` prints:

```Match of fo*bar found at 18 in My program was a foobar
Match of Melvin found at 26 in This file created by Melvin.
```

`split(string, array, fieldsep)`
This divides string up into pieces separated by fieldsep, and stores the pieces in array. The first piece is stored in `array`, the second piece in `array`, and so forth. The string value of the third argument, fieldsep, is used as a regexp to search for to find the places to split string. If the fieldsep is omitted, the value of `FS` is used. `split` returns the number of elements created.

The `split` function, then, splits strings into pieces in a manner similar to the way input lines are split into fields. For example:

```split("auto-da-fe", a, "-")
```

splits the string `auto-da-fe' into three fields using `-' as the separator. It sets the contents of the array `a` as follows:

```a = "auto"
a = "da"
a = "fe"
```

The value returned by this call to `split` is 3.

`sprintf(format, expression1,...)`
This returns (without printing) the string that `printf` would have printed out with the same arguments (see section Using `printf` Statements For Fancier Printing). For example:

```sprintf("pi = %.2f (approx.)", 22/7)
```

returns the string `"pi = 3.14 (approx.)"`.

`sub(regexp, replacement, target)`
The `sub` function alters the value of target. It searches this value, which should be a string, for the leftmost substring matched by the regular expression, regexp, extending this match as far as possible. Then the entire string is changed by replacing the matched text with replacement. The modified string becomes the new value of target.

This function is peculiar because target is not simply used to compute a value, and not just any expression will do: it must be a variable, field or array reference, so that `sub` can store a modified value there. If this argument is omitted, then the default is to use and alter `\$0`.

For example:

```str = "water, water, everywhere"
sub(/at/, "ith", str)
```

sets `str` to `"wither, water, everywhere"`, by replacing the leftmost, longest occurrence of `at' with `ith'.

The `sub` function returns the number of substitutions made (either one or zero).

If the special character `&' appears in replacement, it stands for the precise substring that was matched by regexp. (If the regexp can match more than one string, then this precise substring may vary.) For example:

```awk '{ sub(/candidate/, "& and his wife"); print }'
```

changes the first occurrence of `candidate' to `candidate and his wife' on each input line.

The effect of this special character can be turned off by putting a backslash before it in the string. As usual, to insert one backslash in the string, you must write two backslashes. Therefore, write `\\&' in a string constant to include a literal `&' in the replacement. For example, here is how to replace the first `|' on each line with an `&':

```awk '{ sub(/\|/, "\\&"); print }'
```

Note: as mentioned above, the third argument to `sub` must be an lvalue. Some versions of `awk` allow the third argument to be an expression which is not an lvalue. In such a case, `sub` would still search for the pattern and return 0 or 1, but the result of the substitution (if any) would be thrown away because there is no place to put it. Such versions of `awk` accept expressions like this:

```sub(/USA/, "United States", "the USA and Canada")
```

But that is considered erroneous in `gawk`.

`gsub(regexp, replacement, target)`
This is similar to the `sub` function, except `gsub` replaces all of the longest, leftmost, nonoverlapping matching substrings it can find. The `g' in `gsub` stands for "global", which means replace everywhere. For example:

```awk '{ gsub(/Britain/, "United Kingdom"); print }'
```

replaces all occurrences of the string `Britain' with `United Kingdom' for all input records.

The `gsub` function returns the number of substitutions made. If the variable to be searched and altered, target, is omitted, then the entire input record, `\$0`, is used.

As in `sub`, the characters `&' and `\' are special, and the third argument must be an lvalue.

`substr(string, start, length)`
This returns a length-character-long substring of string, starting at character number start. The first character of a string is character number one. For example, `substr("washington", 5, 3)` returns `"ing"`.

If length is not present, this function returns the whole suffix of string that begins at character number start. For example, `substr("washington", 5)` returns `"ington"`.

`tolower(string)`
This returns a copy of string, with each upper-case character in the string replaced with its corresponding lower-case character. Nonalphabetic characters are left unchanged. For example, `tolower("MiXeD cAsE 123")` returns `"mixed case 123"`.

`toupper(string)`
This returns a copy of string, with each lower-case character in the string replaced with its corresponding upper-case character. Nonalphabetic characters are left unchanged. For example, `toupper("MiXeD cAsE 123")` returns `"MIXED CASE 123"`.

## Built-in Functions For Input/Output

`close(filename)`
Close the file filename, for input or output. The argument may alternatively be a shell command that was used for redirecting to or from a pipe; then the pipe is closed.

See section Closing Input Files and Pipes, regarding closing input files and pipes. See section Closing Output Files and Pipes, regarding closing output files and pipes.

`system(command)`
The system function allows the user to execute operating system commands and then return to the `awk` program. The `system` function executes the command given by the string command. It returns, as its value, the status returned by the command that was executed.

For example, if the following fragment of code is put in your `awk` program:

```END {
system("mail -s 'awk run done' operator < /dev/null")
}
```

the system operator will be sent mail when the `awk` program finishes processing input and begins its end-of-input processing.

Note that much the same result can be obtained by redirecting `print` or `printf` into a pipe. However, if your `awk` program is interactive, `system` is useful for cranking up large self-contained programs, such as a shell or an editor.

Some operating systems cannot implement the `system` function. `system` causes a fatal error if it is not supported.

Go to the previous, next section. 