When using my video-scripts I came across a subtle problem with awk pipe statements. If you forget to close a pipe-command, the next one may deliver wrong results under certain circumstances (which makes this problem subtle).
This is about GNU Awk 5.0.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.2.0) on Ubuntu 20.04.1 with LINUX 5.4.0-59.
AWK is an ancestor of perl and a great tool for quick data interpretation, better than perl because simpler. But, like most script languages, it has its peculiarities that may cause undetected bugs. In my case a video length was reported to be too short to contain a given timestamp, which was not true, so I had to check the responsible awk-script.
Example:
file = "a.txt"
sizeCommand = "stat --printf='%s' " file
sizeCommand | getline size
print "Size of " file " is " size
This code fetches the size of the file a.txt
through an external command that is piped into getline
to read the first line of its output.
The GNU documentation puts a close()
immediately after the pipe statement. So the correct form would be:
....
sizeCommand | getline size
close(sizeCommand)
....
→ In case you forget to close()
, you may experience strange results!
Here is a reproduction of what I encountered. You need two text files:
'a'
(size = 1), and'ab'
(size = 2). Then put following AWK script pipe-statement-problem.awk
into the same directory and make it executable:
1 | #!/usr/bin/awk -f |
The shebang #!/usr/bin/awk -f
in first line tells the UNIX-shell to use /usr/bin/awk for execution.
As no data are processed by this script, everything happens in the BEGIN
rule that is executed on script start.
The script builds two arrays, one for file names and one for the expected sizes of these files.
The for-loop opens all files in the array, which are a.txt, ab.txt, and again a.txt, and fetches their sizes. An ERROR message is printed if the size doesn't match what was expected.
This script doesn't make any practical sense, but it is something like a unit test for the pipe-statement.
Output is ('$' is the UNIX command prompt):
$ pipe-statement-problem.awk
size of a.txt = 1
size of ab.txt = 2
size of a.txt = 2
ERROR in size of a.txt, should be 1
The error happens when, once again, executing the pipe for file a.txt
. For some reason awk then delivers the size of file ab.txt
, which is the one that preceded this pipe-statement.
The bug can be fixed by inserting a close()
immediately after the pipe-statement on line 14:
1 | #!/usr/bin/awk -f |
Running the fixed script you see:
$ pipe-statement-problem.awk
size of a.txt = 1
size of ab.txt = 2
size of a.txt = 1
This is the right output. Now the size of file a.txt
has been read correctly.
When I got to know the AWK pipe-statement, I didn't even know that you can (or must) close it. The resulting problems may stay undetected a long time, because there is no warning and no error message, simply the result of getline
is wrong. I didn't find out why the result is always that of the preceding pipe, and why it happens only when repeating a pipe that was already executed once.
ɔ⃝ Fritz Ritzberger, 2021-01-08