The simplest use, like egrep (outputs all lines that matches a regular expression), is this. The commands listed below are all equivalent.
cat | egrep 'pattern' cat | awk '/pattern/ { print }' cat | sed -n '/pattern/ p' cat | perl -ne 'print if /pattern/'Here we look at the Perl case more closely. The statement print if /pattern/ is certainly valid Perl code. It is carefully designed so:
- The syntax "statment if expression" is the same as 'if (expression) { statement; }'.
- The expression for pattern matching, usually written as '$value =~ /pattern/', can be abbreviated as simply '/pattern/' or 'm{pattern}'. The default value is drawn from $_ (a built-in variable).
- If the argument to print is missing, it prints the value of $_.
cat | perl -ne '/pattern/ and print'which is the same thing, relying on the fact that the 'and' operator short-circuits.
The command line flags -ne accomplish the following:
- -e is used to specify the expression to evaluate.
- -n wraps the expression inside a while loop that places each input line into $_ and evaluate the expression.
- -p wraps the expression inside a while loop, placing each input line into $_, evaluate the expression which manipulates $_, and prints $_, the result.
cat | sed 's/pattern/replacement/flags' cat | perl -pe 's/pattern/replacement/flags'Again, this works because regular expression substitution in perl, normally written as '$value =~ s/pattern/replacement/flags' or '$value =~ s{pattern}{replacement}flags', operates on $_ by default.
Here are a few flags that make Perl more awk like, with field separators.
- -l makes each print statement output a record separator that is the same as input record separator (newline by default).
- -Fpattern is used to specify input field separator, much like awk's -F option.
- -a turns on the autosplit mode, so input fields are placed into @F array.
cat /etc/passwd | awk -F: '{ print $1 }' cat /etc/passwd | perl -F: -lane 'print @F[0]'Note that Perl fields are @F[0], @F[1], ...; awk fields are $1, $2, ... instead. However, awk $0 (the whole input line) corresponds to $_ in Perl.
If we want to combine regular expression matching and field separation, we might have something like:
find . | awk -F/ '/hw[0-9]+/ { print $1 }' find . | perl -F/ -lane 'print @F[0] if /hw[0-9]+/'Many awk variables have their Perl equivalents as well. However, in order to use them, the -MEnglish flag must be passed to Perl like this:
cat | awk '{ print NR, $0 }' cat | perl -MEnglish -ne 'print $NR, " ", $_'Most notably, the commas in the Perl print statement does not normally print out an output field separator. To get a behavior more like awk, do this:
cat | awk 'BEGIN { OFS = ": " } { print NR, $0 }' cat | perl -MEnglish -ne 'BEGIN { $OFS = ": " } print $NR, $_'In conclusion, Perl does seem very ambitious to make itself very awk or sed like. Both sed and awk also come with pretty comprehensive programming constructs, but it is nice how Perl is like a grand unified text processing and reporting tool.
3 comments:
Very useful post!
Thanks!!
Perl can be made even more Awk-like using this:
$ echo foo bar baz | awk '/foo/ { print $2; }'
$ echo foo bar baz | perl -lane '/foo/ and do { print $F[2]; }'
Isn't Perl great? :-)
ha, copy&paste gotcha, the Awk example would need to use $3 of course :-)
Post a Comment