Life of a Computer Scientist: October 2008

Friday, October 10, 2008

Using Perl like awk and sed

It looks like the designer of Perl really wanted to make it a viable awk and sed alternative. It is possible to run perl using command line flags that makes it behave much like awk and sed.

The simplest use, like egrep (outputs all lines that matches a regular expression), is this. The commands listed below are all equivalent.

cat | egrep 'pattern'
cat | awk '/pattern/ { print }'
cat | sed -n '/pattern/ p'
cat | perl -ne 'print if /pattern/'

Here we look at the Perl case more closely. The statement print if /pattern/ is certainly valid Perl code. It is carefully designed so:

The syntax "statment if expression" is the same as 'if (expression) { statement; }'.
The expression for pattern matching, usually written as '$value =~ /pattern/', can be abbreviated as simply '/pattern/' or 'm{pattern}'. The default value is drawn from $_ (a built-in variable).
If the argument to print is missing, it prints the value of $_.

Alternatively, we can write instead:

cat | perl -ne '/pattern/ and print'

which is the same thing, relying on the fact that the 'and' operator short-circuits.

The command line flags -ne accomplish the following:

-e is used to specify the expression to evaluate.
-n wraps the expression inside a while loop that places each input line into $_ and evaluate the expression.

Alternatively, there is also a -p flag which replaces -n, and it allows Perl to simulate sed:

-p wraps the expression inside a while loop, placing each input line into $_, evaluate the expression which manipulates $_, and prints $_, the result.

Here is an example (note that awk, sed and Perl have slightly different regular expression syntax and flags):

cat | sed 's/pattern/replacement/flags'
cat | perl -pe 's/pattern/replacement/flags'

Again, this works because regular expression substitution in perl, normally written as '$value =~ s/pattern/replacement/flags' or '$value =~ s{pattern}{replacement}flags', operates on $_ by default.

Here are a few flags that make Perl more awk like, with field separators.

-l makes each print statement output a record separator that is the same as input record separator (newline by default).
-Fpattern is used to specify input field separator, much like awk's -F option.
-a turns on the autosplit mode, so input fields are placed into @F array.

A good mnemonic is perl -Fpattern -lane 'expression'. Example:

cat /etc/passwd | awk -F: '{ print $1 }'
cat /etc/passwd | perl -F: -lane 'print @F[0]'

Note that Perl fields are @F[0], @F[1], ...; awk fields are $1, $2, ... instead. However, awk $0 (the whole input line) corresponds to $_ in Perl.

If we want to combine regular expression matching and field separation, we might have something like:

find . | awk -F/ '/hw[0-9]+/ { print $1 }'
find . | perl -F/ -lane 'print @F[0] if /hw[0-9]+/'

Many awk variables have their Perl equivalents as well. However, in order to use them, the -MEnglish flag must be passed to Perl like this:

cat | awk '{ print NR, $0 }'
cat | perl -MEnglish -ne 'print $NR, " ", $_'

Most notably, the commas in the Perl print statement does not normally print out an output field separator. To get a behavior more like awk, do this:

cat | awk 'BEGIN { OFS = ": " } { print NR, $0 }'
cat | perl -MEnglish -ne 'BEGIN { $OFS = ": " } print $NR, $_'

In conclusion, Perl does seem very ambitious to make itself very awk or sed like. Both sed and awk also come with pretty comprehensive programming constructs, but it is nice how Perl is like a grand unified text processing and reporting tool.

SSH agent on Leopard

I've been using this SSH Agent (featuring Puffy in businessman suit and a briefcase) for some time since Mac OS X Panther (10.3). It worked for a while on Leopard, until I believe 10.5.5 update broke it. It no longer can access the Keychain for passphrase.

It turns out that Leopard has native ssh agent Keychain support. I had to remember removing the SSH_AUTH_SOCK entry from my ~/.MacOSX/environment.plist file (I previously configured it to the former SSH Agent). After logging out and logging back in, it now points to the socket of the Leopard ssh agent. Running ssh-add -K (the -K option is Mac OS X specific) works.

Not using my former SSH Agent has the added benefit that my machine no longer has to start Rosetta when I login. SSH Agent was compiled for PowerPC, not Intel Mac.