Documentation
A unique feature of Perl is that it doensn’t come with just one man page, but with a whole slew of man pages that describe various aspects of Perl, and serve as tutorials, reference manuals, and FAQ pages. Here are some of the most useful of these pages for beginners:
|
1 2 |
man perl Main man page, lists the various auxiliary pages available. |
|
1 2 |
man perlintro Perl introduction for beginners. |
|
1 2 |
man perlrequick Perl regular expressions quick start. |
|
1 2 |
man perlcheat Perl cheat sheet (very neat!). |
|
1 2 |
man perlfaq1 The first Perl FAQ page. (There are 9 such pages, perlfaq1 through perlfaq9.) |
|
1 2 |
perldoc -q keyword Extracts entries matching "keyword" from the perlfaq pages. Example: "perldoc -q reverse". |
|
1 2 |
perldoc -f function Man page for perl function "function". Example: "perldoc -f reverse". |
|
1 2 3 |
perldoc -q books The Perl books section of the Perl FAQ pages. A large listing of recommended books, classified by level. My own top three recommendations are "Learning Perl" by Schwartz/Phoenix/Foy; "Programming Perl" by Wall/Christiansen/Orwant; and "The Perl Cookbook" (Christiansen/Torkington), all published by O'Reilly. The first is a beginner's tutorial; the second is a "must have" reference for anyone seriously into Perl; and the third is icing on the cake, with lots of nifty tricks. |
Line-based operation
The simplest way to use perl in commandline mode is as a filter that operates on a file (or on standard input), manipulates the file one line at a time, and outputs the result to standard output, much like standard Unix utilities like grep, sed, and awk work.
The basic structure of the command is one of the following:
|
1 2 |
perl -lape'.....' file For each line in "file", apply the command(s) (e.g., a substitution) '....', then print the line to standard output. |
|
1 2 |
perl -lane'.....' file For each line in "file", apply the command(s) '....', but do not print the line. In this case, '....' usually will contain a print command (possibly conditional), and only output generated by such an explicit print command will get printed. |
The commandline options used here have the following meaning:
The “e” option indicates that the following string is to be interpreted as a perl script (i.e., sequence of commands). To prevent interfering with the shell, it is best to enclose the script in single right quotes (‘).
The “l” (character “ell”, not the bar symbol) option ensures proper end-of-line handling; without it, linebreaks may get chopped off.
The “a” option causes perl to autosplit each line into an array of fields $F[0], $F[1], …, with blank space acting as default field separator. Note that in Perl, array indices start at 0, so the first array element has index 0.
The “p” and “n” options indicate whether or not each line is printed by default.
The following examples illustrate the use of Perl for line-by-line processing of files.
|
1 2 |
perl -lane'print $F[1]' file Print second field of each line (i.e., the output consists of the second column of the file). |
|
1 |
Note the $F[0] is the first field, $F[1] the second, etc. |
|
1 2 3 |
perl -lane'print $F[-1]' file Print the last column of the file. In Perl, negative array indices denote array elements counted from the right. Thus $F[-1] denotes the last field (column), $F[-2] the second last, etc. |
|
1 2 |
perl -lane'print "$F[2],$F[1]"' file Print the second and third columns of the file in reverse order, separated by a comma. |
|
1 2 3 4 5 |
perl -lpe's/\s+/,/g' file Replace any sequence of consecutive blank spaces by a comma. This converts a tabular list with fields separated by blanks to one in which the fields are separated by commas. (The latter format is the csv format, a common spreadsheet format that can be used to import files into Excel). The s/.../.../g syntax is similar to that of sed; The "g" modifier in the substitution command denotes a "global" substitution; without it, only the first occurrence of the substitution pattern would get substituted. "\s" stands for any whitespace character (blank, tab, etc.). The plus sign "+" indicates that the substitution pattern should match one or more instances of "\s"; thus, any chunk of consecutive whitespace characters gets replaced by a single comma. (The "a" (autosplit) option is not needed here since no use of the field array $F[...] is made; however, it would not hurt to leave it in.) |
|
1 2 3 |
perl -pe 's/3/1/g' file Replace 3 in the file by 1. (Here the "a" (autosplit) option is not needed, nor is the "l" (end-of-line processing) option, though one could, of course, leave those options in.) |
|
1 2 3 |
perl -i.bak -pe 's/3/1/g' file The same, but with "in place" editing. The "i" option is a powerful option of Perl that causes the commands to be performed on the file itself. Thus, there is no need to save the modified file under a temporary filename and then copy that file over the original file. In the above form of this option, the original version of the file is saved onto a file with extension ".bak". Saving the original version onto a backup file is safety mechanism; the name of the backup file can be changed by replacing the string ".bak" by something else. If no such string is provided in the "-i" option, then the file is modified without backing up. |
|
1 2 3 |
perl -lpe's/\d+/NNN/g' file Replace any string of digits by "NNN". Here "\d" stands for any single digit, the plus sign indicates one or more instances of whatever precedes it. Thus, \d+ stands for any string of digits. |
|
1 2 3 |
perl -lpe's/^/$. /' file Print the file, with line numbers prepended to each line. The "$." variable denotes the line number; the caret symbol (^) denotes a match at the beginning of the line. In this case the substitution pattern in s/.../.../ is empty, so the "substitution" simply amounts to tacking on the replacement string at the beginning of the line. |
|
1 2 |
perl -lpe's/^\s+//' file Delete any blank spaces at the beginning of each line. |
|
1 2 3 |
perl -lpe's/^\s+//;s/\s+$//' file Same, but also delete any blank spaces at the end of each line. The two substitutions specified by the s/.../.../ commands are separated by a semicolon and are executed sequentially. In the second substitution command, the dollar sign ($) plays a role analogous to the caret sign and denotes the end of the line. |
|
1 2 3 |
perl -lane'print if (/\d\d\d\d/)' file Print all lines in file that contain (at least) four consecutive digits. The string enclosed in /.../ is interpreted as a pattern that needs to be matched in order for the if clause to evaluate as true. The string \d\d\d\d stands for 4 consecutive digits. (This is a grep-like operation, but accomplishing the same with grep would be messy since grep has very limited regular-expression matching capabilities.) |
|
1 2 3 |
perl -lane'print if (/\S/)' file Print any line that contains a non-whitespace character. This effectively deletes blank lines (or lines containing only whitespace) from the file. "\S" stands for the complement of "\s", i.e., any character that is not a whitespace. |
|
1 2 |
perl -lane'print length($_)' file Print the length (measured in characters) of each input line. |
|
1 2 |
perl -lane'print if (length($_) > 40)' file Print all lines in file that have length (measured in characters) greater than 40. |
Operating on entire files
Perl’s power really shines when one wants to perform operations on chunks of files that extend over multiple lines (e.g., deleting line breaks in paragraphs). Standard Unix utilities like sed or awk are ill-suited for that, but with Perl this is easy by changing the record separator (which defaults to a linebreak) to something else using the ‘-0’ option. Of particular interest are the following cases:
|
1 2 3 4 |
Slurp mode: perl -0777 The "0777" string (note that "0" here is the digit 0) causes the record separator to be set to "undefined", which in turn causes Perl to operate on the entire file as if it were one line. ("slurp mode"). Paragraph mode: perl -00 The "00" (two digits 0) string causes Perl to interpret one or more consecutive blank lines as record separator. Thus Perl operates on each paragraph as if it were a line. |
Here are some examples using these modes:
|
1 2 3 |
perl -00 -lpe's/\n/ /g' file Delete all linebreaks within each paragraph, replacing them by a single blank space. The net effect is that each paragraph becomes a single line. Here "\n" stands for a linebreak character. |
|
1 2 3 4 |
perl -00 -lpe's/\n/ /g; s/\.\s*/\.\n/g' file Same, but after having deleted all linebreaks within paragraphs reinsert linebreaks at the end of each sentence. As a result, each sentence gets its own line. The asterisk (*) in "\s*" denotes 0 or more instances of "\s". Thus, "\.\s*" matches a period, plus any whitespace following it. The period is used here as an end-of-sentence marker. It must be escaped with a backslash (\.) since an unescaped period has a different meaning in Perl. |
|
1 2 3 |
perl -00 -lpe's/\n/ /g; s/\.\s*/\.\n/g' file | perl -lane'print "$#F+1"' Same as before, but pipe the output into another command that prints out the number of "words" (in the sense of any consecutive string of nonblanks) for each sentence. $#F denotes the last index in the array $F[...]. Since the indexing starts with 0, one has to add 1 to obtain the number of elements in this array. |
|
1 2 3 |
perl -0777 -lape's/\s+/,/g' file Replace all whitespace in file by commas, crossing line boundaries. The resulting file consists of a single long line, with fields separated by commas. Such a format may be useful for importing to other programs. |
|
1 2 3 |
perl -0777 -lape's/\s+/\n/g' file Replace each chunk of one or more whitespace characters in file by a single newline. The resulting file consists of one "word" per line. This is useful for getting word statistics, as in the next example. |
|
1 2 3 |
perl -0777 -lape's/\s+/\n/g' file | sort | uniq -c | sort -nr | less A one line word frequency counter: It generate a list of all distinct "words" in the file, with their frequency of occurrence, and sorted from the most frequent to least frequent. The "sort" command sorts the words alphabetically. The "uniq" command eliminates duplicate words; with the "-c" option it also prints the number of occurrences. The second "sort" command, with the "-nr" option sorts the resulting file numerically in descending order. Finally, the "less" command shows the result one page at a time. |
Cool stuff
|
1 2 |
perl -lne 'print if "$_" eq reverse' file Finds all palindromic lines in file. In particular, if each line contains a single number, it displays all palindromes among these numbers. If applied to a dictionary file (such as /usr/share/lib/dict/words), with one word per line, it displays the palindromes among the words. |
|
1 2 |
perl -e '$n=1;while ($n++){sleep 1;print "\n$n is prime" if (("p" x $n) !~ /^((p)\2+)\1+$/)}' Print out all prime numbers, one per second. (Note that the entire command must be on a single line.) |
|
1 2 |
perl -lape'tr/a-z/n-za-m/' file A one-line encrypter. Rotate all (lower case) letters by 13 characters: a -> n, b -> o, etc. |
|
1 2 |
perl -lape's/(\w/)\U$1/g' file Change all letters in file to upper case. |
|
1 2 |
perl -lape's/(\w)/\L$1/g' file Change all letters in file to lower case. |