The Perl Tutorial: What’s Perl?

Share this article

Perl is short for Practical Extraction and Report Language. It’s a language that is available free over the Web, and it’s used for a variety of things, from writing CGI scripts to assisting administrators in maintaining their systems. Perl was created, and is still maintained, by Larry Wall. It’s slower than C, but faster than a normal interpreted language. Instead, it’s compiled when executed and then interpreted. A Perl compiler does exist but it’s still under development. For more information on compiled Perl go to the Perl Home Page.

As I mentioned earlier, one of the nice things about Perl is that it’s free. It’s distributed under the GNU license, and the source is available from the Perl Home Page. This, along with the fact that it’s flexible and has very few constraints, helps make Perl a popular language. It’s also highly platform-independent, and has been ported to many different platforms including Unix, Windows, and even DOS. Code modifications can be minor or major depending on any system specific functions that are used.

Perl’s popularity has also been encouraged by the fact that you can find support in a variety of places. At the top of the list is the comp.lang.perl/#nlt#/Perl Newsgroup, which is moderated by a number of Perl experts.

Perl and CGI

This is a brief explanation of Perl and CGI. If you have, or want to, put up your own web site, you’ll have heard of HTML, Javascript, Java, and CGI. CGI stands for the Common Gateway Interface. CGI is unlike Javascript, HTML, and Client Side Java. CGI provides a standard method for programs written in any language to run on the server side, and to communicate with the Web server software in response to requests for Web pages. In other words, CGI is the part of your web site that communicates with the other programs running on your server.

The main purpose of writing CGI programs or scripts is to afford more interactivity on your web site. Many web sites are static, and don’t allow much user interaction. Others have guestbooks, image counters, and allow you to place orders, access databases, and other useful applications.

This is where CGI comes in. In order to implement these types of interactivity, you must have an application running on the server that, for example, knows how to process certain types of user input. As I mentioned, CGI scripts can be written in any language, but the most common are Perl and C. Perl is the preferred language for writing CGI scripts because of its strength in string manipulation. And now other programming languages such as Java and PHP are also gaining popularity, as they provide interactivity without the use of CGI.

CGI programming is nothing more than programming with some special types of input, and a few very strict rules for program output. When a user fills out a FORM (a collection of HTML tags that allow them to submit input) the server sends the form input to the specified CGI program, which in turn parses the entered data and uses it in a specified manner, and returns HTML with an answer to the user’s request.

When you’re choosing a language in which to write CGI scripts, ensure that the language you select:

  • Makes text manipulation easy
  • Is able to interface with other software utilities and libraries
  • Is able to access your operating system’s environment variables

Perl satisfies all the above requirements, making it a very good language for CGI programming. If you want to write your own CGI scripts, I’d suggest that you use Perl — you’ll find that it will make your life a lot easier. If you want to get more detail on CGI and how it works, try the CGI specification on the NCSA site (be aware, this is a pretty technical description aimed at experienced programmers).

Running a Perl Program

Note: This will not teach you everything about Perl but it should give you a good idea of how the language works, and how to program in Perl.

Running a Perl program is relatively simple. Once you have a program written out, all you have to do is to invoke the Perl interpreter, along with the filename.

perl filename.pl

If you are running on a unix system, you can make the file executable using the chmod +rx function call. In order to do this, you must have the following line as the first line in your Perl program.

#! /usr/local/bin/perl

Where “/usr/local/bin/perl” is the path of your Perl interpreter. In order to find this path on UNIX just type:

which perl

on the command line.

Let’s practice writing a small program at our console. Using any editor type the following code:

 #! /usr/local/bin/perl

print "Hello World!\n";

Now save your program with a .pl extension. Let’s call it hello.pl. From the command line then type:

perl hello.pl

You should then see the following output:

Hello World!

You have just written your first Perl program. Congratulations!

Literals

Perl literals are values that are represented as-is. The value is considered to be hard-coded into the program. Perl supports literals of the following two types:

  • Numbers
  • Strings

Numeric literals represent numbers. Strings literals are used to identify names, text, or messages that are displayed in the program. String literals are usually enclosed in single quotes, which means that any variable names (more on variables below) appearing in them are not interpolated. Double quoted string literals are also supported, and they support what is called variable interpolation, meaning that variable names are substituted for their real values. Single backquoted string literals are also supported. These normally allow you to run command line options and return the output to your program.

Perl also supports comments in your code. A perl comment is signified by a # sign. When you see a # sign, it means that everything after that is a comment up to the end of the line.

Variables

All programming languages support variables. Variables are like string literals, except that they hold a specific value or values. Perl supports three sets of variables:

  • Scalars The scalar variables hold only one value, which may be a numeric literal, or a string literal. Scalar variable names always begin with a $ sign.
  • Arrays An array variable can hold a list, or an array of values. The values can be numbers or strings. A key or subscript value is automatically assigned, and the values are kept in the order entered. Array variable names start with an @ sign.
  • Associative Arrays Associative arrays or hashes as they’re also known, are similar to arrays. But the major difference is that you, the programmer, assign the key and value to the hash, while an array automatically assigns a key. Hashes do not keep the entered items in the same order as they were entered. Hash variables always start with the % sign.

As mentioned above, scalar values are used to track a single pieces of information. Perl’s most commonly used variable is $_. This special variable name is called the default variable for Perl’s many functions, so become very familiar with it.

We can see a value being assigned to a scalar value in the following example along with interpolation.

$var="my Scalar Variable";

print "This is $var";

The output of the above code would be: This is my Scalar Variable

Arrays hold a list of variables and are handled in a variety of different ways. You can give the values to the array during initialization:

@array=('one','two','three');

The above array assigns three values into the array. The values are separated by commas. The first subscript of an array is always 0, unless you change it with special variable $[. This is generally not good practice, because you can confuse people reading your code. In order to read the value, you would refer to the array name and the subscript that you want. For example, if we wanted the value ‘two’ we would say:

print $array[1];

We say $array because we are referring to only one value, followed by an open bracket, the subscript that we want, and then the close bracket. This would return the second value in your array.

Hashes can be initialized in a similar manner to arrays.

%hash=('key1','value1','key2','value2','key3','value3');

The major difference is that you have to place the values in a key-value order. So the first value you type would be the first key; the second, the associated value; the third, the second key; and the fourth, the value associated with that key. The process of retrieving the value from a hash, is similar to an array, except that instead of using [ ] (square brackets) we would use { } (braces), and instead of the subscript number, we would use the key. If we wanted to retrieve ‘value2’, we would use the following:

print $hash{'key2'};

Be aware that Perl variables are case-sensitive, so $me, $Me, $mE and $ME are four separate variables. Similarly, if you had a scalar variable called $me, an array called @me, and a hash called %me, they would be interpreted as three separate variables, not the same one.

Operators

Perl supports the same set of operators as most other computer languages. Operators allow a computer language to perform actions on operands. As in every computer language, Perl operators also have a certain precedence.

We’ve already encountered the assignment operator (=). This operator assigns a value from one variable to another:

$string1 = $string2;

The above code assigns the value of $string2 to $string1. We will now consider binary and unary arithmatic operators.

  • op1 + op2 The addition operator will add two numbers.
  • op1 - op2 The subtraction operator will subtract two numbers.
  • op1 * op2 The multiplication operator will multiply two numbers.
  • op1 / op2 The division operator will divide two numbers.
  • op1 % op2 The modulus operator will return the remainder of the division of two integer operands.
  • op1 xx op2 The exponentiation operator will raise op1 to the power of op2.
  • ++op1 The pre-increpment operator will increase the value of op1 first, then assign it.
  • op1++ The post-increment operator will increase the value of op1 after it is assigned.
  • --op1 The pre-decrement operator will decrease the value of op1 before it is assigned.
  • op1-- The post-decrement operator will decrease the value of op1 after it is assigned.

The logical operators are used mainly to control the flow of the program. Some of the logical operators that Perl supports include:

  • && The AND operator takes two values, and will return true only if both values are true.
  • || The OR operator takes two values, and will return true only if at least one value is true.
  • ! The NOT operator will negate a value.
  • op1 == op2 The equals operator checks for the equality of two numerical values.
  • op1 != op2 This not-equals operator checks for the inequality of two numerical values.
  • op1 < op2 This numerical operator will return true if op1 is less than op2.
  • op1 > op2 This numerical operator will return true if op1 is greater than op2.
  • op1 <= op2 This numerical operator will return true if op1 is less than or equal to op2.
  • op1 >= op2 This numerical operator will return true if op1 is greater than or equal to op2
  • op1 <=> op2 This numerical operator will return -1 if op1 is less than op2, it will return 0 if they are equal, and it will return 1 if op2 is greater than op1.
  • op1 eq op2 This string operator will return true if the two strings are equal.
  • op1 ne op2 This string operator will return true if both strings are not equal.
  • op1 lt op2 This string operator will return true if op1 is less than op2.
  • op1 le op2 This string operator will return true if op1 is less than or equal to op2.
  • op1 gt op2 This string operator will return true if op1 is greater than op2.
  • op1 ge op2 This string operator will return true if op1 is greater than or equal to op2.
  • op1 cmp op2 This string operator functions in the same manner as the numerical <=> operator described above.

There are other operators which can also be used in Perl. We have the concatenation operator for strings (.):

$string1 . $string2

The above code will concatenate $string1 with $string2 to form a new string.

We also have the repetition operator (x), this operator will repeat a string a certain number of times specified:

$string1 x 2;

The above code will repeat $string1 twice.

The range operator allows us to use ranges, in arrays or patters:

@array = (1..50);

The above code will assign 50 elements to the array.

As we have already seen, we have a wide variety of operators that work with scalar values, but if we wanted to work with arrays we could do what is called “array splicing”. Let’s say we have an array with 10 values. If we want to assign values 5 and 6 to two scalar values we could do the following:

($one,$two) = @array[4,5]; #remember that arrays start at subscript 0.

In the above code we just spliced two values from the array. When we do array splicing we use the @ sign, followed by the array name, followed by brackets and the subscripts that you want separated by commas. So instead of having two lines for the above code we have one. We need to enclose the two scalar values in parentheses in order to group them.

Functions

Functions are used to organize your code into small pieces which can be reused. Perl declares functions using the sub keyword followed by the { sign to start the function and the } to end it:

sub function1 {    
   
 CODE HERE    
   
}

In order to call a Perl function it’s suggested that you call it using the & sign followed by the function name. If you have parameters, they can be also passed, so it’s suggested that you enclose them in parentheses:

&function1;

In Perl, functions can serve as procedures which do not return a value (in reality they do, but that will be discussed later), or functions which do return a value. The above example is of a function that acts as a procedure. Below is an example of a function that acts as a function, because it returns a value (we are also passing values to the function):

$answer = &add(1,2);

You aren’t limited to passing only scalar values to a function. You can also pass it arrays, but it will be easier to pass them at the end.

Perl functions receive their values in an array: @_

This array holds all the values of the parameters passed to the function. So in the above example, where we had 2 values passed, we would retrieve them in the following manner:

sub function1 {    
   
 ($val1, $val2) = @_;    
   
}

or we could do it in this manner:

sub function1 {    
   
 $val1=$_[0];    
   
 $val2=$_[1];    
   
}

The parameters are being passed by value, meaning that you will be working with copies of the original parameters and not the actual parameters themselves. If you wish to pass them by reference, then you would have to work directly with the variable ($_[subscript]).

Perl’s scope of variables is different than most other languages. You don’t have to declare your variables at the beginning of the program, or in the functions: you just use them. The scope of the variables is always global. If you use a variable in a function, it is global to the rest of the program. If you want the variable to be seen only by code within that particular function (and any other functions it may call), then you must declare it as a local variable:

sub function1 {    
   
 local($myvar)    
   
}

The above code will declare $myvar as local, so it can’t be seen by the rest of the program.

Functions can also be nested within each other, and you can create recursive functions that call themselves.

Control Statements

Perl has a variety of control statements, which allow you to control how your program runs, and what statements control each step.

The first statement that we’ll cover is the if statement. The if statement allows you to set a condition. The syntax is:

if (condition) {     
   
 STATEMENTS    
   
}

The logic is as follows: if the condition returns a true value, then execute the statements in the block, otherwise do not execute anything. The if statement goes beyond that, allowing you to have an else statement:

if (condition) {     
   
 STATEMENTS    
   
} else {    
   
 STATEMENTS    
   
}

The else part of the above if statement will be executed only if the condition in our initial if statement is false. We can also improve this, and add more flexibility, by having multiple if statements:

if (condition){     
   
 STATEMENTS    
   
} elsif (condition){    
   
 STATEMENTS    
   
} else {    
   
 STATEMENTS    
   
}

Now we can run multiple tests, and choose from a variety of options as to what to execute.

Another type of control statement that Perl offers are loop statements, which allow us to execute blocks of instructions a certain number of times. Let’s take a look at some of these statements:

While loops can be used in a couple of ways. We can test for our condition at the beginning or end of the loop. The syntax for testing the condition at the beginning of the loop is:

while (condition) {     
   
 STATEMENTS    
   
}

The condition will be tested first. If it is true then the statements will be executed, otherwise, they will be skipped. If the statements are executed, then the condition is tested again to see if they should be executed again. This continues until the condition is false.

The syntax for testing the condition at the end is:

do {     
   
 STATEMENTS    
   
} while (condition);

If you decide test at the end of the loop, the statements will execute at least once.

Another type of loop is the until loop. Unlike the while loop, the until loop is used to execute certain statements while some condition is false. But, just like the while loop, we can test the condition at the beginning and at the end of the loop:

until (condition) {     
   
 STATEMENTS    
   
}

If we want to test the condition of the loop at end:

do {     
   
 STATEMENTS    
   
} until (condition);

Another very powerful loop is the for loop. Unlike the while and until loops, where we don’t know how many times we’ll reiterate the statements, the for loop is normally used to iterate through a specific number of loops. The syntax is:

for (INITIALIZATION; CONDITION; INCREMENT/DECREMENT) {     
   
 STATEMENTS    
   
}

The initialization expression is executed first. It can be used to initialize the variables inside the loop. The condition expression is used as the test that will tell us whether we should exit the loop or not, and the increment/decrement operator is used to increase our initial value or any other value we choose.

The last type of loop that we’ll discuss is the foreach loop. Since one of the most powerful things in Perl is the manipulation of arrays, we must, of course, have a loop that allows us to cycle through an array or hash. The syntax for the foreach loop is:

foreach LOOP_VAR (ARRAY) {     
   
 STATEMENTS    
   
}

The LOOP_VAR variable is optional. If none is indicated, then the default variable $_ is used. ARRAY is the name of our array. If we’re using a hash, we’ll need to be able to retrieve a scalar value, so we’d have to use the function “keys” or “values” along with the hash, in order to be able to retrieve a scalar value. The value retrieved is stored in LOOP_VAR (or $_), which we can then manipulate in our statements.

There are also certain keywords that we can use within loops:

last: jumps out of the loop immediately

next: skips the rest of the statement block and continues with the next iteration of the loop

redo: restarts the statement block without testing the condition or doing the increment/decrement operation

goto: jumps to a specified label

File Manipulation

As Perl is a programming language, it has I/O manipulation. But how is this applied to files? A filehandle is the name in a Perl program that connects I/O between your Perl program and external programs.

Perl has three default file handles which are automatically opened. STDIN (standard input), STDOUT (standard output), and STDERR (for standard error). Filehandles have their own name space, and do not conflict with Perl variables. The recommendation from Larry Wall in declaring filehandles is to use a name that is all uppercase.

In order to open a filehandle you would use the open function:

open(FILEHANDLE,"filename");

FILEHANDLE is the name of our handle and “filename” is the name of the file that we wish to open. The function returns true if it is successful and false if it is not.

To open a file for reading you may use either of the following two syntaxes:

open(FILEHANDLE,"filename");

or

open(FILEHANDLE,"<filename");

To create a file you would use the following syntax:

open(FILEHANDLE,">filename");

To append to an existing file you complete the following:

open(FILEHANDLE,">>filename");

Reading from a filehandle is pretty simple:

open(FILEHANDLE,"test.txt");      
     
while(<FILEHANDLE>){      
     
 print "line: $_\n";      
     
}

or

@entire_file = <FILEHANDLE>; # this will read in the entire      
                            # file into the array

or

$one_line = <FILEHANDLE>; # this will read one line into      
                         # the scalar variable.

To write to a filehandle which has been opened for writing, you would do the following:

print FILEHANDLE "This line will go into our output file";

To open a file in binary mode, use the open function in the same manner as before, except this time, use one additional function:

open(FILEHANDLE,"filename");

binmode(FILEHANDLE);

The binmode() function forces binary mode treatment of the given file handle on systems that distinguish between text and binary files.

Once we are finished using a file handle we should close it:

close(FILEHANDLE);

If you reopen a filehandle before it has been closed it will close the previous file automatically. The same will happen when you exit the program.

There are also some file tests that you can run on a specific file handle or file name. Below are a few:

File Test    Meaning       
  -r        File or directory is readable      
  -w        File or directory is writable      
  -x        File or directory is executable      
  -o        File or directory is owned by user      
  -e        File or directory exists      
  -z        File or directory exist and has zero size      
  -f        Entry is a plain file      
  -d        Entry is a directory      
  -T        File is "Text"      
  -B        File is "Binary"

An example of using one of these tests might be:

open(FILEHANDLE,"text.txt");      
     
if(-e FILEHANDLE) {      
     
 print "File exists.";      
     
}

or

$filename = "/users/abc123/text.txt";      
     
if(-e $filename){      
     
 print "File exists.";      
     
}

As you can see, we can perform these file tests on either the file handle, or the file name we want.

Directory Manipulation

Besides allowing you to manipulate files very easily, Perl also allows you manipulate directories in a very straightforward manner — almost like files.

Let’s start with some basic things. In order to change to a particular directory you would use the chdir() function. This function takes one argument: the name of the directory to which you wish to change.

chdir("/home/abc123");

If Perl is able to change to the specified directory, then this function returns true; otherwise, it returns false.

You can generate a list of files in a directory using the * operator in conjunction with the diamond operator (<>). This is called globbing. If I’m in /home/abc123/ and I want a list of all the files that exist, I’d do the following:

@files_found = </home/abc123/*>;

or, to list only Perl files,

@files_found = </home/abc123/*.pl>;

This will give you the names of all the files in the directory that match the specified pattern.

You can also open a directory using directory handles. To open a directory you would use the function opendir():

opendir(DIRHANDLE,"/home/abc123");

Now you can use the directory handle to manipulate the files using the readdir() function. This will also give you access to the filenames. If you invoke the readdir() function in a scalar context then it will return the next filename in the list, if you call it in an array context then it will return a list of all the files in the directory.

$filename = readdir(DIRHANDLE); #returns one filename

@filenames = readdir(DIRHANDLE); #returns a list of filenames

Once you are done with the directory, you can close it using the closedir() function:

closedir(DIRHANDLE);
Pattern Matching

Pattern matching involves searching for a sequence of characters within a character string. When carrying out pattern matching, if a pattern is found then a match is said to have occurred.

Perl uses three main functions for pattern matching (although pattern matching can be used in other functions such as the split() function). They are the m//, s///, and tr///.

The m// operator is the match operator. This operator will let us know if a match was found, and the syntax for using this operator is:

m/PATTERN/OPTIONS

PATTERN refers to the character sequence we’re searching for, and OPTIONS are alternative selections that can be made. When using the match operator, you can omit these if you’re using a forward slash. Or, if you don’t wish to use a forward slash, you can substitute another character. But remember: you must use slashes to direct Perl to use the match operator. If you do use a pattern delimeter that’s normally a special-pattern character, then you won’t be able to use that special-pattern character within the pattern you specify for matching. Here’s an example:

m!PATTERN!OPTIONS

The match operator has certain options that can be used:

OPTION DESCRIPTION        
 g    Match all possible patterns        
 i    Ignore case        
 m    Test string as multiple lines        
 o    Only evaluate once        
 s    Treat string as single line        
 x    Ignore white space in pattern

The s/// operator is known as the Substitution operator, the syntax for this operator is:

s/PATTERN/REPLACEMENT/OPTIONS

PATTERN holds the pattern that we want to search for, and REPLACEMENT holds the value that we want to use as our replacement value when the pattern that we’re searching for is found. For example:

s/abc/xyz/

In the above example we identify that we want to search for “abc” in that order, and replace it with “xyz”. You can also use Pattern-Sequence variables in substitutions, which will be discussed later. The substitution operator also has options that can be used:

OPTION DESCRIPTION        
 g    Change all occurrences of the pattern        
 i    Ignore case in pattern        
 e    Evaluate replacement strings as expression        
 m    Treat string to be matched as multiple lines        
 o    Evaluate only once        
 s    Treat string to be matched as single line        
 x    Ignore white space in pattern

Finally, the Translation operator provides us with another method to substitute one group of characters for another. The translation operator syntax is:

tr/STRING1/STRING2/OPTIONS

Here STRING1 contains a list of characters to be replacecd, and STRING2 contains the characters that replace them. The first character in STRING1 is replaced by the first character in STRING2, the second character by the corresponding number in STRING2, and so on.

tr/abc/def/

In the above example, abc is STRING1. a is being replaced by d, b is being replaced by e, c is being replaced f. If you wanted to convert all the characters from uppercase to lowercase you would use:

tr/A-Z/a-z/;

As you can see, the range operator is supported in the pattern matching operations. Once again the translation operator also has options which can be used:

OPTION DESCRIPTION       
 c    Translate all characters not specified        
 d    Delete all specified characters        
 s    Replace multiple identical output characters      
      with a single character

Remember that if you are using the slash operator and you pattern contains a forward slash also, then you must escape it using the escape character “\”.

Now let’s see how we build patterns:

When doing pattern matching, the pattern being sought is by default being looked at using the contents of the default variable ($_). To use a different variable, we’d have to use a match operator along with the one of the three functions I mentioned before.

The =~ operator binds the pattern to the string on the left hand side of the operator. This says that the pattern should be searched for in the scalar variable.

$string =~ m/hello/;

As the above example demonstrates, we’re searching for the “hello” string in the $string variable. If a pattern is found then true (a non-zero value) is returned, otherwise false (a 0 value) is returned.

The !~ operator binds the pattern to the string on the left hand side of the operator, and will return true when the pattern is not found.

Now let’s discuss special characters in patterns:

The + character means “one or more of the preceding”. That means if we have a pattern:

m/abc+/

it should return ‘true’ if a match is found on abc, abcc, abccc, abcccc and so on.

The [ ] characters allow you to define patterns that match a group. This means that whatever is contained in the brackets is treated as a group from which we can take our pick:

m/a[bc]d/

The above pattern says that we will find a match if we find either “abd” or “acd”.

Another special character is the * character, which means “zero or more of the preceding”. This means that if we have a pattern:

m/ab*/

We will find a match for “a”, “ab”, “abb”, “abbb”, and so on.

The ? character means “zero or one occurrence of the preceding”. This pattern character works the same way as the * operator, except that the maximum number of characters that are accepted is 1.

You can also anchor patterns using the ^ and $ operators.

The ^ operator anchors a pattern to the beginning of a line:

m/^The/

In the above example we’ll find a match if a line in the string starts with “The” (or if the entire string starts with The, when the ‘s’ option is in use). If the ^ character is being used in square brackets, and is at the beginning, it means ‘anything not in that group.’ So be careful – the meaning of the string changes when it is enclosed in brackets.

The $ operator anchors a pattern at the end of the line, therefore:

m/end$/

In the above example, we’ll find a match only if a line in the string ends with “end” (or if the entire string ends with “end” and the ‘s’ option is in use).

Word-boundry pattern anchors specify whether a matched pattern must be on a word boundry, or inside a word boundry. The \b pattern anchor matches only if the specified pattern is at the beginning or end of a word, while the \B pattern anchor matches if the specified pattern is inside a word.

When using character ranges within brackets, you can shorten up the process by using special character-range escape sequences:

Sequence  Description                     Range Equiv.       
 \d      Any digit                       [0-9]      
 \D      Not a digit                     [^0-9]      
 \w      Any word character              [_0-9a-zA-Z]        
 \W      Anything not a word character   [^_0-0a-zA-Z]        
 \s      White space                     [ \r\n\r\f]        
 \S      Anything other than white space [^ \r\t\n\f]

If you wanted to match a single character, use the ‘.’ (period) character. This will match any character (except the newline character when the ‘s’ option is not in use).

If you want to match a specified number of occurrences, then use the {x,y} characters after the character for which you want to specify the number of occurrences. Here, x is the minimum and y is the maximum.

To specify alternatives, then use the ‘|’ character.

You can group portions of characters using the ( ) characters. These also allow you to reuse the matched pattern later on. To do this, use \number, where number is the number of the ( ) in the order in which they were entered. Characters grouped with ( ) may also have operators such as +, *, ? and {x,y} applied to them as a group.

This is not intended to be a complete tutorial in pattern matching, but an introduction to the subject. For more details on pattern matching, try Mastering Regular Expressions by J. Friedl from O’Reilly.

Report Writing

As we discussed at the beginning of the tutorial, Perl stands for Practical Extraction and Reporting Language, and we’ll now discuss using Perl to write reports.

Perl uses a writing template called a ‘format’ to output reports. To use the format feature of Perl, you must:

  1. Define a Format
  2. Pass the data that will be displayed on the format
  3. Invoke the Format

Define the format as follows:

format FormatName =
fieldline
value_one, value_two, value_three
fieldline
value_one, value_two
.
FormatName represents the name of the format. The fieldline is the specific way the data should be formatted. The values lines represent the values that will be entered into the field line. You end the format with a single period.

fieldline can contain any text or fieldholders. Fieldholders hold space for data that will be placed there at a later date. A fieldholder has the format:

@<<<<

This fieldholder is left-justified, with a field space of 5. You must count the @ sign and the < signs to know the number of spaces in the field. Other field holders include:

@>>>> right-justified
@|||| centered
@####.## numeric field holder
@* multiline field holder

An example format would be:

format EMPLOYEE =
===================================
@<<<<<<<<<<<<<<<<<<<<<< @<<
$name $age
@#####.##
$salary
===================================
.

Only scalar variables can be used – not arrays or hashes. In order to invoke this format declaration we would use the write keyword:

write EMPLOYEE; #send to the output

The problem is that the format name is usually the name of an open file handle, and the write statement will send the output to this file handle. As we want the data sent to the STDOUT, we must associate EMPLOYEE with the STDOUT filehandle. First, however, we must make sure that that STDOUT is our selected file handle, using the select() function:

select(STDOUT);

We would then associate EMPLOYEE with STDOUT by setting the new format name with STDOUT, using the special variable $~:

$~ = "EMPLOYEE";

When we now do a write(), the data would be sent to STDOUT. Remember: if you didn’t have STDOUT set as your default file handle, you could revert back to the original file handle by assigning the return value of select to a scalar value, and using select along with this scalar variable after the special variable is assigned the format name, to be associated with STDOUT.

Boris MordkovichBoris Mordkovich
View Author

Boris is the CTO of EchoMedia Inc. Boris is an experienced Web developer and has over 30 different tutorials and articles.

Share this article
Read Next
Get the freshest news and resources for developers, designers and digital creators in your inbox each week