Perl Notes(I)

Part I    Programming Perl

1 Perl Data Types

1.1 Funny Characters
TypeCharacterExampleIs a name for:
Scalar$$centsAn individual value (number or string)
Array@@largeA list of values, keyed by number
Hash%%interestA group of values, keyed by string
Subroutine&&howA callable chunk of Perl code
Typeglob**struckEverything named struck
1.2 Singularities
Strings and numbers are singular pieces of data, while lists of strings or numbers are plural. Scalar variables can be assigned any form of scalar value: integers, floating-point numbers, strings, and even esoteric things like references to other variables, or to objects.

As in the Unix shell, you can use different quoting mechanisms to make different kinds of values. Double quotation marks (double quotes) do variable interpolation and backslash interpolation (such as turning /n into a newline) while single quotes suppress interpolation. And backquotes (the ones leaning to the left``) will execute an external program and return the output of the program, so you can capture it as a single string containing all the lines of output.

$answer = 42; # an integer
$pi = 3.14159265; # a "real" number
$avocados = 6.02e23; # scientific notation
$pet = "Camel"; # string
$sign = "I love my $pet"; # string with interpolation
$cost = 'It costs $100'; # string without interpolation
$thence = $whence; # another variable's value
$salsa = $moles * $avocados; # a gastrochemical expression
$exit = system("vi $file"); # numeric status of a command
$cwd = `pwd`; # string output from a command
And while we haven't covered fancy values yet, we should point out that scalars may also hold references to other data structures, including subroutines and objects.
$ary = /@myarray; # reference to a named array
$hsh = /%myhash; # reference to a named hash
$sub = /&mysub; # reference to a named subroutine

$ary = [1,2,3,4,5]; # reference to an unnamed array
$hsh = {Na => 19, Cl => 35}; # reference to an unnamed hash
$sub = sub { print $state }; # reference to an unnamed subroutine

$fido = new Camel "Amelia"; # reference to an object
Following the principle of least surprise, the variable is created with a null value, either  "" or  0. Depending on where you use them, variables will be interpreted automatically as strings, as numbers, or as "true" and "false" values (commonly called Boolean values). Perl will automatically convert the data into the form required by the current context, within reason. For example, suppose you said this:
$camels = '123';
print $camels + 1, "/n";
The original value of  $camels is a string, but it is converted to a number to add  1 to it, and then converted back to a string to be printed out as  124.

Similarly, a reference behaves as a reference when you give it a "dereference" context, but otherwise acts like a simple scalar value. For example, we might say:

$fido = new Camel "Amelia";
if (not $fido) { die "dead camel"; }
$fido->saddle();
1.3 Pluralities
Perl has two types of multivalued variables: arrays and hashes.
1.3.1 Array
An  array is an ordered list of scalars, accessed[6] by the scalar's position in the list. To assign a list value to an array, you simply group the values together (with a set of parentheses):
@home = ("couch", "chair", "table", "stove");
Conversely, if you use  @home in a list context, such as on the right side of a list assignment, you get back out the same list you put in. So you could set four scalar variables from the array like this:
($potato, $lift, $tennis, $pipe) = @home;
These are called list assignments. They logically happen in parallel, so you can swap two variables by saying:
($alpha,$omega) = ($omega,$alpha);
Arrays are zero-based. Array subscripts are enclosed in square brackets [like this], so if you want to select an individual array element, you would refer to it as  $home[ n ].

Since arrays are ordered, you can do various useful operations on them, such as the stack operations  push and  pop. Perl regards the  end of your array as the top of a stack.
1.3.2 Hash
hash is an unordered set of scalars, accessed by some string value that is associated with each scalar. For this reason hashes are often called  associative arrays.  A hash has no beginning or end. Since the keys to a hash are not automatically implied by their position, you must supply the key as well as the value when populating a hash.

Suppose you wanted to translate abbreviated day names to the corresponding full names. You could write the following list assignment:

%longday = ("Sun", "Sunday", "Mon", "Monday", "Tue", "Tuesday",
 "Wed", "Wednesday", "Thu", "Thursday", "Fri",
 "Friday", "Sat", "Saturday");
But that's rather difficult to read, so Perl provides the  => (equals sign, greater-than sign) sequence as an alternative separator to the comma. Using this syntactic sugar (and some creative formatting), it is much easier to see which strings are the keys and which strings are the associated values.
%longday = (
 "Sun" => "Sunday",
 "Mon" => "Monday",
 "Tue" => "Tuesday",
 "Wed" => "Wednesday",
 "Thu" => "Thursday",
 "Fri" => "Friday",
 "Sat" => "Saturday",
);
You can select an individual hash element by enclosing the key in braces (those fancy brackets also known as "curlies"). For example, if you want to find out the value associated with  Wed in the hash above, you would use  $longday{"Wed"}. Note again that you are dealing with a scalar value, so you use  $ on the front, not  %, which would indicate the entire hash.

Linguistically, the relationship encoded in a hash is genitive or possessive, like the word "of" in English, or like "'s". The wife of Adam is Eve, so we write:

$wife{"Adam"} = "Eve";
1.3.3 Complexities
Perl lets you manipulate simple scalar references that happen to refer to complicated arrays and hashes. To extend our previous example, suppose we want to switch from talking about Adam's wife to Jacob's wife. Now, as it happens, Jacob had four wives. (Don't try this at home.) In trying to represent this in Perl, we find ourselves in the odd situation where we'd like to pretend that Jacob's four wives were really one wife. (Don't try this at home, either.) You might think you could write it like this:
$wife{"Jacob"} = ("Leah", "Rachel", "Bilhah", "Zilpah"); # WRONG
But that wouldn't do what you want, because even parentheses and commas are not powerful enough to turn a list into a scalar in Perl. (Parentheses are used for syntactic grouping, and commas for syntactic separation.) Rather, you need to tell Perl explicitly that you want to pretend that a list is a scalar. It turns out that square brackets are powerful enough to do that:  
$wife{"Jacob"} = ["Leah", "Rachel", "Bilhah", "Zilpah"]; # ok
That statement creates an unnamed array and puts a reference to it into the hash element  $wife{"Jacob"}.
suppose we wanted to list not only Jacob's wives but all the sons of each of his wives. In this case we want to treat a hash as a scalar. We can use braces for that. (Inside each hash value we'll use square brackets to represent arrays, just as we did earlier. But now we have an array in a hash in a hash.)
$kids_of_wife{"Jacob"} = {
 "Leah" => ["Reuben", "Simeon", "Levi", "Judah", "Issachar", "Zebulun"],
 "Rachel" => ["Joseph", "Benjamin"],
 "Bilhah" => ["Dan", "Naphtali"],
 "Zilpah" => ["Gad", "Asher"],
};
That would be more or less equivalent to saying:
$kids_of_wife{"Jacob"}{"Leah"}[0] = "Reuben";
$kids_of_wife{"Jacob"}{"Leah"}[1] = "Simeon";
$kids_of_wife{"Jacob"}{"Leah"}[2] = "Levi";
$kids_of_wife{"Jacob"}{"Leah"}[3] = "Judah";
$kids_of_wife{"Jacob"}{"Leah"}[4] = "Issachar";
$kids_of_wife{"Jacob"}{"Leah"}[5] = "Zebulun";
$kids_of_wife{"Jacob"}{"Rachel"}[0] = "Joseph";
$kids_of_wife{"Jacob"}{"Rachel"}[1] = "Benjamin";
$kids_of_wife{"Jacob"}{"Bilhah"}[0] = "Dan";
$kids_of_wife{"Jacob"}{"Bilhah"}[1] = "Naphtali";
$kids_of_wife{"Jacob"}{"Zilpah"}[0] = "Gad";
$kids_of_wife{"Jacob"}{"Zilpah"}[1] = "Asher";
1.3.4 Simplicities

Perl also has several ways of topicalizing. One important topicalizer is the package declaration. Suppose you want to talk about Camels in Perl. You'd likely start off your Camel module by saying:

package Camel;
Perl will assume from this point on that any unspecified verbs or nouns are about  Camels. It does this by automatically prefixing any global name with the module name " Camel::".

When you say package Camel, you're starting a new package. But sometimes you just want to borrow the nouns and verbs of an existing package. Perl lets you do that with a use declaration, which not only borrows verbs from another package, but also checks that the module you name is loaded in from disk. In fact, you must say something like:

use Camel;
before you say:
$fido = new Camel "Amelia";

In fact, some of the built-in modules don't actually introduce verbs at all, but simply warp the Perl language in various useful ways. These special modules we call pragmas. For instance, you'll often see people use the pragma strict, like this:

use strict;
1.4 Verbs
Many of the verbs in Perl are commands: they tell the Perl interpreter to do something. A statement starting with a verb is generally purely imperative and evaluated entirely for its side effects. (We sometimes call these verbs  procedures, especially when they're user-defined.)
Other verbs translate their input parameters into return values, just as a recipe tells you how to turn raw ingredients into something (hopefully) edible. We tend to call these verbs  functions.
Verbs are also sometimes called operators (when built-in), or subroutines (when user-defined). Historically, Perl required you to put an ampersand character ( &) on any calls to user-defined subroutines (see  $fido = &fetch(); earlier). But with Perl version 5, the ampersand became optional, so that user-defined verbs can now be called with the same syntax as built-in verbs ( $fido = fetch();). We still use the ampersand when talking about the  name of the routine, such as when we take a reference to it ( $fetcher = /&fetch;).
1.5 Filehandles
A filehandle is just a name you give to a file, device, socket, or pipe to help you remember which one you're talking about, and to hide some of the complexities of buffering and such. (Internally, filehandles are similar to streams from a language like C++ or I/O channels from BASIC.)
You create a filehandle and attach it to a file by using   open. The  open function takes at least two parameters: the filehandle and filename you want to associate it with. Perl also gives you some predefined (and preopened) filehandles.  STDIN is your program's normal input channel, while  STDOUT is your program's normal output channel. And  STDERR is an additional output channel that allows your program to make snide remarks off to the side while it transforms (or attempts to transform) your input into your output.
open(SESAME, "filename") # read from existing file
open(SESAME, "<filename") # (same thing, explicitly)
open(SESAME, ">filename") # create file and write to it
open(SESAME, ">>filename") # append to existing file
open(SESAME, "| output-pipe-command") # set up an output filter
open(SESAME, "input-pipe-command |") # set up an input filter
As you can see, the name you pick for the filehandle is arbitrary. Once opened, the filehandle  SESAME can be used to access the file or pipe until it is explicitly closed (with, you guessed it,  close(SESAME)), or until the filehandle is attached to another file by a subsequent  open on the same filehandle.
Once you've opened a filehandle for input, you can read a line using the line reading operator,  < >The angle operator encloses the filehandle ( <SESAME> ) you want to read lines from. The empty angle operator,  <> , will read lines from all the files specified on the command line, or  STDIN , if none were specified.

An example using the STDIN filehandle to read an answer supplied by the user would look something like this:

print STDOUT "Enter a number: "; # ask for a number
$number = <STDIN>; # input the number
print STDOUT "The number is $number./n"; # print the number

If you try the previous example, you may notice that you get an extra blank line. This happens because the line-reading operation does not automatically remove the newline from your input line (your input would be, for example, "9/n"). For those times when you do want to remove the newline, Perl provides the chop and chomp functions. chop will indiscriminately remove (and return) the last character of the string, while chomp will only remove the end of record marker (generally, "/n") and return the number of characters so removed. You'll often see this idiom for inputting a single line:

chop($number = <STDIN>); # input number and remove newline

2 Operators

2.1 Some Binary Arithmetic Operators
ExampleNameResult
$a + $bAdditionSum of $a and $b
$a * $bMultiplicationProduct of $a and $b
$a % $bModulusRemainder of $a divided by $b
$a ** $bExponentiation$a to the power of $b
2.2 String Operators

Perl defines a separate operator (.) for string concatenation:

$a = 123;
$b = 456;
print $a + $b; # prints 579
print $a . $b; # prints 123456

There's also a "multiply" operator for strings, called the repeat operator. Again, it's a separate operator (x) to keep it distinct from numeric multiplication:

$a = 123;
$b = 3;
print $a * $b; # prints 369
print $a x $b; # prints 123123123

The x operator may seem relatively worthless at first glance, but it is quite useful at times, especially for things like this:

print "-" x $scrwid, "/n";
which draws a line across your screen, presuming  $scrwid contains your screen width, and not your screw identifier.
2.3 Assignment Operators
You can do shortcut with almost any binary operator in Perl, even some that you can't do it with in C:
$line .= "/n"; # Append newline to $line.
$fill x= 80; # Make string $fill into 80 repeats of itself.
$val ||= "2"; # Set $val to 2 if it isn't already "true".
2.4 Unary Arithmetic Operators
ExampleNameResult
++$a, $a++AutoincrementAdd 1 to $a
--$a, $a--AutodecrementSubtract 1 from $a
The same with C.
2.5 Logical Operators
ExampleNameResult
$a && $bAnd$a if $a is false, $b otherwise
$a || $bOr$a if $a is true, $b otherwise
! $aNotTrue if $a is not true
$a and $bAnd$a if $a is false, $b otherwise
$a or $bOr$a if $a is true, $b otherwise
not $aNotTrue if $a is not true
$a xor $bXorTrue if $a or $b is true, but not both
2.6 Some Numeric and String Comparison Operators
ComparisonNumericStringReturn Value
Equal==eqTrue if $a is equal to $b
Not equal!=neTrue if $a is not equal to $b
Less than<ltTrue if $a is less than $b
Greater than>gtTrue if $a is greater than $b
Less than or equal<=leTrue if $a not greater than $b
Comparison<=>cmp0 if equal, 1 if $a greater, -1 if $b greater
2.7 Some File Test Operators

Here are a few of the file test operators:

ExampleNameResult
-e $aExistsTrue if file named in $a exists
-r $aReadableTrue if file named in $a is readable
-w $aWritableTrue if file named in $a is writable
-d $aDirectoryTrue if file named in $a is a directory
-f $aFileTrue if file named in $a is a regular file
-T $aText FileTrue if file named in $a is a text file

You might use them like this:

-e "/usr/bin/perl" or warn "Perl is improperly installed/n";
-f "/vmlinuz" and print "I see you are a friend of Linus/n";

3 Control Structures

3.1 Truth

Truth in Perl is always evaluated in a scalar context. Other than that, no type coercion is done. So here are the rules for the various kinds of values a scalar can hold:

  1. Any string is true except for "" and "0".

  2. Any number is true except for 0.

  3. Any reference is true.

  4. Any undefined value is false. 

For example:
"0.00" + 0 # would become the number 0 (coerced by the +), so false.
/$a # is a reference to $a, so true, even if $a is false.
undef() # is a function returning the undefined value, so false.
3.2 The if and unless statements
if ($city eq "New York") {
 print "New York is northeast of Washington, D.C./n";
}
elsif ($city eq "Chicago") {
 print "Chicago is northwest of Washington, D.C./n";
}
elsif ($city eq "Miami") {
 print "Miami is south of Washington, D.C. And much warmer!/n";
}
else {
 print "I don't know where $city is, sorry./n";
}
Note that: Braces are optional in C if you have a single statement, but the braces are not optional in Perl.
If you want to do something only if it is false, you can use unless statement:
unless ($destination eq $home) {
 print "I'm not going home./n";
}
3.3 Iterative (Looping) Constructs

Perl has four main iterative statement types: whileuntilfor, and foreach. These statements allow a Perl program to repeatedly execute the same code.

3.3.1 while & until

In fact, almost everything is designed to work smoothly in a conditional (Boolean) context. If you mention an array in a scalar context, the length of the array is returned. So you often see command-line arguments processed like this:

while (@ARGV) {
 process(shift @ARGV);
}
The  shift operator removes one element from the argument list each time through the loop (and returns that element). The loop automatically exits when array  @ARGV is exhausted, that is, when its length goes to 0. And 0 is already false in Perl. In a sense, the array itself has become "false". [21]

 

[21] This is how Perl programmers think. So there's no need to compare 0 to 0 to see if it's false. Despite the fact that other languages force you to, don't go out of your way to write explicit comparisons like while (@ARGV != 0). That's just inefficient for both you and the computer. And anyone who has to maintain your code.

3.3.2 for & foreach
"for" example:
for ($sold = 0; $sold < 10000; $sold += $purchase) {
 $available = 10000 - $sold;
 print "$available tickets are available. How many would you like: ";
 $purchase = <STDIN>;
 chomp($purchase); 
} 
foreach statement is used to execute the same code for each of a known set of scalars, such as an array:
foreach $user (@users) {
 if (-f "$home{$user}/.nexrc") {
 print "$user is cool... they use a perl-aware vi!/n";
 }
}
Unlike the  if and  while statements, which provide scalar context to a conditional expression, the  foreach statement provides a list context to the expression in parentheses. So the expression is evaluated to produce a list (not a scalar, even if there's only one scalar in the list).
3.4 Breaking out: next and last
The  next operator would allow you to skip to the end of your current loop iteration, and start the next iteration. The  last operator would allow you to skip to the end of your block, as if your loop's test condition had returned false.
foreach $user (@users) {
 if ($user eq "root" or $user eq "lp") {
 next;
 }
 if ($user eq "special") {
 print "Found the special account./n";
 # do some processing
 last;
 }
}
It's possible to break out of multilevel loops by labeling your loops and specifying which loop you want to break out of. Together with statement modifiers (another form of conditional which we'll talk about later), this can make for extremely readable loop exits (if you happen to think English is readable):
LINE: while ($line = <ARTICLE>) {
 last LINE if $line eq "/n"; # stop on first blank line
 next LINE if $line =~ /^#/; # skip comment lines
 # your ad here
}

4 Regular Expressions

First and foremost, they're used in conditionals to determine whether a string matches a particular pattern, because in a Boolean context they return true and false. So when you see something that looks like /foo/ in a conditional, you know you're looking at an ordinary pattern-matching operator:

if (/Windows 95/) { print "Time to upgrade?/n" }

 

Second, if you can locate patterns within a string, you can replace them with something else. So when you see something that looks like s/foo/bar/, you know it's asking Perl to substitute "bar" for "foo", if possible. We call that the substitution operator. It also happens to return true or false depending on whether it succeeded, but usually it's evaluated for its side effect:

s/Windows/Linux/;

 

Finally, patterns can specify not only where something is, but also where it isn't. So the split operator uses a regular expression to specify where the data isn't. That is, the regular expression defines the separators that delimit the fields of data. Our Average Example has a couple of trivial examples of this. Lines 5 and 12 each split strings on the space character in order to return a list of words. But you can split on any separator you can specify with a regular expression:

($good, $bad, $ugly) = split(/,/, "vi,emacs,teco");

Because certain classes like the alphabetics are so commonly used, Perl defines shortcuts for them:

NameASCII DefinitionCode
Whitespace[ /t/n/r/f]/s
Word character[a-zA-Z_0-9]/w
Digit[0-9]/d

Note that these match single characters. A /w will match any single word character, not an entire word. (Remember that + quantifier? You can say /w+ to match a word.) Perl also provides the negation of these classes by using the uppercased character, such as /D for a nondigit character.

 

" ." will match any character whatsoever.
4.1 Quantifiers

Certain combinations of minimum and maximum occur frequently, so Perl defines special quantifiers for them. We've already seen +, which is the same as {1,}, or "at least one of the preceding item". There is also *, which is the same as {0,}, or "zero or more of the preceding item", and ?, which is the same as {0,1}, or "zero or one of the preceding item" (that is, the preceding item is optional).

 

You need to be careful of a couple things about quantification. First of all, Perl quantifiers are by default greedy. This means that they will attempt to match as much as they can as long as the whole pattern still matches.The other point to be careful about is that regular expressions will try to match as early as possible. This even takes precedence over being greedy. Since scanning happens left-to-right, this means that the pattern will match as far left as possible, even if there is some other place where it could match longer.

 

 

There's one other thing you need to know. By default, quantifiers apply to a single preceding character, so /bam{2}/ will match "bamm" but not "bambam". To apply a quantifier to more than one character, use parentheses. So to match "bambam", use the pattern /(bam){2}/.

4.2 Minimal Matching

You can force non greedy, minimal matching by placing a question mark after any quantifier. That .*? will now try to match as few characters as possible, rather than as many as possible.

4.3 Nailing Things Down

The special symbol /b matches at a word boundary.

If it is the first character of a pattern, the caret (^) matches the "nothing" at the beginning of the string.

The dollar sign ($) works like the caret, except that it matches the "nothing" at the end of the string instead of the beginning.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值