Subroutines and References in Perl

Contents

Introduction

This page discusses both subroutines and references. They're on the samepage because references are often passed into and out of subroutines.

References

In Perl, you can pass only one kind of argument to a subroutine: a scalar.To pass any other kind of argument, you need to convert it to a scalar. Youdo that by passing a reference to it. A reference to anything is ascalar. If you're a C programmer you can think of a reference as a pointer(sort of).

The following table discusses the referencing and de-referencing of variables.Note that in the case of lists and hashes, you reference and dereference thelist or hash as a whole, not individual elements (at least not for the purposesof this discussion).
 

VariableInstantiating
the scalar
Instantiating a
reference to it
Referencing itDereferencing itAccessing an element
$scalar$scalar = "steve";$ref = \"steve";$ref = \$scalar$$ref or
${$ref}
N/A
@list@list = ("steve", "fred");$ref = ["steve", "fred"];$ref = \@list@{$ref}${$ref}[3]
$ref->[3]
%hash%hash = ("name" => "steve",
   "job" => "Troubleshooter");
$hash = {"name" => "steve",
   "job" => "Troubleshooter"};
$ref = \%hash%{$ref}${$ref}{"president"}
$ref->{"president"}
FILE  $ref = \*FILE{$ref} or scalar <$ref> 

These principles are demonstrated in the source code below. Note the followinganomolies:

  • A variable with a % sign won't evaluate out when placed in doublequotes.Variables with @ and $ will. I have no idea why.
sub doscalar
   {
   my($scalar) = "This is the scalar";
   my($ref) = \$scalar;
   print "${$ref}\n";   # Prints "This is the scalar".
   }

sub dolist
   {
   my(@list) = ("Element 0", "Element 1", "Element 2");
   my($ref) = \@list;
   print "@{$ref}\n";    # Prints "Element 0 Element 1 Element 2".
   print "${$ref}[1]\n"; # Prints "Element 1".
   }

sub dohash
   {
   my(%hash) = ("president"=>"Clinton",
                "vice president" => "Gore",
                "intern" => "Lewinsky");
   my($ref) = \%hash;

   # NOTE: Can't put %{ref} inside doublequotes!!! Doesn't work!!!
   # Prints "internLewinskyvice presidentGorepresidentClinton".
   # NOTE: Hash elements might print in any order!
   print %{$ref}; print "\n";

   # NOTE: OK to put ${$ref}{} in doublequotes.
   # NOTE: Prints "Gore".
   print "${$ref}{'vice president'}\n";
   }

&doscalar;
&dolist;
&dohash;

Subroutines: A Discussion

Subroutines are the basic computer science methodology to divide tasks intosubtasks. They take zero or more scalar arguments as input (and possiblyoutput), and they return zero or one scalar as a return value. Note thatthe scalar arguments and/or return values can be references to lists, hashes,or any other type of complex data, so the possibilities are limitless.

In computer science, there are two methods of passing arguments to a subroutine:

  • By value
  • By reference
When passing by value, the language makes a copy of the argument, and allaccess inside the subroutine is to that copy. Therefore, changes made insidethe subroutine do not effect the calling routine. Such arguments cannotbe used as output from the subroutine. The preferred method of outputtingfrom a subroutine is via the return value. Unfortunately, the Perl languagedoesn't support it. Instead, the programmer must explicitly make the copyinside the subroutine.

In general, I believe it's best to use arguments as input-only.

When passing by reference, the language makes the argument's exact variableavailable inside the subroutine, so any changes the subroutine makes to theargument affect the arguments value in the calling procedure (after the subroutinecall, of course). This tends to reduce encapsulation, as there's no way oftelling in the calling routine that the called routine changed it. Passingby reference harkens back to the days of global values, and in general createsless robust code.

All arguments in Perl are passed by reference! If the programmer wishesto make a copy of the argument to simulate passing by value (and I believein most cases he should), he must explicitly make the copy in the subroutineand not otherwise access the original arguments.


NOTE: Modern Perl versions (5.003 and newer) enable youto do function prototyping somewhat similar to C. Doing so lessens the chancefor wierd runtime errors. Because this page was created before Perl prototypingwas common, much of its code is old school. This will change as time goeson.

Danger! Warning! Peligro! Achtung! Watchit!
As you would probably imagine, subroutine order matters when prototyping.A subroutine call must call a subroutine defined previously. The danger liesin the fact that if you do not, you get a non-obvious runtime error, not acompile error.
SUBROUTINE ORDER MATTERS INPROTOTYPING

Bare Bones Subroutine Syntax

Old school, no prototyping
Calling the subroutineConstructing the subroutine
&mysub();
sub mysub
  {
  }
Note that in the above the ampersand (&) is used before the subroutinecall, and that no parentheses are used in the function definition.
 
Prototyping, no arguments
Calling the subroutineConstructing the subroutine
mysub();
sub mysub()
  {
  }
The preceding is prototyped. Note that there is no ampersand before the function.Note also that the function definition has parentheses, but because thereare no args expected those parens are empty. Contrast that with the following,which expects two scalars. Experiment and note that Perl gripes when yourprototype and call don't match.
 
Prototyping, two string arguments
Calling the subroutineConstructing the subroutine
mysub($filename, $title);
sub mysub($$)
  {
  }

Returning a Scalar

Use the return statement.
 
Calling the subroutineConstructing the subroutine
my($name) = &getName();
print "$name\n";
# Prints "Bill Clinton"
sub getName
    {
    return("Bill Clinton");
    }

NOTE: In C++ there are cases where the calling code can "reach into" thefunction via the returned pointer or reference. This is appearantly not trueof passed back scalars. Check out this code:

$GlobalName = "Clinton";

sub getGlobalName
    {
    return($GlobalName);
    }

print "Before: " . &getGlobalName() . "\n";
$ref = \&getGlobalName();
$$ref = "Gore";
print "After: " . &getGlobalName() . "\n";
#All print statements printed "Clinton"
I have been unable to hack into a subroutine via its scalar return. If youknow of a way it can be done, please let me know, as this would bea horrid violation of encapsulation.

Returning a List

Calling the subroutineConstructing the subroutine
my($first, $last) = &getFnameLname();
print "$last, $first\n";

# Prints "Clinton, Bill"
sub getFnameLname
    {
    return("Bill", "Clinton");
    }

Returning a Hash

Calling the subroutineConstructing the subroutine
my(%officers) = &getOfficers();
print $officers{"vice president"};

# prints Al Gore
sub getOfficers
    {
    return("president"=>"Bill Clinton",
           "vice president"=>"Al Gore",
           "intern"=>"Monica Lewinsky"
           );
    }

Subroutine With Scalar Input/OutputArguments

Arguments to a subroutine are accessible inside the subroutine as list @_.Any change the subroutine performs to @_ or any of its members like $_[0],$_[1], etc, are changes to the original argument. HOWEVER, assigning @_ orits elements to other variables makes a separate copy. Changes to the separatecopy are unknown outside of the subroutine.

For readability therefore, on output or input/output arguments it is thereforeimportant to use the output argument as $_[] or @_ throughout the functionto let the reader know it's an output argument.

Below is how to change the value of an argument outside the function.
 

Calling the subroutineConstructing the subroutine
my($mm, $dd, $yyyy) = ("12", "10", "1998");
print "Before: $mm/$dd/$yyyy\n";
&firstOfNextMonth($mm, $dd, $yyyy);
print "After : $mm/$dd/$yyyy\n";
# Second print will print 01/01/1999
sub firstOfNextMonth
    {
    $_[1] = "01";
    $_[0] = $_[0] + 1;
    if($_[0] > 12)
      {
      $_[0] = "01";
      $_[2]++;
      }
    }

By the way, the above is an excellent example of the advantages of a looselytyped language. Note the implicit conversions between string and integer.

Subroutine With Scalar Input-Only Arguments

Arguments to a subroutine are accessible inside the subroutine as list @_.Any change the subroutine performs to @_ or any of its members like $_[0],$_[1], etc, are changes to the original argument. HOWEVER, assigning @_ orits elements to other variables makes a separate copy. Changes to the separatecopy are unknown outside of the subroutine.

For readability, it is therefore important to immediately assign the input-onlyarguments to local variables, and only work on the local variables.

Below is how to print changed values without changing the arguments outsidethe functions scope.
 

Calling the subroutineConstructing the subroutine
my($mm, $dd, $yyyy) = ("12", "10", "1998");
print "Before: $mm/$dd/$yyyy\n";
&printFirstOfNextMonth($mm, $dd, $yyyy);
print "After : $mm/$dd/$yyyy\n";
# Before and after will print 12/10/1998.
# Inside will print 01/01/1999
sub printFirstOfNextMonth
    {
    my($mm, $dd, $yyyy) = @_;
    $dd = "01";
    $mm = $mm + 1;
    if($mm > 12)
      {
      $mm = "01";
      $yyyy++;
      }
    print "Inside: $mm/$dd/$yyyy\n";
    }

Subroutine With List Input/Output Arguments

Arguments to a subroutine are accessible inside the subroutine as list @_,which is a list of scalars. Any change the subroutine performs to @_ or anyof its members like $_[0], $_[1], etc, are changes to the original argument.HOWEVER, assigning @_ or its elements to other variables makes a separatecopy. Changes to the separate copy are unknown outside of the subroutine.

For readability therefore, on output or input/output arguments it is thereforeimportant to use the output argument as $_[] or @_ throughout the functionto let the reader know it's an output argument.

If a member of @_ (in other words, an argument) is a reference to a list,it can be dereferenced and used inside the subroutine.

Here's an example of a listcat() function which appends the second listto the first. From that point forward the caller will see the new value ofthe first argument:
 

Calling the subroutineConstructing the subroutine
my(@languages) = ("C","C++","Delphi");
my(@newlanguages) = ("Java","Perl");
print "Before: @languages\n";
&listcat(\@languages, \@newlanguages);
print "After : @languages\n";

# Before prints "C C++ Delphi"
# After prints "C C++ Delphi Java PERL"

sub listcat
   {
   # Purpose of @append is only to
   # self-document input-only status
   my(@append) = @{$_[1]};

   my($temp);
   foreach $temp (@append)
      {
      # note direct usage of arg0
      push(@{$_[0]}, $temp);  
      }
   }

By the way, the above is an excellent example of the advantages of a looselytyped language. Note the implicit conversions between string and integer.

Subroutine With List Input-Only Arguments

Arguments to a subroutine are accessible inside the subroutine as list @_.Any change the subroutine performs to @_ or any of its members like $_[0],$_[1], etc, are changes to the original argument. HOWEVER, assigning @_ orits elements to other variables makes a separate copy. Changes to the separatecopy are unknown outside of the subroutine.

For readability, it is therefore important to immediately assign the input-onlyarguments to local variables, and only work on the local variables.

If a member of @_ (in other words, an argument) is a reference to a list,it can be dereferenced and used inside the subroutine.

Here's an example of an improved listcat() function which appends thesecond list to the first without affecting the first outside the subroutine.Instead, it returns the total string.
 

Calling the subroutineConstructing the subroutine
my(@languages) = ("C","C++","Delphi");
my(@newlanguages) = ("Java","PERL");
print "Before: @languages\n";
print "Inside: ";
print &listcat(\@languages,\@newlanguages);
print "\n";
print "After : @languages\n";

# Before and after prints "C C++ Delphi"
# Inside prints "CC++DelphiJavaPERL"

sub listcat
   {
   # Purpose of @append is only to
   # self-document input-only status
   my(@original) = @{$_[0]};
   my(@append) = @{$_[1]};
   my($temp);
   foreach $temp (@append)
      {
      push(@original, $temp);  # note direct usage
      }
   return(@original);
   }

Use parentheses with the shift command!

The following generates an error:
sub handleArray
  {
  my(@localArray) = @{shift};
  my($element);
  foreach $element (@localArray) {print $element . "\n";}
  }
&handleArray(\@globalArray);


But once you place the shift command in parens, everything's fine:

sub handleArray
  {
  my(@localArray) = @{(shift)};
  my($element);
  foreach $element (@localArray) {print $element . "\n";}
  }
&handleArray(\@globalArray);

Using prototypes

Be careful prototyping with lists:
sub printList(@$) {print @{(shift)}; print shift; print "\n";};
printList(\@globalArray);
The preceding gives some runtime warnings. But the call is missing an arg-- it shouldn't run at all. Instead, use \@ for the list in the prototype,and pass just the list in the call, as follows:
sub printList(\@$) {print @{(shift)}; print shift; print "\n";};
printList(@globalArray);
Now it gives you a "not enough arguments errors, and ends with a compileerror, which is what you want. Place an additional scalar in the call sothe call matches the prototype, and it runs perfectly:
sub printList(\@$) {print @{(shift)}; print shift; print "\n";};
printList(@globalArray, "Hello World");
Remember, using an unbackslashed @ in the prototype defeats the purpose ofprototyping. Precede the @ with a backslash. Note that this is also true forpassed hashes (%). Unless you have a very good reason to do otherwise, precedeall @ and % with backslashes in the prototype.

Subroutine With Hash Input/Output Arguments

Arguments to a subroutine are accessible inside the subroutine as list @_,which is a list of scalars. Any change the subroutine performs to @_ or anyof its members like $_[0], $_[1], etc, are changes to the original argument.HOWEVER, assigning @_ or its elements to other variables makes a separatecopy. Changes to the separate copy are unknown outside of the subroutine.

For readability therefore, on output or input/output arguments it is thereforeimportant to use the output argument as $_[] or @_ throughout the functionto let the reader know it's an output argument.

If a member of @_ (in other words, an argument) is a reference to a hash,it can be dereferenced and used inside the subroutine.

Here's an example of a setGlobals() function which takes an existing %globalspassed in as a reference argument and sets the proper elements. From thatpoint forward the caller will see the new value of the elements:
 

Calling the subroutineConstructing the subroutine
%globals;     
&setGlobals(\%globals);
&printGlobals(\%globals);

sub setGlobals
   {
   ${$_[0]}{"currentdir"} = "/corporate/data";
   ${$_[0]}{"programdir"} = "/corporate/bin";
   ${$_[0]}{"programver"} = "5.21";
   ${$_[0]}{"accesslevel"} = "root";
   }

Subroutine With Hash Input-Only Arguments

Arguments to a subroutine are accessible inside the subroutine as list @_.Any change the subroutine performs to @_ or any of its members like $_[0],$_[1], etc, are changes to the original argument. HOWEVER, assigning @_ orits elements to other variables makes a separate copy. Changes to the separatecopy are unknown outside of the subroutine.

For readability, it is therefore important to immediately assign the input-onlyarguments to local variables, and only work on the local variables.

If a member of @_ (in other words, an argument) is a reference to a list,it can be dereferenced and used inside the subroutine.

Here's an example of an improved listcat() function which appends thesecond list to the first without affecting the first outside the subroutine.Instead, it returns the total string.
 

Calling the subroutineConstructing the subroutine
%globals;
# ...
# set globals
# ...
# now print globals
&printGlobals(\%globals);

sub printGlobals
   {
   # copy of argument precludes extra-scope change
   my(%globals) = %{$_[0]};
   print "Current Dir: $globals{'currentdir'}\n";
   print "Program Dir: $globals{'programdir'}\n";
   print "Version    : $globals{'programver'}\n";
   print "Accesslevel: $globals{'accesslevel'}\n";
   }


Dereferencing in Place: The ->Operator

By FAR the easiest way to handle references, especially when they're beingpassed into and out of subroutines, is the -> operator. This operatorworks the same as it does in C. It means "element so and so of the dereferencedreference". This is ABSOLUTELY vital when using objects, because most Perlobjects are references to a hash. Nest a few of those, and without the ->operator you're dead meat. The -> operator also enables you to easily modify arguments in place, which is vital in typical OOP applications.

One typical usage is an object containing a list of hashes. The list of hashescould easily represent a data table, with array elements being rows (records)and hash elements being columns (fields). Here's how it's easily done in Perl:

#!/usr/bin/perl -w
use strict;

package Me;

sub new
	{
	my($type) = $_[0];
	my($self) = {};
	$self->{'name'} = 'Bill Brown';

	### Make a reference to an empty array of jobs
	$self->{'jobs'} = [];

	### Now make each element of array referenced by
	### $self->{'jobs'} a REFERENCE to a hash!
	$self->{'jobs'}->[0]={'ystart'=>'1998','yend'=>'1999','desc'=>'Bus driver'};
	$self->{'jobs'}->[1]={'ystart'=>'1999','yend'=>'1999','desc'=>'Bus mechanic'};
	$self->{'jobs'}->[2]={'ystart'=>'1999','yend'=>'2001','desc'=>'Software Developer'};

	bless($self, $type);
	return($self);
	}

### showResume is coded to show off the -> operator. In real
### life you'd probably use a foreach loop, but the following
### while(1) loop better demonstrates nested -> operators.
sub showResume
	{
	my($self)=$_[0];
	print "Resume of " . $self->{'name'} . "\n\n";
	print "Start\tEnd\tDescription\n";
	my $ss = 0;

	# Loop through array referenced by $self->{'jobs'},
	# and for each subscript, print the value corresponding
	# to the hash key. In other words, print every field of
	# every record of the jobs array
	while (1)
		{
		last unless defined $self->{'jobs'}->[$ss];
		print "$self->{'jobs'}->[$ss]->{'ystart'}\t";
		print "$self->{'jobs'}->[$ss]->{'yend'}\t";
		print "$self->{'jobs'}->[$ss]->{'desc'}\n";
		$ss++;
		}
	}

package Main;

my $me = Me->new();
$me->showResume();
print "\nFirst job was $me->{'jobs'}->[0]->{'desc'}.\n";

I think you'll agree that the reference nesting in the preceding code wouldhave been extremely hard to understand without the in-place dereferencingprovided by the -> operator. The following is the resulting output:

[slitt@mydesk slitt]$ ./test.pl
Resume of Bill Brown

Start   End     Description
1998    1999    Bus driver
1999    1999    Bus mechanic
1999    2001    Software Developer

First job was Bus driver.
[slitt@mydesk slitt]$

转载自:http://www.troubleshooters.com/codecorn/littperl/perlsub.htm





评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值