PERL常见问题解答--FAQ(4)--Data: Strings

最新推荐文章于 2024-09-07 15:19:09 发布

Hank_Huang

最新推荐文章于 2024-09-07 15:19:09 发布

阅读量1.3k

点赞数

分类专栏： Perl常见问题解答文章标签： perl string character subroutine variables numbers

Perl常见问题解答专栏收录该内容

5 篇文章 0 订阅

订阅专栏

How do I validate input?

The answer to this question is usually a regular expression, perhaps with auxiliary logic. See the more specific questions (numbers, email addresses, etc.) for details.

How do I unescape a string?

It depends just what you mean by ``escape''. URL escapes are dealt with in the perlfaq9 manpage. Shell escapes with the backslash (/) character are removed with:

    s///(.)/$1/g;

Note that this won't expand /n or /t or any other special escapes.

How do I remove consecutive pairs of characters?

To turn ``abbcccd'' into ``abccd'':

    s/(.)/1/$1/g;

How do I expand function calls in a string?

This is documented in the perlref manpage. In general, this is fraught with quoting and readability problems, but it is possible. To interpolate a subroutine call (in a list context) into a string:

    print "My sub returned @{[mysub(1,2,3)]} that time./n";

If you prefer scalar context, similar chicanery is also useful for arbitrary expressions:

    print "That yields ${/($n + 5)} widgets/n";

See also ``How can I expand variables in text strings?'' in this section of the FAQ.

How do I find matching/nesting anything?

This isn't something that can be tackled in one regular expression, no matter how complicated. To find something between two single characters, a pattern like /x([^x]*)x/ will get the intervening bits in $1. For multiple ones, then something more like /alpha(.*?)omega/ would be needed. But none of these deals with nested patterns, nor can they. For that you'll have to write a parser.

How do I reverse a string?

Use reverse() in a scalar context, as documented in reverse.

    $reversed = reverse $string;

How do I expand tabs in a string?

You can do it the old-fashioned way:

    1 while $string =~ s//t+/' ' x (length(___FCKpd___5amp;) * 8 - length(
 
 
How do I validate input? 
The answer to this question is usually a regular expression, perhaps with auxiliary logic. See the more specific questions (numbers, email addresses, etc.) for details.
 
 

 
 
How do I unescape a string? 
It depends just what you mean by ``escape''. URL escapes are dealt with in the perlfaq9 manpage. Shell escapes with the backslash (/) character are removed with:
 
    s///(.)/$1/g;

Note that this won't expand /n or /t or any other special escapes. 
 
 

 
 
How do I remove consecutive pairs of characters? 
To turn ``abbcccd'' into ``abccd'':
 
    s/(.)/1/$1/g;

 
 

 
 
How do I expand function calls in a string? 
This is documented in the perlref manpage. In general, this is fraught with quoting and readability problems, but it is possible. To interpolate a subroutine call (in a list context) into a string:
 
    print "My sub returned @{[mysub(1,2,3)]} that time./n";

If you prefer scalar context, similar chicanery is also useful for arbitrary expressions: 
 
    print "That yields ${/($n + 5)} widgets/n";

See also ``How can I expand variables in text strings?'' in this section of the FAQ. 
 
 

 
 
How do I find matching/nesting anything? 
This isn't something that can be tackled in one regular expression, no matter how complicated. To find something between two single characters, a pattern like /x([^x]*)x/ will get the intervening bits in $1. For multiple ones, then something more like /alpha(.*?)omega/ would be needed. But none of these deals with nested patterns, nor can they. For that you'll have to write a parser.
 
 

 
 
How do I reverse a string? 
Use reverse() in a scalar context, as documented in reverse.
 
    $reversed = reverse $string;

 
 

 
 
How do I expand tabs in a string? 
You can do it the old-fashioned way:
 
) % 8)/e;

Or you can just use the Text::Tabs module (part of the standard perl distribution). 
 
    use Text::Tabs;
    @expanded_lines = expand(@lines_with_tabs);

 
 

 
 
How do I reformat a paragraph? 
Use Text::Wrap (part of the standard perl distribution):
 
    use Text::Wrap;
    print wrap("/t", '  ', @paragraphs);

The paragraphs you give to Text::Wrap may not contain embedded newlines. Text::Wrap doesn't justify the lines (flush-right). 
 
 

 
 
How can I access/change the first N letters of a string? 
There are many ways. If you just want to grab a copy, use substr:
 
    $first_byte = substr($a, 0, 1);

If you want to modify part of a string, the simplest way is often to use substr() as an lvalue: 
 
    substr($a, 0, 3) = "Tom";

Although those with a regexp kind of thought process will likely prefer 
 
    $a =~ s/^.../Tom/;

 
 

 
 
How do I change the Nth occurrence of something? 
You have to keep track. For example, let's say you want to change the fifth occurrence of ``whoever'' or ``whomever'' into ``whosoever'' or ``whomsoever'', case insensitively.
 
    $count = 0;
    s{((whom?)ever)}{
        ++$count == 5           # is it the 5th?
            ? "${2}soever"      # yes, swap
            : $1                # renege and leave it there
    }igex;

 
 

 
 
How can I count the number of occurrences of a substring within a string? 
There are a number of ways, with varying efficiency: If you want a count of a certain single character (X) within a string, you can use the tr/// function like so:
 
    $string = "ThisXlineXhasXsomeXx'sXinXit":
    $count = ($string =~ tr/X//);
    print "There are $count X charcters in the string";

This is fine if you are just looking for a single character. However, if you are trying to count multiple character substrings within a larger string, tr/// won't work. What you can do is wrap a while() loop around a global pattern match. For example, let's count negative integers: 
 
    $string = "-9 55 48 -2 23 -76 4 14 -44";
    while ($string =~ /-/d+/g) { $count++ }
    print "There are $count negative numbers in the string";

 
 

 
 
How do I capitalize all the words on one line? 
To make the first letter of each word upper case:
 
        $line =~ s//b(/w)//U$1/g;

This has the strange effect of turning ``don't do it'' into ``Don'T Do It''. Sometimes you might want this, instead (Suggested by Brian Foy <comdog@computerdog.com>): 
 
    $string =~ s/ (
                 (^/w)    #at the beginning of the line
                   |      # or
                 (/s/w)   #preceded by whitespace
                   )
                //U$1/xg;
    $string =~ /([/w']+)//u/L$1/g;

To make the whole line upper case: 
 
        $line = uc($line);

To force each word to be lower case, with the first letter upper case: 
 
        $line =~ s/(/w+)//u/L$1/g;

 
 

 
 
How can I split a [character] delimited string except when inside [character]? (Comma-separated files) 
Take the example case of trying to split a string that is comma-separated into its different fields. (We'll pretend you said comma-separated, not comma-delimited, which is different and almost never what you mean.) You can't use split(/,/) because you shouldn't split if the comma is inside quotes. For example, take a data line like this:
 
    SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"

Due to the restriction of the quotes, this is a fairly complex problem. Thankfully, we have Jeffrey Friedl, author of a highly recommended book on regular expressions, to handle these for us. He suggests (assuming your string is contained in $text): 
 
     @new = ();
     push(@new, ___FCKpd___19) while $text =~ m{
         "([^/"//]*(?://.[^/"//]*)*)",?  # groups the phrase inside the quotes
       | ([^,]+),?
       | ,
     }gx;
     push(@new, undef) if substr($text,-1,1) eq ',';

If you want to represent quotation marks inside a quotation-mark-delimited field, escape them with backslashes (eg, C<``like /''this/``''). Unescaping them is a task addressed earlier in this section. 
Alternatively, the Text::ParseWords module (part of the standard perl distribution) lets you say: 
 
    use Text::ParseWords;
    @new = quotewords(",", 0, $text);

 
 

 
 
How do I strip blank space from the beginning/end of a string? 
The simplest approach, albeit not the fastest, is probably like this:
 
    $string =~ s/^/s*(.*?)/s*$/$1/;

It would be faster to do this in two steps: 
 
    $string =~ s/^/s+//;
    $string =~ s//s+$//;

Or more nicely written as: 
 
    for ($string) {
        s/^/s+//;
        s//s+$//;
    }

 
 

 
 
How do I extract selected columns from a string? 
Use substr() or unpack(), both documented in the perlfunc manpage.
 
 

 
 
How do I find the soundex value of a string? 
Use the standard Text::Soundex module distributed with perl.
 
 

 
 
How can I expand variables in text strings? 
Let's assume that you have a string like:
 
    $text = 'this has a $foo in it and a $bar';
    $text =~ s//$(/w+)/${$1}/g;

Before version 5 of perl, this had to be done with a double-eval substitution: 
 
    $text =~ s/(/$/w+)/$1/eeg;

Which is bizarre enough that you'll probably actually need an EEG afterwards. :-) 
See also ``How do I expand function calls in a string?'' in this section of the FAQ. 
 
 

 
 
What's wrong with always quoting "$vars"? 
The problem is that those double-quotes force stringification, coercing numbers and references into strings, even when you don't want them to be.
If you get used to writing odd things like these: 
 
    print "$var";       # BAD
    $new = "$old";      # BAD
    somefunc("$var");   # BAD

You'll be in trouble. Those should (in 99.8% of the cases) be the simpler and more direct: 
 
    print $var;
    $new = $old;
    somefunc($var);

Otherwise, besides slowing you down, you're going to break code when the thing in the scalar is actually neither a string nor a number, but a reference: 
 
    func(/@array);
    sub func {
        my $aref = shift;
        my $oref = "$aref";  # WRONG
    }

You can also get into subtle problems on those few operations in Perl that actually do care about the difference between a string and a number, such as the magical ++ autoincrement operator or the syscall() function. 
 
 

 
 
Why don't my <
   
   There must be no space after the << part. 
Check for these three things:
 

 
 
There (probably) should be a semicolon at the end. 
You can't (easily) have any space in front of the tag.