PERL常见问题解答--FAQ(4)--Data: Strings

转载 2006年06月19日 17:33:00

How do I validate input?

The answer to this question is usually a regular expression, perhaps with auxiliary logic. See the more specific questions (numbers, email addresses, etc.) for details.

 

 


How do I unescape a string?

It depends just what you mean by ``escape''. URL escapes are dealt with in the perlfaq9 manpage. Shell escapes with the backslash (/) character are removed with:

 

    s///(.)/$1/g;

Note that this won't expand /n or /t or any other special escapes.

 

 


How do I remove consecutive pairs of characters?

To turn ``abbcccd'' into ``abccd'':

 

    s/(.)/1/$1/g;

 

 


How do I expand function calls in a string?

This is documented in the perlref manpage. In general, this is fraught with quoting and readability problems, but it is possible. To interpolate a subroutine call (in a list context) into a string:

 

    print "My sub returned @{[mysub(1,2,3)]} that time./n";

If you prefer scalar context, similar chicanery is also useful for arbitrary expressions:

 

    print "That yields ${/($n + 5)} widgets/n";

See also ``How can I expand variables in text strings?'' in this section of the FAQ.

 

 


How do I find matching/nesting anything?

This isn't something that can be tackled in one regular expression, no matter how complicated. To find something between two single characters, a pattern like /x([^x]*)x/ will get the intervening bits in $1. For multiple ones, then something more like /alpha(.*?)omega/ would be needed. But none of these deals with nested patterns, nor can they. For that you'll have to write a parser.

 

 


How do I reverse a string?

Use reverse() in a scalar context, as documented in reverse.

 

    $reversed = reverse $string;

 

 


How do I expand tabs in a string?

You can do it the old-fashioned way:

 

    1 while $string =~ s//t+/' ' x (length(___FCKpd___5amp;) * 8 - length(

How do I validate input?

The answer to this question is usually a regular expression, perhaps with auxiliary logic. See the more specific questions (numbers, email addresses, etc.) for details.

 

 


How do I unescape a string?

It depends just what you mean by ``escape''. URL escapes are dealt with in the perlfaq9 manpage. Shell escapes with the backslash (/) character are removed with:

 

    s///(.)/$1/g;

Note that this won't expand /n or /t or any other special escapes.

 

 


How do I remove consecutive pairs of characters?

To turn ``abbcccd'' into ``abccd'':

 

    s/(.)/1/$1/g;

 

 


How do I expand function calls in a string?

This is documented in the perlref manpage. In general, this is fraught with quoting and readability problems, but it is possible. To interpolate a subroutine call (in a list context) into a string:

 

    print "My sub returned @{[mysub(1,2,3)]} that time./n";

If you prefer scalar context, similar chicanery is also useful for arbitrary expressions:

 

    print "That yields ${/($n + 5)} widgets/n";

See also ``How can I expand variables in text strings?'' in this section of the FAQ.

 

 


How do I find matching/nesting anything?

This isn't something that can be tackled in one regular expression, no matter how complicated. To find something between two single characters, a pattern like /x([^x]*)x/ will get the intervening bits in $1. For multiple ones, then something more like /alpha(.*?)omega/ would be needed. But none of these deals with nested patterns, nor can they. For that you'll have to write a parser.

 

 


How do I reverse a string?

Use reverse() in a scalar context, as documented in reverse.

 

    $reversed = reverse $string;

 

 


How do I expand tabs in a string?

You can do it the old-fashioned way:

 

) % 8)/e;

Or you can just use the Text::Tabs module (part of the standard perl distribution).

 

    use Text::Tabs;
    @expanded_lines = expand(@lines_with_tabs);

 

 


How do I reformat a paragraph?

Use Text::Wrap (part of the standard perl distribution):

 

    use Text::Wrap;
    print wrap("/t", '  ', @paragraphs);

The paragraphs you give to Text::Wrap may not contain embedded newlines. Text::Wrap doesn't justify the lines (flush-right).

 

 


How can I access/change the first N letters of a string?

There are many ways. If you just want to grab a copy, use substr:

 

    $first_byte = substr($a, 0, 1);

If you want to modify part of a string, the simplest way is often to use substr() as an lvalue:

 

    substr($a, 0, 3) = "Tom";

Although those with a regexp kind of thought process will likely prefer

 

    $a =~ s/^.../Tom/;

 

 


How do I change the Nth occurrence of something?

You have to keep track. For example, let's say you want to change the fifth occurrence of ``whoever'' or ``whomever'' into ``whosoever'' or ``whomsoever'', case insensitively.

 

    $count = 0;
    s{((whom?)ever)}{
        ++$count == 5           # is it the 5th?
            ? "${2}soever"      # yes, swap
            : $1                # renege and leave it there
    }igex;

 

 


How can I count the number of occurrences of a substring within a string?

There are a number of ways, with varying efficiency: If you want a count of a certain single character (X) within a string, you can use the tr/// function like so:

 

    $string = "ThisXlineXhasXsomeXx'sXinXit":
    $count = ($string =~ tr/X//);
    print "There are $count X charcters in the string";

This is fine if you are just looking for a single character. However, if you are trying to count multiple character substrings within a larger string, tr/// won't work. What you can do is wrap a while() loop around a global pattern match. For example, let's count negative integers:

 

    $string = "-9 55 48 -2 23 -76 4 14 -44";
    while ($string =~ /-/d+/g) { $count++ }
    print "There are $count negative numbers in the string";

 

 


How do I capitalize all the words on one line?

To make the first letter of each word upper case:

 

        $line =~ s//b(/w)//U$1/g;

This has the strange effect of turning ``don't do it'' into ``Don'T Do It''. Sometimes you might want this, instead (Suggested by Brian Foy <comdog@computerdog.com>):

 

    $string =~ s/ (
                 (^/w)    #at the beginning of the line
                   |      # or
                 (/s/w)   #preceded by whitespace
                   )
                //U$1/xg;
    $string =~ /([/w']+)//u/L$1/g;

To make the whole line upper case:

 

        $line = uc($line);

To force each word to be lower case, with the first letter upper case:

 

        $line =~ s/(/w+)//u/L$1/g;

 

 


How can I split a [character] delimited string except when inside [character]? (Comma-separated files)

Take the example case of trying to split a string that is comma-separated into its different fields. (We'll pretend you said comma-separated, not comma-delimited, which is different and almost never what you mean.) You can't use split(/,/) because you shouldn't split if the comma is inside quotes. For example, take a data line like this:

 

    SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"

Due to the restriction of the quotes, this is a fairly complex problem. Thankfully, we have Jeffrey Friedl, author of a highly recommended book on regular expressions, to handle these for us. He suggests (assuming your string is contained in $text):

 

     @new = ();
     push(@new, ___FCKpd___19) while $text =~ m{
         "([^/"//]*(?://.[^/"//]*)*)",?  # groups the phrase inside the quotes
       | ([^,]+),?
       | ,
     }gx;
     push(@new, undef) if substr($text,-1,1) eq ',';

If you want to represent quotation marks inside a quotation-mark-delimited field, escape them with backslashes (eg, C<``like /''this/``''). Unescaping them is a task addressed earlier in this section.

Alternatively, the Text::ParseWords module (part of the standard perl distribution) lets you say:

 

    use Text::ParseWords;
    @new = quotewords(",", 0, $text);

 

 


How do I strip blank space from the beginning/end of a string?

The simplest approach, albeit not the fastest, is probably like this:

 

    $string =~ s/^/s*(.*?)/s*$/$1/;

It would be faster to do this in two steps:

 

    $string =~ s/^/s+//;
    $string =~ s//s+$//;

Or more nicely written as:

 

    for ($string) {
        s/^/s+//;
        s//s+$//;
    }

 

 


How do I extract selected columns from a string?

Use substr() or unpack(), both documented in the perlfunc manpage.

 

 


How do I find the soundex value of a string?

Use the standard Text::Soundex module distributed with perl.

 

 


How can I expand variables in text strings?

Let's assume that you have a string like:

 

    $text = 'this has a $foo in it and a $bar';
    $text =~ s//$(/w+)/${$1}/g;

Before version 5 of perl, this had to be done with a double-eval substitution:

 

    $text =~ s/(/$/w+)/$1/eeg;

Which is bizarre enough that you'll probably actually need an EEG afterwards. :-)

See also ``How do I expand function calls in a string?'' in this section of the FAQ.

 

 


What's wrong with always quoting "$vars"?

The problem is that those double-quotes force stringification, coercing numbers and references into strings, even when you don't want them to be.

If you get used to writing odd things like these:

 

    print "$var";       # BAD
    $new = "$old";      # BAD
    somefunc("$var");   # BAD

You'll be in trouble. Those should (in 99.8% of the cases) be the simpler and more direct:

 

    print $var;
    $new = $old;
    somefunc($var);

Otherwise, besides slowing you down, you're going to break code when the thing in the scalar is actually neither a string nor a number, but a reference:

 

    func(/@array);
    sub func {
        my $aref = shift;
        my $oref = "$aref";  # WRONG
    }

You can also get into subtle problems on those few operations in Perl that actually do care about the difference between a string and a number, such as the magical ++ autoincrement operator or the syscall() function.

 

 


Why don't my <There must be no space after the << part.

Check for these three things:

 

  1. There (probably) should be a semicolon at the end.
  2. You can't (easily) have any space in front of the tag.

 


PERL常见问题解答--FAQ(4)--Data: Numbers

Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting...
  • Hank_Huang
  • Hank_Huang
  • 2006年06月19日 17:19
  • 715

PERL常见问题解答--FAQ(4)--Data: Dates

How do I find the week-of-the-year/day-of-the-year? How can I compare two date strings? How can I ta...
  • Hank_Huang
  • Hank_Huang
  • 2006年06月19日 17:22
  • 780

PERL常见问题解答--FAQ(3)

我如何作 (任何事)? 如何以互动的方式使用 Perl? 有 Perl shell吗? 如何替我的 Perl程式除虫? 如何检测 (profile)我的 per...
  • Hank_Huang
  • Hank_Huang
  • 2006年06月19日 17:06
  • 1582

PERL常见问题解答--FAQ(2)

哪些平台上有 Perl?要到哪里去找? 要如合取得以执行档形式发行的 Perl? 我的系统里没有 C编译器。要如何编译 perl? 我直接将 Perl的执行档从一台机器...
  • Hank_Huang
  • Hank_Huang
  • 2006年06月19日 17:02
  • 944

PERL常见问题解答--FAQ(1)

Perl是什麽? 谁对 perl提供支援?由谁负责发展?它为什麽是免费的? 我该用哪一个版本的 Perl? perl4和 perl5各代表什麽? Perl的发展...
  • Hank_Huang
  • Hank_Huang
  • 2006年06月19日 17:00
  • 801

SQLite FAQ常见问题解答

7.容量限制 string/BLOB: 1,000,000,000 最大列数量:2000 (谁要真的用到成千上百的列,直接跳楼算了) SQL语句最大长度:1,000,000 (如需插...
  • zhlf91718
  • zhlf91718
  • 2014年08月05日 18:39
  • 250

Solaris常见问题解答 (FAQ)

Q:SOLARIS7为64位,是不是意味着我不能在以前的机器(32位)上使用呢? A:不是的,SOLARIS7仍然支持sun4c,sun4m,sun4d甚至sun4u体系的32位核心。 Q:关机前为什...
  • zcatlinux
  • zcatlinux
  • 2004年10月29日 11:09
  • 1354

液晶显示器 常见问题解答(FAQ)

一问:何为液晶?     答:液晶是一种介于固体与液体之间,具有规则性分子排列的有机化合物,一般最常用的液晶型式为向列(nematic)液晶,分子形状为细长棒形,长宽约1nm~10nm,在不同电流电场...
  • yysmall
  • yysmall
  • 2005年07月13日 18:10
  • 1528

FAQ:CDN常见问题解答

1 、什么类型的网站最需要 CDN ?   答: 我们请第三方公司对我们的 CDN 做了性能测试, 测试结果表明, 从 ISP 分类 来看, 教育网的用户使用 CDN 后提速最为明显, 访问速度可以提...
  • orichisonic
  • orichisonic
  • 2016年01月18日 18:08
  • 269

Subversion FAQ(常见问题解答)

常见问题:为什么会有这样一个项目? Subversion是私有软件吗?我听说它是属于CollabNet公司的。 Subversion用在我的项目上是否足够稳定? Subversion的客户端/服务器在...
  • lxt2lili
  • lxt2lili
  • 2009年08月05日 12:50
  • 1516
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章: PERL常见问题解答--FAQ(4)--Data: Strings
举报原因:
原因补充:

(最多只允许输入30个字)