Learning Perl: 7.1. What Are Regular Expressions?

Previous Page
Next Page

 

7.1. What Are Regular Expressions?

A regular expression, often called a pattern in Perl, is a template that matches or doesn't match a given string.[] An infinite number of possible text strings exist, and a given pattern divides that infinite set into two groups: the ones that match and the ones that don't. There's never any kinda-sorta-almost-up-to-here wishy-washy matching: either it matches or it doesn't.

[] Purists would ask for a more rigorous definition. But then again, purists say that Perl's patterns aren't really regular expressions. If you're serious about regular expressions, we recommend the book Mastering Regular Expressions by Jeffrey Friedl (O'Reilly).

A pattern may match one possible string, two or three, a dozen, a hundred, or an infinite number. It may match all strings except for one, except for some, or except for an infinite number.[*] We've referred to regular expressions as being little programs in their own simple programming language. It's a simple language because the programs have one task: to look at a string and say "it matches" or "it doesn't match".[] That's all they do.

[*] As you'll see, you could have a pattern that always matches or that never does. In rare cases, even these may be useful, but generally they're mistakes.

[] The programs also pass back some information that Perl can use later. One such piece of information is the "regular expressions memories" that you'll learn about a little later.

One of the places you're likely to have seen regular expressions is in the Unix grep command, which prints out text lines matching a given pattern. For example, if you wanted to see which lines in a given file mention flint and, somewhere later on the same line, stone, you might do something like this with the Unix grep command:

    $ grep 'flint.*stone' chapter*.txt
    chapter3.txt:a piece of flint, a stone which may be used to start a fire by striking
    chapter3.txt:found obsidian, flint, granite, and small stones of basaltic rock, which
    chapter9.txt:a flintlock rifle in poor condition. The sandstone mantle held several

Don't confuse regular expressions with shell filename-matching patterns, called globs. A typical glob is what you use when you type *.pm to the Unix shell to match all filenames that end in .pm. The previous example uses a glob of chapter*.txt. (You may have noticed that you had to quote the pattern to prevent the shell from treating it like a glob.) Though globs use many of the same characters you use in regular expressions, those characters are used in different ways.[] You'll visit globs in Chapter 12.

[] Globs are also (alas) sometimes called patterns. What's worse is that some bad Unix books for beginners (and possibly written by beginners) have taken to calling globs "regular expressions," which they certainly are not. This confuses many folks at the start of their work with Unix.

Previous Page
Next Page
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值