Validate With Regular Expressions

Simple validations often aren't adequate for some kinds of user input—for example, credit card numbers and Social Security numbers. The fact that users like to enter data in different formats complicates matters. For example, users might enter a credit card number as 1234 5678 9012 3456, 1234567890123456, or 1234-5678-9012-3456. You can parse any of these as a valid credit card number. Even simple requirements, such as ZIP codes, have a regular format, but some users might include spaces. 
 
I'll show you how to use regular expressions to validate user input against a variety of common formats. You can find many of these formats in the ASP.NET regular expression validator wizard (see Figure 1 and Additional Resources). I'll start by showing you how to create a validator for credit card numbers in all possible formats. Then, you can create a WinForms application that lets you enter a possible credit card number or domain name and click on a button to validate it (see Figure 2, and download the source code).

Your goal is to create a regular expression you can use in the RegularExpression.IsMatch() method. This method returns TRUE if the method is a match, and FALSE if it's not. To match a regular expression, you create a regular-expression string that matches the expected input string. The key is that regular expressions provide a language to describe pattern matches. As a simple example, *.doc in the DOS dir command matches all Word documents. You can create similar and far more complex patterns with regular expressions.

When you build any regular expression to validate input, you want to match the entire input expression. All your regular expressions should start with ^ to match the beginning of the input string and end with $ to match the end of the input string (see Table 1 for the most common RegEx elements for validating user input).

I'll show you how to build the regular expression that matches only the forms of a credit card number that I listed previously (see Listing 1). All the forms use four groups of four digits each. You use /d to match a single digit, so the regular expression ^/d$ matches exactly one digit. However, you want four digits for each credit card number group. Regular expressions use braces ({}) to describe how many times a pattern is repeated, so /d{4} matches four digits. You use the expression ^/d{16}$ to match a credit card number with no intervening spaces.

Match Optional Delimiters
You must modify this expression now so that the user can insert optional spacing delimiters between the groups of four digits. The delimiters can be hyphens (-), spaces, or nothing. To match a single character from a set, you place the set of characters inside square brackets ([]); [ab] matches either a or b. You use /s to match any whitespace character. The expression [-/s] matches the possible delimiters.

But wait—there's a bit more. You want the user to place zero delimiters or one delimiter. You could use {0,1}: [-/s]{0,1}, but matching zero or one copy of a substring is so common that there's a simpler way to specify this—with the question mark (?). You use [/s]? to match zero or one delimiter.

Your version of the regular expression (which splits here because of line-width constraints) now looks like this:

^[//d]{4}[-//s]?[//d]{4}[-//s]?
[//d]{4}[-//s]?[//d]{4}$

You're almost there, but the preceding regular expression has one small bug: The user could use different delimiter characters between different four-digit groups. For example, 1234 5678-90123456 would be valid input. You need to make sure the user places the same delimiter between each of the groups. You use two features of regular expressions—grouping and backreferences—to do this. A grouping is a set of characters that the regular expression processor remembers from the input string. A backreference is a copy of the remembered text. First, modify your expression to remember which delimiter the user typed first:

^[//d]{4}([-//s]?)

A group is any expression in parentheses. Simply place a substring in parentheses to create a numbered group. The entire string is number 0, and each group is numbered from left to right, starting at 1. However, I prefer to avoid numbered expressions, because they can be difficult to understand later, especially if they involve multiple or nested groups. You can use named groups instead. You name a group by adding a question mark and a name in angle brackets after the opening parentheses:

^[//d]{4}(?<grpdel>[-//s]?)

The remembered delimiter is named grpdel now. You must match the remembered group for each delimiter in order to limit the user to using the same delimiter in each group. Use a backreference to match a remembered group:

^[/d]{4}(?<grpdel>[-/s]?)
[/d]{4}/k<grpdel>

The /k<grpdel> string matches the text remembered from the group named grpdel. (To reference a numbered group, use /1 and /2 for the numbered groups 1 and 2, respectively.) Here's the final regular expression (split once again to fit this column width):

^[/d]{4}(?<grpdel>[-/s]?)[/d]{4}
/k<grpdel>[/d]{4}/k<grpdel>[/d]{4}$

If you're confused at this point, walk through each step. ^ matches the beginning of the input. [/d]{4} matches exactly four digits. (?<grpdel>[-/s]?) matches a hyphen, or whitespace, repeated either zero or one time. The next set matches four digits again. /k<grpdel> matches whatever was captured for the group named <grpdel>. Each of these repeats, and, finally, $ matches the end of the input string.

Validate a Domain Address
The example I've just shown you demonstrates the most common constructs you use when you write regular expressions to validate user input. Here's another simple example that shows how you can use other input characters to validate input (see Listing 2). Suppose you want to parse a domain address. The addresses fawcette.com, microsoft.com, and srtsolutions.com are all valid. However, any address that uses a different protocol (such as ftp://) or contains invalid characters is invalid. A valid domain name must have fewer than 64 characters, which can include only a–z, 0–9, and -. You must have no more than 63 characters, no more than one period (.), and a suffix in order to validate a domain name. I'll limit the list of valid suffixes to .com, .net, .org, .edu, .gov, and .mil, to keep this example reasonably simple.

Once again, you build the regular expression by using ^ and $ for the start and end of the entire input string. You need to find between 1 and 63 characters in the set of a–z, 0–9, and -. You place the set of valid character ranges inside the brackets. This set includes the range a–z, the range 0–9, and the single hyphen character. The {1,63} construct matches from 1 to 63 repeats of the preceding range. This expression builds on a construct you saw previously:

[a-z,0-9,-]{1,63}

Next, you must find a single period. This might seem simple, but a period is a special character in the regular expression language, so you need to escape the character by preceding it with a backslash (/). Finally, you must find one of the approved extensions. You find one of a set of phrases by placing all the phrases between parentheses, separated by the pipe character (|): (com|net|org|edu|gov|mil). The complete expression is:

^[a-z,0-9,-]{1,63}//.
(com|org|net|gov|mil)$

Note that you need only two lines of code for each expression in the examples I've shown you. Putting together regular expressions can take work, but it pays off handsomely. I've yet to see an input format that you can't validate with regular expressions. Look at the samples provided in the ASP.NET regular expression validator to learn more about using and forming your own regular expressions. Try to understand how each one works. Then, build expressions for your own validations. Test all the small subexpressions individually and include comments in your code, because debugging a long expression can be tricky.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值