一.NSString自带的正则查找,替换方法
正则查找方法
– rangeOfString:options:
– rangeOfString:options:range:
– rangeOfString:options:range:locale:
正则替换方法
– stringByReplacingOccurrencesOfString:withString:options:range:
options参数指定搜索选项,类型为NSStringCompareOptions,可通过位或操作指定为NSCaseInsensitiveSearch, NSLiteralSearch, NSBackwardsSearch, NSAnchoredSearch>等选项的组合。
若指定的选项为NSRegularExpressionSearch,则搜索字符串被认为是ICU兼容的正则表达式,如果指定了此选项,则与其可以同时存在的选项只有NSCaseInsensitiveSearch和NSAnchoredSearch
二.使用RegexKitLite
RegexKitLite向标准NSString类增加了很多方法来使用正则表达式,RegexKitLite使用iOS系统自带的ICU(International Components for Unicode)正则引擎处理正则表达式,所以RegexKitLite使用的正则语法为ICU的语法,使用RegexKitLite需要导入libicucore.dylib库。
使用RegexKitLite的方法很简单,将RegexKitLite.h和RegexKitLite.m加入到工程,然后引入libicucore.dylib库即可。
RegexKitLite.h RegexKitLite.m
RegexKitLit NSString方法参考
RegexKitLite NSString Additions Reference
RegexKitLite的使用说明见:
Using RegexKitLite
ICU正则语法为:
ICU Syntax
ICU User Guide – Regular Expressions
三.使用RegexKit.framework框架
RegexKit Framework与RegexKitLite来自同一体系,但其更复杂和强大。RegexKit Framework不使用iOS系统的ICU正则库,而是自带 PCRE(Perl Compatible Regular Expressions)库, 所以其正则语法是PCRE的语法。
RegexKit Framework功能很强大,其向NSArray,NSData,NSDictionary,NSSet和NSString对象增加了正则表达式的支持。
TRegexKit.framework与RegexKitLite的区别
RegexKit.framework | RegexKitLite | |
---|---|---|
Regex Library | PCRE | ICU |
Library Included | Yes, built into framework object file. | No, provided by Mac OS X. |
Library Linked As | Statically linked into framework. | Dynamically linked to/usr/lib/libicucore.dylib. |
Compiled Size | Approximately 371KB† per architecture. | Very small, approximately 16KB—20KB‡ per architecture. |
Style | External, linked to framework. | Compiled directly in to final executable. |
Feature Set | Large, with additions to many classes. | Minimal, NSString only. |
四.常用ICU正则匹配模式
常用的ICU正则匹配模式见:
RegexKitLite Cookbook
数字 Numbers
Description | Regex | Examples |
---|---|---|
Integer | [+\-]?[0-9]+ | 123-42+23 |
Hex Number | 0[xX][0-9a-fA-F]+ | 0×00xdeadbeef0xF3 |
Floating Point | [+\-]?(?:[0-9]*\.[0-9]+|[0-9]+\.) | 123..123+.42 |
Floating Point with Exponent | [+\-]?(?:[0-9]*\.[0-9]+|[0-9]+\.)(?:[eE][+\-]?[0-9]+)? | 123..12310.0E131.23e-7 |
Comma Separated Number | [0-9]{1,3}(?:,[0-9]{3})* | 421,2341,234,567 |
Comma Separated Number | [0-9]{1,3}(?:,[0-9]{3})*(?:\.[0-9]+)? | 421,2341,234,567.89 |
文本文件 Text Files
Description | Regex |
---|---|
Empty Line | (?m:^$) |
Empty or Whitespace Only Line | (?m-s:^\s*$) |
Strip Leading Whitespace | (?m-s:^\s*(.*?)$) |
Strip Trailing Whitespace | (?m-s:^(.*?)\s*$) |
Strip Leading and Trailing Whitespace | (?m-s:^\s*(.*?)\s*$) |
Quoted String, Can Span Multiple Lines, May Contain \" | "(?:[^"\\]*+|\\.)*" |
Quoted String, Single Line Only, May Contain \" | "(?:[^"\\\r\n]*+|\\[^\r\n])*" |
HTML Comment | (?s:<--.*?-->) |
Perl / Shell Comment | (?m-s:#.*$) |
C, C++, or ObjC Comment | (?m-s://.*$) |
C, C++, or ObjC Comment and Leading Whitespace | (?m-s:\s*//.*$) |
C, C++, or ObjC Comment | (?s:/\*.*?\*/) |
网络与URL相关 Network and URL
Description | Regex |
---|---|
HTTP | \bhttps?://[a-zA-Z0-9\-.]+(?:(?:/[a-zA-Z0-9\-._?,'+\&%$=~*!():@\\]*)+)? |
HTTP | \b(https?)://([a-zA-Z0-9\-.]+)((?:/[a-zA-Z0-9\-._?,'+\&%$=~*!():@\\]*)+)? |
HTTP | \b(https?)://(?:(\S+?)(?::(\S+?))?@)?([a-zA-Z0-9\-.]+)(?::(\d+))?((?:/[a-zA-Z0-9\-._?,'+\&%$=~*!():@\\]*)+)? |
\b([a-zA-Z0-9%_.+\-]+)@([a-zA-Z0-9.\-]+?\.[a-zA-Z]{2,6})\b | |
Hostname | \b(?:[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}?[a-zA-Z0-9]\.)+[a-zA-Z]{2,6}\b |
IP | \b(?:\d{1,3}\.){3}\d{1,3}\b |
IP with Optional Netmask | \b((?:\d{1,3}\.){3}\d{1,3})(?:/(\d{1,2}))?\b |
IP or Hostname | \b(?:(?:\d{1,3}\.){3}\d{1,3}|(?:[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}?[a-zA-Z0-9]\.)+[a-zA-Z]{2,6})\b |
五.贪婪匹配与最小匹配
在正则表达式中单独使用*或+时,默认是匹配尽可能多的数据,即贪婪匹配。
* Match zero or more times. Match as many times as possible. + Match one or more times. Match as many times as possible.
比如对 abcdefgabcdefg 使用abc(.*)g进行匹配,则捕获到到的数据为 defgabcdef。
若只想捕获到第一个g,即只想得到def,则需要使用最小匹配,在*或+后面加上?,即使用abc(.*?)g进行匹配。
*? Match zero or more times. Match as few times as possible. +? Match one or more times. Match as few times as possible.
另外,在正则中用(…)包含内容是要捕获的数据,如果只要用(…)来引用group而不想捕获则可使用(?:…)。
(…) Capturing parentheses. Range of input that matched the parenthesized subexpression is available after the match. (?:…) Non-capturing parentheses. Groups the included pattern, but does not provide capturing of matching text. Somewhat more efficient than capturing parentheses.
六.正则表达式书写格式
在书写正则表达式时,需要将\进行转义,即写成两个\\。
例如 匹配IP地址的正则表达式为 \b(?:\d{1,3}\.){3}\d{1,3}\b,则在实际书写时则为
NSString *regex = @"\\b(?:\\d{1,3}\.){3}\\d{1,3}\\b";
参考:
iOS 开发中使用正则表达式-暨 RegexKitLite 库的用法
RegexKitLite Documentation
[perl]理解贪婪匹配和最小匹配之间的区别
NSString Class Reference
ICU – International Components for Unicode