其实,正则表达式是XPATH2.0定义的,暂且放在XSLT里说吧。
还是举例来说正则表达式的用法吧。
XML源文件任意,XSLT文件:
<
?xml version='1.0'?
>
< xsl:stylesheet version= "2.0" xmlns:xsl= "http://www.w3.org/1999/XSL/Transform"
xmlns:xs= "http://www.w3.org/XMLSchema" >
< xsl:template match= "/" >
< xsl:variable name= "file" select= "'aa.txt'" / >
< xsl:variable name= "string" select= "unparsed-text($file,ISO-8859-1)" / >
< xsl:analyze-string select= "$string" regex= "\n" >
< xsl:non-matching-substring >
< row >
< xsl:analyze-string select= "." regex= '("([^"]*)")|([^,]+)' >
< xsl:matching-substring >
< cell >
< xsl:value-of select= "regex-group(2)" / >
< xsl:value-of select= "regex-group(3)" / >
< /cell >
< /xsl:matching-substring >
< /xsl:analyze-string >
< /row >
< /xsl:non-matching-substring >
< /xsl:analyze-string >
< /xsl:template >
< /xsl:stylesheet >
< xsl:stylesheet version= "2.0" xmlns:xsl= "http://www.w3.org/1999/XSL/Transform"
xmlns:xs= "http://www.w3.org/XMLSchema" >
< xsl:template match= "/" >
< xsl:variable name= "file" select= "'aa.txt'" / >
< xsl:variable name= "string" select= "unparsed-text($file,ISO-8859-1)" / >
< xsl:analyze-string select= "$string" regex= "\n" >
< xsl:non-matching-substring >
< row >
< xsl:analyze-string select= "." regex= '("([^"]*)")|([^,]+)' >
< xsl:matching-substring >
< cell >
< xsl:value-of select= "regex-group(2)" / >
< xsl:value-of select= "regex-group(3)" / >
< /cell >
< /xsl:matching-substring >
< /xsl:analyze-string >
< /row >
< /xsl:non-matching-substring >
< /xsl:analyze-string >
< /xsl:template >
< /xsl:stylesheet >
摘自XSLT2.0 porgrammer's reference(稍有改动)。
aa.txt的内容:
123,"Mary Jones","IBM","USA",1997-05-14
423,"Barbara Smith","General Motors","USA",1996-03-12
6721,"Martin McDougall","British Airways","UK",2001-01-15
830,"Jonathan Perkins","Springer Verlag","Germany",2000-11-17
得到输出结果:
423,"Barbara Smith","General Motors","USA",1996-03-12
6721,"Martin McDougall","British Airways","UK",2001-01-15
830,"Jonathan Perkins","Springer Verlag","Germany",2000-11-17
得到输出结果:
<
?xml version='1.0' ?
>
< row xmlns:xs= "http://www.w3.org/XMLSchema" >
< row xmlns:xs= "http://www.w3.org/XMLSchema" >
<
cell
>123
<
/cell
>
<
cell
>Mary Jones
<
/cell
>
<
cell
>IBM
<
/cell
>
<
cell
>USA
<
/cell
>
<
cell
>1997-05-14
<
/cell
>
<
/row
>
<
row
xmlns:xs=
"http://www.w3.org/XMLSchema"
>
<
cell
>423
<
/cell
>
<
cell
>Barbara Smith
<
/cell
>
<
cell
>General Motors
<
/cell
>
<
cell
>USA
<
/cell
>
<
cell
>1996-03-12
<
/cell
>
<
/row
>
<
row
xmlns:xs=
"http://www.w3.org/XMLSchema"
>
<
cell
>6721
<
/cell
>
<
cell
>Martin McDougall
<
/cell
>
<
cell
>British Airways
<
/cell
>
<
cell
>UK
<
/cell
>
<
cell
>2001-01-15
<
/cell
>
<
/row
>
<
row
xmlns:xs=
"http://www.w3.org/XMLSchema"
>
<
cell
>830
<
/cell
>
<
cell
>Jonathan Perkins
<
/cell
>
<
cell
>Springer Verlag
<
/cell
>
<
cell
>Germany
<
/cell
>
<
cell
>2000-11-17
<
/cell
>
<
/row
>
我们可以看到寻找与regex匹配的字符串的过程是:首先从input的第一个字符开始,看能否找到,如果找到了,那么就截取该匹配字符,然后从后面的字符开始,再次寻找,如果未找到,就将第一个字符确定为不匹配字符,然后从第二个字符开始寻找,一直到查找完所有字符。
原来的XSLT文件正则表达式匹配的模式为:
<xsl:analyze-string select="," regex='("([^"]*?)")|([^,]+?),'>
* 后面跟一个问号(?),代表非贪婪原则。意思是,一旦找到类似"xxxxx"形式的字符串就认定为匹配字符串。但是在这里没有必要,因为中间的xxxx不能为"(双引号),也就防止了出现贪婪的"xxx""xxxx"xxx"的形式。也算是对大师提出的一个疑问吧。呵呵。
转载于:https://blog.51cto.com/electiger/19722