我正在尝试使用simpleXML从http://rates.fxcm.com/RatesXML获取数据
使用simplexml_load_file()我有时会遇到错误,因为这个网站在xml文件之前和之后总是有奇怪的字符串/数字.
例:
2000<?xml version="1.0" encoding="UTF-8"?>
1.27595
1.2762
1.27748
1.27385
-1
23:29:11
0
然后我决定使用file_get_contents并将其解析为带有simplexml_load_string()的字符串,之后我使用substr()来删除前后的字符串.但是,有时随机字符串将出现在节点之间,如下所示:
2.29443
2.29562
2.29841
2.28999
137b
1
23:29:11
我的问题是,无论如何我可以使用任何正则表达式函数处理所有这些随机字符串,无论它们放在何处? (认为这将是一个更好的主意,而不是联系该网站,让他们广播正确的xml文件)
解决方法:
但是这里有一个preg替换,它从字符串的开头,字符串的结尾以及关闭/自闭标签之后删除所有非空白字符:
$string = preg_replace( '~
(?| # start of alternation where capturing group count starts from
# 1 for each alternative
^[^
| # OR
[^>]*$ # match non-> characters at the end of the string
| # OR
( # start of capturing group $1: closing tag
[^>]++> # match a closing tag; note the possessive quantifier (++); it
# suppresses backtracking, which is a convenient optimization,
# the following bit is mutually exclusive anyway (this will be
# used throughout the regex)
\s++ # and the following whitespace
) # end of $1
[^
(?: # start subgroup to repeat for more whitespace/non-whitespace
# sequences
\s++ # match whitespace
[^
)* # repeat
# note that this will kind of pattern keeps all whitespace
# before the first and the last "bad" character
| # OR
( # start of capturing group $1: self-closing tag
/]+/> # match a self-closing tag
\s++ # and the following whitespace
)
[^
# same as before
) # end of alternation
~x',
'$1',
$input);
然后我们简单地回写关闭或自动关闭标签(如果有的话).
这种方法不安全的原因之一是在注释或属性字符串中可能会出现关闭或自动关闭标记.但我很难建议您使用XML解析器,因为您的XML解析器也无法解析XML.
标签:php,regex,parsing,simplexml,preg-replace
来源: https://codeday.me/bug/20190709/1412620.html