我有一个PHP应用程序,它与支付处理器连接,以处理信用卡.有时,来自处理器的后响应失败(例如矩阵中的短暂故障),并且我们没有得到付款的自动通知.在这些情况下,我们会回退到始终发送的确认电子邮件中输入数据.我希望我的代码能够解析出电子邮件的文本以获取数据,这似乎是preg_match_all的完美用例.问题是电子邮件格式错误:它有名称:值对,但它们都在一行上,而且值通常是空白的,这让我感到困惑.
我对正则表达式基础知识(量词,分组,字符类,锚点,修饰符)非常好,但实际上没有前瞻和后向引用的经验,对我来说,它们是否可以提供帮助并不是很明显.
示例数据可能看起来像这样(再次,这将全部在一行上,只是为了便于阅读而包装):
bypass_first_page : x_company : x_cust_id : 12345 x_customer_ip :
x_customer_tax_id : x_description : 98765 x_duty : x_email_customer :
an_example@example.com x_fax : x_footer_email_receipt : x_fp_hash :
747ffeddfe4e106a9c67363ebff996ad x_fp_timestamp : 1525100766
x_invoice_num : R000098765 x_login : MY-LOGIN-ID x_logo_url :
x_merchant_email : x_method : x_phone : (416) 555-1212 x_po_num :
x_receipt_link_method : GET x_reference_3 : 1234 x_relay_response :
TRUE x_relay_url :
我想要的输出看起来像这样:
[
[bypass_first_page] =>
[x_company] =>
[x_cust_id] => 12345
[x_customer_ip] =>
[x_customer_tax_id] =>
[x_description] => 98765
[x_duty] =>
[x_email_customer] => an_example@example.com
[x_fax] =>
[x_footer_email_receipt] =>
[x_fp_hash] => 747ffeddfe4e106a9c67363ebff996ad
[x_fp_timestamp] => 1525100766
[x_invoice_num] => R000098765
[x_login] => MY-LOGIN-ID
[x_logo_url] =>
[x_merchant_email] =>
[x_method] =>
[x_phone] => (416) 555-1212
[x_po_num] =>
[x_receipt_link_method] => GET
[x_reference_3] => 1234
[x_relay_response] => TRUE
[x_relay_url] =>
]
需要注意的重要事项:
>字段名称主要是以x_开头,但不是唯一的.如果只能找到需要这个的解决方案,它可能是可行的.
>字段名称中没有空格.
>某些字段名称中包含数字.
>值中可以包含空格(例如电话号码)和下划线(例如电子邮件地址).
>如果没有值,则冒号和下一个字段名称之间只有一个空格.
我最接近的是:
/([\w\d_]+) ?: ([^:]+)/
但这会产生如下输出:
[
[bypass_first_page] => x_company
[x_cust_id] => 12345 x_customer_ip
[x_customer_tax_id] => x_description
...
]
正如您在this regex101 link中所看到的,这是失败的,因为冒号与任何东西都不匹配,并且字段名称最终在值中(单独或与实际值连接).我觉得如果有一个修饰符要求整个字符串匹配,或者锚点以某种方式表明一个匹配必须从前一个匹配开始,这可以很容易地解决这个问题,但我找不到任何提及这样的事情随处可见.可能就是我不知道那叫什么东西?
解决方法:
我发现的最简单的解决方案(到目前为止)是这样的:
(\w+) : ?(.*?)(?= ?\w+ :|$)
最后,添加?最后,正如艾伦所建议的那样,输出更加出色.
(\w+) : ?(.*?)(?= ?\w+ :|$) ?
输出:
[0] => Array
(
[0] => bypass_first_page :
[1] => x_company :
[2] => x_cust_id : 12345
[3] => x_customer_ip :
[4] => x_customer_tax_id :
[5] => x_description : 98765
[6] => x_duty :
[7] => x_email_customer : an_example@example.com
[8] => x_fax :
[9] => x_footer_email_receipt :
[10] => x_fp_hash : 747ffeddfe4e106a9c67363ebff996ad
[11] => x_fp_timestamp : 1525100766
[12] => x_invoice_num : R000098765
[13] => x_login : MY-LOGIN-ID
[14] => x_logo_url :
[15] => x_merchant_email :
[16] => x_method :
[17] => x_phone : (416) 555-1212
[18] => x_po_num :
[19] => x_receipt_link_method : GET
[20] => x_reference_3 : 1234
[21] => x_relay_response : TRUE
[22] => x_relay_url :
)
[1] => Array
(
[0] => bypass_first_page
[1] => x_company
[2] => x_cust_id
[3] => x_customer_ip
[4] => x_customer_tax_id
[5] => x_description
[6] => x_duty
[7] => x_email_customer
[8] => x_fax
[9] => x_footer_email_receipt
[10] => x_fp_hash
[11] => x_fp_timestamp
[12] => x_invoice_num
[13] => x_login
[14] => x_logo_url
[15] => x_merchant_email
[16] => x_method
[17] => x_phone
[18] => x_po_num
[19] => x_receipt_link_method
[20] => x_reference_3
[21] => x_relay_response
[22] => x_relay_url
)
[2] => Array
(
[0] =>
[1] =>
[2] => 12345
[3] =>
[4] =>
[5] => 98765
[6] =>
[7] => an_example@example.com
[8] =>
[9] =>
[10] => 747ffeddfe4e106a9c67363ebff996ad
[11] => 1525100766
[12] => R000098765
[13] => MY-LOGIN-ID
[14] =>
[15] =>
[16] =>
[17] => (416) 555-1212
[18] =>
[19] => GET
[20] => 1234
[21] => TRUE
[22] =>
)
我做了一些测试,认为这应该符合要求.
PS:我提出的第一个解决方案是this:
(?:^| )(\w+) : ?(?!\w+ : )(?:(.*?)(?= \w+ :|$))?
它有点冗长,但也可能对你有所帮助.
标签:php,regex,pcre
来源: https://codeday.me/bug/20190607/1194728.html