Halcon ---tuple_regexp_match

tuple_regexp_match (Operator),找了半天也没有找到自己要的资料,只能自己下去翻了,错了请轻点喷,欢迎提宝贵意见,谢谢!

Name

tuple_regexp_match — Extract substrings using regular expressions.

Signature

tuple_regexp_match( : : Data, Expression : Matches)

Description

tuple_regexp_match applies the regular expression in Expression to one or more input strings in Data, and in each case returns the first matching substring in Matches. Normally, one output string is returned for each input string, the output string being empty if no match was found. However, if the regular expression contains capturing groups (see below), the behavior depends on the number of input strings: If there is only a single input string, the result is a tuple of all captured submatches. If there are multiple input strings, the output strings represent the matched pattern of the first capturing group.

tuple_regexp_match算子将正则表达式应用于Data中的一个或多个输入字符串,并在每种情况下返回Matches中的第一个匹配子字符串。通常,对于每个输入字符串,返回一个输出字符串,如果没有找到匹配,则输出字符串为空。然而,如果正则表达式包含捕获组(参见下文),则行为取决于输入字符串的数量:如果只有一个输入字符串,则结果是所有捕获子匹配的元组。如果有多个输入字符串,则输出字符串表示第一个捕获组的匹配模式。

A summary of regular expression syntax is provided here. Basically, each character in the regular expression represents a literal to match, except for the following symbols which have a special meaning (the described syntax is compatible with Perl):

以下是正则表达式语法的摘要。基本上,正则表达式中的每个字符都表示要匹配的文字,除了以下具有特殊含义的符号(所描述的语法与Perl兼容):


^      Matches start of string 匹配字符串的开头
$      Matches end of string (a trailing newline is allowed) 匹配字符串的结尾(允许尾随换行符)
.      Matches any character except newline 匹配除换行符之外的任意字符
[...]  Matches any character literal listed in the brackets.匹配括号中列出的任何字符。
       If the first character is a '^', this matches any character
       except those in the list. You can use the '-' character as
       in '[A-Z0-9]' to select character ranges. Other characters
       lose their special meaning in brackets, except '\'.
       Within these brackets it is possible to use the following
       POSIX character classes (note that the additional brackets are
       needed):

如果第一个字符是'^',则匹配除列表中的字符之外的任何字符。 您可以使用'-'字符,如'[A-Z0-9]',以选择字符范围。在括号中,除了''之外,其他字符失去了它们在括号中的特殊含义。 在这些括号中,可以使用以下POSIX字符类(注意需要额外的括号)
       [:alnum:]  alphabetic and numeric characters字母和数字字符
       [:alpha:]  alphabetic characters 字母字符
       [:blank:]  space and tab空格和制表符
       [:cntrl:]  control characters控制字符
       [:digit:]  digits数字
       [:graph:]  non-blank (like spaces or control characters)非空白(如空格或控制字符)
       [:lower:]  lowercase alphabetic characters小写字母字符
       [:print:]  like [:graph:] but including spaces ] 类似于[:graph:],但包括空格 
       [:punct:]  punctuation characters 标点字符
       [:space:]  all whitespace characters ([:blank:], newline, ...) 所有空白字符 ([:blank:]、换行符等)
       [:upper:]  uppercase alphabetic characters大写字母字符
       [:xdigit:] digits allowed in hexadecimal numbers (0-9a-fA-F).十六进制数字允许的数字(0-9a-fA-F

*      Allows 0 or more repetitions of preceding literal or group

允许0个或多个前面的文字或组的重复

+      Allows 1 or more repetitions

  •  允许1个或多个重复

?      Allows 0 or 1 repetitions 允许0个或1个重复
{n,m}  Allows n to m repetitions 允许n到m次重复
{n}    Allows exactly n repetitions 允许恰好n次重复

       The repeat quantifiers above are greedy by default, i.e., they
       attempt to maximize the length of the match. Appending ? attempts
       to find a minimal match, e.g., +?

  上述重复定界符默认为贪婪模式,即尝试最大化匹配长度。添加?以尝试找到最小匹配,例如+?

|      Separates alternative matching expressions. 分隔替代匹配表达式。
( )    Groups a subpattern and creates a capturing group.
       The substrings captured by this group will be stored separately.分组子模式并创建捕获组。 由此组捕获的子字符串将被单独存储。
       (?: )   Groups a subpattern without creating a capturing group 分组子模式,不创建捕获组
       (?= )   Positive lookahead (requested condition right to the match) 正向前瞻(匹配右侧的请求条件)
       (?! )   Negative lookahead (forbidden condition right to the match) 负向前瞻(匹配右侧的禁止条件)
       (?<= )  Positive lookbehind (requested condition left to the match) 正向后顾(匹配左侧的请求条件)
       (?<! )  Negative lookbehind (forbidden condition left to the match)负向后顾(匹配左侧的禁止条件)

\      Escapes any special symbol to treat it as a literal. Note that
       some host languages like HDevelop and C/C++ already use the backslash
       as a general escape character. In this case, '\\.' matches a
       literal dot while '\\\\' matches a literal backslash.
       Furthermore, there are some special codes (the capitalized
       version of each denoting the negation):
       \d,\D  Matches a digit
       \w,\W  Matches a letter, digit or underscore
       \s,\S  Matches a white space character
       \b,\B  Matches a word boundary
If the specified expression is syntactically incorrect, you will receive an error stating that the value of control parameter 2 is wrong. Additional details are displayed in a message box if set_system('do_low_error', 'true') is set and in HDevelop's Output Console.

Furthermore, you can set some options by passing a string tuple for Expression. In this case, the first element is used as the expression, and each additional element is treated as an option.

\      转义任何特殊符号以将其视为文字。请注意,一些宿主语言如HDevelop和C/C++已经将反斜杠用作通用转义字符。在这种情况下,'\.'匹配文字点,而'\\'匹配文字反斜杠。此外,还有一些特殊代码(每个大写版本表示否定):\d,\D匹配数字;\w,\W匹配字母、数字或下划线;\s,\S匹配空白字符;\b,\B匹配单词边界。如果指定的表达式在语法上不正确,您将收到一个错误,指出控制参数2的值错误。如果设置了set_system('do_low_error', 'true'),则会在消息框中显示附加详细信息,并在HDevelop的输出控制台中显示。

此外,您可以通过传递一个字符串元组给Expression来设置一些选项。在这种情况下,第一个元素用作表达式,每个附加元素都被视为一个选项。

'ignore_case': Perform case-insensitive matching执行不区分大小写的匹配

'multiline': '^' and '$' match start and end of individual lines'^'和'$'匹配各行的开头和结尾

'dot_matches_all': Allow the '.' character to also match newlines允许'.'字符也匹配换行符

'newline_lf', 'newline_crlf', 'newline_cr': Specify the encoding of newlines in the input data. The default is LF on all systems (even though in Windows files usually CRLF is used as line break, when reading a file into memory the read operators return for every line break just '\n', which is the same as LF).

For general information about string operations see Tuple / String Operations.

If the input parameter Data is an empty tuple, the operator returns an empty tuple. If Expression is an empty tuple, an exception is raised.

Unicode code points versus bytes

Regular expression matching operates on Unicode code points. One Unicode code point may be composed of multiple bytes in the UTF-8 string. If regular expression matching should only match on bytes, this operator can be switched to byte mode with set_system('tsp_tuple_string_operator_mode', 'byte'). If 'filename_encoding' is set to 'locale' (legacy), this operator always uses the byte mode.

'newline_lf'、'newline_crlf'、'newline_cr':指定输入数据中换行符的编码方式。默认为所有系统上的LF(即使在Windows文件中通常使用CRLF作为换行符,在将文件读入内存时,读取操作符对于每个换行符只返回'\n',这与LF相同)。

有关字符串操作的一般信息,请参阅元组/字符串操作。

如果输入参数Data是一个空元组,则算子返回一个空元组。如果Expression是一个空元组,则会引发异常。

Unicode代码点与字节

正则表达式匹配操作基于Unicode代码点。一个Unicode代码点可能由UTF-8字符串中的多个字节组成。如果正则表达式匹配只应匹配字节,则可以使用set_system('tsp_tuple_string_operator_mode', 'byte')将此算子切换到字节模式。如果'filename_encoding'设置为'locale'(传统模式),则此算子始终使用字节模式。

HDevelop In-line Operation

HDevelop provides an in-line operation for tuple_regexp_match, which can be used in an expression in the following syntax:

Matches := regexp_match(Data, Expression)

Execution Information

Multithreading type: independent (runs in parallel even with exclusive operators).
Multithreading scope: global (may be called from any thread).
Processed without parallelization.
Parameters

Data (input_control)  string(-array) → (string)
Input strings to match.
Expression (input_control)  string(-array) → (string)
Regular expression.
Default value: '.*'
Suggested values: '.*', 'ignore_case', 'multiline', 'dot_matches_all', 'newline_lf', 'newline_crlf', 'newline_cr'
Matches (output_control)  string(-array) → (string)
Found matches.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值