std::regex正则表达式

back2childhood

已于 2023-04-27 23:37:25 修改

阅读量552

点赞数 2

分类专栏： STL cpp 文章标签： c++

于 2023-04-14 00:49:36 首次发布

本文链接：https://blog.csdn.net/weixin_44609676/article/details/130142163

版权

STL cpp 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

std::match_results

（匹配的结果存入其中）
result[0]是完整的文本，result[1]是第一个分组匹配的数据。如果正则表达式有n个分组，match_results的size也就是n+1个

This is a specialized allocator-aware container. It can only be default created, obtained from std::regex_iterator, or modified by std::regex_search or std::regex_match. Because std::match_results holds std::sub_matches, each of which is a pair of iterators into the original character sequence that was matched, it’s undefined behavior to examine std::match_results if the original character sequence was destroyed or iterators to it were invalidated for other reasons.

Type	Definition
std::cmatch	std::match_results<const char*>
std::wcmatch	std::match_results<const wchar_t*>
std::smatch	std::match_results<std::string::const_iterator>
std::wsmatch	std::match_results<std::wstring::const_iterator>
std::pmr::cmatch (C++17)	std::pmr::match_results<const char*>
std::pmr::wcmatch (C++17)	std::pmr::match_results<const wchar_t*>
std::pmr::smatch (C++17)	std::pmr::match_results<std::string::const_iterator>
std::pmr::wsmatch (C++17)	std::pmr::match_results<std::wstring::const_iterator>

std::sub_match

用来观测match_results的结果
The class template std::sub_match is used by the regular expression engine to denote sequences of characters matched by marked sub-expressions.

regex_match

Returns true if a match exists, false otherwise.

#include <iostream>
#include <regex>
#include <string>
 
int main()
{
    // Simple regular expression matching
    const std::string fnames[] = {"foo.txt", "bar.txt", "baz.dat", "zoidberg"};
    const std::regex txt_regex("[a-z]+\\.txt");
 
    for (const auto &fname : fnames)
        std::cout << fname << ": " << std::regex_match(fname, txt_regex) << '\n';
/*
foo.txt: 1
bar.txt: 1
baz.dat: 0
zoidberg: 0
*/
 
    // Extraction of a sub-match
    const std::regex base_regex("([a-z]+)\\.txt");
    std::smatch base_match;
 
    for (const auto &fname : fnames){
        if (std::regex_match(fname, base_match, base_regex)){
            // The first sub_match is the whole string; the next
            // sub_match is the first parenthesized expression.
            if (base_match.size() == 2){
                std::ssub_match base_sub_match = base_match[1];
                std::string base = base_sub_match.str();
                std::cout << fname << " has a base of " << base << '\n';
            }
        }
    }
/*
foo.txt has a base of foo
bar.txt has a base of bar
*/
 
    // Extraction of several sub-matches
    const std::regex pieces_regex("([a-z]+)\\.([a-z]+)");
    std::smatch pieces_match;
 
    for (const auto &fname : fnames){
        if (std::regex_match(fname, pieces_match, pieces_regex)){
            std::cout << fname << '\n';
            for (size_t i = 0; i < pieces_match.size(); ++i){
                std::ssub_match sub_match = pieces_match[i];
                std::string piece = sub_match.str();
                std::cout << "  submatch " << i << ": " << piece << '\n';
            }
        }
    }
}
/*
foo.txt
  submatch 0: foo.txt
  submatch 1: foo
  submatch 2: txt
bar.txt
  submatch 0: bar.txt
  submatch 1: bar
  submatch 2: txt
baz.dat
  submatch 0: baz.dat
  submatch 1: baz
  submatch 2: dat
*/

regex_search

std::regex_search: 搜素正则表达式参数，但它不要求整个字符序列完全匹配。而且它只进行单次搜索，搜索到即停止继续搜索，不进行重复多次搜索。
Determines if there is a match between the regular expression e and some subsequence in the target character sequence.
1- Analyzes generic range [first, last). Match results are returned in m.
2- Analyzes a null-terminated string pointed to by str. Match results are returned in m.
3- Analyzes a string s. Match results are returned in m.
4-6- Equivalent to (1-3), just omits the match results.
7- The overload (3) is prohibited from accepting temporary strings, otherwise this function populates match_results m with string iterators that become invalid immediately.

regex_search will successfully match any subsequence of the given sequence, whereas std::regex_match will only return true if the regular expression matches the entire sequence.

#include <iostream>
#include <regex>
#include <string>
 
int main()
{
    std::string lines[] = {"Roses are #ff0000",
                           "violets are #0000ff",
                           "all of my base are belong to you"};
 
    std::regex color_regex("#([a-f0-9]{2})"
                            "([a-f0-9]{2})"
                            "([a-f0-9]{2})");
 
    // simple match
    for (const auto &line : lines) {
        std::cout << line << ": " << std::boolalpha
                  << std::regex_search(line, color_regex) << '\n';
    }   
    std::cout << '\n';
 
    // show contents of marked subexpressions within each match
    std::smatch color_match;
    for (const auto& line : lines) {
        if(std::regex_search(line, color_match, color_regex)) {
            std::cout << "matches for '" << line << "'\n";
            std::cout << "Prefix: '" << color_match.prefix() << "'\n";
            for (size_t i = 0; i < color_match.size(); ++i) 
                std::cout << i << ": " << color_match[i] << '\n';
            std::cout << "Suffix: '" << color_match.suffix() << "\'\n\n";
        }
    }
 
    // repeated search (see also std::regex_iterator)
    std::string log(R"(
        Speed:	366
        Mass:	35
        Speed:	378
        Mass:	32
        Speed:	400
	Mass:	30)");
    std::regex r(R"(Speed:\t\d*)");
    std::smatch sm;
    while(regex_search(log, sm, r))
    {
        std::cout << sm.str() << '\n';
        log = sm.suffix();
    }
 
    // C-style string demo
    std::cmatch cm;
    if(std::regex_search("this is a test", cm, std::regex("test"))) 
        std::cout << "\nFound " << cm[0] << " at position " << cm.prefix().length();
}

std::regex_replace

Copies characters in the range [first, last) to out, replacing any sequences that match re with characters formatted by fmt. In other words:
Constructs a std::regex_iterator object i as if by std::regex_iterator<BidirIt, CharT, traits> i(first, last, re, flags), and uses it to step through every match of re within the sequence [first,last).
For each such match m, copies the non-matched subsequence (m.prefix()) into out as if by out = std::copy(m.prefix().first, m.prefix().second, out) and then replaces the matched subsequence with the formatted replacement string as if by calling out = m.format(out, fmt, flags).
When no more matches are found, copies the remaining non-matched characters to out as if by out = std::copy(last_m.suffix().first, last_m.suffix().second, out) where last_m is a copy of the last match found.
If there are no matches, copies the entire sequence into out as-is, by out = std::copy(first, last, out)
If flags contains std::regex_constants::format_no_copy, the non-matched subsequences are not copied into out.
If flags contains std::regex_constants::format_first_only, only the first match is replaced.
same as 1), but the formatted replacement is performed as if by calling out = m.format(out, fmt, fmt + char_traits::length(fmt), flags)

3-4) Constructs an empty string result of type std::basic_string<CharT, ST, SA> and calls std::regex_replace(std::back_inserter(result), s.begin(), s.end(), re, fmt, flags).
5-6) Constructs an empty string result of type std::basic_string and calls std::regex_replace(std::back_inserter(result), s, s + std::char_traits::length(s), re, fmt, flags)

Return value
1-2) Returns a copy of the output iterator out after all the insertions.
3-6) Returns the string result which contains the output.

#include <iostream>
#include <iterator>
#include <regex>
#include <string>
 
int main()
{
   std::string text = "Quick brown fox";
   std::regex vowel_re("a|e|i|o|u");
 
   // write the results to an output iterator
   std::regex_replace(std::ostreambuf_iterator<char>(std::cout),
                      text.begin(), text.end(), vowel_re, "*");
 
   // construct a string holding the results
   std::cout << '\n' << std::regex_replace(text, vowel_re, "[$&]") << '\n';
}

std::regex_iterator

It is the programmer’s responsibility to ensure that the std::basic_regex object passed to the iterator’s constructor outlives the iterator. Because the iterator stores a pointer to the regex, incrementing the iterator after the regex was destroyed accesses a dangling pointer.
If the part of the regular expression that matched is just an assertion (^, $, \b, \B), the match stored in the iterator is a zero-length match, that is, match[0].first == match[0].second.

#include <iostream>
#include <iterator>
#include <regex>
#include <string>
 
int main()
{
    const std::string s = "Quick brown fox.";
 
    std::regex words_regex("[^\\s]+");
    auto words_begin =
        std::sregex_iterator(s.begin(), s.end(), words_regex);
    auto words_end = std::sregex_iterator();
 
    std::cout << "Found "
              << std::distance(words_begin, words_end)
              << " words:\n";
 
    for (std::sregex_iterator i = words_begin; i != words_end; ++i)
    {
        std::smatch match = *i;
        std::string match_str = match.str();
        std::cout << match_str << '\n';
    }
}

back2childhood

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
std::regex正则表达式

std::regex_search: 搜素正则表达式参数，但它不要求整个字符序列完全匹配。而且它只进行单次搜索，搜索到即停止继续搜索，不进行重复多次搜索。result[0]是完整的文本，result[1]是第一个分组匹配的数据。如果正则表达式有n个分组，match_results的size也就是n+1个。用来观测match_results的结果。（匹配的结果存入其中）
复制链接

扫一扫

专栏目录