ANTLR for C++在VC++下提示内存泄漏的解决方案

ANTLR for C++在VC++下提示内存泄漏的解决方案

问题背景

用ANTLR4替代Flex/Bison来开发语法分析部分后,程序每次退出时总会提示内存泄漏(参见这个issue):

Detected memory leaks!
Dumping objects ->
{11750} normal block at 0x0000000000436E90, 128 bytes long.
 Data: <  C       C     > 90 04 43 00 00 00 00 00 90 04 43 00 00 00 00 00 
{11749} normal block at 0x00000000004202B0, 16 bytes long.
 Data: <`iC             > 60 69 43 00 00 00 00 00 00 00 00 00 00 00 00 00 
{11748} normal block at 0x0000000000430490, 24 bytes long.
 Data: <  C       C     > 90 04 43 00 00 00 00 00 90 04 43 00 00 00 00 00 
{11747} normal block at 0x000000000041FDB0, 16 bytes long.
 Data: <HiC             > 48 69 43 00 00 00 00 00 00 00 00 00 00 00 00 00 
{11746} normal block at 0x0000000000436F50, 128 bytes long.
 Data: <  C       C     > D0 09 43 00 00 00 00 00 D0 09 43 00 00 00 00 00 
{11745} normal block at 0x000000000041FB30, 16 bytes long.
 Data: < hC             > E8 68 43 00 00 00 00 00 00 00 00 00 00 00 00 00 
{11744} normal block at 0x00000000004309D0, 24 bytes long.
 Data: <  C       C     > D0 09 43 00 00 00 00 00 D0 09 43 00 00 00 00 00 
{11743} normal block at 0x000000000041F310, 16 bytes long.
 Data: < hC             > D0 68 43 00 00 00 00 00 00 00 00 00 00 00 00 00 
{11742} normal block at 0x0000000000439B90, 128 bytes long.
 Data: <0 C     0 C     > 30 04 43 00 00 00 00 00 30 04 43 00 00 00 00 00 
{11741} normal block at 0x000000000041FBD0, 16 bytes long.
 Data: <phC             > 70 68 43 00 00 00 00 00 00 00 00 00 00 00 00 00 
(...)

定位/解决问题

在ANTLR中添加VC++内存检测支持后重新编译,定位到内存泄漏来源于以下源文件:

runtime/Cpp/runtime/src/atn/LexerATNSimulator.cpp(192)
runtime/Cpp/runtime/src/atn/LexerATNSimulator.cpp(295)
runtime/Cpp/runtime/src/atn/LexerATNSimulator.cpp(536)
runtime/Cpp/runtime/src/atn/ParserATNSimulator.cpp(299)
runtime/Cpp/runtime/src/atn/ParserATNSimulator.cpp(465)
runtime/Cpp/runtime/src/atn/ParserATNSimulator.cpp(531)
runtime/Cpp/runtime/src/atn/ParserATNSimulator.cpp(618)
runtime/Cpp/runtime/src/atn/ParserATNSimulator.cpp(636)
runtime/Cpp/runtime/src/dfa/DFA.cpp(29)
runtime/Cpp/runtime/src/atn/ATNDeserializer.cpp(179)
runtime/Cpp/runtime/src/atn/ATNDeserializer.cpp(182)
runtime/Cpp/runtime/src/atn/ATNDeserializer.cpp(185)
runtime/Cpp/runtime/src/atn/ATNDeserializer.cpp(188)
runtime/Cpp/runtime/src/atn/ATNDeserializer.cpp(191)
runtime/Cpp/runtime/src/atn/ATNDeserializer.cpp(194)
runtime/Cpp/runtime/src/atn/ATNDeserializer.cpp(197)
runtime/Cpp/runtime/src/atn/ATNDeserializer.cpp(200)
runtime/Cpp/runtime/src/atn/ATNDeserializer.cpp(203)
runtime/Cpp/runtime/src/atn/ATNDeserializer.cpp(206)
runtime/Cpp/runtime/src/atn/ATNDeserializer.cpp(212)
runtime/Cpp/runtime/src/atn/ATNDeserializationOptions.cpp(17)
runtime/Cpp/runtime/src/atn/LexerMoreAction.cpp(16)
runtime/Cpp/runtime/src/atn/LexerSkipAction.cpp(16)
runtime/Cpp/runtime/src/atn/LexerPopModeAction.cpp(16)

其中大部分泄漏都是与ATN相关的,根据引用顺藤摸瓜发现ATN中提示泄漏的这些东西最终被存储到了DFA缓存中,而DFA缓存最终又被放在lexer和parser的静态数据中(供lexer和parser使用,只会在初次构造lexer和parser时各自初始化一次):

// 本文中采用的Parser均名为MyParser,下文不再赘述
// 用来存储Parser的静态数据的结构体,Lexer也有
struct MyParserStaticData final {
  MyParserStaticData(std::vector<std::string> ruleNames,
                        std::vector<std::string> literalNames,
                        std::vector<std::string> symbolicNames)
      : ruleNames(std::move(ruleNames)), literalNames(std::move(literalNames)),
        symbolicNames(std::move(symbolicNames)),
        vocabulary(this->literalNames, this->symbolicNames) {}

  MyParserStaticData(const MyParserStaticData&) = delete;
  MyParserStaticData(MyParserStaticData&&) = delete;
  MyParserStaticData& operator=(const MyParserStaticData&) = delete;
  MyParserStaticData& operator=(MyParserStaticData&&) = delete;

  std::vector<antlr4::dfa::DFA> decisionToDFA; // DFA缓存就存储在这里
  antlr4::atn::PredictionContextCache sharedContextCache;
  const std::vector<std::string> ruleNames;
  const std::vector<std::string> literalNames;
  const std::vector<std::string> symbolicNames;
  const antlr4::dfa::Vocabulary vocabulary;
  antlr4::atn::SerializedATNView serializedATN;
  std::unique_ptr<antlr4::atn::ATN> atn;
};

// Parser的静态数据,只会在Parser初次构建时初始化一次
MyParserStaticData* myparserParserStaticData;

继续查找发现Parser和Lexer的静态数据是由名为xxxxParserInitialize的函数初始化的(xxxx为你的parser名称),而静态数据直接使用了原始指针存储,而且在别处没有发现任何释放代码:

MyParserStaticData* myparserParserStaticData;

为了验证内存泄漏是否是因为静态数据没有被正确释放产生的,这里使用了unique_ptr替代原始指针存储parser和lexer的静态数据进行测试(以下仅演示Parser部分,Lexer情况和Parser类似):

std::unique_ptr<MyParserStaticData> myparserParserStaticData;

为了让修改后的代码能正常工作,我们还需要修改xxxxParserInitialize函数:

myparserParserStaticData = staticData.release();

这句需要改成

myparserParserStaticData = std::move(staticData);

对Lexer和Parser都进行操作之后重新编译运行,大部分内存泄漏消失:

Detected memory leaks!
Dumping objects ->
C:\Users\Pyxherb\Desktop\antlr4\runtime\Cpp\runtime\src\atn\ATNDeserializationOptions.cpp(17) : {434} normal block at 0x000002891574C2C0, 3 bytes long.新
 Data: <   > 00 01 00 
Object dump complete.

(剩下的一条内存泄漏是由于ANTLR内部某个方法为了返回左值而new了一个对象导致的,目前暂时还未解决)

为了使修改能够一直有效(不至于每次从g4文件生成parser和lexer后都得编辑一次源码),我们需要修改ANTLR生成C++ Parser和Lexer用的模板文件,我这里采用了修改后自己重新构建一份jar的方法:

  1. clone一份ANTLR的repo
  2. 找到tool/resources/org/antlr/v4/tool/templates/codegen/Cpp/Cpp.stg(生成C++的Parser和Lexer用的模板文件)
  3. 按照上述修改C++代码的步骤修改模板文件,具体操作可以看这个commit
  4. 重新编译ANTLR,用编译出来的jar(建议选取名称中带有-complete的)替换原先的jar
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值