python怎么测试c代码_使用Python检测C文件中的递归

I need to detect direct and indirect recursion in a rather large (5-15,000) set of C (not C++) files.

The files are already preprocessed.

The code is pretty "old school" for safety reasons so no fancy things like function pointers in there, only functions that pass variables about and some function-macros that do the same.

The most natural way to detect recursion is to make a directed call-graph, considering each function a node with an edge going to all the other functions that it calls. If the graph has any cycles, then we have recursion.

A regex to find function calls is trivial to make but I also need to know which function did the calling.

PyCParser was nice but it complains about a lot of things such as variables that are not defined or typedefs where the source type is not defined or defined in a different file which are completely irrelevant in my use-case. The project uses a custom dependency management system so some includes and the such are added automatically so I would need PyCParser to not care about anything other than FuncCall and FuncDef nodes and I don't think there is a way to limit the parsing process itself to just that.

I would rather not implement a parser as there i do not exactly have the time to learn how to do that in python and then implement the solution.

Back to the issue, how would I go about parsing the functions in a C file? Basically getting a dict with strings(names of functions defined in the file) as the keys, and lists of strings(the functions called by each function) as the values? A regex seems to be the most natural solution.

Using python is not optional sadly.

解决方案

Why not just use objdump on your compiled code then parse the generated assembly to build your graph?

test1.c file:

extern void test2();

void test1()

{

test2();

}

test2.c file:

extern void test1();

void test2()

{

test1();

}

int main()

{

test2();

}

now build it:

gcc -g test1.c test2.c -o myprog

now disassemble

objdump -d myprog > myprog.asm

Lookup all functions calls with a couple of simple regexes while memorizing the context you're on. A sample of the disassembly shows you how easy it should be:

00401630 <_test1>:

401630: 55 push %ebp

401631: 89 e5 mov %esp,%ebp

401633: 83 ec 08 sub $0x8,%esp

401636: e8 05 00 00 00 call 401640 <_test2>

40163b: c9 leave

40163c: c3 ret

40163d: 90 nop

40163e: 90 nop

40163f: 90 nop

00401640 <_test2>:

401640: 55 push %ebp

401641: 89 e5 mov %esp,%ebp

401643: 83 ec 08 sub $0x8,%esp

401646: e8 e5 ff ff ff call 401630 <_test1>

40164b: c9 leave

40164c: c3 ret

then use python to postprocess your disassembly and build a dictionary of function=>calls:

import re

import collections

calldict = collections.defaultdict(set)

callre = re.compile(".*\scall\s+.*")

funcre = re.compile("[0-9a-f]+\s:")

current_function = ""

with open("myprog.asm") as f:

for l in f:

m = funcre.match(l)

if m:

current_function = m.group(1)

else:

m = callre.search(l)

if m:

called = m.group(1)

calldict[current_function].add(called)

I didn't write the full graph search, but you can detect "ping-pong" recursion with a simple code like:

for function,called_set in calldict.items():

for called in called_set:

callset = calldict.get(called)

if callset and function in callset:

print(function,called)

which gives me:

_test2 _test1

_test1 _test2

this symbol/asm analysis technique is also used in callcatcher to detect unused C functions (which can be done very easily here as well by checking keys that aren't in any sets, with a bit of filtering of compiler symbols)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值