扩展clang 之自定义 pragma 处理

最新推荐文章于 2024-01-02 10:51:48 发布

cwg2552298

最新推荐文章于 2024-01-02 10:51:48 发布

阅读量2.4k

点赞数 7

文章标签： OLLVM Clang 扩展clang pragma 自定义pragma

本文链接：https://blog.csdn.net/cwg2552298/article/details/95654719

版权

前言：

出于对程序性能影响的考虑，使用LLVM对代码进行混淆时，可能会有这么个需求：对特定的代码段进行混淆。比如在一

个函数内，希望对某一段核心代码进行混淆。这时，你就需要通过某种方式对这段核心代码进行标记，使得你的混淆器能够识别

到某一段的代码需要进行混淆，而其余没有标记的代码段，不进行混淆。本篇文章的主要内容就是介绍如何对clang进行扩展，

使用自定义pragma对代码段进行标记，使得LLVM后端的Pass能够对特定的代码段进行混淆、转换。（本篇文章不是向你介绍如

何进行代码混淆，要想了解更多的代码混淆知识，可以参考《软件加密与解密》这本书，以及基于LLVM编译框架的开源代码混

淆框架OLLVM，本篇文章的思路也是源于OLLVM中对特定函数进行混淆的原理，具体OLLVM是怎么实现对特定函数进行混淆

的，网上应该也有分析文章，或者你直接看OLLVM的源码： /obfuscation/Utils.cpp 的 readAnnotate 函数。）

预备知识：

简单介绍下LLVM编译框架。该编译框架主要由前端clang、中间表示处理器（LLVM pass）、后端（特定平台代码生成

器）组成，一般的编译过程如下图所示（个人理解，如果有误，请不吝赐教）：

前端clang对源码进行词法、语法分析、语义生成，生成AST，传递给LLVM的Pass模块。混淆器工作在LLVM的Pass层，

所以它“看到的”代码，都是LLVM的IR（中间表示Intermediate presentation），要想能够对特定的代码进行识别，就需要通过某

种手段，向被转化为LLVM IR的源码中，添加某些“标记”（字面上是标记，实则是注解- annotation ），使得我们能够通过这

写“标记”，区分哪些代码段（基本块 Basicblock）需要进行混淆，哪些不用。OLLVM对特定函数混淆使用的就是这个思路。

在代码中添加注解，事实上会在 AST 上添加特殊的节点，后续的LLVM-PASS就可以根据特殊的节点信息，做自定义处理。

下图展示了AST中的 “注解” 节点：

添加对特定代码段识别的思路：

先定义两个pragma，一个表示开始，一个表示结束，例如：#pragma obf_begin、#pragma obf_end 。

当遇到自定义pragma时，会向AST中插入“注解”，注解的内容分别为 begin、end，这样就可以标识什么地方开始，

什么地方结束。

好了，既然思路有了，就需要实现对clang的扩展，添加这两个自定义pragma的处理函数了。

主要流程：

1.添加自定义pragma的定义。

2.添加自定义pragma的处理函数。

涉及的一些源码文件路径：

ollvm\tools\clang\lib\Sema\SemaStmt.cpp //Stmt 语义生成

ollvm\tools\clang\lib\Parse\ParseStmt.cpp //Stmt 解析

ollvm\tools\clang\lib\Parse\ParsePragma.cpp //Pragma 解析

ollvm\tools\clang\include\clang\Parse\Parser.h //词法分析器

ollvm\tools\clang\include\clang\Basic\TokenKinds.def //符号定义文件

ollvm\tools\clang\include\clang\Sema\Sema.h //语义生成函数的定义

扩展clang：

1.在 TokenKinds.def 文件中，添加自定义的pragma的注解符号定义：

ANNOTATION(pragma_begin_obf)
ANNOTATION(pragma_end_obf)

2.在词法分析器中（Parser.h），添加两个新的pragma的定义：

  std::unique_ptr<PragmaHandler> BeginObfHandler;
  std::unique_ptr<PragmaHandler> EndObfHandler;

3.在ParsePragma.cpp 文件中，添加对这两个新的pragma的处理函数。处理函数主要的功能是：

从符号流中，把我们的pragma字符串设置为前面定义的符号类型-token kind （tokenkind.def中定义），

代码如下所示：

struct PragmaBeginObfHandler : public PragmaHandler{
    PragmaBeginObfHandler() : PragmaHandler("begin_obf"){}
    void HandlePragma(Preprocessor &PP, PragmaIntroducerKind Introducer,
                    Token &FirstToken) override; 
};

struct PragmaEndObfHandler : public PragmaHandler{
    PragmaEndObfHandler() : PragmaHandler("end_obf"){}
    void HandlePragma(Preprocessor &PP, PragmaIntroducerKind Introducer,
                    Token &FirstToken) override; 
};

void MyPragmaHandler(Preprocessor &PP,Token &FirstToken,bool IsBegin)
{
   //eat token
    Token tok ;
    while(tok.isNot(tok::eod)){
        PP.Lex(tok); 
    } 

    //enter our annotation token 
    tok.startToken();
    if(IsBegin)
    {
      tok.setKind(tok::annot_pragma_begin_obf);         //设置token kind，在TokenKind.def文件中定义
      tok.setAnnotationValue(strdup("begin_obf"));      //2019-9-1 替换了调用顺序
      //原来是：
/*
      tok.setAnnotationValue(strdup("begin_obf"));
      tok.setKind(tok::annot_pragma_begin_obf);         
*/
       //这样调用会导致assert(isAnnotation() && "Used AnnotVal on non-annotation token"); 原因出在 startToken后，token的kind为tok::unknown，导致setAnnotationValue内部isAnnotation() 返回false，出现assert
    }
    else
    {
      tok.setKind(tok::annot_pragma_end_obf);
      tok.setAnnotationValue(strdup("end_obf"));
    }
    tok.setLocation(FirstToken.getLocation());
    tok.setAnnotationEndLoc(FirstToken.getLocation());
    PP.EnterToken(tok);
}
void PragmaBeginObfHandler::HandlePragma(Preprocessor &PP,
                                             PragmaIntroducerKind Introducer,
                                             Token &FirstToken){
    MyPragmaHandler(PP,FirstToken,true);
}

void PragmaEndObfHandler::HandlePragma(Preprocessor &PP,
                                             PragmaIntroducerKind Introducer,
                                             Token &FirstToken){
    MyPragmaHandler(PP,FirstToken,false);
}

4.自定义pragma的符号由前面的pragma handler 插入到符号流中了。现在需要扩展语句分析器（parseStmt.cpp）：

StmtResult 
Parser::HandlePragmaMyAnnotate(bool IsBegin)
{
    if(IsBegin)
      assert(Tok.is(tok::annot_pragma_begin_obf));
    else 
      assert(Tok.is(tok::annot_pragma_end_obf));
    auto Where = Tok.getLocation();
    //option 2. create a identifier token , enter this token , and let clang parse it .
    SmallVector<Token, 32> TokenList;

    Token tokINT;
    tokINT.startToken();
    tokINT.setKind(tok::kw_int);

    Token tokDeclarator;
    tokDeclarator.startToken();
    tokDeclarator.setKind(tok::identifier);
    if(IsBegin)
      tokDeclarator.setIdentifierInfo(PP.getIdentifierInfo("beginObf"));
    else 
      tokDeclarator.setIdentifierInfo(PP.getIdentifierInfo("endObf"));

    Token semi;
    semi.startToken();
    semi.setKind(tok::semi);

    TokenList.push_back(tokINT);
    TokenList.push_back(tokDeclarator);
    TokenList.push_back(semi);

    //lazily fix the location of tokens 
    for(Token & tok : TokenList)
    {
      tok.setLocation(Where);
    }

    //1. enter tokens after the pragma directive 
    Token * tokenArray= new Token[TokenList.size()];
    std::copy(TokenList.begin(),TokenList.end(),tokenArray);
    PP.EnterTokenStream(tokenArray,TokenList.size(),false,true);
   
    //eat the useless tokens
    if(IsBegin)
      while (Tok.is(tok::annot_pragma_begin_obf)){
        ConsumeToken();
      }
    else
      while (Tok.is(tok::annot_pragma_end_obf)){
          ConsumeToken();
      }

    //2. add GNU - attribute , with the name "annotate" , and parameter "\"end_obf"\" 
    IdentifierInfo * idfAnnotate= PP.getIdentifierInfo ("annotate");
    Token idfAnnName;
    idfAnnName.startToken();
    idfAnnName.setKind(tok::string_literal);
    StringRef ann;
    if(IsBegin)
      ann=StringRef("\"begin_obf\"");
    else 
      ann=StringRef("\"end_obf\"");
    idfAnnName.setLiteralData(ann.data());
    idfAnnName.setLength(ann.size());

    //create GNU - attribute 
    ParsedAttributesWithRange TempAttrs(AttrFactory);
    SourceRange sr(Where,Where);
    ArgsVector ArgExprs;
    SmallVector<Token, 4> StringToks;
    StringToks.push_back(idfAnnName);

    //create string parameter for the attribute 
    ExprResult ArgExpr(Actions.ActOnStringLiteral(StringToks,getCurScope()) );
    ArgExprs.push_back(ArgExpr.get());
    TempAttrs.addNew(idfAnnotate, sr , (IdentifierInfo *)nullptr, Where ,
                 ArgExprs.data(), ArgExprs.size(), AttributeList::AS_GNU);

    //3. let clang deal with newly created declaration with annotation .
    SourceLocation DeclStart = Tok.getLocation(), DeclEnd;
    DeclGroupPtrTy Decl = ParseDeclaration(Declarator::BlockContext,DeclEnd, TempAttrs);
    return Actions.ActOnDeclStmt(Decl, DeclStart, DeclEnd); 
}


///
// 一下只展示了关键的修改点
StmtResult
Parser::ParseStatementOrDeclarationAfterAttributes(StmtVector &Stmts,
          AllowedContsructsKind Allowed, SourceLocation *TrailingElseLoc,
          ParsedAttributesWithRange &Attrs) {
  const char *SemiError = nullptr;
  StmtResult Res;

  // Cases in this switch statement should fall through if the parser expects
  // the token to end in a semicolon (in which case SemiError should be set),
  // or they directly 'return;' if not.
Retry:
  tok::TokenKind Kind  = Tok.getKind();
  SourceLocation AtLoc;
  switch (Kind) {
/------------- 修改处 -----------//
  case tok::annot_pragma_begin_obf:
  {
      ProhibitAttributes(Attrs);
      return HandlePragmaMyAnnotate(true);

  case tok::annot_pragma_end_obf:
  {
      ProhibitAttributes(Attrs);
      return HandlePragmaMyAnnotate(false);
  }
 balabala 其他的 符号 处理

5. 最后一步了，语义生成（SemaStmt.cpp）：

StmtResult Sema::ActOnBeginObfStmt(ArrayRef<Decl *> Group,
                                   SourceLocation StartLoc,
                                   SourceLocation EndLoc//,
                                   /*const AttributeList &Attr*/
                                   /*,
                                   SmallVector<char * ,8> Annotations*/){
    //I think it's time to add a int declaration in the DeclGroup , 
    //so that the DeclStmt can generate a int variable int the AST .  
    //and don't forget adding a annotation under the declaration of integer variable      

    	DeclGroupRef dgr(Group[0]);
      /*
      //add attribute to variable 
      Decl * Var = Group[0];
      Var->addAttr(::new (S.Context)
             AnnotateAttr(Attr.getRange(), S.Context, Str,
                          Attr.getAttributeSpellingListIndex()));*/
    /*if(VarDecl * VarBegin = dyn_cast<VarDecl>(Group[0]))
    {
      Group=DeclGroupPtrTy::make(DeclGroupRef::Create(Context, Group.data(), Group.size()));
    }*/

    /*DeclGroupRef DG = DG.get();
    DeclSpecContext DSContext = getDeclSpecContextFromDeclaratorContext(Context);
    //create a integer in the AST 
    DS.SetTypeSpecType(DeclSpec::TST_int, Loc, PrevSpec,
                                     DiagID, Policy);
  // If we have an invalid decl, just return an error.
  if (DG.isNull()) return StmtError();
  */
  //VarDecl * myVar = VarDecl::Create(Context,CurContext,StartLoc,StartLoc,)
  //ActOnUninitializedDecl(Group[0]);
  return new (Context) DeclStmt(dgr, StartLoc, EndLoc);                                 
  }

StmtResult Sema::ActOnEndObfStmt(ArrayRef<Decl *> Group,
                                   SourceLocation StartLoc,
                                   SourceLocation EndLoc/*,
                                   SmallVector<char * ,8> Annotations*/){
    DeclGroupRef dgr(Group[0]);
    /*if(VarDecl * VarEnd = dyn_cast<VarDecl>(Group[0]))
    {
      Group=DeclGroupPtrTy::make(DeclGroupRef::Create(Context, Group.data(), Group.size()));
    }*/
   //DeclGroupRef DG = DG.get();

  // If we have an invalid decl, just return an error.
  /*if (DG.isNull()) return StmtError();
  if (VarDecl *D = dyn_cast<VarDecl>(Group[0])) {
		DeclGroupPtrTy::make(
      DeclGroupRef::Create(Context, Group.data(), Group.size()));
	}
	*/
	
  return new (Context) DeclStmt(dgr, StartLoc, EndLoc);
  }

代码中都有注释，就不做更多的解释了。涉及的知识点比较多，也比较复杂，有不懂的地方可以看下官方文档Doxygen。

OK，感谢阅读~

附：

国外的一篇参考文章：

https://blog.quarkslab.com/implementing-a-custom-directive-handler-in-clang.html

以上代码的commit，用git - Diff可以查看修改了哪些代码：

https://github.com/cwg2205195/ExtClangPragma

cwg2552298

关注

7
点赞
踩
5

收藏

觉得还不错? 一键收藏
2
评论
扩展clang 之自定义 pragma 处理

前言：出于对程序性能影响的考虑，使用LLVM对代码进行混淆时，可能会有这么个需求：对特定的代码段进行混淆。比如在一个函数内，希望对某一段核心代码进行混淆。这时，你就需要通过某种方式对这段核心代码进行标记，使得你的混淆器能够识别到某一段的代码需要进行混淆，而其余没有标记的代码段，不进行混淆。本篇文章的主要内容就是介绍如何对clang进行扩展，使用自定义pragma对代码段进行标记，使...
复制链接

扫一扫