初次使用tree-sitter

安装

采用的是anaconda设置虚拟环境,在vs studio2022上运行。

第一次配置了python3.6版本 使用pip3 install tree_sitter命令进行安装时报错:

 然后配置了3.8(图)和3.9版本,发现安装成功:

 选择在3.9版本中使用tree-sitter。

我也不知道为啥在3.6不好用。。。但是最终装上了!

使用

根据官网教程tree-sitter跟着尝试:

先将官网git下载到本地,然后在包里运行示例。

from tree_sitter import Language, Parser

Language.build_library(
  # Store the library in the `build` directory
  'build/my-languages.so',

  # Include one or more languages
  [
    'vendor/tree-sitter-go',
    'vendor/tree-sitter-javascript',
    'vendor/tree-sitter-python'
  ]
)

GO_LANGUAGE = Language('build/my-languages.so', 'go')
JS_LANGUAGE = Language('build/my-languages.so', 'javascript')
PY_LANGUAGE = Language('build/my-languages.so', 'python')

运行该示例后发现会报错:

Traceback (most recent call last):
  File "/Users/symbolk/coding/analysis/treesitter/py-tree-sitter/builder.py", line 1, in <module>
    from tree_sitter import Language, Parser
  File "/Users/symbolk/coding/analysis/treesitter/py-tree-sitter/tree_sitter/__init__.py", line 9, in <module>
    from tree_sitter.binding import _language_field_id_for_name, _language_query
ModuleNotFoundError: No module named 'tree_sitter.binding'

网上查找,发现是不能在py-tree-sitter的源目录下运行:

import tree_sitter will try to import the git clone and not the version installed from PyPI (because . is the first entry in sys.path). But this does not work because tree_sitter.binding is a native module that has to be compiled first.

后重新创建项目,真的不报错了!! 

GraphCodeBert使用语法树分词

在网上寻找教程的时候发现了这段代码,GraphCodeBert也曾经读过【不过忘了。。】。于是尝试跑了一下这个代码。

from tree_sitter import Language, Parser


def tree_to_token_index(root_node):
    if (len(root_node.children) == 0 or root_node.type.find('string') != -1) and root_node.type != 'comment':
        return [(root_node.start_point, root_node.end_point)]
    else:
        code_tokens = []
        for child in root_node.children:
            code_tokens += tree_to_token_index(child)
        return code_tokens


def index_to_code_token(index, code):
    start_point = index[0]
    end_point = index[1]
    if start_point[0] == end_point[0]:
        s = code[start_point[0]][start_point[1]:end_point[1]]
    else:
        s = ""
        s += code[start_point[0]][start_point[1]:]
        for i in range(start_point[0]+1, end_point[0]):
            s += code[i]
        s += code[end_point[0]][:end_point[1]]
    return s


if __name__ == '__main__':
    Language.build_library(
  # Store the library in the `build` directory
  'build1/my-languages.so',

  # Include one or more languages
  [
    'vendor/tree-sitter-go',
    'vendor/tree-sitter-javascript',
    'vendor/tree-sitter-cpp',
     'vendor/tree-sitter-python',
  ]
)
GO_LANGUAGE = Language('build1/my-languages.so', 'go')
JS_LANGUAGE = Language('build1/my-languages.so', 'javascript')
PY_LANGUAGE = Language('build1/my-languages.so', 'python')
CPP_LANGUAGE = Language('build1/my-languages.so', 'cpp')

cpp_parser = Parser()
cpp_parser.set_language(CPP_LANGUAGE)

cpp_code_snippet = '''
    int mian{
        piantf("hell world");
        remake O;
    }
    '''

tree = cpp_parser.parse(bytes(cpp_code_snippet, "utf8"))
root_node = tree.root_node

tokens_index = tree_to_token_index(root_node)
cpp_loc = cpp_code_snippet.split('\n')

code_tokens = [index_to_code_token(x, cpp_loc) for x in tokens_index]

print(code_tokens)

 运行可得到结果:

 

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值