DeepSmith源码学习

最新推荐文章于 2021-07-29 10:29:34 发布

lln_lln

最新推荐文章于 2021-07-29 10:29:34 发布

阅读量660

点赞数 2

分类专栏： project 文章标签： project

本文链接：https://blog.csdn.net/lln_lln/article/details/82931476

版权

1.获取数据集（\phd-master\datasets\github\scrape_repos）
执行顺序：scraper.py—cloner.py—importer.py—export_corpus.py
2.生成模型（\phd-master\deeplearning\clgen\clgen.py）
a.对数据集进行预处理和编码。
b.在语料库上定义和训练一个机器学习模型。
c.对经过训练的模型进行抽样，以生成新的程序。
3.cli下进行测试（\phd-master\deeplearning\deepsmith\cli）
执行顺序：generate_testcases.py—import.py（将生成的testcases导入DataStore）—run_testcases.py—explore.py（打印结果）
4.demo（\phd-master\docs\2018_07_issta\artifact_evaluation）
执行顺序：01_evaluate_generator.py（训练神经网络模型生成程序）—02_evaluate_harness.py（在OpenCL testbed上运行DeepSmith测试用例，并记录结果）—03_evaluate_results.py（对不同设备上的DeepSmith测试用例进行微分测试和评估）

1.Datasets
A.Benchmarks（benchmarks.proto）：
\phd-master\datasets\benchmarks包括：
LLVM test suite（benchmark）:执行clang.py
（@pytest.mark.parametrize(‘benchmark’, llvm_test_suite.BENCHMARKS)）参数化测试
bzip2(压缩和解压缩):执行clang_format.py
\phd-master\datasets\benchmarks\proto\benchmarks.proto：benchmarks的proto

B.Github（scrape_repos.proto）：
README.md（Clone Popular GitHub Repos by Language）
\phd-master\datasets\github\scrape_repos包括：
Create a “clone list” file（scrape_repos.proto）—>Scrape GitHub to create GitHubRepositoryMeta messages of repos—>Run the cloner to download the repos scraped—>Extract individual source files from the cloned repos and import them into a contentfiles database—>Export the source files from the corpus database to a directory
即执行顺序：scraper.py—cloner.py—importer.py—export_corpus.py

\phd-master\datasets\github\scrape_repos\proto\scrape_repos.proto：用于组织GitHub repos的批量克隆的Protocol buffers。包含对GitHubCredentials、LanguageCloneList、LanguageToClone、GitHubRepositoryQuery、ContentFilesImporterConfig、GitHubRepoMetadata、ImportWorker、ContentFile、MethodsList的描述。

\phd-master\datasets\github\scrape_repos\preprocessors包括：
extractors.py：extract from source code
inliners.py：inline includes
public.py：定义了标记数据集预处理器的修饰符。
preprocessors.py：数据集中的预处理文件。
conftest.py：共享设备功能。如果在实现测试过程中，需要使用来自多个测试文件的fixture函数，那么可以将它移动到conftest.py文件。不需要导入你想在测试中使用的fixture函数，它会被pytest自动发现。fixture函数的发现从测试类开始，然后是测试模块，conftest.py文件，最后builtin和第三方插件。
JavaMethodsExtractor.java：从java源文件中抽取方法，返回方法列表proto(MethodsList proto)

2.deeplearning
Pull a docker image containing a pre-trained neural network and OpenCL environment for fuzz testing Oclgrind（README.md）
A.CLgen
\phd-master\deeplearning\clgen包括以下文件：
init.py：
clgen.py：一个深度学习程序生成器。
The core operations of CLgen are:

Preprocess and encode a corpus of handwritten example programs.
Define and train a machine learning model on the corpus.
Sample the trained model to generate new programs.
这个程序自动执行管道的三个阶段。管道可以随时中断和恢复。结果在运行时缓存。注意，管道中的许多步骤都是高度密集型计算和高度并行化的。如果配置了CUDA支持，任何NVIDIA gpu都将用于尽可能提高性能。
执行过程如下：
获取config(一个指向Clgen实例的path)——RunWithErrorHandling（DoFlagsAction）运行给定的函数DoFlagsAction作为程序的入口。DoFlagsAction：根据命令行请求执行相应动作（选择是否打印corpus、model、sampler、clgen_path、preprocessed等，默认动作是创建corpus——train——export_model——sample）。
cache.py：包含管理CLgen文件系统caches的逻辑。
ls_models.py：列举缓存（cached）模型
sample.py：CLgen模块的一个轻量级版本，仅用于参考。
samplers.py：CLgen语言模型采样器，采样器是一个当它被传递给模式的Sample()方法时，能确定生成的样本的形状的对象，
telemetry.py：定义遥测数据的采集。遥测数据是在每个训练阶段之后收集的。它包括一个
时间戳，模型的损失，以及花在训练epoch上的时间。
参考：https://keras.io/callbacks/#lambdacallback
errors.py：CLgen使用的自定义异常层次结构。
conftest.py：Pytest fixtures。

\phd-master\deeplearning\clgen\data

\phd-master\deeplearning\clgen\proto包括：
clgen.proto：描述了CLgen instances。
corpus.proto：特定的训练模型的语料库的描述。包括Corpus、GreedyMulticharAtomizer
（多字符token列表）。
internal.proto：定义了CLgen组件内部，包括：CorpusMeta、ModelMeta、SamplerMeta
（META.pbtxt）

最低0.47元/天解锁文章

lln_lln

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
DeepSmith源码学习

1.linux中shell变量$#,$@,$0,$1,$2的含义解释:$$ Shell本身的PID（ProcessID） $! Shell最后运行的后台Process的PID $? 最后运行的命令的结束代码（返回值） $- 使用Set命令设定的Flag一览 $* 所有参数列表。如&amp;amp;amp;quot;$*&amp;amp;amp;quot;用「&amp;amp;amp;quot;」括起来的情况、以&amp;amp;amp;quot;$1 $2
复制链接

扫一扫

专栏目录