介绍软件工程领域熊英飞老师及其2017-18年论文——第二篇

最新推荐文章于 2022-11-17 14:14:26 发布

宇内虹游

最新推荐文章于 2022-11-17 14:14:26 发布

阅读量940

点赞数

分类专栏：研二上 2018年软件工程领域顶会论文--阅读笔记

本文链接：https://blog.csdn.net/weixin_39278265/article/details/82315522

版权

研二上同时被 2 个专栏收录

104 篇文章

订阅专栏

2018年软件工程领域顶会论文--阅读笔记

46 篇文章

订阅专栏

前言

本文旨在介绍软件工程领域熊英飞老师2017-18年论文——第二篇。

二、《An Empirical Study on TensorFlow Program Bugs》ISSTA 18

作者信息

第一作者：Yuhao Zhang，熊英飞老师的本科学生，”Enrolled at 2015, Worked with me since 2017” [3]
那现在才大三？太强了

简要概括工作：

这个TensorFlow方面的我不太懂，但是我觉得此文的工作可以用它的contributions来概括：
1) a dataset of tensorflow bugs collected from stackoverflow and github.
our empirical study focuses on the defects in tensorflow programs.
we extracted information from QA pages

2) a study of the symptoms and root causes of the bugs, which could assist future studies on TensorFlow application testing and detecting techniques.
four types of symptoms, seven types of root causes.

3) a study of the new challenges in detecting and localizing the bugs and the current strategies to address them, which opens new problems for future research.
five challenges in detection and fault localization:
- stochastic nature of learning process
- huge computation model
- non-determinism
- densely inter-dependent of a neural network (traditional debugging techniques such as slicing provide little help.)
- black-box nature of neural networks.

five strategies that the TF users have adopted to address the challenges.

感觉是对TensorFlow这个DL application中的主流进行了一个比较有意思的分析：首先提取这些bugs，然后分析他们的外在症状和内在原因，然后分析出现在检测和定位存在的挑战，并给出一些strategies，我觉得在DL这一块我看的不多，如文中所说，文章填补了空白（the characteristic of deep learning defects have never been studied)，还从github上找到了用到TensorFlow的3万多个应用，并从中抽了上百个bug，这个还是很酷的。

section 2：background of programming over the tensorflow framework

介绍呢。

deep learning (DL) is an artificial intellegence computational paradigm that makes classification based on hierachical layers of neurons that are interconnected to form a neural network.

超级复杂句，2个that，based on

In supervised learning, the cost function quantifies the error, known as “loss”, between the labeled values from the training data and the classified values outputted by the model.

from和by一定程度上可以互相转换。

In unsupervised learning, the cost function can quatify the distance between the encoded and decoded representations in the underlying autoencoder neural network.

section 3: three research questions

• RQ1: What are the symptoms and root causes of the bugs?
• RQ2: What new challenges exist to detect the bugs and how
do TF users handle them?
• RQ3: What new challenges exist to localize the bugs and
how do TF users handle them?

给我的感觉像是通过分析来找到这些bug的特征，以及发现和定位bug中出现的困哪以及TF使用者通常是怎么解决的，
像是一个survery

但是意义很明确，也很重要：
When answering the questions about challenges, we are also concerned about the solutions currently used by TF users. Understanding these solutions helps the development of new fault detection and localization techniques.

写的真的好，就感觉意义确实是这样的，这样的研究确实有道理。

section 4: how we collected our data

感觉有技术含量：

To collect bugs from strackoverflow pages, we used a search term “tensorflow answers -how -install -build” in stackoverflow search engine.
To collect bugs from GitHub commits, we searched for projects with keyword “tensorflow” in Github’s search engine.
Among the search results, we selected 11 target projects that are well-maintained with the highest numbers of commits and stars for further examination.

写的很棒，我发现用, which 的不多，通常用that从句。

请注意：整个搜索过程大概是：在引擎上用关键字先过滤一波 -> 然后就是人为筛选了。

这里通过比较bugs的规模，进一步enhance自己：（值得学习）：

Putting together, we got a dataset of 175 bugs, 87 collected from stackoverflow and 88 collected from github. The scale of our dataset is similar to other existing studies that require manual inspection, e.g., Jin et al. conducted a study of performance bugs and inspected 109 performance bugs [19], and Nasehi et al. conducted a study on what makes a good code example and analyzed 163 stackoverflow QA pages [26].

[19] Guoliang Jin, Linhai Song, Xiaoming Shi, Joel Scherpelz, and Shan Lu. 2012. Understanding and detecting real-world performance bugs. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, Beijing, China - June 11 - 16, 2012. 77–88. https://doi.org/101145/22540642254075
[26] Seyed Mehdi Nasehi, Jonathan Sillito, Frank Maurer, and Chris Burns. 2012. What
makes a good code example?: A study of programming Q&A in StackOverflow. In 28th IEEE International Conference on Software Maintenance, ICSM 2012, Trento, Italy, September 23-28, 2012. 25–34. https://doi.org/10.1109/ICSM.2012.6405249

section 5, 6, 7: answer these three research questions respectively

RQ 1:
有一段不得不记一下，实在精辟，再一次enhance自己的论点：

We observed that the last bug-including causes, some of which are unrelated to TensorFlow, are only accountable for 12(13.6%) of real issues we found in Github projects. This suggest that TF-related issues are the main reason for bugs in TF applications, calling for new testing and debugging techniques to specifically address TF-related bugs.

文章实在没得挑剔，这一段写在section 5的最后，感觉还是很亮眼的。

RQ 3：
其实我觉得这个研究还挺有趣的，确实有工作量。170余个bug，分析起来对我来说工作量很大了。

for bugs in the “error” type, we used trace dependency distance to measure the difficulty of fault locaization quantitatively.
Trace dependency distance is the smallest number of nodes on the trace dependency graph from the reported error location to the root cause of the bug, and was suggested by prior studies to measure the difficulty of fault localization [30].

[30] Manos Renieris and Steven P. Reiss. 2003. Fault Localization With Nearest Neighbor Queries. In 18th IEEE International Conference on Automated Software Engineering (ASE 2003), 6-10 October 2003, Montreal, Canada. 30–39. https://doi.org/10.1109/ASE.2003.1240292
再次感受到了作者强大的积累，ASE 2003的都找到了。

Threats to validity

讲的挺好的

our study investigated 175 bugs from stackoverflow and github, and it is not clear how much our findings generalize beyond the dataset, especially considering the fact that TF is growing fast. However, it is not easy to expand this dataset. First, since TensorFlow is an emerging framework, there were not many well-maintained popular github projects at the time we conducted this empirical study. Second, the manual efforts required to analyze the bugs were large. To collect and analyze the bugs, we spent approximately 400 person-hours, leading to an average 2.3 person-hours per bug.

person-hours，新词。

Discussion

不出意外，未来会有一篇关于DL应用修复的文章。而且这篇文章现在可以说建立了一个初步的benchmark，因为他们能够复现这些bug，那么意味着，未来
1）新的benchmark（关于Deep learning）
2）新的修复工具。

强。

提到了很多大牛，比如：

[40] Xiaoyuan Xie, Joshua W. K. Ho, Christian Murphy, Gail E. Kaiser, Baowen Xu, and Tsong Yueh Chen. 2011. Testing and validating machine learning classifiers by metamorphic testing. Journal of Systems and Software 84, 4 (2011), 544–558. https://doi.org/10.1016/j.jss.2010.11.920

此外，值得学习的是，可以看到之前还是有和DL相关的工作的，但是作者总能够很好的解释清楚，这个是我现在还做不到的。

conclusion

也写的很好。很有现实意义。
讲道理这是我第一次看见有论文这么强调自己的工作带来的现实意义
Two groups of people can benefit from this study. For TF users, we summarized five strategies used by other TF users to detect and debug the bugs in TF programs. For software engineering researchers, we pointed out five challenges which call for more research efforts. Our classification of causes and symptoms offers both TF users and software engineering researchers a better understanding of deep learning program bugs.