【论文笔记】Learning to log

最新推荐文章于 2023-12-09 00:39:43 发布

Preke

最新推荐文章于 2023-12-09 00:39:43 发布

阅读量506

点赞数 1

分类专栏：论文笔记文章标签： logging

本文链接：https://blog.csdn.net/u013398398/article/details/77606329

版权

论文笔记专栏收录该内容

3 篇文章 0 订阅

订阅专栏

paper链接：http://www.academia.edu/download/36281506/jmzhu_icse2015.pdf

Abstract

作者先进行了背景介绍，
在这篇paper里提出了一个 learning to log 的架构，旨在提供logging的指导；

其中的一个实现就是他们做出的一个工具：LogAdviser;从已有的logging实例中学习 where to log 这个问题。
目标问题的三个因素为：

结构特征
文本特征
语义特征

然后应用机器学习（特征选择，分类器训练）的方法求解。

评估：

2 from MS
2 from Github

一共有 19.1M LOC（Line of Code） and 100.6K

loggingstatements
LOC的解释
结果很好。

Introduction

前面说logging不能太少也不能太多。。还举例论证。。废话有点多

作者之前做过调研
就连 MS 也没有对logging的明确严格的标准。

在一些论坛的帖子里发现了一些开发者们讨论最好的logging实践经验
developers still need to make their own decisions on where to log and what to log，
which in most cases depend on their own domain knowledge

所以logging是一个重要的问题

Observations and Motivation

上述系统的log是经得起考验的

Observations：

Pervasiveness of logging（log的广泛性）
主要讲了…
a line of logging code in every 58 LOC
where to log

exceptions
return-valuecheck snippets

why not to log everything

还把这个问题去问starkoverflow。。。

Logging decision and the context

logging decision is
highly dependent on the context of this code snippet, including
the exception type

Motivation

现有的log基于developer的专业知识
我们希望提出一个工具来更好的提供有价值的log建议；降低对开发者专业知识的要求程度

Learing to Log

overview

Instances collection（选训练集）

exception snippets
records the exception context after an exception is captured in the catch block
在catch模块里，抛出异常的时候记录异常信息
return-value-check snippets
the situation where an unexpected value (e.g., -1/null/false/empty) is returned from a function call
函数调用时异常返回值时记录信息

Label identification （标label)

logged 包含logging 语句
unlogged 不包含 logging 语句

searching some keywords in all method names, such as

log/logging, trace, write/writeline

Feature extraction (特征提取）

The details on feature extraction are described
in Section III-B

Feature selection （特征选择）

Model training （模型训练）

classification model
Decision Tree

Logging suggestion（预测）

predictive model to perform accurate logging predictions

Structural features

error type
（每种错误的频率做特征）
associated methods
帮助理解函数功能和操作
采用函数名作为特征
通过调用的先后顺序 BFS
比如： System.IO.Path.GetFullPath

namespace,
class name,
its (short) method name.

Textual features

代码中的变量名，变量类型，函数名。。。
与上述的Structrual features 结合组成句子

词袋模型
分词，去停用词，tf-idf….

Syntactic features

SettingFlag. We identify whether there is an assignment statement
with an assigned value like -1/null/false/empty.
Throw. Weidentify whether there is a throw statement.
Return. We identify whether any special value (e.g., -1/null/false/empty)
is returned.
RecoverFlag. We check whether there is a new try statement inside.
OtherOperation. We check whether there is any other operations included except the above five
ones.
EmptyBlock. We find that the developers sometimes catch and then do nothing. We thus identify whether the catch block is empty.

以上都是布尔变量

& NumOfMethods

Feature selection

特征太多，维度太大

设置一个频率的最小阈值
信息增益（决策树）

reduce the feature dimensionality to around 1000

Noise Handling

implicitly assume good logging quality in the training data

CLNI

logged 和unlogged比例严重失衡

SMOTE合成一些logged数据达到平衡

评估

RQ1: What is the accuracy of LogAdvisor?
RQ2: What is the effect of different learning models?
RQ3: What is the effect of noise handling?
RQ4: How does LogAdvisor perform in the cross-project learning scenario?

用DT:

good performance
ease of interpretation(可解释性强）

10-fold cross evaluation

balanced accuracy (BA) [19], which
is the average of the proportion of logged instances and the
proportion of unlogged instances that are correctly classified.
（正确分类的）