CCFinder操作指南

最新推荐文章于 2024-09-04 07:34:10 发布

mml8654352

最新推荐文章于 2024-09-04 07:34:10 发布

阅读量1.7k

点赞数

原网页：http://www.ccfinder.net/doc/10.2/en/tutorial-ccfx.html#options-of-execution-mode-p

The available execution modes are: d , p , m , s

You can obtain the list of execution modes by the following command line:

ccfx -h

Also, you can obtain the help message of each execution mode by such as command-lines:

ccfx d -h
ccfx m -h

when a directory is given, which stores target source files. how to detect code clones,

ccfx d java -dn c:\target\src

The argument java means that target source files are written in Java. That is, with this argument, ccfx will search files having extension ".java" in (sub-directories of) the target directories apply a preprocessing for Java programming language. (The preprocessing is a kind of normalization, which depends on the syntax of each programming language, to improve correctness of clone detection. The detection algorithm itself is independent from programming languages; a single detection algorithm is used for source code written in any programming language.)

The execution-mode d (detection of code clones) requires a name of preprocess script at the first argument of command line.The name of preprocess script will be also stored in a clone-data file so you can see name of preprocess script for a given clone-data file, with execution-mode p. The applicable names are:

The option -dn is roughly to specify a directory which stores the target source files. The detected code clones will be stored to a file a.ccfxd ，by default. In oder to specify name of the output file, use option "-o file"

The clone-data file (output file) is a binary file. In order to print them as text, use execution-mode p of ccfx (p stands for pretty printing):

ccfx p a.ccfxd

The execution-mode p has some options, which enable to extract part of information from a clone-data file.For example, each of target source file has a file ID (a kind of serial number), and a table containing each file path and its file ID will be obtained by the following command line:

ccfx p -ln a.ccfxd

Filtering by metric values

Filtering of clone-data file with the metrics requires the following two steps; at first, making a list of file IDs (or clone iDs), which should remain in the data. Secondly, modify the clone-data file using the list.

Step 2

The execution-mode s is used to perform filtering with the list of file IDs (or clone IDs) which will be remained. (the s stands for subset or scope).

The following command line will do a filtering by file ID and save the result to a file filtered.ccfxd:

ccfxd s a.ccfxd -o filtered.ccfxd -fi remainfiles.txt

Here, the option "-fi file" means to keep the source files with one of the file IDs (which appear in remainingfiles.txt) and also to remove the other source files from the clone data file.

In order to do filtering by clone ID, use option "-ci file" , in place of the option "-fi file" .

Summary of command lines for filtering by file metrics

Do filtering to the input clone-data file a.ccfxd, and save the result to a clone-data file filtered.ccfxd:

ccfx m a.ccfxd -f -o filemetrics.tsv
picosel -o remainfiles.txt from filemetrics.tsv select FID where "CONDITION"
ccfx s a.ccfxd -o filtered.ccfxd -fi remainfiles.txt

Summary of command lines for filtering by clone metrics

Do filtering to the input clone-data file a.ccfxd, and save the result to a clone-data file filtered.ccfxd:

ccfx m a.ccfxd -c -o clonemetrics.tsv
picosel -o remainclones.txt from clonemetrics.tsv select CID where "CONDITION"
ccfx s a.ccfxd -o filtered.ccfxd -ci remainclones.txt

4. File list

This section presents how to generate a file list (that is a list of the input source files of code-clone detection), and how to use a file list in detection and analysis of code clones.

A file list is used to specify paths of target source files, in an explicit way, one-by-one. Such an explicit specification of files is useful in the following cases.

Excluding some source files from a list of the input source file

The CCFinderX (ccfx) doesn't have a capability to identify a tool-generated source files (Because there is no standard method to marking or identifying such tool-generated source files. I am looking forward to java.annotation.Geneted in Java programming language or similar programming-language level solutions. )

Includes some source files having a non-standard extension

By default, a source file with special extension will not be regarded as a target (in execution-mode d's option -d, or file searching in execution-mode f). If you are using such special extensions (for example, .inl in VC++), and you want to include such files in the target of clone detection, use a file list in order to specify these files explicitly.

Modification of the order in source files

By default, the order of source files is a kind of lexical order, with comparing paths of source files encoding in UTF-8. For example, when you want to place some two directories in the nearhood in a clone scatter plot, you can edit the file list.

Generating a file list

Just the same as the preceding section, the source files are written in Java, the target directory is c:\target\src .

In order to find out Java source files in the target directory and save the file list as a file filelist.txt , type following command line:

ccfx f java -a -l n c:\target\src -o filelist.txt

Here the option -a is to specify storing each file path as an absolute path in the result file list. The option -l n is to add a preprocessed-directory option to the file list, that is, a line, which describes an option -n, will be inserted as the first line of the file list. The -n line in a file list will work as if it will be a command-line option -n of execution-mode d, in a clone detection afterwards.

A clone detection itself will be done without these options ( -a and -l n ). However, as a preparation for display the clone-data file with GemX afterwards, and in order to prevent the preprocessed files from existing in the same directory of the target source files, these options are recommended.

The file list is a text file, so you can edit it with a text editor and freely add or delete names of source files. As a matter of course, any text file in the same format will be used as a file list, even if the file is not generated by execution-mode f.

When a file list is ready, use option -i of execution-mode d like the following command line, in order to detect code clones from the source files that are listed in the file list:

ccfx d java -i filelist.txt

When you specify multiple file lists in the command line, ccfx will work as if a file list that is a concatenation of them:

ccfx d java -i filelist1.txt -i filelist2.txt

You can also specify option -is in a file list, in addition to a path of source file, option -n . When a file list including a line, which is including only -is, the source files before the line and the source files after the line will belong the distinct file groups. As for the file group, see the next section 5. File group .

5. File group

File groups are used for separating the target source files into some groups and detecting code clones only between the groups.

The execution-mode d has two options, which are related with file group. In order to separate source files into groups, use option -is . In order to detect code clones only between the groups and not to detect code clones between two files in the same group or code clones within a file, use option -w .

Code clone detection between versions with file group

This subsection presents an example where the target source code is two versions of a product, and detecting code clones between versions (and not detecting code clones inside each version).

The source files of the older version are stored in a directory c:\oldsrc, and ones of the newer version is c:\newsrc . The following command line will detect code clones only between versions:

ccfx d java -dn c:\oldsrc -is -dn c:\newsrc -w f-w-g+

Here, each argument means:

the first d means execution-mode d.
the next java means the target source file is written in Java.
the next -dn c:\oldsrc is to specify searching source files from the directory. In this case, the older versions of source files.
the next -is is a group separator, that is, the source files before this option and the source files after this option will belong to the distinct groups.
the next -dn c:\newsrc is to specify searching source files from the directory. In this case, the newer versions of source files.
the last -w w-f-g+ means "do not detect code clones within a file", "do not detect code clones between files in the same file group", and "detect code clones between files from the distinct file groups".

By comparing two versions with code clone, you can analyze them from view point of similarity, rather than difference. For example, you can observe the case where a code fragment was copied-and-pasted many times and has been spread over the product, or the case where duplicated code in the older versions has been cleaned up in the newer version.