A Novel Approach To Effective Detection and Removal of Code Clones

A Novel Approach to Effective Detection
and Analysis of Code Clones

Ms. Kavitha Esther Rajakumari
Dr. T. Jebarajan
Dept. of Computer Science and Engineering

Sathyabama University
Chennai, India.
Dept. of Computer Science and Engineering

Rajalakshmi Engineering College
Chennai, India.
Abstract - Code clones are found in most of the software systems.

They play a major role in the field of software engineering. The
presence of clones in a particular module will either improve or
degrade the quality of the overall software system. Poor quality
software indirectly leads to strenuous software maintenance.
Detecting the code clones will also pave way for analyzing them.
The existing approaches detect the clones efficiently. Though
some of the tools analyze the clones, accuracy is still missing. In this
paper, a novel method is proposed, which exhibits the use of an
efficient data mining technique in the phase of analysis. Based on
the outcome of the analysis, the clones are either removed or
retained in the software system.
The rest of the paper is structured as follows: Section 2

presents the concept of code clones. In section 3, the data
mining technique used in the proposed method is described.
Section 4 depicts the proposed method. In section 5, the
implementation and results are discussed. Finally, the paper is
concluded along with the future work in section 6.
II. DEFINITIONS AND TYPES OF CODE CLONES
Definition 1: Code Fragment.

A code fragment is any sequence of code lines. It can be of
any granularity, e.g., function definition, begin-end block, or
sequence of statements.
Keywords - Code clones; Bad smells; Software Engineering; Data

Mining Technique.
I. INTRODUCTION
Definition 2: Code Clone.
Research conducted on code clones, proves it to be a

challenging area. There are various reasons for the occurrence
of clones within a software system. They are generated
purposefully or accidentally due to software reuse. In most of
the occasion, due to time constraint, copy and paste and other
similar techniques are followed to develop the software
system. Sometimes it is of the poor knowledge of the
programmer. In some cases, when software modules are reused,
lack of quality testing will lead to unexpected harmful clones.
A code clone is a clone of another code fragment. Two

fragments that are similar to each other form a clone pair and
when many fragments are similar, they form a clone class or
clone group [3].
A. Code Clone Types
There are four types of clones. They are divided into two
subgroups. First subgroup is based on text and the second
subgroup is based on function. Type-1, type-2, type-3 belongs
Code clones are also known as duplicated codes. They form to textual type and type-4 belongs to functional type. They are
a part of bad smells. Bad smells are the ones which affect the
listed below.
software system, thus resulting in poor quality. Some of the
bad smells are [8],
1) Textual Type:
Duplicated code
Long method
a) Type-1: Identical code fragments except for variations in
Large class
whitespace, layout and comments.
Useless method
Useless class
b) Type-2: Syntactically identical fragments except for
Though clones are considered as bad smells, they prove to
be useful in some cases. Beneficial functions developed by
skilled programmers can be made into a library function and
can use it within a software system wherever necessary.
Therefore, if code clones exist in software systems, they have to
be analyzed to check whether they are harmful or
beneficial. After analysis, harmful clones have to be
refactored and beneficial clones should be maintained.
978-1-4799-0048-0/13/$31.00 2013 IEEE
287
variations in identifiers, literals, types, whitespace, layout and

comments.
c) Type-3: Copied fragments with further modifications such
as changed, added or removed statements, in addition to
variations in identifiers, literals, types, whitespace, layout and
comments.
2) Functional Type:
a) Type-4: Two or more code fragments that perform the same
computation but are implemented by
different syntactic
variants.
Input the software

module
III. FREQUENT PATTERN GROWTH METHOD (FP-GROWTH)

The key of the proposed approach is the underlying data
mining technique. This technique helps in scrutinizing the
data effectively. It also helps in clone detection. The technique
used is the FP-Growth method. This method is based on the
divide and conquer strategy. This method is widely used in a
transaction database. It mines frequent patterns using an FPTree.
FP-Growth allows frequent itemset discovery without
candidate itemset generation. This technique has two phases.
The first phase is to derive the frequent items and the second
phase is to construct an FP-Tree based on the results of the
first phase. The technique works as follows: In the first scan
of the database, frequent items (1-itemsets) and their support
counts are derived. Then the itemsets are listed in descending
order of support count. An FP-Tree is then constructed.
Mining of an FP-Tree is performed by calling FP-growth (FPtree, null) which is implemented as follows [5].
4.
5.
6.
7.
8.
Filtration
Extraction
Manual
Screening
Do the codes
contain Clones
/ Bad Smells ?
Procedure FP_growth (Tree )

1.
2.
3.
Data Mining
Technique
if Tree contains a single path p then

for each combination () of the nodes in the path p.
generate pattern b with support = minimum support
of nodes in ;
else for each ai in the header of Tree {
generate pattern =ai with support = ai support;
construct , s conditional pattern base and then 's
conditional FP_tree Tree ;
if Tree then
call F_growth (Tree, );}
Rectify the
problematic codes
Regression
Testing
Refined Software
Module
Though the FP-Growth procedure is widely used in data

mining, it can also be used in code clone research. The
proposed approach have applied the first half of the FP-Growth
procedure.
IV. PROPOSED METHOD
The proposed method consists of five stages. The input data
will undergo these stages thereby enhancing the quality of the
data. By using this configuration the software system can be
detected for code clones as well as bad smells. In most of the
existing approaches, there is a separation between detection and
analysis of the code clones. A clone detection tool is used for
detecting clones and a distinct environment is used for
analyzing the detected clones. This approach supports both,
thereby making the programmer contented. This proposed
approach is appropriate for Type-1 and Type-4 clones. Type-1 is
simple clones whereas Type-4 is functional clones. Below is the
schematic representation of the proposed method.
288
Ready for Testing

Phase
Following are the steps involved in the novel approach:

1.
Data mining technique scans the modules of the
software system.
2.
Displays the cloned and uncloned information.
3.
Filtration of unwanted elements.
4.
Extraction of cloned methods and classes.
5.
Manual checking of the cloned methods and classes.
6.
Make changes if necessary
- Retain the clones if beneficial (or)
- Eliminate the clones if harmful.
7.
Record the modifications (if any).
8.
Perform regression testing.
The input can be a software system in any programming

language. This framework is applied to the software which
comes out of the coding phase in software development life
cycle. Every module of the software system is examined for
clones and bad smells. By reviewing all the modules within
the given software system will lead to effortless software
maintenance. Furthermore if the software is free from
unwanted clones and bad smells, time consumption during
testing phase is considerably reduced. Now each of the five
phases in the proposed block diagram is described below.
during manual checking have affected the other parts of the

module. This testing is a method of verification. The last step
(8) is executed here.
The software module that comes out after passing all
these phases is considered as a refined module. This module is
now ready for the testing phase. Since the module is error free,
the time taken to conduct all types of tests on this module will
be greatly reduced. The implementation and results of the
proposed approach are discussed in the next section.
A. Data Mining Technique
V. IMPLEMENTATION AND RESULTS
The FP-Growth method described in the previous section is

used here. The first phase of the FP-Growth technique alone is
used. After scanning the software module, the elements of that
module like classes, namespaces, variables, methods, etc are
displayed. Moreover the occurrences of the displayed
elements are exhibited. The elements of the module are items
and the occurrences are the support counts. It is displayed in
the descending order of the support counts. Both cloned and
uncloned elements are displayed. Therefore the above
mentioned steps (1) and (2) are done here.
B. Filtration
In this stage, the unwanted elements except methods and
classes are filtered. Here too the number of occurrences of the
classes and methods within a specific module are exhibited.
This is done to ensure whether the repeated element belongs
to a class or method. Sometimes the same name is used for
class and a method within the class. Step (3) is carried out
here. Type-1 clones are detected in this stage.
To illustrate the proposed method, a Bank Management

System is considered. The Bank Management System used
here is an application software developed in C#.net. Each of
its software module is taken as the input. The proposed
method was implemented using C#.net. There are a number
of modules within this software system. Each of the modules
is selected and it undergoes the above mentioned five stages.
In this illustration, the Account Summary module is selected.
Below are the screenshots that were achieved during the
process of detection and analysis of code clones.
First enter the file path. As soon as the browse file is
clicked, the data mining technique is applied to the source
code. Frequent items are displayed along with the support
count in descending order. Here the frequent items are the
elements of the program and the support count is the number
of occurrences of each element in the source code of that
specific module.
C. Extraction
In this phase, the classes and methods are displayed with a
link to the source code of the module. When the user clicks on
a method or class the relevant codes which are highlighted is
displayed. Step (4) is executed here. Type-4 clones are
detected. Even bad smells can be detected in this stage.
D. Manual Screening
Here the displayed codes are examined for clones as well
as for bad smells. If any problematic codes are found they are
rectified. If there are clones with bug or clones which change
the operation of a function, they are considered as harmful
clones and removed. These types of clones cannot be rectified.
On the other hand, if there are clones which are reasonable
and beneficial, they are maintained in the module itself. The
clones detected in filtration and extraction stage, are well
analyzed here. In this level, steps (5), (6) and (7) are
performed.
E. Regression Testing
Regression means retesting the unchanged parts of the
application. Regression testing verifies whether the existing
functionality works as expected and the new changes have not
introduced any defect in functionality that was working before
this change. The software module undergoes this testing phase
only if modifications were made on that module. This testing
is performed in order to know whether the changes made
Next the filtration option is clicked. This option filters all

the elements of the program except classes and methods. It
separately displays the classes and methods. Let us consider
the element Account Summary. In the previous screenshot the
count frequency for this element is 4.But here only two
occurrences are exhibited, one is the class and the other is a
method. So the remaining two have to be checked in the
codes. It can be an ideal or an unwanted element. Therefore,
cautious analysis has to be carried out during manual screening.
So make a note of the count difference for later verification.
Mostly the remaining occurrences of this element fall in the
Type-1 category.
289
manual. Based on the interpretations, the clones will either be

removed or retained. Bad smells if any are carefully
eradicated. Perform the necessary changes and document it.
Next the module undergoes the last stage i.e. regression
testing. The module will undergo this phase only if changes
were made in the codes. This is executed to make sure, the
other parts of the module is intact.
VI. CONCLUSION AND FUTURE WORK
Now the extraction stage comes into the picture. Here too
both the classes and methods are displayed. The source code
of the classes and methods will be shown when clicked upon
it. The codes of the selected method or class alone will be
highlighted to make the programmer view instantly. This
phase helps in detecting Type-4 clones and bad smells.
Earlier the code clones were detected by using the

traditional string-matching algorithms. In this paper, the first
half of the FP-Growth method was used. This FP-Growth
method helps in two ways: 1. Detection of code clones and
bad smells and 2. To analyze the detected clones. Though FPGrowth method is already an existing one, the area of
application of this procedure is new. An innovative attempt was
made to apply this procedure in the area of software
engineering. It is a new approach towards the research of
clones.
The analysis process is done manually, because there are
chances of eliminating beneficial or reasonable clones in case
of automated tools. The projected approach has a wide range
of benefits, but it consumes much of the time during manual
screening which is unavoidable. Useful measures should be
taken to minimize time consumption.
The future work is, 1. To study and investigate the second
half of the FP-Growth method 2. To examine its role in the
field of code clones 3. To evaluate the effectiveness of the FPGrowth method as a whole in the field of software
engineering.
REFERENCES
Manual screening is now carried out. The highlighted

codes are first checked for clones and bad smells. Next the
doubtful clones traced during filtration and extraction is
checked. If clones are present they are well analyzed, to know
their purpose of existence in the module. This stage is entirely
[1] Andrian Marcus and Jonathan I. Maletic, Kent State

University, Identification of High-Level Concept Clones in
Source Code, IEEE, 2001.
[2] Lerina Aversano, Luigi Cerulo, Massimiliano Di Penta,
University of Sannio, How Clones are Maintained: An
Empirical Study, IEEE, CSMR2007.
[3] Yoshihiko, Raula, Shinji, Kyohei, Masataka and
Hajimu,
Codeclone graph metrics for detecting diffused code clones,
Software engineering conference, 2009.
[4] Jablonski, P. and Daqing Hou, Clarkson University, Renaming
parts of identifiers consistently within code clones, ICPC,
pages 38-39, IEEE Computer Society, 2010.
[5] Anil Patro, Raj Sekhar, C.P.V.N.J.Mohan Rao,A Technique for
Identifying and Testing Structural Clones in Large Scale
Systems, International Journal of Computer Science and
Information Technologies, Vol. 2 (4), 1403-1406, 2011.
[6] Hamid A. Basit, Usman Ali, Stan Jarzabek, Lahore University
of Management, Viewing Simple Clones from Structural
Clones Perspective, ACM, IWSC2011.
[7] Jiachen Yang, Keisuke Hotta, Yoshiki Higo, Hiroshi Igaki,
Shinji Kusumoto, Osaka University, Filtering Clones for
Individual User Based on Machine Learning Analysis, IEEE,
2012.
[8] Hui Liu, Zhiyi Ma, Weizhong Shao, and Zhendong Niu,
Schedule of Bad Smell Detection and Resolution: A New Way to
Save Effort, IEEE Transactions on Software Engineering,
2012.
[9] Michael Pradel, Thomas R. Gross, Members of IEEE, Namebased Analysis of Equally Typed Method Arguments, IEEE,
2013.
290

A Novel Approach To Effective Detection and Removal of Code Clones

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

A Novel Approach To Effective Detection and Removal of Code Clones

Enviado por

Direitos autorais:

Formatos disponíveis

A Novel Approach to Effective Detection

and Analysis of Code Clones

Dept. of Computer Science and Engineering

Dept. of Computer Science and Engineering

Abstract - Code clones are found in most of the software systems.

The rest of the paper is structured as follows: Section 2

Definition 1: Code Fragment.

Keywords - Code clones; Bad smells; Software Engineering; Data

Definition 2: Code Clone.

Research conducted on code clones, proves it to be a

A code clone is a clone of another code fragment. Two

978-1-4799-0048-0/13/$31.00 2013 IEEE

variations in identifiers, literals, types, whitespace, layout and

Input the software

III. FREQUENT PATTERN GROWTH METHOD (FP-GROWTH)

Procedure FP_growth (Tree )

if Tree contains a single path p then

Though the FP-Growth procedure is widely used in data

Ready for Testing

Following are the steps involved in the novel approach:

The input can be a software system in any programming

during manual checking have affected the other parts of the

A. Data Mining Technique

V. IMPLEMENTATION AND RESULTS

The FP-Growth method described in the previous section is

To illustrate the proposed method, a Bank Management

Next the filtration option is clicked. This option filters all

manual. Based on the interpretations, the clones will either be

Earlier the code clones were detected by using the

Manual screening is now carried out. The highlighted

[1] Andrian Marcus and Jonathan I. Maletic, Kent State

Você também pode gostar