Insights into Deep Learning and Non-Deep Learning Techniques for Code Clone Detection

Abstract

A source code clone is a type of bad smell caused by pieces of code that have the same functional semantics, but the syntactical representation varies. In the past few years, there have been several studies about code clone detection, steered by numerous machine learning models, software techniques and other mathematical measures. This paper aims to conduct an impartial comparative study of the existing literature on Deep Learning and Non-Deep Learning techniques. Due to the lack of work in studying the previous and the current state-of-the-art tools in code clone detection, there is no concrete evidence found to underpin the use of Deep Learning approaches in clone detection, except for a preference from the evolutionary point of view. We will address and investigate a few research questions related to the intentions of using DL techniques for code clone detection compared to those of non-DL approaches (Based on –token, text, AST, metrics, and others). Furthermore, we will discuss the challenges faced in the Deep Learning implementation for clone detection and their potential resolutions if feasible. This review would help the audience understand how different approaches aid the clone detection process along with their performance measures, limitations, issues, and challenges.

Keywords: Code analysis, Code clone, Deep learning, Machine learning, Review, Source transformation, Tokenization.

Cite as