Deep code search efficiency based on clustering

doi:10.1002/cpe.8027

DOI: 10.1002/cpe.8027 ISSN: 1532-0626

Deep code search efficiency based on clustering

Kun Liu, Jianxun Liu, Haize Hu

Computational Theory and Mathematics
Computer Networks and Communications
Computer Science Applications
Theoretical Computer Science
Software

Show PDF Cite

Abstract

The deep‐learning based code search model mainly takes accuracy as the only target for judging the performance of the model, ignoring the efficiency of code search. This article proposes a clustering‐based code search model (C‐DCS). C‐DCS uses the K‐Means to divide the code vector base into K clusters and obtains the center vectors of K clusters. While searching, C‐DCS first matches the query vector with the K center vectors to get the best matching center vector. After matching the center vector, C‐DCS matches the query vector with code vectors in the cluster corresponding to the best matching center vector one by one and then gets the best matching code snippet vector. To verify the efficiency of C‐DCS in the code search task, experimental analysis was built on a large dataset. The experimental results showed that C‐DCS saves 92.2% of the search time compared to the baseline model while remaining the accuracy. In the experimental evaluation section, we optimized the K‐Means algorithm to improve the code search efficiency of C‐DCS further, reducing the search time to 93.8% of the baseline model. Hence, C‐DCS reduces the code search time greatly with not affecting the accuracy, improving the efficiency of software development.

Outline

Deep code search efficiency based on clustering

Abstract

More from our Archive