An Automatic Near-Duplicate Video Data Cleaning Method Based on a Consistent Feature Hash Ring

doi:10.3390/electronics13081522

DOI: 10.3390/electronics13081522 ISSN: 2079-9292

An Automatic Near-Duplicate Video Data Cleaning Method Based on a Consistent Feature Hash Ring

Yi Qin, Ou Ye, Yan Fu

Electrical and Electronic Engineering
Computer Networks and Communications
Hardware and Architecture
Signal Processing
Control and Systems Engineering

In recent decades, with the ever-growing scale of video data, near-duplicate videos continue to emerge. Data quality issues caused by near-duplicate videos are becoming more and more prominent, which has affected the application of normal videos. Although current studies on near-duplicate video detection can help uncover data quality issues for videos, they still lack a process of automatic merging for the video data represented by high-dimensional features, which makes it difficult to automatically clean the near-duplicate videos to improve data quality for video datasets. At present, there are few studies on near-duplicate video data cleaning. The existing studies have the sensitive problems of video data orderliness and initial clustering centers under a condition that prior distribution is unknown, which seriously affects the accuracy of near-duplicate video data cleaning. To address the above issues, an automatic near-duplicate video data cleaning method based on a consistent feature hash ring is proposed in this paper. First, a residual network with convolutional block attention modules, a long short-term memory deep network, and an attention model are integrated to construct an RCLA deep network with the multi-head attention mechanism to extract spatiotemporal features of video data. Then, a consistent feature hash ring is constructed, which can effectively alleviate the sensitivity of video data orderliness while providing a condition of near-duplicate video merging. To reduce the sensitivity of the initial cluster centers to the results of near-duplicate video cleansing, an optimized feature distance-means clustering algorithm is constructed by utilizing a mountain peak function on a consistent feature hash ring, which can implement automatic cleaning of near-duplicate video data. Finally, experiments are conducted based on a commonly used dataset named CC_WEB_VIDEO and a coal mining video dataset. Compared with some existing studies, simulation results demonstrate the performance of the proposed method.

Outline

An Automatic Near-Duplicate Video Data Cleaning Method Based on a Consistent Feature Hash Ring

More from our Archive