节点文献
基于多模态对比学习的输电线路螺栓缺陷分类
Transmission Line Bolt Defects Classification Based on Multi-modal Contrastive Learning
【摘要】 输电线路巡检中采集的螺栓图像有分辨率低、视觉信息不足的特点。针对传统图像分类模型难以从螺栓图像中学习到语义丰富的视觉表征问题,提出了一种基于多模态对比学习的输电线路螺栓缺陷分类方法。首先,为了将文本中螺栓相关的语义信息和先验知识以跨模态的方式注入视觉表征,提出了一种结合多模态对比预训练和监督式微调的二阶段训练算法;其次,为了缓解多模态对比预训练中的过拟合问题,提出了标签平滑的信息噪声对比估计损失(info noise contrastive estimation loss with label smoothing,infoNCE-LS),以提高预训练视觉表征的泛化性能;最后,针对上下游任务的不匹配问题,设计了3种基于文本提示的分类头,以改善预训练视觉表征在监督式微调阶段的迁移学习效果。实验结果表明:该文基于Res Net50和ViT构建的两种模型在螺栓缺陷分类数据集上的准确率分别为92.3%和97.4%,相比基线分别提高了2.4%和5.8%。研究实现了从文本到图像的语义信息跨模态补充,为螺栓缺陷识别的研究提供了新的思路。
【Abstract】 Bolt images collected in transmission line inspection have the characteristics of low resolution and insufficient visual information. To solve the problem that traditional image classification models struggle to learn rich-semantic visual representations from bolt images, this paper proposes a method of bolt defect classification based on multi-modal contrastive learning. Firstly, in order to inject bolt-related semantic information and prior knowledge into the visual representation in a cross-modal manner, a two-stage training algorithm which combines the multi-modal contrastive pre-training and supervised fine-tuning is proposed. Secondly, to alleviate the overfitting in multi-modal contrastive pre-training, the info noise contrastive estimation loss with label smoothing(infoNCE-LS) is proposed to improve the generalization of the pre-trained visual representation. Finally, aimed at the mismatch between the upstream and downstream tasks, three types of classification heads based on text prompts are designed to improve the transfer learning performance of the pre-trained visual representation in the supervised fine-tuning stage. The experimental results show that the accuracy of two models based on ResNet50 and ViT on the bolt defect classification dataset is 92.3% and 97.4%,which is 2.4% and 5.8% higher than the baseline. The study realizes the cross-modal supplement of semantic information from text to image, which provides a new idea for the research of bolt defect identification.
【Key words】 transmission line; bolt defect classification; multi-modal pre-training; contrastive learning; transfer learning;
- 【文献出处】 高电压技术 ,High Voltage Engineering , 编辑部邮箱 ,2025年02期
- 【分类号】TM75;TP391.41
- 【下载频次】636