节点文献

基于多模态数据与知识融合的开放环境蔬菜病害识别方法研究

Research on Recognition Method of Vegetable Diseases in Open Environment Based on Multimodal Data and Knowledge Fusion

【作者】 王春山

【导师】 赵春江;

【作者基本信息】 河北农业大学 , 农业信息化技术, 2023, 博士

【摘要】 确保蔬菜产业持续稳定发展是惠民生、保稳定和促和谐的重大民心工程。蔬菜病害的爆发容易导致大规模的减产降质,造成不可挽回的经济损失。病害的准确识别是进行有效防治的关键,有利于减少产量损失、降低农药使用量、保护农产品质量安全、促进蔬菜产业健康可持续发展。传统的病害识别方法速度慢、主观性强、误判率高,已不能满足现代农业生产的需要。结合图像处理技术和机器学习的蔬菜病害识别方法具有快速、精确、实时等特点,相对于传统人工诊断和识别方法具有无法比拟的优越性,已经成为现代蔬菜病害诊断的迫切要求和发展趋势。但在时空复杂多变的农业场景下,现有方法仍存在病害症状“看不全”、识别特征“说不清”、领域知识“融合难”的问题。针对上述问题,本文以番茄、黄瓜的常见叶部侵染性病害识别为例,从多模态联合表征学习、病害图像自动描述和领域知识融合三个维度开展了病害识别研究,主要研究内容和结论如下:1、针对现有模型多模态信息利用不足而产生的病害特征“看不全”问题,提出将病害图像以外的诊断信息以文本形式输入给模型的方法,采用多模态联合表征学习方式对蔬菜叶部病害进行识别,将图像和文本两种模态特征进行融合实现了模态之间的优势互补。为了验证上述方法的有效性,分别构建了基于概率融合与基于特征空间融合的多模态表征学习模型。前者采用图像分类器和文本分类器分别提取病害的图像特征和文本特征,并获得各自模态的概率矩阵,最后按照权重系数融合概率矩阵进行病害识别,这种双分支结构与特征后期融合的方式,具有架构灵活、计算量小的优点。后者则利用Transformer将病害图像与描述文本映射到一个共享的特征空间中,在这个特征空间中对图像特征和文本特征进行融合,最后使用融合后的特征对病害图像进行预测,该方法的优点是集成度较高。实验结果表明,上述两种多模态融合的病害识别模型在准确率方面均优于图像、文本单一模态的识别模型。2、针对基于深度学习的病害识别模型在病害特征与识别结果之间缺乏清晰表达,导致识别过程“说不清”的问题,提出一种病害图像特征密集描述方法,该方法将所看到的病害图像特征生成自然语言描述,辅助用户理解和判断识别结果的合理性。基于上述方法构建了蔬菜叶部病害图像中文密集描述生成模型Veg-DenseCap,该模型由两部分组成,首先使用基于区域的两阶段目标检测器Faster R-CNN提取包含病斑视觉信息的特征,然后使用语言生成器LSTM将检测器提取的图像特征作为输入,生成带有病斑详细信息的描述语句。实验结果表明,Veg-DenseCap生成的句子语法正确且具有一定的多样性,同时能准确的描述出叶部病害特征信息。这种视觉特征的语义描述便于用户理解模型决策依据,有利于提升模型的透明性,在用户和模型之间建立信任关系。3、由于现有的病害识别模型存在数据与领域知识的“融合难”问题,形成了病害识别模型对标注数据依赖强、领域知识利用率低的现象。针对上述问题,本文提出融合领域知识图谱与深度学习的蔬菜叶部病害诊断方法。首先提取病害描述文本中的特征词,使用词嵌入将其转化为词向量;其次通过结构化知识提取得到病害领域知识图谱中与病害特征相关的实体和关系,并利用知识图谱嵌入将其转化为低维连续向量;最后将病害特征词向量与相关知识实体向量作为模型的多通道输入,通过融合多个通道的信息从而使得网络能够学习到更丰富的病害特征。实验表明本方法通过加入病害领域知识,模型在病害描述文本中学习到更规范、更全面和更直接相关的病害特征,从而提高了不同类型病害的诊断准确率,同时通过病害特征词与三元组的实体链接,可以在知识图谱上形成可视化的推理路径,有助于理解模型决策的依据,增强模型的可解释性。综上所述,本文通过多模态数据间的相关性和互补性进行病害特征的融合表征学习,从多角度获取病害的可区分性特征,提高了模型在开放环境下病害识别的准确率和泛化能力;利用密集描述技术将病害图像特征空间中的相关特征映射到人类可理解的自然语言语义空间中,辅助用户理解模型的决策依据,增强模型的可信任程度;提出领域知识融合方法,增强模型的识别精度和可解释性。本文的相关研究成果能够为基于多模态数据与领域知识相结合的病害识别提供一种新的方法,对提高农作物病害识别的智能化水平,促进智慧农业发展具有重要的理论研究意义和现实应用价值。

【Abstract】 Sustainable development of the vegetable planting industry plays an indispensable role in improving people’s livelihood,maintaining social stability,and promoting public harmony.Outbreaks of vegetable diseases can easily cause large-scale reduction in production yield and quality,resulting in irreversible economic losses.In this regard,accurate identification of diseases is the key to effective disease prevention and control,so as to minimize yield losses,reduce pesticide use,guarantee the quality and safety of agricultural products,and promote the healthy and sustainable development of the vegetable industry in case of outbreaks.Traditional disease identification methods are subjected to low efficiency,high subjectivity and high misdiagnosis rate,and therefore cannot meet the needs of modern agricultural production.The vegetable disease identification methods combining image processing and machine learning technologies have shown incomparable advantages over traditional methods,including rapid detection,high accuracy and real-time feedback,and have become the developing trend of modern vegetable industry.To date,existing methods have achieved relative success under limited and restricted settings,but in complex and volatile agricultural scenarios in terms of time and space,there are still problems to be addressed,such as:the disease features are "incomprehensive";the decision-making basis is "inexplicable";and the domain knowledge is "difficult to integrate".In response to these issues,this paper investigated the task of disease identification from three dimensions,i.e.,multimodal joint representation learning,automatic captioning of disease images,and domain knowledge fusion,by taking common tomato and cucumber leaf diseases as an example.The main research contents and conclusions are as follows:1.Aiming at the problem of "incomprehensive" feature detection due to insufficient utilization of multimodal information in existing models,a solution was proposed by inputting diagnostic information other than disease images into the model in the form of text,so as to identify vegetable diseases on the basis of multimodal joint representation.The integration of images and texts can achieve complementary advantages between these two modal features.To verify the effectiveness of the above method,the multimodal representation learning models based on probability fusion and based on feature space fusion were constructed,respectively.The former model uses an image classifier to extract image features and a text classifier to extract text features first,and then derives the probability matrix for each mode.Thereafter,the probability matrixes are fused according to an appropriate ratio for disease identification.This dual branch structure with feature fusion at a later stage offers a flexible architecture and a low computational complexity.The latter model uses a transformer to map disease images and description texts into a unified feature space,where the image and text features are fused for disease prediction.The advantage of this method is that it has a high degree of integration.The experimental results show that the disease identification accuracy of these two multimodal fusion models is both better than that of the single modal identification model based on either images or texts alone.2.Aiming at the problem of "inexplicable" description of decision-making basis due to the lack of clear expression between the disease features and detection results in models based on deep learning,an image feature dense captioning method was proposed.This method can generate natural language descriptions for the observed image features to assist users in understanding and judging the rationality of the detection results.Based on this idea,a Chinese dense captioning model for vegetable leaf disease images,namely Veg-DenseCap,was constructed.This model consists of two parts.In the first part,a two-stage region-based target detector,Faster R-CNN,is used to extract the features containing visual information about the disease spots.In the second part,a language generator,LSTM,is used to generate a description statement with detailed information about the disease spots using the image features extracted by the detector as input.The experimental results show that the sentences generated by Veg-DenseCap have correct syntax and a good diversity,and the feature information of leaf diseases is accurately described.This kind of semantic description based on visual features can facilitate users in understanding the model’s decision-making basis,which is conducive to improving the transparency of model and establishing a trust relationship between the model and its users.3.Due to the problem of "difficult to integrate" between data and knowledge,the existing disease identification models usually rely heavily on the labeled data but have low utilization of the domain knowledge.In response to this issue,a method of diagnosing vegetable leaf diseases by integrating knowledge map and deep learning was proposed.Firstly,the feature words in the disease text description sentences are extracted and then transformed into word vectors through word embedding.Secondly,the entities and relationships related to the disease features in the disease domain knowledge map are extracted through structured knowledge extraction,which are further transformed into low-dimensional continuous vectors through knowledge map embedding.Finally,the disease feature word vectors and the related knowledge entity vectors are used as multi-channel inputs for the CNN network,so as to allow the model to learn more abundant disease features by fusing information from multiple channels.The experimental results show that,by adding disease domain knowledge,the model is able to capture more standardized,comprehensive,and straightforward disease features from the description texts,so that the diagnostic accuracy on different types of diseases is improved.Moreover,the physical connection between disease feature words and triples can help form a visual reasoning path on the knowledge map,which facilitates the users to better understand the model decision-making basis and enhances the model interpretability.To sum up,in this paper,a method of fused representation learning of vegetable disease features based on the correlation and complementarity between multimodal data was proposed for the purpose of obtaining distinguishable diseases features from multiple dimensions.This method has been verified to improve the accuracy and generalization ability of the disease identification model in an open environment.By using the dense captioning technique to map the relevant features in the image feature space into the understandable natural language semantic space,the users can better understand the model decision-making basis,which improves the credibility of the model:Furthermore,a knowledge fusion method was proposed,which increases the identification accuracy and interpretability of the model.The research outcome of this paper provides a novel method for disease identification based on multimodal data and domain knowledge,which is of important theoretical significance and practical application value for improving the intelligent level of crop disease identification and promoting the development of intelligent agriculture.

  • 【分类号】TP391.41;S436.3
节点文献中: 

本文链接的文献网络图示:

本文的引文网络