节点文献

基于近红外光谱结合数据增强CNN算法的白芷产地溯源方法

Based on Near Infrared Spectroscopy Combined with Data Enhancement CNN Algorithm Origin Traceability Method of Angelica Dahurica

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 郭兆华文师召李思凡王琪王颖鑫王鑫国牛丽颖李亚薇冯薇

【Author】 GUO Zhaohua;WEN Shizhao;LI Sifan;WANG Qi;WANG Yingxin;WANG Xinguo;NIU Liying;LI Yawei;FENG Wei;China Electronics Technology Group Corporation Network Communication Research Institute;School of Statistics and Data Science,Nankai University;Northeastern University;Quality Evaluation & Standardization Hebei Province Engineering Research Center of Traditional Chinese Medicine,School of Pharmaceutical Sciences,Hebei University of Chinese Medicine;Liaoning Academy of Analytical Sciences,Liaoning Inspection,Examination and Certification Centrer;

【通讯作者】 李亚薇;冯薇;

【机构】 中国电子科技集团公司网络通信研究院微波散射通信专业部南开大学统计与数据科学学院东北大学理学院河北中医药大学药学院,中药材品质评价与标准化河北省工程研究中心辽宁省检验检测认证中心,辽宁省分析科学研究院

【摘要】 目的 在中药产地溯源领域,基于近红外光谱结合数据增强卷积神经网络(CNN)算法建立样本量不均衡的白芷产地分类模型具有很大的理论研究价值与实际应用价值。方法 研究采集95份白芷样本,采用12 500~4 000 cm-1波段对不同白芷样品进行近红外光谱采集。本研究所使用的白芷近红外光谱数据集,存在样本量小、样本产地类别分布不均衡等问题。本研究提出了3种数据增强算法,包含光谱平移、光谱增噪和光谱组合来提升模型泛化能力,并使用Focal Loss作为损失函数来训练CNN模型解决样本不平衡的问题。结果 将3种数据增强算法应用于支持向量机(SVM)模型,对光谱数据添加信噪比为20的高斯噪声效果最好,能够将模型正确率提高至84.2%;在样本不平衡的情况下,通过应用Focal Loss作为损失函数来训练CNN模型,实现了高达94.7%的正确率。结论 通过红外光谱技术结合数据增强的CNN算法为白芷产地溯源提供了快速、无损的检测手段及可靠的数据分析方法,为中药材产地溯源提供新的方法参考。

【Abstract】 OBJECTIVE To establish an origin classification model of Angelica dahurica with unbalanced sample size based on near-infrared spectroscopy combined with data-enhanced convolutional neural network(CNN) algorithm. METHODS In this study, 95 samples of Angelica dahurica were collected, and near-infrared spectroscopy was performed on different samples within the wavelength range of 12 500 to 4 000 cm-1. The near-infrared spectroscopy dataset of Angelica dahurica used in this study faces issues such as small sample size and uneven distribution of sample origins. To enhance the generalizability of the model, three data augmentation algorithms were proposed, including spectral shifting, spectral noise addition, and spectral combination. Additionally, to address the problem of sample imbalance, Focal Loss was used as the loss function for training the CNN model. RESULTS The three data enhancement algorithms were applied to the SVM model. Adding Gaussian noise with a signal-to-noise ratio of 20 to the spectral data had the best effect, which could increase the accuracy of the model to 84.2%. Aiming at the problem of sample imbalance, Focal Loss is used as the loss function to train the CNN model, and the accuracy rate can reach 94.7%. CONCLUSION The infrared spectroscopy combined with data-enhanced CNN algorithm provides a rapid and non-destructive detection method and reliable data analysis method for the origin traceability of Radix Angelicae Dahuricae, and provides a new method reference for the origin traceability of Chinese medicinal materials.

【基金】 河北省省级科技计划项目资助(21372503D);大学生创新创业训练计划项目资助(202414432007)
  • 【文献出处】 中国药学杂志 ,Chinese Pharmaceutical Journal , 编辑部邮箱 ,2024年21期
  • 【分类号】R284.1
  • 【下载频次】33
节点文献中: 

本文链接的文献网络图示:

本文的引文网络