节点文献
基于近红外光谱结合数据增强CNN算法的白芷产地溯源方法
Based on Near Infrared Spectroscopy Combined with Data Enhancement CNN Algorithm Origin Traceability Method of Angelica Dahurica
【摘要】 目的 在中药产地溯源领域,基于近红外光谱结合数据增强卷积神经网络(CNN)算法建立样本量不均衡的白芷产地分类模型具有很大的理论研究价值与实际应用价值。方法 研究采集95份白芷样本,采用12 500~4 000 cm-1波段对不同白芷样品进行近红外光谱采集。本研究所使用的白芷近红外光谱数据集,存在样本量小、样本产地类别分布不均衡等问题。本研究提出了3种数据增强算法,包含光谱平移、光谱增噪和光谱组合来提升模型泛化能力,并使用Focal Loss作为损失函数来训练CNN模型解决样本不平衡的问题。结果 将3种数据增强算法应用于支持向量机(SVM)模型,对光谱数据添加信噪比为20的高斯噪声效果最好,能够将模型正确率提高至84.2%;在样本不平衡的情况下,通过应用Focal Loss作为损失函数来训练CNN模型,实现了高达94.7%的正确率。结论 通过红外光谱技术结合数据增强的CNN算法为白芷产地溯源提供了快速、无损的检测手段及可靠的数据分析方法,为中药材产地溯源提供新的方法参考。
【Abstract】 OBJECTIVE To establish an origin classification model of Angelica dahurica with unbalanced sample size based on near-infrared spectroscopy combined with data-enhanced convolutional neural network(CNN) algorithm. METHODS In this study, 95 samples of Angelica dahurica were collected, and near-infrared spectroscopy was performed on different samples within the wavelength range of 12 500 to 4 000 cm-1. The near-infrared spectroscopy dataset of Angelica dahurica used in this study faces issues such as small sample size and uneven distribution of sample origins. To enhance the generalizability of the model, three data augmentation algorithms were proposed, including spectral shifting, spectral noise addition, and spectral combination. Additionally, to address the problem of sample imbalance, Focal Loss was used as the loss function for training the CNN model. RESULTS The three data enhancement algorithms were applied to the SVM model. Adding Gaussian noise with a signal-to-noise ratio of 20 to the spectral data had the best effect, which could increase the accuracy of the model to 84.2%. Aiming at the problem of sample imbalance, Focal Loss is used as the loss function to train the CNN model, and the accuracy rate can reach 94.7%. CONCLUSION The infrared spectroscopy combined with data-enhanced CNN algorithm provides a rapid and non-destructive detection method and reliable data analysis method for the origin traceability of Radix Angelicae Dahuricae, and provides a new method reference for the origin traceability of Chinese medicinal materials.
【Key words】 near infrared spectroscopy; Angelica dahurica; origin traceability; data enhancement; convolutional neural network;
- 【文献出处】 中国药学杂志 ,Chinese Pharmaceutical Journal , 编辑部邮箱 ,2024年21期
- 【分类号】R284.1
- 【下载频次】33