节点文献

基于数据融合和机器学习发掘细菌素并判定类别的研究

The Bacteriocin’s mining and classification model based on data fusion and machine learning

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 李柏林郭世晴岳嘉豪韩建春任海斌陈昊李航王玉堂

【Author】 Li Bolin;Guo Shiqing;Yue Jiahao;Han Jianchun;Ren Haibin;Chen Hao;Li Hang;Wang Yutang;College of Life Science,Northeast Agriculture University;College of Food Science,Northeast Agriculture University;Key Laboratory of Dairy Science,Ministry of Education,Northeast Agriculture University;

【机构】 东北农业大学生命学院东北农业大学食品学院乳品科学教育部重点实验室

【摘要】 细菌素在食品、制药工业中,具有重要研究及应用价值。传统寻找、筛选细菌素的方法费时费力。根据多肽的结构决定其性质的假设,利用数学模型能够高效的发掘细菌素。本研究建立细菌素信息数据库后,将细菌素蛋白质三级结构与氨基酸序列转化为空间结构与理化性质相关的数学描述符,并通过数据融合技术对经过筛选后的描述符进行整合。利用机器学习算法建立起抗菌活性与本文构建的描述符之间的数学模型,比较不同算法得到的结果。其中,利用随机森林算法建立起的细菌素挖掘模型识别效果最好,准确率为0.9787,接收者操作特征曲线的AUC值最高,为0.992。利用k最近邻算法建立起的细菌素类别判定模型分类效果最好,准确率为0.9000。通过本研究建立的模型,从功能未知的蛋白质中筛选出8种可能的细菌素并判定了类别,其中2种具有抗菌功能。本研究建立了高效的细菌素发掘模型,并对进一步探索特定细菌素的作用机制指明了方向。

【Abstract】 Bacteriocins have important research and application value in the food and pharmaceutical industries.The traditional method of founding and screening bacteriocins is time-consuming and laborious.According to the presumption that the structure of the polypeptide determines its properties,the use of mathematical models can efficiently discover bacteriocins.After the establishment of the bacteriocin information database in this study,the tertiary structure and amino acid sequence of the bacteriocin protein were transformed into mathematical descriptors related to spatial structure and physical and chemical properties,and the selected descriptors were integrated through data fusion technology.Machine learning algorithms was used to establish a mathematical model between the antibacterial activity and the descriptor constructed in this article,and compared the results obtained by different algorithms.Among them,the bacteriocin mining model established by the random forest algorithm had the best recognition effect,with an accuracy rate of 0.9787,and the AUC value of the receiver operating characteristic curve was the highest,which was 0.992.The classification effect of the bacteriocin category determination model established by the k nearest neighbor algorithm was the best,with an accuracy of 0.9000.Through the model established in this study,8 possible bacteriocins were screened out of proteins with unknown functions and the categories were determined,of which 2 had antibacterial functions.This study established an efficient bacteriocin discovery model,and pointed out the direction for further exploration of the corresponding bacteriocin mechanism.

  • 【会议录名称】 中国食品科学技术学会第十七届年会摘要集
  • 【会议名称】中国食品科学技术学会第十七届年会
  • 【会议时间】2020-10-28
  • 【会议地点】中国陕西西安
  • 【分类号】TS201.2;TP181
  • 【主办单位】中国食品科学技术学会
节点文献中: