节点文献
一种自适应序列长度的RNA二级结构深度预测方法
Adaptive Sequence Length Deep Method for Predicting RNA Secondary Structure
【摘要】 RNA二级结构预测是结构生物信息学中的一个重要问题.带假结的RNA二级结构预测,由于复杂的假结结构,更是增加了预测的难度.传统的机器学习方法受限于学习模型的结构,输入特征数目必须固定.大部分方法将不同长度的序列统一截断后进行训练,这不仅导致有用信息丢失,而且并破坏了生物序列完整性.针对该问题提出了一种适应序列长度的深度递归神经网络模型,构造了序列长度自适应模块及训练算法,从而不需要截断.同时,由于实际样本比例不均衡,采用了动态加权方法进行改善.随后,在权威数据集RNA STRAND上与四种优秀方法进行了四组比较实验.实验结果表明,本方法的正确率和M atthew s相关系数比定长LSTM方法分别提高了1. 6%和3. 3%;比其它四种典型方法提高了13. 6%和14. 8%.
【Abstract】 RNA secondary structure prediction is an important issue in structural bioinformatics. The difficulty of RNA secondary structure prediction with pseudoknot is increased due to complicated structure of the pseudoknot. Traditional machine learning methods are restricted by the topologies of the models. The fixed shape of features make their input sequences truncated before training. It not only leads to the loss of valuable information but also destroys the integrity of biological sequence. To address this issue,an adaptive LSTM deep model which could automatically fit in with variation of sequence length was proposed,adaptive module and a new training algorithm was constructed. And dynamic weighting method is used to resolve the imbalance sample quantity. Subsequently,three comparative experiments were conducted with four excellent methods on the classical data set RNA STRAND. The experimental results showed that the accuracy and Matthews correlation coefficient of the method are 1. 6% and 3. 3% higher than the fixed length LSTM respectively,and higher than other four methods by 13. 6% and 14. 8% respectively.
【Key words】 RNA secondary structure prediction; recurrent neural network; dynamic weighting; pseudoknots; bases;
- 【文献出处】 小型微型计算机系统 ,Journal of Chinese Computer Systems , 编辑部邮箱 ,2019年08期
- 【分类号】TP183;Q522;Q811.4
- 【被引频次】1
- 【下载频次】169