节点文献
基于深度学习和主题模型的短文本分类方法
A Short Text Classification Approach Based on Deep Learning and Topic Model
【摘要】 为了解决短文本的语义稀疏和特征信息难以提取问题,本文提出了一种基于深度学习的短文本分类方法.首先通过增加自注意机制的双向BiLSTM通道获取短文本特征词向量,引入外部CN-DBpedia知识库KBs来深度挖掘短文本语义,解决语义稀疏问题.其次通过BTM主题模型在短文本数据集上提取主体信息,为了得到准确的词向量拼接引入了超参数δ.最终将所得的特征词向量以及知识向量运用语义余弦相似度计算并拼接向量,将得到的拼接结果与主题信息通过Softmax分类器中进行分类.在中国微博情感分析数据集、产品评价数据集、中文新闻标题数据集、Sogou新闻数据集上进行实验.与TextCNN、TextRNN、TextRNN_Att、BiLSTM-MP、KPCNN算法相比,分类准确性有一定提高.
【Abstract】 In order to solve the problem of semantic sparsity and feature extraction of short text,a short text classification method based on deep learning is proposed. Firstly, the feature word vector of short text is obtained by adding the Bidirectional BiLSTM channel of self attention mechanism,and the external CN-DBpedia knowledge base KBs is introduced to deeply mine the semantics of short text to solve the problem of semantic sparsity. Secondly, the subject information is extracted from the short text dataset by BTM topic model. In order to get the accurate word vector splicing,the δ super parameters are introduced. Finally,the feature word vector and knowledge vector are calculated by using semantic cosine similarity,and the splicing results and topic information are classified by Softmax classifier. The experiments are conducted on Chinese microblog sentiment analysis data set,product evaluation data set,Chinese News Headlines data set and Sogou news data set. Compared to TextCNN、TextRNN、TextRNN_Att、BiLSTM-MP、KPCNN, the classification accuracy is improved.
【Key words】 short text classification; attention mechanism; external knowledge base; BTM theme model; semantic cosine similarity;
- 【文献出处】 辽宁大学学报(自然科学版) ,Journal of Liaoning University(Natural Science Edition) , 编辑部邮箱 ,2022年02期
- 【分类号】TP391.1;TP18
- 【下载频次】169