节点文献
基于DTW和改进匈牙利算法的句子语义相似度研究
Research on Sentence Semantic Similarity Based on DTW and Improved Hungarian Algorithm
【摘要】 句子语义相似度的研究在自然语言处理等领域发挥着重要的作用。针对现有汉语句子相似度研究中存在的语义特征难以分析以及语序影响的问题,提出了一种基于DTW和匈牙利算法相结合的语义句子相似度处理模型。模型首先使用Word2vec深度学习模型训练百度新闻语料,得到200维的包含语义特征的词向量词典,并建立词向量空间,根据词向量组成的多维空间曲线,通过计算句子曲线之间相互转换的距离和复杂度来表示句子语义相似度,模型采用了DTW矩阵和改进的匈牙利算法,并对DTW矩阵做最短路径规划。实验结果表明,与现有的夹角余弦相似度等句子相似度计算方法相比,该方法在语序较乱但语义相近的情况下也能得到较为准确的相似度结果值。
【Abstract】 The study of semantic similarity of sentences plays an important role in the field of natural language processing.Aiming at the problem that the existing semantic features of Chinese sentence similarity are difficult to analyze and the influence of word order,a semantic sentence similarity processing model based on DTW and Hungarian algorithm is proposed. The model first uses the Word2 vec deep learning model to train Baidu news corpus,obtains a 200-dimensional word vector dictionary containing semantic features,and establishes a word vector space. According to the multi-dimensional space curve composed of word vectors,the distance between the sentence curves is calculated. Complexity to represent the semantic similarity of sentences,the model uses the DTW matrix and the improved Hungarian algorithm,and the shortest path planning for the DTW matrix. The experimental results show that compared with the existing sentence similarity calculation methods such as the angle cosine similarity,the method can obtain more accurate similarity result values when the word order is chaotic but the semantics are similar.
【Key words】 word vector; DTW; Hungarian algorithm; semantic similarity; semantic feature;
- 【文献出处】 计算机与数字工程 ,Computer & Digital Engineering , 编辑部邮箱 ,2021年02期
- 【分类号】TP391.1
- 【被引频次】1
- 【下载频次】139