节点文献
一种编辑距离算法及其在网页搜索中的应用
Modified Edit Distance Algorithm and Its Application in Web Search
【摘要】 针对传统方法不能很好地处理网页中简短域与用户查询之间的相关性排序问题,提出一种基于改进的编辑距离排序算法.将以词为单位的用户查询和简短网页域通过匹配编码转化为2个字符串,再利用改进的编辑距离计算2个字符串之间的相似性.由于在用户查询与待比较的简短网页域之间引入了查询词分布的位置、顺序和距离等,以及含有查询词修饰关系的重要信息,所以编码字符串之间的相似程度可以衡量对应的查询与简短网页域之间的相关性.经大规模真实搜索引擎实验表明,该算法较之传统的相关性排序算法,可以显著地提高网页搜索中的简短网页域相关性排序性能,尤其适用于简短域与用户查询之间的相关性比较.
【Abstract】 Focusing on the problem that the traditional methods cannot perform well on the short web page fields,a modified edit distance algorithm,referred as MED,is proposed.The proposed algorithm encodes the user query and the short web fields into two strings according to the word match,and then the MED is used to calculate the similarity between the two strings.Because the ’position’,’order’,and ’distance’ information that is very important in expressing the modification relationship between the query words are considered,the similarity between the encoding strings can be used to measure the relevance between the corresponding query and short field.Experimental results on large scale search engine data show that the proposed algorithm can significantly outperform the traditional algorithms for relevance ranking on short web fields,especially for very short fields.
【Key words】 web search; relevance ranking; edit distance; string match;
- 【文献出处】 西安交通大学学报 ,Journal of Xi’an Jiaotong University , 编辑部邮箱 ,2008年12期
- 【分类号】TP391.41
- 【被引频次】31
- 【下载频次】335