节点文献

数字电视中非结构化信息管理研究

Research on Unstructured Information Management of Digital TV

【作者】 赵亮

【导师】 陈英;

【作者基本信息】 上海交通大学 , 计算机软件与理论, 2007, 硕士

【摘要】 随着数字电视的普及,数字电视节目越来越多,因此如何有效地在大量电视节目中快速找到用户喜欢看的节目成为一个紧迫的研究任务,一种有效的办法是对文本进行索引。通常的结构化数据库和全文索引都不适合对数字电视节目描述文本作索引。传统的数据库不适合管理非结构化文本,而通常的全文索引也不适合面向嵌入式的媒体信息管理的需要。因为全文索引中常用的倒排索引模型需要对文本进行分词,而日文分词工具(Chasen)空间开销较大(23M),而且无法抽取表征文本含义的词,这样势必造成对很多无意义的词作索引,开销很大。另外,由于数字电视节目更新很快,对其管理需要具有较好的动态性能,而通常国内外对索引的动态更新研究还较少。本文对文本检索中的检索模型、分词工具和索引的动态更新策略进行了研究并且提出了一种具有一定特征提取功能并适合嵌入式的日文分词工具和一种具有良好动态性能的复合更新策略。本文的主要研究内容和成果包括如下:1.比较研究了文本索引中常用标引技术和检索模型。

【Abstract】 With the popularity of digital television, the number of TV programs grows significantly, so how to quickly find the programs that the user likes becomes an urgent task. An efficient way is using an index.While both the usually used database and the full-text indexing is not suitable for this. First, Traditional database are not suitable for management those unstructured text. Second, full-text indexing is not suitable for our digital TV media database which will be used in an embedded system. For example, the inverted index model which has good performance needs to segment the text into words. The most commonly used Japanese segmentation tool-Chasen has too much space cost (more than 23M), and can not extract word that represent the text’s meaning, so all the words in the text will be indexed, this makes much more space cost. In addition, digital TV program database requires better dynamic performance, but researches on how to improve dynamic performance are only a little.This structure of this thesis is as follow: First, we discuss the index model and the factors which affect dynamic performance. Second, we present a Japanese segmentation tool with some feature extraction functions which is suitable for using in an embedded system. Third, we present an improved hybrid index update strategy for inverted index model, theoretical analysis show that shows it has a better dynamic performance. The main content and results of this thesis are as follows:1. Compare and analysis the virtue and shortcomings of the index models usually used2. Present a Japanese segmentation tool with some feature extraction

  • 【分类号】TN949.197
  • 【下载频次】43
节点文献中: 

本文链接的文献网络图示:

本文的引文网络