节点文献
布尔向量数据模式分类关键问题及中医诊断量表研制
Key Issues on Pattern Classification on of Boolean Vector Data and TCM Diagnosis Scale Development
【作者】 王振华;
【导师】 侯忠生;
【作者基本信息】 北京交通大学 , 系统分析与集成, 2009, 博士
【摘要】 论文提出和研究了布尔向量数据模式分类中的关键问题,并将研究结果应用于中医学诊断量表研制当中。对布尔向量模式分类中的相似系数、降维以及权重这三个关键问题进行了系统地研究。为后续进一步研究针对布尔向量数据的模式分类问题完成了初步性的工作。论文主要工作和创新点总结如下:一、对布尔向量的相似系数及其性质进行了研究,并在此基础上对目前常见的相似系数的性质进行了对比和分析,针对一些模式分类问题中的相似系数选择问题,提出了多参数相似系数族及其优化方法。通过真实数据的实验结果,验证了本文提出的相似系数族在布尔向量模式分类问题的相似系数优化中的有效性。二、对于布尔向量数据的降维从特征提取和特征选择两个方面进行了研究:其一,根据布尔向量数据的特点,提出和研究了基于分片求和的特征提取降维算法,通过理论分析和真实数据实验,验证了本文方法的有效性;其二,针对两类互斥问题、多类非互斥问题,分别提出了基于布尔向量相似系数的过滤式和混合式特征选择降维算法,通过真实数据的实验结果,验证了这些算法在解决这两类问题时的有效性。三、在对现有特征权重方法研究的基础上,针对k-NN特征权重算法计算量大速度慢的缺点,提出了改进算法;以及针对医学诊断试验,提出了基于Fisher线性判别的改进阈值确定方法;通过理论分析和真实数据实验验证,分别证明了改进方法的有效性。四、将本文提出的算法,系统地应用到中医中风病证候诊断量表的研制当中,主要解决了量表研制中的条目筛选和条目权重问题,为基于布尔向量数据的中医学量表研制提供了新的方法和思路。
【Abstract】 Some key issues of pattern classification for Boolean vector data are put forward and studied in the dissertation, and research results are applied into TCM diagnosis scale development. These issues include similarity measure, dimension reduction and feature weighting. This research would lay a preliminary work for the further study on the pattern classification for Boolean vector data. The main works and key innovations are summarized as the following:1. Similarity coefficients (SC), SC families and their properties are summarized, based on which some important properties of those SC are analyzed. And multi-parameters SC families and its optimization method are proposed for SC optimization. Experiment of actual data show that the proposed SC families are efficient in the SC optimization for Boolean vector classification.2. Dimension reduction methods for Boolean vector data are studied from such two aspects as feature extraction and feature selection. Fisrt, considering about feature extraction, a new method is put forward and studied which based on piecewise summing. Theoretical analysis and experiment results illustrate the efficiency of the method. Second, considering about feature selection, the filter and hybrid feature selection algorithms are proposed which based on SC in order to solve such different problems as mutually exclusive biclassification, non-mutually exclusive multi-classification. Experiment results illustrate that the efficiencies of these methods.3. Based on the review of the existing feature weight methods, some improved methods for the k-NN feature weight algorithm are proposed in order to solve the problem that the speed of the traditional k-NN is very slow. And considering about medicine diagnosis test, an improved method is proposed to solve the Fisher linear descriminant algorithm for the calculated of the algorithm’s threshold. Theoretical analysis and experiment results illustrate that the above two improved methods are efficient than the corresponding traditional methods.4. Methods proposed in the dissertation are applied into developing the TCM (Traditional Chinese Medicine) diagnosis scale for stroke syndromes, which are used to resolve the item selection and item weight problems in scale development.