节点文献
基于数据挖掘的红色籽用西瓜经济性状遗传规律的研究
Research on Hereditary Regularity of Economic Characters for Red Seed-using Watermelon Based on Data Mining
【作者】 樊建峰;
【导师】 李绍稳;
【作者基本信息】 安徽农业大学 , 果树学, 2007, 硕士
【摘要】 随着数字农业的发展,人们获取了大量的农业种质资源等数据,巨量的数据背后蕴涵着丰富的遗传规律,但迄今为止却没有很好地挖掘这些数据背后隐藏的知识,愈来愈呈现出“数据爆炸而知识贫乏”的现象。如何从这海量的数据中提取出有用的信息,准确掌握亲本资源和自交系的遗传特征,分析其亲缘关系,预测发展趋势,从而有效地进行自交系的筛选和亲本选配,指导作物遗传育种及农业生产管理已成为育种者关注的问题。数据挖掘技术(Data Mining)为从海量数据中提取人们感兴趣的知识提供了一种有效的途径。本研究在安徽省自然科学基金“籽用西瓜种质资源性状遗传规律及优势组合的研究”项目的支持下,通过大量田间试验与室内分析,获得了连续9代红色籽用西瓜(Red Seed-using Watermelon)自交系主要经济性状数据。以此数据集为处理对象,利用SPSS统计分析系统及其数据挖掘平台Clementine中提供的算法模块,探讨了决策树、聚类、粗糙集、典型相关等数据挖掘技术在红色籽用西瓜数量遗传育种上的应用,揭示了主要经济性状间的相互关系及自交系的遗传规律,为品种选育过程中自交系纯化、亲本选配和优势杂交组合培育提供理论依据。本文研究的主要内容及成果如下:阐述了数据挖掘的理论、方法及发展现状,论述了SPSS统计分析系统及其数据挖掘平台Clementine的功能和特点;分析农业种质资源等数据的复杂性和多样性,探讨了数据挖掘技术在作物遗传分析中应用的可行性和重要性。针对品种选育中经济性状的相对重要性,基于通径分析技术,挖掘了红色籽用西瓜主要经济性状间的相互关系,进一步阐明了单瓜种子重与千粒重、种子数、种子体积等性状的直接关系和间接关系。针对红籽瓜经济性状的分类和各因子间的关系问题,通过主成分与典型相关分析方法,将红籽瓜经济性状分为产量因子、粒重因子、生长因子、产籽因子和品质因子。研究各因子内部性状指标间的关系,表明性状指标间呈显著或极显著正相关。基于决策树算法,对红籽瓜自交系主要经济性状的数据集进行知识发现,探索红籽瓜经济性状之间的相互制约关系,建立了单瓜种子重决策树模型,据此模型可对红籽瓜自交系的各经济性状进行定量分析研究,为育种者进行优良自交系筛选和亲本选配提供决策支持。采用类平均聚类法(UPGMA,Unweight Pair-group Methodusing the AverageApproach)对红籽瓜各代自交系进行遗传分析,揭示了红籽瓜各代自交系的平均遗传距离变异呈下降趋势,并且共表型相关系数达到显著和极显著水平,表明聚类结果具有较高的可靠性。
【Abstract】 With the rapid development of digital agriculture,a great deal of agricultural germplasm resource data is acquired.There is abundance of hereditary regularity in these huge data,but little technique can find out the knowledge in these data so far.That means "too much data,too little knowledge".At present time,breeding experts are interested in how to extract useful information and knowledge from the data,which can instruct agricultural management and plant breeding and help us to know genetic character of inbred line and parent resource,to analyze relative,to predict trend and successfully choose inbreeds and parents.Data mining provides an effective method for extracting interesting knowledge in huge data.The research is in the support of natural science foundation of Anhui province Research on germplasm resources characteristics hereditary regularity and dominative combination for Red-using Watermelon.In this paper,the red seed-using watermelon inbred line dataset of 9 continual generations was acquired by abundant experiments and analysis in the field and lab.SPSS software and decision tree,cluster,rough set, canocorrelation etc technologies in Clementine are applied in quantitative inheritance breeding of red seed-using watermelon.The research indicated genetic law of inbred line and the relationships among economic character,which provided theory basis for the inbred line purge,parent choosing and predominance hybrid combination cultivating in process of breed cultivation.The main content and production on our research as followings:The paper introduced the conception,method,status and development of data mining dissertated,the characters and functions of Clementine and SPSS.It also analyzed complexity and diversity of agricultural germplasm resources data.The feasibility and importance of data mining in crop genetic analysis has been researched in this paper.According to the importance of economic characters in breeding cultivation,the relationships between main economic traits for red seed-using watermelon were mined. Path analysis technique was used to interpret the direct and indirect relationship between single-fruit seed-weight and kilo-seed weight,seed weight,seed volume and other traits.According to the classification and relationship between factors of economic characters for red seed-using watermelon,principal component and canocorrelation analysis technique were used to divide economic characters of red seed-using watermelon into product factor,grain weight factor,growth factor,produce seed factor and quality factor.Researches on the relationship between characters within factor indicated significant positive correlation between characters.Decision tree algorithm was used to discovery the knowledge and relationship between main economic characters in dataset of red seed-using watermelon.And the model of decision tree for single-fruit seed-weight was constructed,which can help to analyze economic character of red seed-using watermelon inbred line and provide decision-making for the breed worker when choosing inbred line and parents.Genetic analyzing for red seed-using watermelon inbred line with UPGMA method, this paper find out average genetic distance of red seed-using watermelon inbred line has a down trend and cophenetic correlation coefficient are importance significant, that means the result is reliable.
【Key words】 Data Mining; Red Seed-using Watermelon; Economic Character; Hereditary Regularity;
- 【网络出版投稿人】 安徽农业大学 【网络出版年期】2008年 09期
- 【分类号】S651
- 【被引频次】7
- 【下载频次】225