节点文献

数据挖掘技术在农业信息服务中的应用研究

A Research on Data Mining in Agricultural Information Service

【作者】 龚健

【导师】 黄世祥;

【作者基本信息】 安徽农业大学 , 农业经济管理, 2010, 硕士

【摘要】 我国是一个农业大国,由于农业的基础薄弱,农业信息化水平始终落后于西方发达国家。在我国加入WTO组织之后,农业信息化建设进入了快速发展阶段,农业类网站如雨后春笋般涌现出来。涉农网站中存有大量的农业数据,并且数据每天仍在以指数级的速度增长。但是这些海量数据只是原始信息,包含大量模糊的,不完整的,带有噪声的信息,并不能作为知识被人们直接利用。农业方面的信息很多,有宏观信息、科技信息、市场信息和科学文化信息等。这些信息在农业领域发挥着重要作用。农业信息按照需求和特点,又可以分为季节性信息、地域性信息、综合性信息、时效性信息、多层次信息及创新性信息等。信息的传递依赖于信息载体,农业信息的载体很多。现阶段,我国的农业信息传播的主要载体,有电视、广播、报刊、图书和互联网等大众传媒。每一种载体都有自己的特点,不同的载体相互补充。与广播,电视,电话等传统的农业信息载体相比较,互联网是正在蓬勃发展的新的信息传播载体。目前,农业网站是农业信息的重要载体,通过农业网站传递着海量的农业信息数据,是其他农业信息载体所不能比的。同时,农业网站是开展农业电子商务的重要手段之一,也是农民及相关从业者交流和学习的平台。在农业网站中存在大量的行情土情、作物信息、经验技术和政策法规等原始信息。通过应用数据挖掘技术来解决“信息爆炸,但知识贫乏”的问题,提高农业信息的利用率。Web数据挖掘就是使用数据挖掘技术对Web网页中的数据进行自动抽取、处理和知识发现的过程,根据挖掘对象的不同,Web数据挖掘可以分为Web内容挖掘、Web结构挖掘和Web用户访问模式挖掘三类。相对于传统的数据库来说,Web上的数据不是完全结构化的数据,且不同Web站点的信息组织各不相同。因此,面向Web的数据挖掘技术首先要解决异构数据源问题和半结构化的数据问题。本文把农业网站作为挖掘的数据源,为了更有效地管理和利用通过挖掘得到的农业数据,尝试构建了一种星型结构的农业数据仓库模型,并给出了农业网站数据挖掘的模型系统。农业网站的数据自动抽取技术也是本文研究的难点之一,本文通过分析农业网站Web页面的结构特点,结合HTTP特征介绍了数据抽取的原理,采用正则表达式来设计抽取算法,并对合肥周谷堆批发市场蔬菜价格行情网页的数据进行了抽取实验,实现了对蔬菜价格数据的自动批量抽取,并对抽取到的数据结果,运用时间序列短期预测模型进行了预测分析。

【Abstract】 China is a large agricultural nation, but agriculture as the foundation of the weak, the agricultural informatization level always lags behind in the western developed countries. After China’s entry into WTO, the agricultural informatization construction stage of rapid development, the agricultural sites have sprung up in the website of agriculture, agricultural data, are still in the day and data with exponential growth. But these mass data only original information, contains a number of fuzzy, incomplete, with the noise of the information and knowledge is not as people using directly.There are many agricultural information, macro information, technical information, market information and scientific and cultural information. The information in the agricultural field plays a very important role. According to the characteristics and requirements of agricultural information, can divide again to seasonal information, regional information, comprehensive information, timeliness information, multi-level information and innovative information, etc. The information relay dependent on information carrier, there are many agricultural information carrier. At present, main carriers of agricultural information transmission, television, radio, newspapers, books and Internet media in China. Each kind of carrier has its own characteristics, different carrier complement each other. Compared with the traditional agriculture information carrier, as radio, television, telephone etc. The Internet is developing the new information communication carrier. Currently, the agriculture website is an important carrier of agricultural information. Transfer by agricultural sites, the magnitude of the agricultural information data, and other agricultural information carrier can not ratio. Meanwhile, agriculture website is one of the important means of agriculture e-commerce, and exchange and learning platform.There are a lot of the market information, and soil and crop experience technology policy and regulations as the original message in agricultural websites.Through the application of data mining technology to solve the "information explosion, but little knowledge", to improve the efficiency of agricultural information. Web data mining is the data processing and automatic extraction and knowledge discovery process on web pages. According to different object, the web data mining can be divided into the Web content, structure and web users access pattern mining. Compared with the traditional database, the data on the Web is completely different structured data, and the websites of the information organization each are not identical. Therefore, the data mining technology for the websites to tackle heterogeneous data and semi-structured data. This paper based on the data of agricultural websites as mining. In order to effectively manage and use the data obtained by mining, to construct a star of the agricultural structure of data warehouse, and agricultural websites of data mining model system. The automatic extraction technology on agriculture websites is one of the difficulties in study. The paper analyzes structure characteristics of the agricultural web page,website HTTP characteristics, introduced the principle and data extraction using regular expressions for design, and to extract the price of vegetables market of HeFei Zhougudui piles up wholesale market. The price of vegetables has realized automatic batch data extraction. And the extraction of data, using that time series prediction model for the short-term forecast analysis.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络