节点文献

基于垂直搜索引擎的农业信息推荐关键技术研究

Research on Key Technology of Agricultural Information Recommendation Based on Vertical Search Engine

【作者】 李娜

【导师】 朴在林;

【作者基本信息】 沈阳农业大学 , 农业信息化技术, 2016, 博士

【摘要】 互联网已经成为人们获取信息资源的重要渠道,面对浩如烟海的互联网信息,个性化的信息推荐是未来信息服务的发展方向。另一方面,各级政府和部门投入大量资源建立了涵盖农业科技、畜牧、水产、农垦、农机等领域的信息平台,由于农村地区信息基础建设的缺乏,广大农业生产经营者在信息分析、信息处理等方面能力的匮乏,导致这些对农业生产具有重要指导意义的信息却无法有针对性的传播到农业生产经营者手中。涉农人员仅依靠大众媒体、农业信息机构和口头传播的方式,很难获取到个性化的农业信息服务。该研究的目标是将散布在互联网上的大量农业相关信息进行采集、分析和处理,准确把握涉农用户的意愿和需求,主动将需求信息精准的传播到涉农用户手中,提高农业信息的在农业生产过程中的指导作用和社会经济效益。现有的推荐系统在农业领域的应用主要存在三个问题,一是农业领域信息专注度不够;二是涉农用户兴趣过拟合和冷启动问题;三是现有的信息推荐系统未根据农业的属性特色实现涉农用户个性化的分类和推荐问题。针对以上问题,本研究对农业信息推荐系统的数据源、用户兴趣模型、推荐算法三个重要部件的关键技术进行了深入的研究。主要包括:农业信息采集与分析、用户兴趣模型构建、推荐模型构建和推荐算法改进、软件自主决策机制等关键技术,为个性化农业信息推荐服务的实现提供技术支撑。论文的主要研究工作概括如下:1.通过对搜索引擎功能和搜索效果的比较研究,设计了基于Nutch的农业垂直搜索引擎,实现互联网农业信息的采集、过滤和分析,构建了农业信息推荐资源库。针对垂直搜索在农业领域的应用特点和面临的不足,采用字标注分词技术和参考农业专业术语语料库识别新词的方法改进了搜索引擎的分词模块,实验表明,该分词模块的分词效果与其他分词系统相比,对农业领域文本信息的分词准确度有所提高,结合对种子URL质量的控制,增强农业相关网页的抓取精度和深度。2.针对农业网络资源存在的空间属性表示不统一、显性表达缺失的问题,研究农业领域空间属性信息提取方法,提出了一种借助行政区划本体库对农业领域空间属性的辨别和抽取方法,设计了显性空间属性抽取算法和基于通用搜索引擎的隐性空间属性抽取算法,采用卡方检验的方法解决了隐性空间属性抽取方法中返回空间属性不唯一的问题。两种抽取算法可有效标注网页信息中的空间属性信息,实现用户和项目地域特征的提取,为涉农用户兴趣模型中地域标签的建立和基于地域特征的个性化农业信息推荐模式的实现提供必要的信息。3.采用问卷调查的方法对涉农人员的农业信息需求情况和获取信息方式进行了研究,针对现有的农业信息服务方式无法实现个性化服务的现状,构建了全面反映涉农用户兴趣的模型ATBUIM。选定涉农用户的显式和隐式信息来源,研究了用户背景、浏览行为对用户兴趣度的估算方法和权重,构建了基于互信息和农业领域资源分类标签的贝叶斯网络涉农用户兴趣模型,将农业领域标签间的互信息作为节点条件概率,采用结构学习的方法实现模型的更新和优化。该模型将用户兴趣信息进行加权处理,体现不同类型信息的在模型构建中的比重,更加全面和准确的反映涉农用户的兴趣领域,为实现精准、有效的农业信息推荐算法奠定基础。4.分析和比较了三种推荐算法,针对传统推荐算法存在的冷启动和数据稀疏问题,提出了解决方法和策略,设计了高效的组合推荐算法模型。提出添加特征标签改进算法相似度的方法,解决了传统基于内容推荐算法中新用户无法推荐的问题。针对协同过滤算法中存在的数据稀疏性问题,提出了结合涉农用户的评分、特征因素与农业项目的评分、特征因素的协同过滤算法,算法中目标用户和目标项目的预测评分均为最近邻居综合了评分相似度和特征相似度的结果,加权结合两项预测评分获得最终推荐结果,经实验表明,改进的协同过滤算法在相同数据稀疏度的环境下平均绝对值偏差更小,推荐精度表现更好。针对单推荐算法存在的不足,基于泛函网络提出了一种组合推荐算法,构建了组合推荐模型。实验表明,组合推荐算法计算用户对项目的预测评分更接近用户对项目的实际评分。5.针对信息推荐服务模式在新的网络环境下能够主动调整自身结构、状态和行为的服务需求,提出了一种面向农业领域的软件自主决策机制。基于本体将农业网络信息中的领域知识、消息和服务信息等信息构建模型,设计了面向农业领域知识的思维决策模型AKDM,将环境信息转换成信念、愿望和意图集合,并利用信念-愿望-意图之间的决策推理关系指导Agent完成农业信息推荐行为。分析和实验表明,该机制在农业领域知识和规则的约束下,实现了自主思维决策过程,完成了农业信息的推荐。综上所述,论文对互联网农业信息的有效搜索、涉农用户兴趣模型构建、农业信息精准推荐算法和软件自主决策机制做出的研究,可以为农业领域信息个性化推荐服务的实现提供技术支撑。

【Abstract】 The Internet has become an important channel for obtaining information resources in our daily life. In face of massive internet information, personalized information recommended services will become the direction of development in the future information services. In addition, governments and departments at different levels has invested substantial resources to establish information platform for agriculture technology, animal husbandry, fishery, agricultural machinery etc. However, due to the lack of information infrastructure in rural areas and the incompetence of agricultural producers in information analysis and processing, the information that is important for agricultural production is not able to be delivered to the agricultural producers and marketers. It is difficult to get a personalized information services for agriculture related people only by mass media, word-of-mouth communication of agricultural information organizations. The objective of this study is to collect, analyze and process the large amount of agriculture related information distributed on the internet, comprehend the will and needs of the users, and accurately deliver the information needed to the agriculture related people, thus improve the function of guidance of agricultural information in the process of agricultural production and increase social and economic benefits.There are three main problems in the application of the existing recommendation system in the field of agriculture. Firstly, there is a lack of information concentration in the field of agriculture. Secondly, there are problems of over fitting and cold start of agricultural users’ interests. Thirdly, personalized classification and recommendation based on the characteristics of agriculture are not realized by the existing information recommender system. Therefore, this paper studies forward the key technologies of data source, user interest model and recommendation algorithm in agricultural information recommender system. The key technologies including agricultural information collection and analysis, model construction of users’interests, construction of recommended model, improvement of recommended calculation, independent decision-making mechanism of software, and provide technical support to the personalized agricultural information recomended service.The main research work of the dissertation is summarized as follows:1. By the comparison of the functions of search engines and search results, agricultural vertical search engine based on nutch is designed to collect, screen and analysis agricultural information in the Internet and constructs the agriculture information recomended database. In view of the application characteristics and disadvantage of vertical search in agriculture, a new word recognition method based on word segmentation technology and reference agricultural terminology corpus is proposed to improve the search engine segmentation module. Experimental results show that compared with the effect of other word segmentation systems, this segmentation module improves segmentation accuracy of text information in the field of agriculture. And by combining the control of the URL quality of seed, this segmentation further increases the accuracy and depth of the agricultural vertical search engine on the agricultural related web pages.2. In regards to the nonuniform representation and inexplicit expression of the spatial attributes of agricultural network resources, a method to identify and extract the spatial attributes of agriculture by using the ontology base of administrative division is proposed. In view of the different types of attribute information, the dominant spatial attribute extraction algorithm and implicit spatial attribute extraction algorithm based on universal search engine is designed, which can effectively label the spatial attribute information of the text information and extract the regional features of users and projects. It provides effective information for the construction of regional label in the interest model of agriculture related users, and provides the basis for the realization of personalized agricultural information recomended mode based on regional characteristics.3. Questionnaire survey was used to study the demand for agricultural information and the ways to access information of agriculture related people. In view of the current status that the existing agricultural information service mode is not able to realize personalized service, an ATBUIM model which can fully reflect the interests of agricultural users is constructed. By selecting the sources of explicit and implicit information of agricultural users and studying the estimation methods and weights of user’s background and browsing behavior to the user’s interest degree, a Bayesian network user interest model is constructed based on mutual information and labeling of agricultural resources classification, which uses the mutual information of the labeling in the agriculture as the node’s conditional probability to update and optimize the model. In this model, the multi channels to obtain information of user’s interests were weighted, which reflects the proportion of different types of information in the model construction and reflects fields that attract the user’s most attention more comprehensively and accurately, laying foundation for the realization of accurate and effective agricultural information recommendation algorithm.4. The three kinds of recommendation algorithms are analyzed and compared. In view of the cold start and sparse data of traditional recommendation algorithm, a new method and ways to solve the problem is proposed, an efficient combined recommendation algorithm is designed. A new method to improve the calculating similarity by adding feature label is developed, which can realize the information recommendation of the new users. For the problem of data sparseness in collaborative filtering algorithm, the paper proposes a new collaborative filtering algorithm which combines the score and characteristic factors of agriculture related users with the score and characteristic factors of agricultural projects. In the algorithm, the predicted score of target user and target project are both the results of the scores of the nearest neighbor combined with the calculation of the user’s score similarity and the user’s characteristic similarity, then the final recommendation results by weighting the above two prediction scores can be obtained. The experiments show that under the same data scarcity, the improved collaborative filtering algorithm has a much smaller average absolute deviation and better recommendation accuracy. Aimed at the shortage of the existing single recommendation algorithm, a combined recommendation algorithm is proposed based on functional network, and a combined recommendation model is constructed. Experiments show that the predictive score of users on the project calculated by the combined recommendation algorithm is more close to the actual score of users on the project.5. In view of the need for autonomous adjustment of its structure, status and behavior of the information push service model in the environment of new network, an agriculture-oriented independent decision-making mechanism of software is developed. This paper constructs a model with knowledge, news and service information in the agricultural network, designs the AKDM thinking decision model targeting the agricultural knowledge, which turns environmental information into an aggregation with belief, desire and intention, and take advantage of decision inference between the above three factors to guide Agent to complete the agricultural information push behavior. Analysis and experiments show that under the restriction of agricultural knowledge and rules, the mechanism can realize the autonomous decision-making and the recommendation of agricultural information.In summary, the research of the effective search of agricultural information on Internet, the construction of user interest model, the personalized precise recommendation algorithm for agricultural information, the mechanism of independent decision-making of software in this paper can provide technical support to the realization of personalized information push service in the field of agriculture.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络