节点文献

社会网络链接预测的相关分析研究

Research and Analysis on Link Prediction in Social Network

【作者】 杜鹏

【导师】 程维虎;

【作者基本信息】 北京工业大学 , 应用统计(专业学位), 2021, 硕士

【摘要】 社会网络是复杂网络的一种特定形式,具有自我组织、自我相似、自我相吸的特性。随着科技的不断进步,日常生活中所接触的信息越发丰富,从而提供了大量网络数据,也带动了人们对社会网络研究的需求。而链接预测就极大程度上满足这一需求,提供了一种实用性高的方式预测社会网络,从而进一步满足对社会网络的相应需求。链接预测是根据现有网络拓扑结构中节点与链路间的关系,预测新的链接。在理论方面,链接预测可以极大简化复杂网络演化的研究,根据预测结果即可预计演化方向。在实际应用中,链接预测可以在许多方面发挥作用,比如根据现有用户集推荐潜在好友,根据现有用户偏好进行个性化推荐等。因此链接预测作为大数据时代的一项重要工具,在数据挖掘以及社会实践上有极大的研究意义。本文针对YouTube用户数据集进行链接预测,首先全面介绍了现有的链接预测方法,并在Ada-Boost算法基础上提出改进算法。传统Ada-Boost算法是通过迭代弱分类器的方式,由后置弱分类器解决前置分类器错误,从而最终构成强分类器。但传统Ada-Boost算法无法识别数据噪音情况,若出现数据倾斜,则没有办法有效解决。因此本文提出一种改进Ada-Boost算法,将错误率转变为正负误差,并通过调整权重使得误差均值比处于一个固定区间,从而有效的解决数据倾斜所带来的模型准确率与稳定性减弱的问题。并通过研究确认,改进Ada-Boost算法在预测精准度与稳定性上,比传统Ada-Boost有较大提升。此外本文详细分析了 Ada-Boost算法中弱分类器的各种预测方法,主要分为基于邻近度预测方法与基于中心度预测方法两类。前者主要关注网络的局部信息,后者偏向基于全局信息,根据网络拓扑结构进行预测。

【Abstract】 Social network is a specific form of a complex network.It refers to a network with self-organization,self-similarity and self-attraction.With technology advances,people need to deal with much more information than before,which provides a large amount of network data and also stimulates people’s in-depth study of social networks.Link prediction meets this demand by providing a practical way to predict social networks.Link prediction is based on the current network topo construction to predict link.In theory,link prediction could simplify the complex network research,besides,evolution could also be identified based on the prediction.In practical applications,researcher could use user data set to make recommendation under link prediction.Therefore,as an important big data tool,link prediction has important scientific meaning in various fields.In this master’s dissertation,we mainly conduce link prediction analysis on the YouTube user data set,detailed introduction to the existing link prediction methods,and propose an improved algorithm based on the Ada-Boost algorithm.In the traditional Ada-Boost algorithm,by means of weak classifier iteration,the post-weak classifier solves the errors of the pre-classifier,and finally forms a strong classifier.However,the traditional Ada-Boost algorithm fails to identify the situation of data noise,and in the case of data skew,there is no effective solution.Therefore,this dissertation proposes an improved Ada-Boost algorithm,which converts the error rate into positive and negative errors,and adjusts the weight to make the error-to-mean ratio in a fixed interval.It effectively solves the accuracy and stability problems caused by data skew.Based on a series of research,the paper concluded that improved Ada-Boost has better performance in precision and stability.In addition,this dissertation also introduces prediction methods of the weak classifier in the Ada-Boost algorithm.Prediction methods are mainly divided into two types:proximity prediction methods and centrality prediction methods.The former is mainly based on the local information,while the latter prefers to use global information to make predictions based on the network topology.

  • 【分类号】O157.5
节点文献中: 

本文链接的文献网络图示:

本文的引文网络