节点文献
面向多类信息网络的链路预测方法研究
Research on Link Prediction Methods for Multiple Types of Information Networks
【作者】 李冬;
【导师】 申德荣;
【作者基本信息】 东北大学 , 计算机软件与理论, 2020, 博士
【摘要】 在现实世界中,实体连同关联关系构成了一种网络关系结构即信息网络。这里的实体可以是商品、文章、会议、人、图片、电影或者导演等个体,关联关系可以是购买、发表、观看、出演或者指导等关联。链路预测是指通过已知的网络节点以及网络结构等信息来预测网络中尚未产生连边的两个节点之间产生链路或产生某种符号链路的可能性。这些被预测出的链路可以是实际存在但未被观察到的链路,也可以是未来可能会出现的链路。链路预测已经成为数据挖掘领域中一个重要的研究方向,它可以更好地帮助用户理解网络的结构生成和演化规律。尽管当前的链路预测技术有很多,但面对信息网络的异构性、不完备性及互补性等特点,仍存在不足:首先,这些技术通常将所有实体和关联关系都同等对待或独立分析,而忽略了不同实体类型及关联关系之间的相关性,且缺乏对多种特征的有效融合。其次,现有的链路预测方法通常只关注已标注符号的链路来提取特征,而大多数信息网络中已标注的链路较少,缺乏足够的样本数据进行模型训练。第三,现有技术主要依赖于社交链路的特征,而忽略了用户的行为特征。另外,当前的链路预测技术大多倾向于单一任务,而忽略多个任务之间的协调促进作用。针对上述问题,本文面向四种信息网络(包括:异构信息网络、不完备符号网络、社交推荐网络以及多信息网络),综合运用图论、信息论、机器学习等理论和技术,分别提出了相应的链路预测模型和算法。本文的主要创新点如下:(1)针对异构信息网络,本文提出了一种基于层次化混合特征图(Hierarchical Hybrid Feature Graph,简称HHFG)的链路预测方法。首先,提出了一种层次化混合特征图模型——HHFG。不同于传统的链路预测技术,HHFG利用实体特征和边特征来区分不同类型的实体及关联关系,并将异构信息网络的拓扑特征、语义特征和时序特征进行层次化表示,对异构信息网络所表达的丰富特征进行有效组织。其次,提出了一种基于HHFG的链路预测算法:一方面,基于混合特征在HHFG上做随机游走,通过计算随机游走概率来评估节点之间产生链路的可能性;另一方面,采用梯度下降法学习特征权重和转移系数等参数,有效地保证了链路预测的准确性。最后,通过与其它链路预测方法进行对比实验,验证了基于HHFG的链路预测方法的可行性和有效性。(2)针对不完备符号网络,本文提出了一种基于非标注边的链路预测方法(Unlabeled Ties based Link Prediction,简称UTLP),用于预测非标注边的符号。首先,与传统符号链路预测方法不同,UTLP模型基于社交感知理论来进行预测,利用该理论可同时获取已标注边和非标注边的特征。其次,提出了一种基于迁移学习的链路预测算法,对源网络的训练样本采用加权的方式加以区分,有效解决了目标样本数据不足的问题。最后,将Epinions、Slashdot和Wiki-RfA作为实验数据集进行实验,实验结果表明,与传统方法相比,UTLP模型具有较高的预测准确度。(3)针对社交推荐网络,本文提出了一种符号预测与行为预测相结合的链路预测方法(Sign Prediction with Behavior Prediction,简称 SPBP)。首先,采用基于深度学习的节点嵌入技术来学习节点的嵌入表示,在此基础上提出了两种相关性量化模型,分别计算用户之间的社交相关性和行为相关性。其次,提出了一种符号预测与行为预测相结合的迭代式预测算法,充分利用符号预测与行为预测的相互促进作用来提高预测的准确性:一方面,针对符号预测问题,基于社交心理学理论将已标注链路和非标注链路的特征相结合,提出了一种基于迁移学习的符号预测算法;另一方面,针对行为预测问题,充分考虑了用户间的社交特征——正链路和负链路,为行为预测提供了更详实的证据。最后,通过对真实的社交推荐网络进行实验可知,与传统方法相比,SPBP模型具有较高的预测准确度。(4)针对多信息网络,本文提出了一种基于多任务协调的信息网络融合模型(Information Networks Fusion Model based on Multi-task Coordination,简称MC-INFM)。与传统模型不同,MC-INFM模型将融合问题转化为概率推理问题,通过执行多个任务(包括:实体识别、链路预测和关系匹配)来推断融合后的最终结果。首先,分别定义了任务内特征和任务间特征,并将这些特征建模为因子图。其次,使用条件随机场(Conditional Random Field,简称CRF)模型来学习每个任务内特征和任务间特征的权重。第三,提出了一种基于多任务协调的迭代推理算法,通过执行最大概率推理来迭代地推断这些任务的结果。最后,通过在真实数据集上的实验,验证了本文提出的MC-INFM模型的有效性。
【Abstract】 Entities in real world are often interconnected,forming information networks.Here entities can be items,papers,meetings,authors,pictures,movies,directors etc.,and relationships can be purchase,publication,watching,acting,directing etc.Link prediction is a necessary technique to predict links among nodes or to predict signs of unlabeled links via the observed nodes and network structure in information networks.These predicted links can be links that actually exist but have not been observed,or the links that might appear in the future.Link prediction has become an important research area in the field of data mining.It is useful to make users understand the generation and evolution of networks better.Although there are a lot of studies on link prediction recently,they are still insufficient due to the heterogeneity,incompleteness and complementarity of information networks.Firstly,existing link prediction techniques regard all nodes and relationships equally or analyze nodes and relationships independently.The types of entities or relationships are often ignored.Current techniques lack the effective fusion of multiple features.Secondly,most of existing methods for link prediction in signed networks tend to rely on the features only from labeled ties.However,in incomplete signed networks,little information about labeled ties is available.Often,there is not enough sample data to train by machine learning models.Thirdly,existing methods mainly rely on the features from social links but ignore users’ behavior.Moreover,traditional work defines the features only to capture the dependencies between a single predicted result and the evidence within one task.The interaction among multiple tasks is often ignored.Therefore,as for four types of information networks(i.e.heterogeneous information networks,incomplete signed networks,social recommendation networks and multiple information networks),based on graph theory,information theory,machine learning etc.we present corresponding link prediction models and algorithms respectively.We make the following contributions:(1)For heterogeneous information networks,we present a Hierarchical Hybrid Feature Graph(HHFG)based link prediction method.Firstly,we present a HHFG model.Unlike traditional link prediction techniques,HHFG model distinguishes different entities and relationships by capturing both entity features and tie features.The model fully considers structure features,semantic features and time features and represents them hierarchically.Abundant features from information networks are organized effectively.Secondly,a HHFG-based link prediction algorithm is proposed.On one hand,it performs random walk on HHFG based on the hybrid features.We estimate possibilities of forming links among nodes by calculating the probability of random walk.On the other hand,the parameters such as feature weights and transition coefficients are learned by gradient descent method to effectively guarantee the accuracy of link prediction.Finally,the experiments demonstrate the feasibility and effectiveness of our HHFG-based link prediction method by comparing it with other link prediction methods.(2)For incomplete signed networks,we present an Unlabeled Ties based Link Prediction(UTLP)method,in order to predict the signs of unlabeled ties.Firstly,unlike traditional link prediction methods,based on social awareness theories,UTLP model can utilize the features of both labeled ties and unlabeled ties.Secondly,we adopt the transfer learning algorithm with instance weighting to utilize more useful training instances in the source network to train the model and predict the signs of links,which can effectively make up for the incompleteness of target samples.Finally,we conduct experimental studies based upon Epinions,Slashdot and Wiki-RfA.Experiments demonstrate the effectiveness and the efficiency of our proposed method compared with traditional methods.(3)For social recommendation networks,we present a model by integrating Sign Prediction with Behavior Prediction(SPBP).Firstly,we adopt deep learning-based embedding technique to extract users’ representations.Then we use these representations and users’ behavior to estimate social correlation and behavioral correlation between users respectively.Secondly,we propose an iterative prediction algorithm by integrating Sign Prediction with Behavior Prediction.It improves the accuracy of prediction by taking full advantage of interaction between sign prediction and behavior prediction.On one hand,for sign prediction,we utilize the features from both labeled links and unlabeled links to extract social psychology theories-based features.We propose a sign prediction algorithm based on transfer learning.On the other hand,for behavior prediction,we take into account the social features of users and distinguish between positive links and negative links which can provide more informative evidence for behavior prediction.Finally,extensive experiments conducted on real-world social recommendation networks(i.e.Epinions,Slashdot and Wiki-RfA)demonstrate that SPBP can effectively solve both the sign prediction problem and the behavior prediction problem.(4)For multiple information networks,we present an Information Networks Fusion Model based on Multi-task Coordination(MC-INFM).Different from traditional models,MC-INFM casts the fusion problem as a probabilistic inference problem,and collectively performs multiple tasks(including entity resolution,link prediction and relation matching)to infer the final result of fusion.Firstly,we define the intra-features and the inter-features respectively and model them as factor graphs.Secondly,we use Conditional Random Field(CRF)to learn the weights of intra-features and inter-features.Thirdly,we propose an iterative inference algorithm based on multi-task coordination.We infer the results of these tasks simultaneously by performing the maximum probabilistic inference.Finally,experiments demonstrate the effectiveness of our proposed MC-INFM model.