节点文献

基于异质信息网络的出行行为分析及相似性度量方法研究

Travel Behavior Analysis and Similarity Measurement Based on Heterogeneous Information Network

【作者】 陈君

【导师】 唐蕾;

【作者基本信息】 长安大学 , 计算机系统结构, 2018, 硕士

【摘要】 近年来,城市计算和智能交通成为了交通领域的研究热点,其中用户的出行行为分析以及行为建模是工作基础,可支持大量增值性研究,例如交通出行需求分析、交通流预测以及群体服务推荐等。现有研究主要从GPS数据中挖掘有限、独立的出行特征,存在忽视特征之间语义联系、模型拓展性较差等问题。因此,构建能够完整描绘用户出行行为,理解出行特征之间联系的出行模型具有很重要的意义。本文分析了用户出行行为研究的现状,从GPS数据中抽取包括出行时间、位置以及服务等不同类型对象,分析对象之间的关系类型与语义信息,构造异质出行网络模型表示。在此基础上,定义出行元路径,对出行特征及其关系进行表示与相似性分析,运用机器学习方法自动发现具有相似出行特征的用户。实验结果表明,采用本文所提出的复合元路径算法比一般的算法有更好的效果。主要研究工作如下:(1)结合用户出行对位置敏感的特性,挖掘停留位置的语义信息,分析并进行用户出行特征数据提取。(2)研究异质信息网络理论,构造异质出行网络模型。该模型从GPS数据中抽取包括出行时间、位置以及服务等不同类型对象,分析对象之间的关系类型。定义出行元路径,表示星型网络模式下的对象间语义关系,进行出行特征分析。(3)采用元路径定义特征相似性,运用PathSim、SimRank与随机游走方法构造相似向量,确定度量方法。结合线性回归技术自动融合不同元路径下的相似性度量结果,确定相似用户。(4)采用10-折交叉验证方法对Geolife样本数据集进行相似度标定。在1000对用户数据集上,选取80%的数据输入到分类器中进行训练。使用测试集中的用户对数据,分别对单元路径下相似性算法与复合多元路径下相似性算法进行查全率、查准率与AUC(Area under the ROC curve)的验证。结果显示,在各个指标下,复合元路径下相似算法远远好于单元路径相似算法。

【Abstract】 Recently,Urban Computing and Intelligent Transportation have become research hotspots in the transportation field.The user’s travel behavior analysis and behavioral modeling are the basis of work and can support a large number of value-added studies,such as traffic travel demand analysis,traffic flow forecasting and group recommendation,etc..Existing researches mainly mine limit and independent travel characteristics from the GPS data,having the problem of ignoring semantic links between features and poor model expansion.Therefore,it is of great significance to build a travel model that can completely depict the user’s travel behavior and present the connection between travel characteristics.This paper analyzes the current status of user travel behavior research.By modeling from GPS data includes different types of objects such as travel time,location,and service,we analyze the relationship types and semantic information between objects,and construct a representation of heterogeneous travel network model.Based on this,we defined the meta path of travel,which expresses and analyzes the similarities and characteristics of the travel features and applied machine learning methods to automatically discover users with similar travel features.We conclude our contributions is as follows:(1)Considering the location-aware travel behaviours,we extracted the semantic information of the stay points and then analyzed the trajectories.(2)We employed the heterogeneous information network to construct a travel network.The network can model different types of objects including travel time,location and services from the GPS data and imply the relationship types between the objects.We also defined the meta-path of travel to characterize the travel in the star network schema.(3)We proposed a similarity measurement based on the meta path.PathSim,SimRank and random walk serving as the existing similarity measurements have been used to construct similar vectors to determine the measurement method.With linear regression model,we can automatically fuse similar measures under different meta paths to determine the user similarity.(4)We used the 10-fold cross validation to perform similarity calibration on the open Geolife dataset.In 1000 pairs of users,80% of the samples was selected as the input of theclassifier for training.With the test datasets,the similarity measurements based on the single path and multiple paths were used to perform recall,precision and AUC(Area under the ROC curve)verification.

  • 【网络出版投稿人】 长安大学
  • 【网络出版年期】2019年 01期
  • 【分类号】TP391.3;U495
  • 【被引频次】1
  • 【下载频次】133
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络