节点文献
基于时间因素的模糊协同过滤推荐算法
A Fuzzy Collaborative Filtering Recommendation Algorithm Based on Time Factor
【作者】 陈娟;
【作者基本信息】 武汉大学 , 软件工程, 2017, 硕士
【摘要】 协同过滤推荐算法理论成熟,应用广泛,但也存在数据稀疏、准确性受用户评分习惯影响的不足。为弥补这种不足,用户属性和项目属性作为补充信息被研究者们引入到协同过滤推荐算法中,传统协同过滤算法得到了扩充。但是对于匿名用户,或者基于IP地址的推荐,用户信息的获取变得举步维艰。而时间属性对于项目和用户都易于获取,且相对于用户评分,更具有客观性。基于时间衰减函数的协同过滤能反映时间变化信息,但仍然存在准确性受用户评分习惯影响的不足。针对这些不足,本文做了如下工作:1.针对基于用户的协同过滤推荐算法,提出了融合时间特征的方法,通过用户响应时间的长短描述用户对项目的喜爱程度,扩充了传统的单一依赖评分刻画用户喜好的评价标准。通过使用基于响应时间喜爱度对用户评分进行更新,在一定程度上改善了由于用户评分习惯不同而对推荐结果带来的的不利影响。同时,本文讨论了该算法在前过滤和后过滤中的区别。2.通过用户评分时间差反映用户之间潜在的相似性,间接刻画了用户的不同年龄高层次、不同的活跃程度等潜在的因素,丰富了传统的单一依据评分衡量用户相似程度的准则,在一定程度上克服了传统协同过滤算法忽略用户自身属性的弊端,且这种以时间反映用户潜在属性的方法也能一定程度缓解获取用户真实属性的压力。3.针对用确定集描述用户评分时间差别可能带来的偏差,本文引入模糊集。基于主动学习中LCLC(Learning from Common Local Clusters,即基于共同局部簇学习的方法)算法的思想,本文根据用户不同的响应时间,将用户划分为RP(Reliable Positive),LP(Likely Positive)、LN(Likely Negative)、RN(Reliable Negative)四类,即可信的积极用户、可能的积极用户、可能的消极用户和可信的消极用户四类,构建高斯模糊模型,并基于此高斯模糊模型计算用户间的距离,衡量用户间的相似度,对基于评分的用户相似度进行更新,在更新后的相似度基础上进行评分预测。本文分别用这三种算法在MovieLens数据集上与基于皮尔逊相似度的传统协同过滤算法进行对比实验,其中,基于响应时间的算法在前过滤条件下优势明显,在后过滤条件下具有一定优势;引入评分时间差的算法在一定范围内有微弱优势;引入模糊集的算法具有明显优势。实验结果说明,引入时间因素和模糊集能在一定程度上提高推荐算法效率。
【Abstract】 Collaborative filtering is one of the most classic recommendation algorithms,which is widely used,but also has the problem of sparse data and easy to be influenced by user rating habits.In order to resolve this problem,the user attributes and item attributes are introduced into the collaborative filtering recommendation algorithm.But for anonymous users,or the recommendation based on IP,the acquisition of user information becomes difficult.The time attribute of the item and the user are easy to obtain,and more objectivity with respect to user ratings.Collaborative filtering based on time decay function can reflect the change of time,but it still can not overcome the shortcomings of the user rating habits.To improve the accuracy of the user-based collaborative filtering algorithm,an algorithm compromises time factor and rating factor is proposed in this article.We use the response time of user to describe the user’s preference for the item,and it is an extension of the traditional criteria which singly depends on rating.By updating the ratings through the weight of response time,the adverse influence caused by user rating habits becomes smaller.Besides,this paper discusses the difference between the pre-filtering and post-filtering.The differences of users’ rating time can reflect the similarity between users.We assume that there is certain relationship between the difference of rating time and the age level,and the degree of active potential.And our method can enrich the single measurement of the traditional score similarity criterion,and overcome the shortcomings of traditional collaborative filtering algorithm which ignoring user’s attributes.Besides,this method can alleviate the pressure of getting the user real property to some extent.In order to describe the possible deviation of the rating time of users,the fuzzy set isintroduced into our algorithm.Based on the idea of the LCLC(Learning from Common Local Clusters)algorithm,which is an active learning method,the users are divided into RP(Reliable Positive),LP(Likely Positive),LN(Likely Negative),RN(Reliable Negative)four categories according to the response time.And a Gauss fuzzy model is constructed according to the division.Then,we calculate the distance between users based on this model to measure the similarity between users,and update score based user similarity.Our experiment is based on the MovieLens data set,and traditional collaborative filtering algorithm based on Pearson similarity is compared with the three algorithms respectively.The experimental values of MAE are lower than that of the traditional collaborative filtering algorithm based on the similarity of Pearson.The experimental results show that the introduction of time factor and fuzzy set can improve the efficiency of recommendation algorithm to some extent.
【Key words】 Collaborative Filtering; Time Factor; Recommendation System; Fuzzy Set;