节点文献

基于概率矩阵分解和谱聚类的协同过滤推荐算法

Collaborative Filtering Recommendation Algorithm Based on Probability Matrix Factorization and Spectral Clustering

【作者】 王亮

【导师】 关菲;

【作者基本信息】 河北经贸大学 , 应用统计硕士(专业学位), 2022, 硕士

【摘要】 随着大数据时代的来临,数据呈现爆炸式增长。数据不仅表现在数据量大,也呈现“数据冗余”的问题。而推荐系统能很好地从海量的数据信息中获取用户感兴趣的信息,以便更好地产生用户推荐列表。目前,推荐系统较为广泛的应用在商业领域,它不仅能挖掘企业潜在的商业价值,也能更好地满足客户的个性化需求。协同过滤推荐算法是推荐系统中应用最基础的算法,但是也存在着三个亟待解决的问题。“稀疏性”问题,是协同过滤推荐算法中面临的主要问题之一。其原理主要是用户-项目评分矩阵较为稀疏,即只有一部分人对一部分项目进行评分,从而造成数据量较少,进而影响推荐结果的准确度。“冷启动”问题,当系统中出现新的用户或者项目时,由于系统之前并没有相关用户或者项目的记录,所以会直接导致推荐结果的不准确。“可拓展性”问题,随着新用户和项目的逐渐加入,数据量逐渐增加,系统的计算复杂度逐渐提升,现有的推荐算法能否更好的进行实时推荐也成为其面临的主要问题。针对推荐系统三大问题和更好的生成个性化推荐列表,本文的主要工作具体如下:首先,缓解“数据稀疏性”问题,本文利用概率矩阵分解填充用户-项目稀疏矩阵。本文以Movie Lens100K数据集作为实验数据集,利用概率矩阵分解、全局平局值、Slope one和非负矩阵分解四种方法进行填充稀疏矩阵,以均方误差(RMSE)作为评价指标,结果表明:概率矩阵分解的RMSE最低,为0.9177。说明用概率矩阵分解填充稀疏矩阵的效果最好,预测评分的准确率有所提升。其次,为了更好的进行个性化推荐,我们对填充好的用户-项目矩阵进行谱聚类,缩小目标用户最近邻的搜索范围,给出更为准确的近邻区间,降低搜索范围。然后,在类内进行协同过滤推荐算法,计算目标用户与其他用户之间相似度,确定最紧邻集合,在最近邻内由预测评分公式计算预测评分。最后,为了验证本文算法的效果,我们以公共数据集Movie Lens100K作为实验数据集,利用均方根误差(Root Mean Square Error)和平均绝对误差(Mean Absolute Error)作为评价指标,进行了五组实验:实验一:概率矩阵分解正则化参数λ值的确定,以RMSE为评价指标,结果表明λ=0.1时,RMSE最小,后续实验λ恒为0.1;实验二:概率矩阵分解的潜在特征数量确定,当迭代数量为50次,潜在特征数量为5;实验三:概率矩阵分解融合不同的聚类算法,比较不同聚类数量产生的结果,以MAE作为评价指标,确定最优聚类数量;实验四:比较了概率矩阵分解融合聚类算法的RMSE数值,分别是融合谱聚类算法和K-Means算法与传统协同过滤推荐算法相比较,结果表明融合谱聚类算法的RMSE值最低,表明融合谱聚类算法的效果最好,能有效的改善冷启动问题;实验五:基于上面四个实验参数的确定,输出不同近邻数下的预测评分,与基于传统的协同过滤推荐算法、未聚类的概率矩阵分解(PMF)的协同过滤和本文改进的算法(PMF_SC)进行比较,以RMSE和MAE作为评价指标,结果表明:本文所提出的算法相比于协同过滤算法和概率矩阵分解在MAE和RMSE上有所降低,说明该算法在预测的准确率有一定提升,具有一定的参考意义。

【Abstract】 With the approach of the period of big data,data is growing exponentially.Data is not only manifested in a large amount of data,but also presents the problem of "data redundancy".The recommendation system can well obtain the information that the user is interested in from the massive data information,so as to better generate the user recommendation list.At present,recommendation systems have been widespreadly applied in the commercial areas,which can not only tap potential commercial value,but also better meet the personalized needs of users.Collaborative filtering recommendation algorithm is the most basic algorithm in recommendation system,and there are three main problems.The "data sparsity" problem is one of the main problems faced by collaborative filtering recommendation algorithms.The main principle is that the user-item rating matrix is relatively sparse,that is,only some people rate some items,resulting in a small amount of data,which affects the accuracy of the recommendation results.The "cold start" problem,a new user or item appears in the system,will directly lead to inaccurate recommendation results because there is no record of the relevant user or item in the system before.The "Scalability" problem,with the gradual addition of new users and projects,the amount of data gradually increases,and whether the existing recommendation algorithm can better perform real-time recommendation has also become the main problem it faces.In order to solve the above problems and generate a better personalized recommendation list,the main assignment of this thesis is as follows:First,to alleviate the "data sparsity" problem,this paper uses probabilistic matrix factorization to populate the user-item sparse matrix.In this paper,the Movie Lens100 K data is devoted to the experimental dataset,and the four methods of probability matrix decomposition,global average value,Slope one and non-negative matrix decomposition are used to fill the sparse matrix,and the mean square error(RMSE)is used as the evaluation index.The results show that: the probability matrix The decomposition had the lowest RMSE at 0.9177.The results show that filling the sparse matrix with probability matrix factorization is the best.Second,in order to better perform personalized recommendation,we perform spectral clustering on the filled user-item matrix to narrow the search range of the target user’s nearest neighbors,give a more accurate neighbor interval,and reduce the search range.Then,the traditional collaborative filtering recommendation algorithm is performed within the class,the user similarity is calculated,and the predicted score is obtained.Finally,to verify the effect of the algorithm in this paper,we take the public dataset Movie Lens100 K as the experimental dataset,and use the root mean square error and the mean absolute error as the evaluation indicators,and conduct five sets of experiments: Experiment 1:Determination of the regularization parameter λ of the probability matrix decomposition,with RMSE as the evaluation index,the results show that when λ=0.1,the RMSE is the smallest,and the subsequent experiment λ is always 0.1;Experiment 2: The number of potential features of the probability matrix decomposition is determined,when The number of iterations is 50,and the number of potential features is 5;Experiment 3: Probabilistic matrix decomposition integrates different clustering algorithms,compares the results generated by different numbers of clusters,and uses MAE as an evaluation index to determine the optimal number of clusters;Experiment 4: The RMSE values of the probabilistic matrix decomposition fusion clustering algorithm are compared,respectively,the fusion spectral clustering algorithm and the K-Means algorithm are compared with CF.The consequences show that the fusion spectral clustering algorithm has the lowest RMSE value,indicating that the fusion spectral clustering algorithm The effect of the class algorithm is the best,and it can availably improve the cold start problem;Experiment 5: Based on the determination of the above four experimental parameters,output the predicted score under different number of neighbors,which is different from collaborative filtering recommendation algorithm and the probability of no clustering.The collaborative filtering of matrix factorization(PMF)is compared with the improved algorithm(PMF_SC)in this paper,and RMSE and MAE are used as evaluation indicators.The RMSE has decreased,indicating that the algorithm has effectively improve accuracy,which has a certain reference significance.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络