节点文献

融合功能语义关联计算与密度峰值检测的Mashup服务聚类方法

Mashup Service Clustering Method Via Integrating Functional Semantic Association Calculation and Density Peak Detection

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 陆佳炜吴涵张元鸣梁倩卉肖刚

【Author】 LU Jia-Wei;WU Han;ZHANG Yuan-Ming;LIANG Qian-Hui;XIAO Gang;School of Computer Science and Technology,Zhejiang University of Technology;School of Computer Science and Engineering,Nanyang Technological University;

【通讯作者】 肖刚;

【机构】 浙江工业大学计算机科学与技术学院南洋理工大学计算机科学与工程学院

【摘要】 随着互联网上Mashup服务数量及种类的急剧增长,如何从这些海量的服务集合中快速、精准地发现满足用户需求的Mashup服务,成为一个具有挑战性的问题.针对这一问题,本文提出一种融合功能语义关联计算与密度峰值检测的Mashup服务聚类方法,用于缩小服务的搜索空间,提升服务发现的精度与效率,首先,该方法对Mashup服务进行元信息提取和描述文本内容整理,并根据Web API组合的标签对相应Mashup服务标签进行扩充.然后,基于功能语义关联计算方法(Functional Semantic Association Calculation Method,FSAC)提取出各服务描述的功能名词集合,并通过功能名词的语义权重来构造Mashup语义特征向量.最后,通过基于密度信息的聚类中心检测方法(Clustering Center Detection Method based on Density Information,CCD-DI)检测出最为合适的 K 个Mashup语义特征向量作为K-means算法的初始中心,进行聚类划分.基于ProgrammableWeb的真实数据实验表明,本文所提聚类方法在纯度、精准率、召回率、熵等指标上均有良好表现.

【Abstract】 With the rapid growth in the number and type of Mashup service on the Internet,how to quickly and accurately find Mashup services that meet user needs from these massive services has become a challenging problem.Service clustering technology can simplify the Web API recommendation process,and a lot of different approaches have been proposed.Many of them mainly focus on the semantic similarities research from the Web service document to guide clustering operations.But it usually uses the K-means or its improved algorithms to cluster the services,and did not propose an effective solution to the initial clustering centers selection problem for K-Means.Moreover,most service description documents are short texts,often have a limited contextual information,and they are sparse,noisy and ambiguous,and hence,automatically mining the hidden functional information from them remains an important challenge.Traditional mining algorithms such as LDA are difficult to represent short texts and find satisfactory clustering effects from them.Aiming at these problems,we investigate services and their compositions in ProgrammableWeb which characterize services as APIs and their compositions as Mashups.A Mashup service clustering method via integrating functional semantic association calculation and density peak detection is proposed in this paper,which is used to reduce the search space of services and improve the accuracy and efficiency of service discovery.In the initial stage of the method,each Mashup service description text is normalized,and the Mashup service tag is extended from the Mashup and Web APIs.Then,according to the functional semantic association calculation method(FSAC),the functional noun set of each service description is extracted,and the functional semantic weights of these functional nouns are calculated.Clustering Mashup services based on function similarities would greatly boost the ability of services search engines to retrieve the most relevant Web services.Further,the nouns with higher functional semantic weights are represented as Mashup semantic feature vectors.Finally,the most suitable K Mashup semantic feature vectors are detected by the clustering center detection method based on density information(CCD-DI),which are used as the initial center of the K-means algorithm for clustering.A series of experiments are carried out with real data from ProgrammableWeb.We first mine the contents of the Mashup service documents to extract the functional words describing the meaning of the services.The extracted data is compared with manual to verify the effectiveness of FSAC.Then we use two criteria to evaluate the performance of our approach,namely Precision and Recall.Precision can be seen as a measure of exactness or fidelity,whereas Recall is a measure of completeness.Moreover,we also employ purity and entropy to evaluate the clustering accuracy.Compared with existing methods,the results show that the proposed clustering method has good performance in terms of precision,recall,purity,and entropy.In addition,we use t-SNE tools to visualize the vectors in Mashup based on the TF-IDF,LDA and MFSF methods,respectively.The vector visualization results demonstrate the interpretability of our method by discovering related and consistent clusters.

【基金】 国家自然科学基金(61976193);浙江省自然科学基金(LY19F020034);浙江省重点研发计划项目(2021C03136)资助~~
  • 【文献出处】 计算机学报 ,Chinese Journal of Computers , 编辑部邮箱 ,2021年07期
  • 【分类号】TP311.13
  • 【被引频次】5
  • 【下载频次】233
节点文献中: 

本文链接的文献网络图示:

本文的引文网络