节点文献

基于情感分析的问答社区回答质量研究

Research on Quality of Answers in Q&A Community with Sentiment Analysis

【作者】 张磊

【导师】 邓三鸿;

【作者基本信息】 南京大学 , 情报学, 2016, 硕士

【摘要】 问答社区是一种社会化的知识经验交流平台,兼具社交性和知识性为一体。社交性在于其间用户的关于问题和经验的交流,这部分的研究很多,主要是以社会网络分析和链接分析的方法构建问答社区中的社会网络,从结构上去理解认识问答社区的运行机制、传播机理以及用户关系。社交性之外,问答社区的内容也是社会化的产物,不论是话题还是问题,或是回答和评论都具有鲜明的社会性,所有的话题由用户创建和维护,所有的问题由用户修改和回答,所有的回答由用户点赞和评论,所有的评论也可以继续被回复点赞下去。社会性是问答社区发展的源泉动力,也是其面临的挑战。本文即对问答社区的社会性进行探讨,以其社会性特征的一个典型——回答的质量评价为目标,试图深入其社会性,找出一套直观易得的指标进行数值量化。本文不从回答本身特征着手,而改道从问答社区回答的评论切入,考察评论自身所有特征,从中抽取出关于其内容的两个特征——专业度和情感值,来表示评论;再以评论的加权累加表示回答的内容特征的一方面,另一方面以回答者的专业度为度量,以此几个变量构建回归预测模型,试图简单客观量化回答质量。首先本文构建了一套数据抓取框架,对需要抓取的数据特征进行了定义,利用爬虫技术将网页采集为Excel表格信息存储。抓取完成后对数据的总体分布特征进行分析,发现话题热度的不同造成了个话题下问题回答数、回答赞同数、回答评论数、评论赞同数统计特征的差异。同时也表现问答社区中用户的点赞行为更多在回答上,较少对评论进行点赞。这之后对采集数据进行预处理,使用了包括中文分析、停用词过滤和情感分析等技术。对预处理的数据归纳总结出预测变量——回答赞同数,和观测变量——回答者专业度、答评论数和回答的情感得分等,使用SPSS进行建模,综合使用散点图、相关分析和回归分析,首先散点图反映出采集数据中特征分布的稀疏性以及由此带来的正态性不明显;从相关分析角度以相关系数的大小反映观测变量对预测变量的影响力,其中着重对是否去除停用词、是否对回答进行用户加权、正向情感与负向情感等作了对比研究;再用回归的方法对强相关的观测变量进行建模数据拟合,先以二元情感分类入手再对具体7类情感作详细分析,最后得出结论回答质量可有回答者专业度、评论数量以及评论中的部分积极情感来综合衡量,不同话题中的回答质量普遍受正向情感影响较大,对负向情感则各有差异,而情感因素之外的回答者自身权威度以及回答下评论数对回答质量的预测影响更大,能提供更多的支持信息。

【Abstract】 Q&A Community is a social and socialized and platform where knowledge and experience can be exchanged.The sociality relies on the conversations about questions and experience.Much research has been done on this part,mainly using the social network analysis and link analysis to rebuild the network of the community.Improvements have been made on the operational mechanism,communication mechanism and user relationship.In addition to the social research,community is born as a result of socialization.The contents,no matter the topic and question,or the answer and comment,are all socialized.All the topics are created and maintained by users.All the questions are corrected and answered by users.All the answers are endorsed and commented by users and even all the comments are endorsed and replied as a circle.The socialization is both promoting and blocking it.This essay researches on the sociality of the Q&A community.It starts with the judgment of the answers in the community and moves on to its inner part to find an indexer to quantitate the judgments.The essay doesn’t focus on the nature of answers but carries on to the comments to the answers.It concludes two features——the degree of professionalism and the value of sentiment analysis.It makes the sums of the weighed comment values and the professionalism of the author as predictors of the quantitate quality of answers.Firstly,the essay create a framework for crawling data and store the crawled data in Excel format.Then it describes the general feature distribution using statistical methods and finds that different types of topic vary in the feature of the total number of answers,answer votes,comments and comment votes.There are more votes on answers than comments.After that,it preprocesses the crawled data considering by natural language process and summarize the processed data into predict variables and observe variables.In the scatter plot,we can see the sparsity of data features as to make the data abnormal.By correlation analysis,some closely related factors are picked out as the candidate members of the final model.Finally it models the data with regression and comes to the conclusion that the quality of answers can be measured by the professionalism of authors,numbers of the comments and part of the positive sentiment scores comprehensively.In detail,positive sentiments of comments have more power over the quality of answers,although negative sentiments differ in the extent of effect according to the inner property of topics.Apart from the sentiment factors,the total number of answers and the authority of answerer also play essential parts and provide more complementary information.

  • 【网络出版投稿人】 南京大学
  • 【网络出版年期】2020年 05期
  • 【分类号】G252
  • 【被引频次】1
  • 【下载频次】121
节点文献中: 

本文链接的文献网络图示:

本文的引文网络