节点文献
融合多特征的产品垃圾评论识别
Fuse multi-features to identify product review spam
【摘要】 针对JINDALN等人新近提出的利用逻辑回归模型识别产品垃圾评论的检测方法中使用过多产品评论特征这一问题,分析了解决方法,并提出对特征进行显著性检验。通过对亚马逊数据集的实验结果表明,采用显著性特征建立的回归模型优于所有特征建立的模型。新模型不仅解决了上述问题,减少了计算量,而且整体性能不变,这表明以显著性特征建模有助于提高模型的检测质量。
【Abstract】 To solve the problem of overusing features in the product review spam identification based on logistic regression recently proposed by JINDAL N et al.,we take significance testing on these features.Our experiments on the Amazon dataset show that the new regression model based on the significant features is better than the model based on the whole features.This new model not only solves the problem mentioned above,but also achieves the same performance with lower calculation cost;it shows that modeling on the significant features contributes to improving the detection quality.
【Key words】 logistic regression(LR); product review spam; significance testing;
- 【文献出处】 微型机与应用 ,Microcomputer & Its Applications , 编辑部邮箱 ,2012年22期
- 【分类号】TP393.09
- 【被引频次】11
- 【下载频次】286