节点文献
GridOF:面向大规模数据集的高效离群点检测算法
GridOF: An Efficient Outlier Detection Algorithm for Very Large Datasets
【摘要】 作为数据库知识发现研究的重要技术手段,现有离群点检测算法在运用于大型数据集时其时间与空间效率均无法令人满意.通过对数据集中离群点分布特征的分析,在数据空间网格划分的基础上,研究数据超方格层次上的密度近似计算与稠密数据主体滤除策略.给出通过简单的修正近似计算取代繁复的点对点密度函数值计算的方法.基于上述思想构造的离群点检测算法GridOF在保持足够检测精度的同时显著降低了时空复杂度,运用于大规模数据集离群点检测具有良好的适用性和有效性.
【Abstract】 Identifying the rare instances in datasets can lead to the discovery of unexpected and useful knowledge. However, existing algorithms for such outlier detection applications are not efficient when facing large datasets. With detailed discussion on the futures of outliers in datasets, a novel grid-based algorithm, called GridOF, is presented, which first filters out crowded grids and then finds outliers by computing adjusted mean approximation of the density function. While still keeping desirable outlier detection accuracy, the algorithm has a very high performance in both space and time usage. Results of experiments also demonstrate promising availabilities of this approach.
【Key words】 outlier detection; adjusted mean approximation; GridOF algorithm;
- 【文献出处】 计算机研究与发展 ,Journal of Computer Research and Development , 编辑部邮箱 ,2003年11期
- 【分类号】TP311.13
- 【被引频次】54
- 【下载频次】468