节点文献

一种基于FastText的恶意代码家族分类方法

A Classification Method of Malicious Code Family Based on FastText

  • 推荐 CAJ下载
  • PDF下载
  • 不支持迅雷等下载工具,请取消加速工具后下载。

【作者】 张宇迪冯永新赵运弢

【Author】 ZHNAG Yudi;FENG Yongxin;ZHAO Yuntao;Shenyang Ligong University;

【通讯作者】 冯永新;

【机构】 沈阳理工大学信息科学与工程学院

【摘要】 传统的恶意代码家族分类方法主要通过代码家族浅层关联特征的统计分析达到分类和识别的目的。随着恶意代码加壳、混淆、多态技术的发展,传统方法的局限性逐渐显现,但恶意代码需调用API函数达成恶意目的始终是其不变的行为特征。基于embedding、word2vec模型的传统方法缺乏对低频API函数的特征提取能力,在表征API序列局部顺序特征时易产生映射失真,存在词典外API行为扩展、推理能力弱等导致分类准确率下降的不足。由此,引入负采样优化的FastText框架以加强对API序列映射的准确度,提出一种基于FastText框架下的恶意代码家族分类方法。利用FastText框架实现代码样本API序列的多维向量转换和精准表达,结合一维卷积及长短时记忆(LSTM)网络进一步提取API行为局部特征。实验结果表明,该模型的性能相较于传统的embedding方法和word2vec框架性能更优,准确率可达99%以上。

【Abstract】 The traditional classification method of malicious code families is mainly used to achieve the purpose of classification and recognition through statistical analysis of shallow association features of code families.With the development of malicious code shelling, obfuscation, and polymorphism techniques, the limitations of the traditional methods are gradually emerging.However, malicious code needs API functions to achieve malicious purposes.Traditional methods based on embedding and word2vec models are unable to extract features from low-frequency API functions, and are prone to mapping distortion when characterizing local sequential features of API sequences.These methods also have shortcomings such as extended API behavior outside the dictionary and weak reasoning ability, which can lead to a decrease in classification accuracy.Therefore, a negative sampling optimized FastText framework is introduced to enhance the accuracy of API sequence mapping, and a malicious code family classification method based on the FastText framework is proposed.The FastText framework is utilized to achieve multidimensional vector transformation and precise expression of code sample API sequences, and one-dimensional convolution and long short term memory(LSTM)networks are combined to further extract local features of API behavior.The experimental results show that the performance of this model is superior to traditional embedding methods and the Word2vec framework, with an accuracy rate of over 99%.

【基金】 国家自然科学基金项目(61971291)
  • 【文献出处】 沈阳理工大学学报 ,Journal of Shenyang Ligong University , 编辑部邮箱 ,2024年01期
  • 【分类号】TP311.52;TP309
  • 【下载频次】30
节点文献中: