节点文献
基于生成对抗网络的文本序列数据集脱敏
Differentially private sequence generative adversarial networks for data privacy masking
【摘要】 基于生成对抗网络和差分隐私提出一种文本序列数据集脱敏模型,即差分隐私文本序列生成网络(DP-Seq GAN)。DP-Seq GAN通过生成对抗网络自动提取数据集的重要特征并生成与原数据分布接近的新数据集,基于差分隐私对模型做随机加扰以提高生成数据集的隐私性,并进一步降低鉴别器过拟合。DP-Seq GAN具有直观通用性,无须对具体数据集设计针对性脱敏规则和对模型做适应性调整。实验表明,数据集经DP-SeqGAN脱敏后其隐私性和可用性明显提升,成员推断攻击成功率明显降低。
【Abstract】 Based on generative adversary networks and the differential privacy mechanism, a differentially private sequence generative adversarial net(DP-SeqGAN) was proposed, with which the privacy of text sequence data sets can be filtered out. DP-SeqGAN can be used to automatically extract important features of a data set and then generate a new data set which was close to the original one in terms of data distributions. Based on differential privacy, randomness is introduced to the model, which improves the privacy of the generated data set and further reduces the over fitting of the discriminator. The proposed DP-SeqGAN was universal, so there is no need to adjust the model adaptively for datasets or design complex masking rules against dataset characters. The experiments show that the privacy and usability of a sequence data set are both improved significantly after it is processed by theDP-Seq GAN model, and DP-Seq GAN can greatly reduce the success rate of member inference attacks against the generated data set.
【Key words】 privacy preserving; data privacy masking; generative adversarial network; differential privacy;
- 【文献出处】 网络与信息安全学报 ,Chinese Journal of Network and Information Security , 编辑部邮箱 ,2020年04期
- 【分类号】TP309.2
- 【被引频次】7
- 【下载频次】322