节点文献

基于区块链的用户信息增量学习方案研究

The Research on Incremental Learning Scheme of User Information Based on Blockchain

【作者】 刘文

【导师】 秦静;

【作者基本信息】 山东大学 , 基础数学, 2019, 硕士

【摘要】 随着互联网、物联网等科学技术的迅速发展,人类产生数据的速度急剧增长。大量的商业决策、生产活动都依赖于数据,通过一定的手段从这些数据当中挖掘出所需的信息是当今研究的热点课题。机器学习是利用数据价值的关键技术,通过对海量已知案例数据的学习,机器学习能够从中找出人类难以发现的规律,实现对未来的预测。但传统的用户信息价值发掘存在以下两个问题:其一是模型需求方需要具备存储、计算海量数据的能力及面临模型过时的风险,其二是用户无法控制个人数据价值的流转。在传统场景下,为了对用户数据进行价值发掘,企业或机构需要收集、存储、处理海量用户数据信息。这对模型需求方的计算存储能力提出了较高的要求,同时由于信息产生、更新的速度进一步加快,训练完成所得的模型无法有效整合新的增量数据信息,从而面临模型泛化能力逐渐下降的风险。针对传统用户信息发掘需要大量计算存储资源及模型容易过时的问题,我们提出了基于区块链的用户信息增量学习方案。原本需要依赖于企业中心化数据库完成的模型训练,利用区块链技术使得学习的过程可以在用户的本地设备当中进行,而增量学习算法可以保证模型的实时更新。方案利用区块链技术保证了模型的训练由网络中用户独立完成,使得企业或机构不必为存储、计算设备付出巨大成本,降低了用户信息价值发掘的门槛。方案为了实现对流式数据的在线学习,利用增量学习保证了模型可以实时整合新的增量数据所蕴含的信息,防止模型随着数据的产生而面临过时的风险。同时,方案保证了用户数据不会被模型训练方所采集,实现了对用户信息的隐私保护和个人数据的控制,而企业也可以在规避由用户数据泄露带来商业风险的前提下,获取用户信息当中的价值。对于数据生产者的用户而言,个人信息被大量采集使用,除了需要承受隐私数据被恶意利用的风险,其个人数据当中所蕴含的信息红利全部被企业或机构所获取,用户无法从中获得任何收益。针对用户无法控制个人数据价值流转的问题,同时保证将数据控制权归还给用户后,企业可以在不获取用户数据信息前提下完成模型训练的任务。本文给出了实现用户数据价值流通的方案。方案在区块链当中部署智能合约,在不引入权威第三方的前提下,保证了利用自有数据参与模型训练用户的权益,激励用户加入区块链网络中提供个人数据信息完成对模型的更新、传递。用户可以自主选择是否参与模型的训练,实现了用户数据价值的流通。文章最后对所做工作进行了总结,并对方案面临的一些问题进行了分析,为未来的研究方向提供了若干思路。

【Abstract】 With the rapid development of science and technology such as the Internet and the Internet of Things,the speed at which humans produce data has increased dramatically.A large number of business decisions and production activities rely on data.It is a hot topic in today’s research to extract the required information from these data through certain means.Machine learning is a key technology that uses the value of data.Through the study of a large number of known case data,machine learning can find out the laws that are difficult for humans to discover and predict the future.However,the traditional value of user information has two problems:one is that the model demander needs to have the ability to store and calculate massive data and the outdated risk of the model,and the other is that the user cannot control the flow of personal data value.In the traditional scenario,in order to explore the value of user data,enterprises or organizations need to collect,store,and process massive amounts of user data.This puts higher requirements on the computational storage capacity of the demand side of the model.At the same time,because the speed of information generation and update is further accelerated,the model obtained by training cannot effectively integrate the new incremental data information,thus facing the risk of gradual decline of the model’s generalization ability.Aiming at the problem that traditional user information mining requires a large amount of computing resources and the model is easy to be outdated,we propose a block-based incremental learning scheme for user information.Originally,it relies on the model training completed by the enterprise centralized database.The blockchain technology can make the learning process be carried out in the user’s local device,and the incremental learning algorithm can ensure the real-time update of the model.The scheme utilizes the blockchain technology to ensure that the training of the model is completed independently by the users in the network,so that the enterprise or the organization does not have to pay a huge cost for the storage and computing devices,and the threshold for the value discovery of the user information is lowered.In order to realize online learning of streaming data,incremental learning ensures that the model can integrate the information contained in the new incremental data in real time,preventing the model from facing outdated risks as the data is generated.At the same time,the solution ensures that the user data is not collected by the model training party,and the privacy protection of’the user information and the control of the personal data are realized,and the enterprise can also acquire the user under the premise of avoiding the commercial risk caused by the leakage of the user data.The value in the information.For users of data producers,personal information is collected and used in large quantities.In addition to the risk of malicious data being used maliciously,the information dividends contained in the personal data are all obtained by enterprises or institutions,and users cannot obtain any income from them.In view of the problem that the user cannot control the flow of personal data value and ensure that the data control right is returned to the user,the enterprise can complete the model training task without acquiring the user data information.This paper presents a scheme for realizing the value circulation of user data.The scheme deploys smart contracts in the blockchain.Under the premise of not introducing an authoritative third party,it ensures the use of its own data participation model to train users’ rights,and encourages users to join the blockchain network to provide personal data information to complete the model update、transfer.The user can choose whether to participate in the training of the model or not,and realize the circulation of the value of the user data.At last,the works we did are summarized and some problems faced by the program are analyzed,which provides some ideas for future research directions.

  • 【网络出版投稿人】 山东大学
  • 【网络出版年期】2019年 09期
  • 【分类号】TP18;TP311.13
  • 【被引频次】1
  • 【下载频次】281
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络