节点文献
基于海量数据的用户行为数据分析系统研究与实现
Design and Realization of the Massive User Behavior Analysis System
【作者】 孙宇;
【导师】 杨公平;
【作者基本信息】 山东大学 , 软件工程(专业学位), 2017, 硕士
【摘要】 随着互联网行业的发展,人们的生活越来越离不开互联网带来的便利,政府对互联网+的支持,造就了众多传统行业与互联网的融合,这种融合不仅改变了行业的发展模式,同时也产生了海量的数据。人们每时每刻在互联网上产生的行为都被格式化的数据记录下来,这些行为数据对分析用户行为、改善服务价值、预测购买意愿及舆论走势具有重大价值。在互联网行业中,每天都会产生大量的行为日志数据,动辄以TB为单位,如何更好、更快、更准确的收集、处理、存储及展示用户的海量行为数据成为迫在眉睫的需求。在用户行为领域,通过数据分析方法的科学应用,经过理论推导,能够相对完整地揭示用户行为的内在规律。基于此帮助企业实现多维交叉分析,帮助企业建立快速反应、适应变化的敏捷商业智能决策。利用用户行为数据,可以产生更加优质的服务,这对任何一个组织来说都是绝对有意的。海量用户数据从产生开始,需要经过多个数据处理阶段才能通过云服务的方式向分析决策人员提供决策参考价值,或者直接向高层提供决策数据的方案支持。如何将海量、异构、实时、多样的用户数据进行高效采集、处理、存储,以实现大规模复杂用户数据查询的快速、准确、及时响应成为具有重要价值的研究方向。用户行为数据的价值随着时间逐步被挖掘,用户行为数据的重要性被广泛认可。本文从系统的角度,分析行业内用户行为数据分析系统的方案与技术,同时根据现有新兴技术的优势及特点,提出一套能够满足海量用户行为数据分析的系统方案,该方案是一套完整的解决方案,涵盖数据采集接收、数据处理、数据仓库、数据分析及数据可视化。本文主要研究和完成的工作包括:(1)建立完善的数据采集接收子系统,包括数据收集的策略、各种异常问题的解决方案、系统的扩展性设计等。(2)构建数据处理链路,通过对不同了来源的行为数据进行日志格式的预处理、清洗,产生符合标准计算的数据。(3)构建用户行为数据仓库,将处理后的数据根据业务等需求,将多来源的用户行为数据相互整合,形成不同产品的用户行为数据仓库,并将其作为用户行为数据分析系统的核心数据。(4)构建用户行为数据分析平台。根据业内对用户行为数据分析的价值判定,提供包含事件分析、漏斗分析、留存分析、用户路径等算法的快捷用户行为数据分析模板和数据可视化方案;除此之外,保留为用户提供的自主分析方案,使用户可以自主分析用户行为数据,挖掘数据价值。依照本文的设计方案构建的海量用户行为分析系统已经在国内最大的网约车互联网公司部署使用,该系统采用完全开源和自编程系统,最大化的保证数据在各个子系统之间的传输的可靠性。海量用户行为数据分析系统采用主流设计思想和较为先进的技术,保证系统的完整性和可扩展性,提供一个简单、迅速和规模化的数据分析产品,能极大地简化分析流程,提交效率,直达业务。该系统从设计到使用经历长达一年设计、开发,现在已经为用户行为分析的分析师和决策者提供稳定、准确、及时的数据支持。
【Abstract】 With the development of Internet industry,convenience brought by Internet is inseparable in people’s life.The government’s support to Internet + has created the integration of numerous traditional industries and Internet,which not only has changed the development mode of industries,but also has created a large amount of data.People’s all behaviors on the Internet are recorded by formative data,which has significant value to analyze user behavior,improve service value as well as predict willingness to buy and the trend of public opinion.In Internet industry,a great amount of behavior log data is created every day,which is calculated with the unit of TB frequently.Therefore,how to collect,process,store and present users’ massive behavior data better,quicker and more accurately has become an impending demand.In the field of user behavior,the scientific application of data analysis method and theoretical derivation can reveal the inherent law of user behavior completely.It can help enterprises realize multidimensional cross-over analysis,and help enterprises establish agile business intelligence decisions that respond rapidly and adapt to changes.Utilizing user behavior data can create better service,which is meaningful to any organizations.Ever since its creation,massive user data needs to go through multiple data processing stages to pass cloud service and further provide decision reference values to decision makers,or provide project support of decision-making data to seniors.How to collect efficiently,process and store massive,heteroid,real-time and diversified user data to realize rapid,correct and timely response of large-scale and complicated user data query and pivot has become a meaningful research direction.With the passage of time,the value of user behavior data is gradually being tapped and its importance is also being recognized increasingly widely.The study analyzes the schemes and technologies about the user behavior data analysis system in the industry from a systematic perspective.Besides,based on the advantages and characteristics of currently available emerging technologies,the study proposes a systematic scheme that can satisfy the analysis of massive user behavior data.The scheme involves a complete set of solutions that cover data collection and reception,data processing,data warehouse,data analysis and data visualization.The study mainly consists of the following parts:(1).The study establishes a complete subsystem for data acquisition and reception,including data collection strategy,solutions to a variety of abnormal problems,scalable design of the system etc.(2).The data processing link is constructed,and behavior data collected from different sources are preprocessed and cleaned in the log format to produce data that can be used in the standard calculation.(3).The user behavior data warehouse is constructed,and the processed data are integrated with each other according to relevant needs to develop user behavior data warehouses of different products as the core data of the user behavior data analysis system.(4).A platform is established to analyze the user behavior data.According to the value judgment of user behavior data analysis within the industry,the study provides fast user behavior data analysis templates and data visualization programs including algorithms such as event analysis,funnel analysis,retention analysis,user path etc.In addition,the independent analysis program offered for the users is retained so that users can independently analyze user behavior data and explore the data value.The massive user behavior analysis system which is established according to this design project in this paper has been deployed and used in the largest online taxi-hailing service Internet company in China.Complete open source and self-programming system are adopted in this system,which guarantees transmission reliability of data in each sub-system.Mainstream design ideas and advanced technology are adopted in massive user behavior analysis system to guarantee the integrity and extendibility of the system and provide a simplified,quick and large-scale data analysis product.Therefore,it can simplify analysis process,improve efficiency and reach business directly.From design to being used,the system has been through a year of design and development.At present,it can provide stable,accurate and timely data support to analysts and decision-makers of user behavior analysis.
【Key words】 Big Data; User Behavior; Analysis System; Data Processing; System Optimization;
- 【网络出版投稿人】 山东大学 【网络出版年期】2018年 04期
- 【分类号】TP311.52
- 【被引频次】6
- 【下载频次】372