节点文献
面向大语言模型应用的数据服务平台研究
Research on data service platform for large language model applications
【摘要】 大语言模型应用效果依赖于高质量数据,从原始语料构建训练数据集和检索增强知识的过程中,端到端的数据管理和处理变得至关重要。当前数据服务面临着因数据处理质量差而影响大语言模型应用效果、数据准备效率低、实现的高复杂性和高成本等问题。为解决这些问题,文章提出一种面向大语言模型的数据协同服务方案,对原始语料、数据集和知识处理进行有效协同,基于算子可视化编排的自动化处理技术和跨平台统一计算调度框架,设计实现了一种端到端数据服务平台,能有效满足各类大语言模型应用对于数据的不同需求。该平台提升了数据质量、处理效率和灵活性,降低了成本,显著增强了大模型应用效果,具有较强的通用性和广阔的应用前景。
【Abstract】 The application effectiveness of large language models depends heavily on high-quality data. In the process of constructing training datasets from raw corpora and enhancing knowledge through retrieval, end-to-end data management and processing become critically important. The current data services face issues such as poor data processing quality affecting the performance of large language models, low efficiency in data preparation, and high complexity and high costs in implementation.To address these issues, the article proposes a data collaboration service scheme tailored for large language models, enabling effective collaboration in the processing of raw corpora, datasets, and knowledge. Based on operator visualization orchestration for automated processing and a unified cross-platform computing scheduling framework. An end-to-end data service platform is designed and implemented that can effectively meet the diverse data requirements of various large language model applications. This platform improves data quality, processing efficiency, and flexibility, reduces the cost, and significantly enhances the effectiveness of large model applications, demonstrating strong generality and broad application prospects.
【Key words】 large language model; collaborative services; operator visual arrangement; calculation schedule; data platform service;
- 【文献出处】 无线互联科技 ,Wireless Internet Science and Technology , 编辑部邮箱 ,2025年02期
- 【分类号】TP311.13;TP18
- 【下载频次】38