Datasets for Large Language Models: A Comprehensive Survey

169 篇文章 7 订阅 ¥99.90 ¥299.90
515 篇文章 3 订阅

已下架不支持订阅

本文是LLM系列文章,针对《Datasets for Large Language Models: A Comprehensive Survey》的翻译。

摘要

本文对大型语言模型(LLM)数据集进行了探索,这些数据集在LLM的显著进步中发挥着至关重要的作用。数据集作为基础基础设施,类似于支撑和培育LLM发展的根系统。因此,对这些数据集的检查成为研究中的一个关键课题。为了解决LLM数据集目前缺乏全面概述和彻底分析的问题,并深入了解其现状和未来趋势,本调查从五个角度对LLM数据集中的基本方面进行了整合和分类:(1)预训练语料库;(2) 指令微调数据集;(3) 偏好数据集;(4) 评估数据集;(5) 传统的自然语言处理(NLP)数据集。该调查揭示了当前的挑战,并指出了未来调查的潜在途径。此外,还提供了对现有可用数据集资源的全面审查,包括444个数据集的统计数据,涵盖8个语言类别和32个领域。来自20个维度的信息被纳入数据集统计。所调查的预训练语料库的总数据量超过774.5 TB,其他数据集的实例数超过700M。我们的目标是展示LLM文本数据集的整个面貌,为该领域的研究人员提供全面的参考,并为未来的研究做出贡献。相关资源可访问:https://github.com/lmmlzn/Awesome-LLMs-Datasets.

1 引言

随着ChatGPT的发布,在短短几个月内,大型语言模型

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 2
    评论
Visual segmentation is one of the most important tasks in computer vision, which involves dividing an image into multiple segments, each of which corresponds to a different object or region of interest in the image. In recent years, transformer-based methods have emerged as a promising approach for visual segmentation, leveraging the self-attention mechanism to capture long-range dependencies in the image. This survey paper provides a comprehensive overview of transformer-based visual segmentation methods, covering their underlying principles, architecture, training strategies, and applications. The paper starts by introducing the basic concepts of visual segmentation and transformer-based models, followed by a discussion of the key challenges and opportunities in applying transformers to visual segmentation. The paper then reviews the state-of-the-art transformer-based segmentation methods, including both fully transformer-based approaches and hybrid approaches that combine transformers with other techniques such as convolutional neural networks (CNNs). For each method, the paper provides a detailed description of its architecture and training strategy, as well as its performance on benchmark datasets. Finally, the paper concludes with a discussion of the future directions of transformer-based visual segmentation, including potential improvements in model design, training methods, and applications. Overall, this survey paper provides a valuable resource for researchers and practitioners interested in the field of transformer-based visual segmentation.
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

UnknownBody

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值