python怎么调用chariot_为NLP模型提供准备好的训练数据,改善训练过程

chariot是一个用于NLP模型的数据准备工具,它通过chazutsu下载数据集,scikit-learn风格的预处理流水线,并支持spaCy的多语言分词。利用Joblib并行执行预处理,提供批量格式化,支持预训练词向量。通过定义和保存预处理流水线,简化模型训练流程。
摘要由CSDN通过智能技术生成

chariot

Deliver the ready-to-train data to your NLP model.

Prepare Dataset

You can prepare typical NLP datasets through the chazutsu.

Build & Run Preprocess

You can build the preprocess pipeline like scikit-learn Pipeline.

Preprocesses for each dataset column are executed in parallel by Joblib.

Multi-language text tokenization is supported by spaCy.

Format Batch

Sampling a batch from preprocessed dataset and format it to train the model (padding etc).

You can use pre-trained word vectors through the chakin.

chariot enables you to concentrate on training your model!

Install

pip install chariot

Prepare dataset

You can download various dataset by using chazutsu.

import chazutsu

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值