chariot
Deliver the ready-to-train data to your NLP model.
Prepare Dataset
You can prepare typical NLP datasets through the chazutsu.
Build & Run Preprocess
You can build the preprocess pipeline like scikit-learn Pipeline.
Preprocesses for each dataset column are executed in parallel by Joblib.
Multi-language text tokenization is supported by spaCy.
Format Batch
Sampling a batch from preprocessed dataset and format it to train the model (padding etc).
You can use pre-trained word vectors through the chakin.
chariot enables you to concentrate on training your model!
Install
pip install chariot
Prepare dataset
You can download various dataset by using chazutsu.
import chazutsu