wget https://dl.fbaipublicfiles.com/fasttext/data/cooking.stackexchange.tar.gz
//将数据集解压到某个文件夹下
tar xzvf cooking.stackexchange.tar.gz -c ./data
5、简单使用fasttext
//进入python3命令行
dreamdeMacBook-Pro:fastText user$ python3
Python 3.7.2 (v3.7.2:9a3ffc0492, Dec 24 2018, 02:44:43)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
//导入fasttext模块
>>> import fasttext
//以cooking.stackexchange.txt为训练集训练模型model
>>> model = fasttext.train_supervised('./data/cooking.stackexchange.txt')
Read 0M words
Number of words: 16568
Number of labels: 736
Progress: 100.0% words/sec/thread: 86310 lr: 0.000000 avg.loss: 9.870326 ETA: 0h 0m 0s
6、预测某串字符串的标签
//默认返回1个概率最高的标签
>>> print(model.predict("Cook a frozen cobbler in a microwave instead of oven"))
(('__label__baking',), array([0.0728373]))
//k=3表示,返回3个预测标签
>>> print(model.predict("Cook a frozen cobbler in a microwave instead of oven", k=3))
(('__label__baking', '__label__equipment', '__label__substitutions'), array([0.0728373 , 0.03996802, 0.03626037]))
//可以同时预测多个字符串
>>> print(model.predict(["Cook a frozen cobbler in a microwave instead of oven", "Michelin Three Star Restaurant"]))
([['__label__baking'], ['__label__food-safety']], [array([0.0728373], dtype=float32), array([0.02446871], dtype=float32)])