Task 04
import pandas as pd
from sklearn.metrics import f1_score
# 转换为FastText需要的格式
train_df = pd.read_csv('train_set.csv', sep='\t', nrows=15000)
train_df['label_ft'] = '__label__' + train_df['label'].astype(str)
train_df[['text','label_ft']].iloc[:-5000].to_csv('train.csv', index=None, header=None, sep='\t')
train_df
label | text | label_ft | |
---|---|---|---|
0 | 2 | 2967 6758 339 2021 1854 3731 4109 3792 4149 15... | __label__2 |
1 | 11 | 4464 486 6352 5619 2465 4802 1452 3137 5778 54... | __label__11 |
2 | 3 | 7346 4068 5074 3747 5681 6093 1777 2226 7354 6... | __label__3 |
3 | 2 | 7159 948 4866 2109 5520 2490 211 3956 5520 549... | __label__2 |
4 | 3 | 3646 3055 3055 2490 4659 6065 3370 5814 2465 5... | __label__3 |
... | ... | ... | ... |
14995 | 5 | 1822 6040 5744 5310 4578 4407 6242 2313 3466 2... | __label__5 |
14996 | 9 | 88 7400 7539 4516 6122 290 6831 465 1647 6293 ... | __label__9 |
14997 | 0 | 2597 7160 2282 1407 4403 4516 2873 4597 7037 5... | __label__0 |
14998 | 0 | 2400 4411 4721 3289 5787 5096 4464 6250 1324 6... | __label__0 |
14999 | 8 | 4188 5778 5296 5640 2835 648 6122 2489 2923 39... | __label__8 |
15000 rows × 3 columns
import fasttext
model = fasttext.train_supervised('train.csv', lr=1.0, wordNgrams=2,
verbose=2, minCount=1, epoch=25, loss="hs")
val_pred = [model.predict(x)[0][0].split('__')[-1] for x in train_df.iloc[-5000:]['text']]
print(f1_score(train_df['label'].values[-5000:].astype(str), val_pred, average='macro'))
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-8-b9927618ef05> in <module>
----> 1 import fasttext
2 model = fasttext.train_supervised('train.csv', lr=1.0, wordNgrams=2,
3 verbose=2, minCount=1, epoch=25, loss="hs")
4 val_pred = [model.predict(x)[0][0].split('__')[-1] for x in train_df.iloc[-5000:]['text']]
5 print(f1_score(train_df['label'].values[-5000:].astype(str), val_pred, average='macro'))
ModuleNotFoundError: No module named 'fasttext'
最后得到结果应该是0.82, 但是fast text一直没安装成功卡这里了 TAT