本文使用方法来自CMU课程《Neural Networks for NLP》
1. BOW Bag of words
2. 语言python
3. 框架dynet
ps. 英文注解写得不好,望见谅
图解
- 第一部分数据读取
- 采用collections.defaultdict创建出可自增词典w2i(word to index),t2i(tag to index)
- 在读文件部分用yield方法返回循环结果,降低内存占有率
from collections import defaultdict
import time
import random
import dynet as dy
import numpy as np
# Functions to read in the corpus
# to initialize a dictionary with value of its length
# for example input w2i["a"], it will automatically give 0 to w2i["a"], t