Noun
- mimic 临摹
- mimicry 模仿、模仿的技巧
- genre 类型、体裁、样式
- diversion 消遣、分散注意力
- Text Normalizing means converting it to a more convenient, standard form.
- tokenization 标记化、词语切分
- hashtag 标签
- lemmatization determining that two words have the same root, despite their surface differences.
- morphologically 形态学地
- Stemming refers to a simpler version of lemmatization in which we mainly just strip suffixes from the end of the word.
- edit distance
- corpus 语料库
- delimit 限制、定…的界
- square braces [ ]
- dash -
- caret ^
- asterisk *
- Kleene * means zero or more occurrences of the immediately previous character or regular expression.
- Kleene + means one or more occurrences of the immediately previous character or regular expression.
- wildcard.
- anchor
- ^ start of a line
- $ end of a line
- \b matches a