我想给索拉尼库尔德语的限定词和介词做一个词性标记。我使用下面的代码将每个标记放在库尔德语文本中的每个命题或限定词之后。在import os
SOR = open("SOR-1.txt", "r+", encoding = 'utf-8')
old_text = SOR.read()
punkt = [".", "!", ",", ":", ";"]
text = ""
for i in old_text:
if i in punkt:
text+=" "+i
else:
text += i
d = {"DET":["ئێمە" , "ئێوە" , "ئەم" , "ئەو" , "ئەوان" , "ئەوەی", "چەند" ], "PREP":["بۆ","بێ","بێجگە","بە","بەبێ","بەدەم","بەردەم","بەرلە","بەرەوی","بەرەوە","بەلای","بەپێی","تۆ","تێ","جگە","دوای","دەگەڵ","سەر","لێ","لە","لەبابەت","لەباتی","لەبارەی","لەبرێتی","لەبن","لەبەینی","لەبەر","لەدەم","لەرێ","لەرێگا","لەرەوی","لەسەر","لەلایەن","لەناو","لەنێو","لەو","لەپێناوی","لەژێر","لەگەڵ","ناو","نێوان","وەک","وەک","پاش","پێش","" ], "punkt":[".", ",", "!"]}
text = text.split()
for w in text:
for pos in d:
if w in d[pos]:
SOR.write(w+"/"+pos+" ")
SOR.close()
我想做的是在定义的字典中的每个单词之后在文本中添加POS标记,但是结果是在文件末尾有一个单词和POS标记的单独列表。在