对于带空格的自定义词无法在词库文件中添加,因为词库文件中一个词条的格式为“word pos frequency”, 即“词 词性 词频”,举例:“单身狗 n 1024”,其中是以空格作为分隔符,所以如果自定义词中带空格,会引起格式错误。
解决方法: 通过代码动态添加
from pyhanlp import *
text = "Joseph Robinette Biden and his family moved into the White House"
print(HanLP.segment(text))
CustomDictionary = JClass("com.hankcs.hanlp.dictionary.CustomDictionary")
CustomDictionary.add("Joseph Robinette Biden","nr")
CustomDictionary.add("White House","ns")
# #CustomDictionary.remove("deleteword"); # 删除词语(注释掉试试)
print(HanLP.segment(text))