最近在搞深度学习时用到TF-IDF词向量空间的东西,在python3.6.5下运行代码:
vectorizer = TfidfVectorizer(
stop_words=stpwrdlst, sublinear_tf=True, max_df=0.5)
报错:UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 0: invalid start byte
解决方法:忽略error,将代码改为:
vectorizer = TfidfVectorizer(
stop_words=stpwrdlst, sublinear_tf=True, max_df=0.5, decode_error='ignore')
即添加:decode_error=''ignore'。