How to avoid decoding to str: need a bytes-like object error in pandas?

在这里插入图片描述代码 :

data = pd.read_csv('asscsv2.csv', encoding = "ISO-8859-1", error_bad_lines=False);
data_text = data[['content']]
data_text['index'] = data_text.index
documents = data_text

输出

print(documents[:2])
                                              content  index
 0  Pretty extensive background in Egyptology and ...      0
 1  Have you guys checked the back end of the Sphi...      1

预处理函数

stemmer = PorterStemmer()
def lemmatize_stemming(text):
    return stemmer.stem(WordNetLemmatizer().lemmatize(text, pos='v'))
def preprocess(text):
    result = []
    for token in gensim.utils.simple_preprocess(text):
        if token not in gensim.parsing.preprocessing.STOPWORDS and len(token) > 3:
            result.append(lemmatize_stemming(token))
    return result
processed_docs = documents['content'].map(preprocess)

报错

TypeError: decoding to str: need a bytes-like object, float found

This :

processed_docs = documents['content'].map(preprocess)

is because the data frame in some cells has NaN values that can not be preprocessed, for that, you have to drop:

documents.dropna(subset = ["content"], inplace=True) # drop those rows which have NaN value cells

those unrequired rows and then apply the preprocessing.

Your data has NaNs(not a number).

You can either drop them first:

documents = documents.dropna(subset=['content'])

Or, you can fill all NaNs with an empty string, convert the column to string type and then map your string based function.

documents['content'].fillna('').astype(str).map(preprocess)

This is because your function preprocess has function calls that accept string only data type.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
TypeError: decoding to str: need a bytes-like object, float found是一个编码解码错误。这个错误通常在数据处理中经常出现,主要是因为在解码字符串时传入了一个浮点数对象。 在你提供的代码中,这个错误可能是由于使用了错误的解码方法或者在解码过程中传入了一个浮点数而不是一个字节对象。可以尝试更改编码方式或者检查数据是否正确。例如,在读取CSV文件时,可以使用不同的编码方式来解决这个问题,如使用utf-8编码: ```python data = pd.read_csv('asscsv2.csv', encoding='utf-8', error_bad_lines=False) ``` 另外,还可以检查数据是否包含了不符合预期的类型,比如是否有浮点数值传递给了解码函数。确保数据类型正确可以避免这个错误的发生。 在调整网络时遇到的其他错误,如在添加权重时出现的错误,可以通过查看错误信息的上下文来进行分析和解决。 通过检查相关代码行并查看错误报告中的堆栈跟踪信息,可以定位到具体的问题所在,并采取相应的修复措施。 总结起来,解决TypeError: decoding to str: need a bytes-like object, float found的方法包括更改编码方式,检查数据类型是否正确以及通过查看上下文进行分析和解决其他错误。<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* *2* [How to avoid decoding to str: need a bytes-like object error in pandas?](https://blog.csdn.net/Victoria_yangyu/article/details/120571715)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 50%"] - *3* [TypeError: int() argument must be a string, a bytes-like object or a number, not ‘tuple’](https://download.csdn.net/download/weixin_38624557/13740663)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 50%"] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值