langchain UnstructuredCSVLoader 读取中文CSV报错

_温水青蛙_

已于 2023-12-28 14:18:36 修改

阅读量771

点赞数 9

文章标签： langchain 数据库前端

于 2023-12-28 14:16:17 首次发布

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/sinat_28704977/article/details/135268065

版权

注：langchain版本：0.0.352

使用langchain的UnstructuredCSVLoader读取带中文csv时：

file_path = “chinese.csv”
loader = UnstructuredCSVLoader(file_path=str(file_path))
docs = loader. Load()

因为编码问题，导致报错：

UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0xxx in position x: illegal multibyte sequence

修改UnstructuredCSVLoader类中的_get_elements函数如下：

    def _get_elements(self) -> List:
        from unstructured.partition.csv import partition_csv

        # #####debug code######
        # unstructuredCSVLoader加载中文csv错误修复
        try:
            elements = partition_csv(filename=self.file_path, **self.unstructured_kwargs)
        except:
            with open(self.file_path,'rb') as f:
                elements = partition_csv(file=f,**self.unstructured_kwargs)
        # ########code end###########

        return elements

即可。

问题为langchain集成三方库unstructured时编码问题导致。

关注

9
点赞
踩
9

收藏

觉得还不错? 一键收藏
0
评论
langchain UnstructuredCSVLoader 读取中文CSV报错

langchain UnstructuredCSVLoader 读取中文CSV报错
复制链接

扫一扫

_温水青蛙_ CSDN认证博客专家 CSDN认证企业博客

码龄9年

12: 原创

15万+: 周排名

14万+: 总排名

4万+: 访问

: 等级

619: 积分

13: 粉丝

42: 获赞

25: 评论

102: 收藏

私信

关注

热门文章

分类专栏

最新评论

RetinaFace在win10+CPU版mxnet+python36下配置运行
wewearef: 感谢！我解决了，我是linux上安装了cpu版本的mxnet，改一下cpu_nms.pyx文件中的那句，编译一下，然后test脚本里gpuid=-1，成功
RetinaFace在win10+CPU版mxnet+python36下配置运行
Benson_Yann: Hello 博主，根据你提供的修改后的setp.py.我还是有错误。它说：Cython.Compiler.Errors.CompileError: bbox.pyx、是什么问题呢？
ValueError: Shape must be rank 1 but is rank 0 for ‘batch_normalization_1/cond/Reshape_4‘ (op: ‘Resh
鹿为马: 大佬，太牛了。这才是真正的解决问题了
RetinaFace在win10+CPU版mxnet+python36下配置运行
qq_37501246: 博主你好，我按照你的方法，执行python setup.py build_ext --inplace，发现成功生成了.c文件，但没有生成.pyd文件，运行结果还是报错 ModuleNotFoundError: No module named 'rcnn.cython.bbox' ，请问博主这种是什么原因呢？
RetinaFace在win10+CPU版mxnet+python36下配置运行
_温水青蛙_ 回复 weixin_45397341: [code=python] # -------------------------------------------------------- # Fast R-CNN # Copyright (c) 2015 Microsoft # Licensed under The MIT License [see LICENSE for details] # Written by Ross Girshick # -------------------------------------------------------- import os try: from setuptools import setup from setuptools import Extension except ImportError: from distutils.core import setup from distutils.extension import Extension from os.path import join as pjoin # from setuptools import setup # from distutils.extension import Extension from Cython.Distutils import build_ext import numpy as np try: numpy_include = np.get_include() except AttributeError: numpy_include = np.get_numpy_include() ext_modules = [ Extension( "bbox", ["bbox.pyx"], extra_compile_args=["/openmp"], include_dirs=[numpy_include] ), Extension( "anchors", ["anchors.pyx"], extra_compile_args=["/openmp"], inc [/code]

您愿意向朋友推荐“博客详情页”吗？

强烈不推荐
不推荐
一般般
推荐
强烈推荐

提交

最新文章

目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。