问题1:UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xc4 in position 0: invalid continuation byte
大家好!我是老码农。
今天分享在Pandas中读取csv文件遇到的问题。
错误日志
-
关键信息
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc4 in position 0: invalid continuation byte
-
完整信息
UnicodeDecodeError Traceback (most recent call last) Cell In[4], line 1 ----> 1 df_csv = pd.read_csv("./input/2023-CSP-J.csv") File ~\miniconda3\envs\coder\lib\site-packages\pandas\io\parsers\readers.py:948, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend) 935 kwds_defaults = _refine_defaults_read( 936 dialect, 937 delimiter, (...) 944 dtype_backend=dtype_backend, 945 ) 946 kwds.update(kwds_defaults) --> 948 return _read(filepath_or_buffer, kwds) File ~\miniconda3\envs\coder\lib\site-packages\pandas\io\parsers\readers.py:611, in _read(filepath_or_buffer, kwds) 608 _validate_names(kwds.get("names", None)) 610 # Create the parser. --> 611 parser = TextFileReader(filepath_or_buffer, **kwds) 613 if chunksize or iterator: 614 return parser File ~\miniconda3\envs\coder\lib\site-packages\pandas\io\parsers\readers.py:1448, in TextFileReader.__init__(self, f, engine, **kwds) 1445 self.options["has_index_names"] = kwds["has_index_names"] 1447 self.handles: IOHandles | None = None -> 1448 self._engine = self._make_engine(f, self.engine) File ~\miniconda3\envs\coder\lib\site-packages\pandas\io\parsers\readers.py:1723, in TextFileReader._make_engine(self, f, engine) 1720 raise ValueError(msg) 1722 try: -> 1723 return mapping[engine](f, **self.options) 1724 except Exception: 1725 if self.handles is not None: File ~\miniconda3\envs\coder\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py:93, in CParserWrapper.__init__(self, src, **kwds) 90 if kwds["dtype_backend"] == "pyarrow": 91 # Fail here loudly instead of in cython after reading 92 import_optional_dependency("pyarrow") ---> 93 self._reader = parsers.TextReader(src, **kwds) 95 self.unnamed_cols = self._reader.unnamed_cols 97 # error: Cannot determine type of 'names' File parsers.pyx:579, in pandas._libs.parsers.TextReader.__cinit__() File parsers.pyx:668, in pandas._libs.parsers.TextReader._get_header() File parsers.pyx:879, in pandas._libs.parsers.TextReader._tokenize_rows() File parsers.pyx:890, in pandas._libs.parsers.TextReader._check_tokenize_status() File parsers.pyx:2050, in pandas._libs.parsers.raise_parser_error() UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc4 in position 0: invalid continuation byte
解决方案
-
Step1:看错误日志:大概能判断是转码出的问题,一般这种问题通常文件中有中文字符串导致发生的。
-
Step2:在windows上我们可以用记事本或者notepad+打开该文件,如下图,我们用记事本打开该文件。
- Step3: 我们先看下当前文件的编码是啥,如下图,另存为
- Step4:我们看到当前文件编码:ANSI
- Step5:我们把编码改成UTF-8,然后保存文件
- Step6: 我们再重新执行代码,发现OK了。
我是老码农
大家好!我是老码农。今天就分享到这里。
一起分享最好用的工具,最佳的资源,让我们少脱发,高效的工作,我们都是最帅的研发。
点赞、分享、留言是老码农持续分享最优质资源原动力,谢谢大家支持。