pandas用read_csv时编码问题解决

在pandas用read_csv时,遇到编码错误的, 可带

encoding : str, default None

Encoding to use for UTF when reading/writing (ex. ‘utf-8’)

官网的标准编码类型解释,其中GBK GB2312 GB18030 UTF-8是经常遇到的问题,

https://docs.python.org/3/library/codecs.html#standard-encodings

 

CodecAliasesLanguages
ascii646, us-asciiEnglish
big5big5-tw, csbig5Traditional Chinese
big5hkscsbig5-hkscs, hkscsTraditional Chinese
cp037IBM037, IBM039English
cp273273, IBM273, csIBM273

German

New in version 3.4.

cp424EBCDIC-CP-HE, IBM424Hebrew
cp437437, IBM437English
cp500EBCDIC-CP-BE, EBCDIC-CP-CH, IBM500Western Europe
cp720 Arabic
cp737 Greek
cp775IBM775Baltic languages
cp850850, IBM850Western Europe
cp852852, IBM852Central and Eastern Europe
cp855855, IBM855Bulgarian, Byelorussian, Macedonian, Russian, Serbian
cp856 Hebrew
cp857857, IBM857Turkish
cp858858, IBM858Western Europe
cp860860, IBM860Portuguese
cp861861, CP-IS, IBM861Icelandic
cp862862, IBM862Hebrew
cp863863, IBM863Canadian
cp864IBM864Arabic
cp865865, IBM865Danish, Norwegian
cp866866, IBM866Russian
cp869869, CP-GR, IBM869Greek
cp874 Thai
cp875 Greek
cp932932, ms932, mskanji, ms-kanjiJapanese
cp949949, ms949, uhcKorean
cp950950, ms950Traditional Chinese
cp1006 Urdu
cp1026ibm1026Turkish
cp11251125, ibm1125, cp866u, ruscii

Ukrainian

New in version 3.4.

cp1140ibm1140Western Europe
cp1250windows-1250Central and Eastern Europe
cp1251windows-1251Bulgarian, Byelorussian, Macedonian, Russian, Serbian
cp1252windows-1252Western Europe
cp1253windows-1253Greek
cp1254windows-1254Turkish
cp1255windows-1255Hebrew
cp1256windows-1256Arabic
cp1257windows-1257Baltic languages
cp1258windows-1258Vietnamese
cp65001 

Windows only: Windows UTF-8 (CP_UTF8)

New in version 3.3.

euc_jpeucjp, ujis, u-jisJapanese
euc_jis_2004jisx0213, eucjis2004Japanese
euc_jisx0213eucjisx0213Japanese
euc_kreuckr, korean, ksc5601, ks_c-5601, ks_c-5601-1987, ksx1001, ks_x-1001Korean
gb2312chinese, csiso58gb231280, euc- cn, euccn, eucgb2312-cn, gb2312-1980, gb2312-80, iso- ir-58Simplified Chinese
gbk936, cp936, ms936Unified Chinese
gb18030gb18030-2000Unified Chinese
hzhzgb, hz-gb, hz-gb-2312Simplified Chinese
iso2022_jpcsiso2022jp, iso2022jp, iso-2022-jpJapanese
iso2022_jp_1iso2022jp-1, iso-2022-jp-1Japanese
iso2022_jp_2iso2022jp-2, iso-2022-jp-2Japanese, Korean, Simplified Chinese, Western Europe, Greek
iso2022_jp_2004iso2022jp-2004, iso-2022-jp-2004Japanese
iso2022_jp_3iso2022jp-3, iso-2022-jp-3Japanese
iso2022_jp_extiso2022jp-ext, iso-2022-jp-extJapanese
iso2022_krcsiso2022kr, iso2022kr, iso-2022-krKorean
latin_1iso-8859-1, iso8859-1, 8859, cp819, latin, latin1, L1West Europe
iso8859_2iso-8859-2, latin2, L2Central and Eastern Europe
iso8859_3iso-8859-3, latin3, L3Esperanto, Maltese
iso8859_4iso-8859-4, latin4, L4Baltic languages
iso8859_5iso-8859-5, cyrillicBulgarian, Byelorussian, Macedonian, Russian, Serbian
iso8859_6iso-8859-6, arabicArabic
iso8859_7iso-8859-7, greek, greek8Greek
iso8859_8iso-8859-8, hebrewHebrew
iso8859_9iso-8859-9, latin5, L5Turkish
iso8859_10iso-8859-10, latin6, L6Nordic languages
iso8859_11iso-8859-11, thaiThai languages
iso8859_13iso-8859-13, latin7, L7Baltic languages
iso8859_14iso-8859-14, latin8, L8Celtic languages
iso8859_15iso-8859-15, latin9, L9Western Europe
iso8859_16iso-8859-16, latin10, L10South-Eastern Europe
johabcp1361, ms1361Korean
koi8_r Russian
koi8_t 

Tajik

New in version 3.5.

koi8_u Ukrainian
kz1048kz_1048, strk1048_2002, rk1048

Kazakh

New in version 3.5.

mac_cyrillicmaccyrillicBulgarian, Byelorussian, Macedonian, Russian, Serbian
mac_greekmacgreekGreek
mac_icelandmacicelandIcelandic
mac_latin2maclatin2, maccentraleuropeCentral and Eastern Europe
mac_romanmacroman, macintoshWestern Europe
mac_turkishmacturkishTurkish
ptcp154csptcp154, pt154, cp154, cyrillic-asianKazakh
shift_jiscsshiftjis, shiftjis, sjis, s_jisJapanese
shift_jis_2004shiftjis2004, sjis_2004, sjis2004Japanese
shift_jisx0213shiftjisx0213, sjisx0213, s_jisx0213Japanese
utf_32U32, utf32all languages
utf_32_beUTF-32BEall languages
utf_32_leUTF-32LEall languages
utf_16U16, utf16all languages
utf_16_beUTF-16BEall languages
utf_16_leUTF-16LEall languages
utf_7U7, unicode-1-1-utf-7all languages
utf_8U8, UTF, utf8all languages
utf_8_sig all languages

转载于:https://www.cnblogs.com/stephen2016/p/6113204.html

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值