pythonpost请求修改编码格式_Python POST请求编码

here's the situation, i'm sending POST requests and trying to fetch the response with Python

problem is that it distorts non latin letters, which doesn't happen when i fetch the same page with direct link (with no search results), but POST requests wont generate a link

here's what i do:

import urllib

import urllib2

url = 'http://donelaitis.vdu.lt/main_helper.php?id=4&nr=1_2_11'

data = 'q=bus&ieskoti=true&lang1=en&lang2=en+-%3E+lt+%28+71813+lygiagre%C4%8Di%C5%B3+sakini%C5%B3+%29&lentele=vertikalus&reg=false&rodyti=dalis&rusiuoti=freq'

req = urllib2.Request(url, data)

response = urllib2.urlopen(req)

the_page = response.read()

file = open("pagesource.txt", "w")

file.write(the_page)

file.close()

whenever i try

thepage = the_page.encode('utf-8')

i get this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 1008: ordinal not in range(128)

whenever i try do change response header Content-Type:text/html;charset=utf-8, i do

response['Content-Type'] = 'text/html;charset=utf-8'

i get this error:

AttributeError: addinfourl instance has no attribute '__setitem__'

My question: is it possible to edit or remove response or request headers?

if not, is there another way to solve this problem other that copying source to notepad++ and fixing encoding manually?

i'm new to python and data mining, really hope you'd let me know if i;m doing something wrong

thanks

解决方案

Why don't your try thepage = the_page.decode('utf-8')instead of encode since what you want is to move from utf-8 encoded text to unicode - coding agnostic - internal strings?

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值