python爬取网页资源(针对特定网页)
由于刚刚接触python,代码存在缺陷,敬请谅解
代码:
import urllib.request
import os
import time
def get_liti(url):
html = get_page(url)
b=html.find('题目')
begin = b+2
end = int(html.find(' 程序分析:'))
timu = html[begin:end]
return timu
def url_open(url):
response = urllib.request.urlopen(url)
return response
def get_page(url):
response=url_open(url)
html = response.read().decode('utf-8')
return html
def download_liti(total):
temp = 1
liti=[]
while temp<=total:
url="https://www.runoob.com/python/python-exercise-example"+str(temp)+".html"
temp+=1
if temp== 3:
continue
a="例题"+str(temp-1)+get_liti(url)
liti.append(a)
with open("E://timu.txt","w+",encoding="utf-8") as file:
for each in liti:
file.write(each+"\n")
file.close()
download_liti(100)
错误原因:windows默认保存文本编码方式为GBK
with open(“E://timu.txt”,“w+”) as file:
for each in liti:
file.write(each+"\n")
file.close()
解决方法: 编码方式改为UTF-8
with open(“E://timu.txt”,“w+”,encoding=“utf-8”) as file:
for each in liti:
file.write(each+"\n")
file.close()

被折叠的 条评论
为什么被折叠?



