作业day2:
1.股吧信息爬取:
url:http://guba.eastmoney.com/
要求:
1、爬取10页内容,保存到guba文件夹下
第一种方法
import requests,os
base_url='http://guba.eastmoney.com/'
headers={
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36',
}
kw='guba02'
filename = './guba02/'+kw
dirname = os.path.dirname(filename)
if not os.path.exists(dirname):
os.mkdir(dirname)
if not os.path.exists(filename):
os.mkdir(filename)
for i in range(10):
params={
'kw':kw,
'ie':'utf-8',
'pn':str(i*50),
}
response = requests.get(base_url,headers=headers,params=params)
with open(filename+'/{}.html'.format(i+1),'w',encoding='utf-8') as fp:
fp.write(response.text)
第二种方法
## 书本上到批量爬取百度贴吧
# 批量爬取上证指数
import requests
import os
def tieba4(kw,start,end):
dir_name = './guba/' +kw+'/'
if not os.path.exists(dir_name):
os.makedirs(dir_name)
payload = {'kw' :kw,'ie' :'utf-8'}
for i in range(int(start),int(end)+1):
pn = (i-1)*50
payload['pn'] = str(pn)
response = requests.get(base_url,params=payload)
html = response.content.decode('utf-8')
with open(dir_name+str(i)+'.html','w',encoding='utf-8')as f:
f.write(html)
if __name__ == '__main__':
base_url = 'http://guba.eastmoney.com/list,zssh000001.html'
kw = '上证指数'
start = 1
end = 10
tieba4(kw,start,end)
2、金山词霸:http://www.iciba.com/
做到和有道相似想过。
1.0版本
import requests,json
headers={
# 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36',
}
kw='kill'
data={'w':kw,}
base_url='http://fy.iciba.com/ajax.php?a=fy'
response = requests.post(base_url,headers=headers,data=data)
json_data = json.loads(response.text)
result = ''
for i in json_data['content']['word_mean']:
result+=i+'\n'
print(result)
E:\project\python.exe C:/Users/Administrator/Desktop/四阶xpat爬虫系列/Requests/Requests01/Requests01/作业系列/fanyi_jscb.py
vt.& vi. 杀死…;
vt. 使停止[结束,失败];破坏,减弱,抵消;使痛苦,使受折磨;使笑得前仰后合,使笑死了;
n. 杀死;猎;被捕杀的动物;猎物;
adj. 致命的;
Process finished with exit code 0
1.01版本添加input标签
import requests,json
url = 'http://fy.iciba.com/ajax.php?a=fy'
headers = {}
wk = input('输入单词')
data = {'w': wk}
response = requests.post(url,headers=headers,data=data)
json_data = json.loads(response.text)
result = ''
try:
for i in json_data['content']['word_mean']:
result += i + '\n'
except Exception as a:
print(a)
print(result)
E:\project\python.exe C:/Users/Administrator/Desktop/四阶xpat爬虫系列/Requests/Requests01/Requests01/作业系列/fanyi_jscb_tym.py
输入单词obj
abbr. object 物体;目标;(工程)项目;objection 反对;
Process finished with exit code 0