爬虫第一天

最新推荐文章于 2024-04-05 10:19:31 发布

QQの喵

最新推荐文章于 2024-04-05 10:19:31 发布

阅读量136

点赞数

分类专栏：爬虫

本文链接：https://blog.csdn.net/Q1231234123/article/details/85057002

版权

爬虫专栏收录该内容

2 篇文章 0 订阅

订阅专栏

爬虫步骤：

1. 导入requests

import requests

2. url和请求头

url = 'http://example.webscraping.com/places/default/user/login'	#要请求的页面地址
headers = {'User-Agent':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:63.0) Gecko/20100101 Firefox/63.0'}

3. 暂存数据

req = requests.get(url=url).content.decode()
#下面这个可以不要的
with open('example.html','w',encoding='utf-8') as f:
	f.write(req)

4. 数据过滤

req_key = re.findall(r'name="_formkey" type="hidden" value="(.*?)"',req)[0]

5. 数据保存

with open('date.txt’,'w',encoding='utf-8') as f:
	f.write(req_key)

补充：

能够使用with的条件：具有__enter__和__exit__方法

class WithObject(object):
    def __enter__(self):
        pass
    def __exit__(self, exc_type, exc_val, exc_tb):
        pass

翻译小程序：

import requests
import sys
import json
word = ' '.join(sys.argv[1:])

req_url = 'http://fanyi.youdao.com/translate'
Form_Date = {}
Form_Date['i'] = word
Form_Date['doctype'] = 'json'

response = requests.post(req_url,data=Form_Date)
html = response.content.decode()
html = json.loads(html)
print('❤ '*len(word))
print(word)
print('❤ '*len(word))
print(html['translateResult'][0][0]['tgt'])
print('❤ '*len(word))

结果：

运行结果

QQの喵

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
爬虫第一天

爬虫步骤：1. 导入requestsimport requests2. url和请求头url = 'http://example.webscraping.com/places/default/user/login' #要请求的页面地址headers = {'User-Agent':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:63.0) Geck...
复制链接

扫一扫