如何使用 requests 库采集拉勾网数据

最新推荐文章于 2024-07-03 21:21:17 发布

qq^^614136809

最新推荐文章于 2024-07-03 21:21:17 发布

阅读量787

点赞数 9

文章标签：采集

本文链接：https://blog.csdn.net/D0126_/article/details/135400234

版权

一个常见的方法是结合使用 requests 库进行网页请求，以及 Beautiful Soup 或 lxml 库来解析和提取网页内容。以下是一个简单的示例：在这里插入图片描述

首先，你需要安装相应的库：

pip install requests
pip install beautifulsoup4

然后，可以使用以下示例代码：

import requests
from bs4 import BeautifulSoup
http://www.jshk.com.cn/mb/reg.asp?kefu=xiaoding；//爬虫IP免费获取；

url = 'https://www.lagou.com/jobs/list_Python?labelWords=&fromSearch=true&suginput='
headers = {'User-Agent': 'Your User-Agent'}  # 请替换为你的浏览器 User-Agent

response = requests.get(url, headers=headers)

if response.status_code == 200:
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # 在这里你可以使用 BeautifulSoup 提供的方法提取你需要的数据
    # 例如，你可以通过查看拉勾网的网页结构，找到职位标题、公司名称等元素，然后提取出来
    
    # 示例：
    job_titles = soup.select('.job-list li .position .p_top .position_link h3')
    for title in job_titles:
        print(title.text.strip())
else:
    print(f"Failed to fetch page. Status code: {response.status_code}")