用了几天时间写成的爬取前程无忧的当日的招聘信息,通过多线程的方式同时爬取多个城市的信息,作为资料保存下来,一下是完整代码,可以直接复制粘贴爬取
这里爬取的数据条件是是24小时内,周末双休的,会在当前文件下创建一个文件夹,并且在当前的文件夹下创建文件,如果昨天已经爬取过了,今天会将昨日的信息全部删除重新下载
import urllib.request
import urllib.parse
from bs4 import BeautifulSoup
import re
import os
import time
import datetime
from threading import Thread
def city_request(city, i, headers):
if city == '徐州':
arguments = '071100'
elif city == '广州':
arguments = '030200'
elif city == '北京':
arguments = '010000'
elif city == '上海':
arguments = '020000'
elif city == '杭州':
arguments = '080200'
url = "https://search.51job.com/list/" + arguments + ",000000,0000,00,0,99,%2B,2," + str(i) + '.html?welfare=04'
request_head = urllib.request.Request(url=url, headers=headers)
return request_head
def txt(list, file_name):
'''将获取的内容写入到TXT文件中'''
for i in list:
b = str(i)
b += '\n'
with open(file_name, 'a') as f:
f.write(b)
def analyze_data(data, now_time, city