大体思路:首先了解一下51job的url地址的规则(听上去有点扯😅,但其实也是最为便捷的方式),然后发送一下http请求,接受返回值并正确解析即可。
具体代码:
首先准备一下接收类(实体类)
package testpoll;
import lombok.Data;
@Data
public class Jobs {
private Integer jobId;//自动增长id
private String jobName; //岗位
private String companyName;//公司名
private String workAddr;//公司地址
private String salary;//薪水
private String pushDate;//发布日期
private String url;//跳转地址
}
然后还需要一个解析器(将返回的页面,从中取出想要的值)页面返回的是一整个json对象,直接解析就好了
public static List<Job> jobParse(String entity) {
List<Job> data = new ArrayList<Job>();
JSONObject jsonObject = JSON.parseObject(entity);
List<JSONObject> t =(List<JSONObject>)jsonObject.get("engine_search_result");
for (JSONObject element : t) {
Job jobs = new Job();
String title = (String)element.get("job_name");
String complany = (String)element.get("company_name");
String address = (String)element.get("workarea_text");
String salary = (String)element.get("providesalary_text");
String dates = (String)element.get("issuedate");
String href = (String)element.get("job_href");
if (StringUtils.isEmpty(title) || StringUtils.isEmpty(complany)|| StringUtils.isEmpty(address)
|| StringUtils.isEmpty(salary) || StringUtils.isEmpty(dates) ||StringUtils.isEmpty(href)) {
continue;
}
jobs.setJobName(title);
jobs.setCompanyName(complany);
jobs.setWorkAddr(address);
jobs.setSalary(salary);
jobs.setPushDate(dates);
jobs.setUrl(href);
data.add(jobs);
}
return data;
}
最后就是调用过程了(发送http请求的方式有很多种,我这里使用的是hutool的HttpUtil,其实RestTemplate或者HttpClient这些都是一样的)
public static List<Job> getJobs(){
//https://search.51job.com/list/080200%252C020000%252C070200%252C070300%252C080300,000000,0000,00,9,99,java,2,1.html
//https://search.51job.com/list/080200%252C020000%252C070200%252C070300%252C080300,000000,0000,00,9,99,java,2,1.html?workyear=01
int pagesize = 20;
List<Job> ans = new ArrayList<>();
for (int i = 1; i <= pagesize; i++) {
String url = "https://search.51job.com/list/080200%252C020000%252C070200%252C070300%252C080300,000000,0000,00,9,99,java,2," + i + ".html";
String s = HttpUtil.get(url);
ans.addAll(JobUtil.jobParse(s));
}
return ans;
}
结果:
最后得到的数据,放到redis这样的缓存中或者数据库即可。