Hadoop大数据综合案例2-HttpClient与Python招聘网数据采集

Hadoop大数据招聘网数据分析综合案例

  1. Hadoop大数据综合案例1-Hadoop2.7.3伪分布式环境搭建
  2. Hadoop大数据综合案例2-HttpClient与Python招聘网数据采集
  3. Hadoop大数据综合案例3-MapReduce数据预处理
  4. Hadoop大数据综合案例4-Hive数据分析
  5. Hadoop大数据综合案例5-SSM可视化基础搭建
  6. Hadoop大数据综合案例6–数据可视化(SpringBoot+ECharts)

在大数据时代背景下,未被使用的信息比例高达99.4%,原因很大程度都是由于高价值的信息无法获取采集。因此,如何从大数据中采集出有用的信息已经是大数据发展的关键因素之一,数据采集可视为大数据产业的基石。
在编写数据采集程序之前,先对网络数据采集所涉及的知识做简单介绍,已奠定网络数据采集的基础知识。

HTTP请求过程

在浏览器中输入一个URL,链接便可以在浏览器页面中浏览该URL的页面内容,从输入的URL链接到浏览页面内容,整个过程是通过浏览器向网站所在服务器发送了一个HTTP请求,请求头会包含一些这个请求的信息,服务器接收到请求后进行处理和解析,返回一个HTTP响应,浏览器接收返回的响应,响应中包含页面的源代码等内容,浏览器接收到响应后对其进行解析,最终将网页内容呈现在浏览器窗口中。
在这里插入图片描述

采集案例

采集步骤

1. 分析网页-->确定采集的数据 
2. 爬取数据
  1. 请求信息(url,method,request header,data)
  2. 导入依赖(httpclient,hadoop-client)
  3. 编写爬虫
     1. 创建一个POST请求,把URL给它
	   2. 设置这个请求的请求头(content-type,user-agent,referer,cookie)
	   3. 设置这个请求的传递数据(first,pn,kd) -->解决编码
	   4. 执行请求
	   5. 获取结果-->可以设置编码
3. 把数据保存到HDFS中(I/O流) 
  1. 确保HDFS服务已经启动
	2. 创建一个Hadoop配置文件 conf
  3. 通过配置对象获取FileSystem 文件对象--> HDFS 文件系统
	4. 文件的读写操作( fs.create/写/FSDataOutputStream   fs.open/读/FSDataInputStream  )
	5. 把爬取的数据写到指定的文件中
	6. 提示信息
4. 检测

Python request采集

import requests
import time

url = 'https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false'
headers = {
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.72 Safari/537.36",
    "referer": "https://www.lagou.com/jobs/list_%E5%A4%A7%E6%95%B0%E6%8D%AE?labelWords=&fromSearch=true&suginput=",
    "origin": "https://www.lagou.com",
    "cookie": "RECOMMEND_TIP=true; user_trace_token=20210118143031-ede97a5b-77bd-4487-a492-d1121f24ff62; LGUID=20210118143031-ff294a5f-41fb-4627-83c4-41343998f128; _ga=GA1.2.1737742795.1610951427; JSESSIONID=ABAAABAABAGABFAF65FCDBA86B0BDB8642A0A207813F54D; WEBTJ-ID=20210418%E4%B8%8A%E5%8D%8810:48:57104857-178e2e195248-0fc970dddb106c-d7e1938-1440000-178e2e1952514; PRE_UTM=; PRE_LAND=https%3A%2F%2Fwww.lagou.com%2F; privacyPolicyPopup=false; _gat=1; LGSID=20210418104907-d83a2861-91e1-45e1-9393-d54197221c2d; PRE_HOST=www.baidu.com; PRE_SITE=https%3A%2F%2Fwww.baidu.com%2Fs%3Ftn%3D02003390%5F42%5Fhao%5Fpg%26ie%3Dutf-8%26wd%3Dlagou; sensorsdata2015session=%7B%7D; Hm_lvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1617717436,1617719456,1617782807,1618714139; _gid=GA1.2.90038644.1618714139; index_location_city=%E5%85%A8%E5%9B%BD; TG-TRACK-CODE=index_search; __lg_stoken__=5daaeb170a87fb743bddb84557aa24c4295e42b2af50f9237b9015a10142c2eb6ff36389cbddaa671b715511101a6f5b07f6329469455ca6b1b231bc2ca6ccfb73489b0a3401; X_HTTP_TOKEN=ab94b9f45077f14e4614178161e2224e9205fea79c; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%2217714300602ae-088bb004b54025-303464-1440000-177143006033a8%22%2C%22first_id%22%3A%22%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E8%87%AA%E7%84%B6%E6%90%9C%E7%B4%A2%E6%B5%81%E9%87%8F%22%2C%22%24latest_search_keyword%22%3A%22lagou%22%2C%22%24latest_referrer%22%3A%22https%3A%2F%2Fwww.baidu.com%2Fs%22%2C%22%24os%22%3A%22Windows%22%2C%22%24browser%22%3A%22Chrome%22%2C%22%24browser_version%22%3A%2290.0.4430.72%22%7D%2C%22%24device_id%22%3A%2217714300602ae-088bb004b54025-303464-1440000-177143006033a8%22%7D; LGRID=20210418104925-a283ca0d-154d-4dcd-b18f-3c1eced0c71e; Hm_lpvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1618714157; SEARCH_ID=76d0bb86d8dd4ec08baba352d6b41e5d"
}
data = {
    'first': 'true',
    'pn': '1',
    'kd': '大数据'
}
for i in range(25, 31):
    data['pn'] = i
    resp = requests.post(url, headers=headers, data=data)
    result = resp.text
    with open(f'lagou/{i}.json', mode='w', encoding='utf-8') as f:
        f.write(result)
        print(result)
    time.sleep(5)

HttpClient采集

认识HttpClient

HttpClientApache Jakarta Common 下的子项目,可以用来提供高效的、最新的、功能丰富的支持 HTTP 协议的客户端编程工具包,并且它支持 HTTP 协议最新的版本和建议。
使用HttpClient发送请求、接收响应很简单,一般需要如下几步即可。

  1. 创建HttpClient对象。
  2. 创建请求方法的实例,并指定请求URL。如果需要发送GET请求,创建HttpGet对象;如果需要发送POST请求,创建HttpPost对象。
  3. 如果需要发送请求参数,可调用HttpGetHttpPost共同的setParams(HttpParams params)方法来添加请求参数;对于HttpPost对象而言,也可调用setEntity(HttpEntity entity)方法来设置请求参数。
  4. 调用HttpClient对象的execute(HttpUriRequest request)发送请求,该方法返回一个HttpResponse
  5. 调用HttpResponsegetAllHeaders()getHeaders(String name)等方法可获取服务器的响应头;调用HttpResponsegetEntity()方法可获取HttpEntity对象,该对象包装了服务器的响应内容。程序可通过该对象获取服务器的响应内容。
  6. 释放连接。无论执行方法是否成功,都必须释放连接。

新建Maven项目导入所需依赖

<dependencies>
  <dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
    <version>4.5.13</version>
  </dependency>
  <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version>2.7.3</version>
  </dependency>
</dependencies>

编写程序采集所需数据

package org.apache.ssm;

import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.util.EntityUtils;

import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.List;

public class HttpClientData {

    /**
     * @apiNote 采集数据并打印输出
     * @param pageNo 页码
     * @return 返回采集到的JSON数据
     * @throws IOException
     */
    private static String getDataInfo(int pageNo) throws IOException {
        // 请求地址
        String url = "https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false";
        // 创建一个Post请求
        HttpPost httpPost = new HttpPost(url);
        //创建请求配置对象
        //RequestConfig config = RequestConfig.custom().setConnectTimeout(6000).setSocketTimeout(6000).build();
        // 把配置对象添加到Post请求中
        //httpPost.setConfig(config);
        // 设置请求头信息
        httpPost.setHeader("content-type","application/x-www-form-urlencoded; charset=UTF-8");
        httpPost.setHeader("cookie", "RECOMMEND_TIP=true; user_trace_token=20210118143031-ede97a5b-77bd-4487-a492-d1121f24ff62; LGUID=20210118143031-ff294a5f-41fb-4627-83c4-41343998f128; _ga=GA1.2.1737742795.1610951427; index_location_city=%E5%85%A8%E5%9B%BD; PRE_UTM=; JSESSIONID=ABAAAECABFAACEA074769E774E4533F67C2AEC31A2EB224; WEBTJ-ID=2021053%E4%B8%8A%E5%8D%888:52:05085205-1792fb5fabb0-045862def62319-d7e1739-1440000-1792fb5fabf8; privacyPolicyPopup=false; sensorsdata2015session=%7B%7D; _gat=1; _gid=GA1.2.15169016.1620003128; LGSID=20210503085217-c0e45ee9-0cde-4720-9e37-59aae4fd5a1a; PRE_HOST=www.baidu.com; PRE_SITE=https%3A%2F%2Fwww.baidu.com%2Flink%3Furl%3DfdDI0PhZVCLR6zYHlUwMJ%5Fkw6pQxMi35rxLqXhlbrjm%26wd%3D%26eqid%3Dc6804cdc0065b25600000006608f4934; PRE_LAND=https%3A%2F%2Fwww.lagou.com%2F; TG-TRACK-CODE=index_search; __lg_stoken__=e7a594ae90198c288332f2f6cec5a6f7e69a23abffe55b600caff2851b3b43709e395908cc34d9d6fbbdfe086dbca4e2087c5ce1d97ab48221f0ccb8798e1cd72f74c723e7ed; X_MIDDLE_TOKEN=d70348dcc10f405ab9eda4b1e5b3a90f; Hm_lvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1617782807,1618714139,1620003128,1620003254; gate_login_token=c16fac548f075c05be661daf0872e24b4f84c2f34af60b84c8d0fff9b2710b52; LG_LOGIN_USER_ID=da2823d7ae2edbf62c8c4830b7997b738efc9dcecfe97481953af19fe219abec; LG_HAS_LOGIN=1; _putrc=28C8D87FCBCD3FCF123F89F2B170EADC; login=true; unick=%E5%B4%94%E5%A4%A7%E6%B4%AA; showExpriedIndex=1; showExpriedCompanyHome=1; showExpriedMyPublish=1; hasDeliver=15; X_HTTP_TOKEN=ab94b9f45077f14e4133000261e2224e9205fea79c; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%2213988995%22%2C%22first_id%22%3A%2217714300602ae-088bb004b54025-303464-1440000-177143006033a8%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6%89%93%E5%BC%80%22%2C%22%24latest_referrer%22%3A%22%22%2C%22%24os%22%3A%22Windows%22%2C%22%24browser%22%3A%22Chrome%22%2C%22%24browser_version%22%3A%2290.0.4430.93%22%2C%22lagou_company_id%22%3A%22%22%7D%2C%22%24device_id%22%3A%2217714300602ae-088bb004b54025-303464-1440000-177143006033a8%22%7D; LGRID=20210503085515-05970c43-fe37-444d-a4ba-8996908770e4; Hm_lpvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1620003305; SEARCH_ID=f64ff93afa6a42a3b4f6a2d7d068a532");
        //httpPost.setHeader("origin", "https://www.lagou.com");
        httpPost.setHeader("referer", "https://www.lagou.com/jobs/list_%E5%A4%A7%E6%95%B0%E6%8D%AE?labelWords=&fromSearch=true&suginput=");
        httpPost.setHeader("user-agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36");
        //组装请求参数集合
        List<BasicNameValuePair> params = new ArrayList<>();
        params.add(new BasicNameValuePair("first", "true"));
        params.add(new BasicNameValuePair("pn",  String.valueOf(pageNo)));
        params.add(new BasicNameValuePair("kd", "大数据"));
        // 把请求参数绑定到Post请求中,并指定参数值的字符编码格式,否则中文参数会失效
        httpPost.setEntity(new UrlEncodedFormEntity(params, StandardCharsets.UTF_8));
        // 创建请求客户端
        CloseableHttpClient httpClient = HttpClients.createDefault();
        // 发送请求
        CloseableHttpResponse response = httpClient.execute(httpPost);
        // 获取请求结果
        String result = EntityUtils.toString(response.getEntity(), StandardCharsets.UTF_8);

        System.out.println(result);
        return result;
    }  
}
{"success":true,"msg":null,"code":0,"content":{"showId":"8954231c9c614fe197218ec738334a5c","hrInfoMap":{"7897241":{"userId":9605519,"portrait":"i/image/M00/6F/DA/Ciqc1F-3WRSAUZqGAAIlhYc8TQI322.png","realName":"张岩","positionName":"招聘经理","phone":null,"receiveEmail":null,"userLevel":"G1","canTalk":true},"8707302":{"userId":19901470,"portrait":"i/image2/M01/0E/AC/CgotOVyhgdKAWLwZAACHltqNgkc507.png","realName":"史煜","positionName":"猎头赋能导师","phone":null,"receiveEmail":null,"userLevel":"G1","canTalk":true},"8484354":{"userId":11656210,"portrait":"i/image6/M00/25/5E/Cgp9HWBZnFGAeKpvAADw-VqTakQ772.png","realName":"Dean","positionName":"HR","phone":null,"receiveEmail":null,"userLevel":"G1","canTalk":true},"8699938":{"userId":16715139,"portrait":"i/image2/M01/0E/AC/CgotOVyhgcmAaD2nAABq7l7a11A980.png","realName":"郭婷","positionName":"HRBP","phone":null,"receiveEmail":null,"userLevel":"G1","canTalk":true},"8583753":{"userId":19891428,"portrait":"i/image2/M01/0E/AC/CgotOVyhgdWACcZgAABtgMsGk64396.png","realName":"赵双","positionName":"人事主管","phone":null,"receiveEmail":null,"userLevel":"G1","canTalk":true},"8134768":{"userId":8480918,"portrait":"i/image2/M01/D4/64/CgoB5lxJFQ-AD2tNAAXBbnPw8H0024.jpg","realName":"Nina","positionName":"HRBP","phone":null,"receiveEmail":null,"userLevel":"G1","canTalk":true},"8702442":{"userId":16678326,"portrait":"i/image2/M01/0E/8C/CgoB5lyhgdiAN-4AAACeGEp-ay0931.png","realName":"廖迎辰","positionName":"人事专员","phone":null,"receiveEmail":null,"userLevel":"G1","canTalk":true},"8158995":{"userId":6117379,"portrait":"i/image2/M01/8A/67/CgoB5luWCeWAUrasAAFtqT-rOaE098.jpg","realName":"夏~","positionName":"招聘经理","phone":null,"receiveEmail":null,"userLevel":"G1","canTalk":true},"8032299":{"userId":9641903,"portrait":"i/image/M00/5C/A3/Ciqc1F-Cm4mAY1bnAAB1FSDKz5k681.jpg","realName":"首席人才官","positionName":"","phone":null,"receiveEmail":null,"userLevel":"G1","canTalk":true},"8520053":{"userId":20159022,"portrait":"i/image2/M01/04/CA/Cip5yF_2aOqABwjGACwniCptX1E061.png","realName":"庄燕敏","positionName":"招聘专员","phone":null,"receiveEmail":null,"userLevel":"G1","canTalk":true},"7496713":{"userId":13704818,"portrait":"i/image2/M01/0E/8C/CgoB5lyhgdiAN-4AAACeGEp-ay0931.png","realName":"王萍","positionName":"HR","phone":null,"receiveEmail":null,"userLevel":"G1","canTalk":true},"8637919":{"userId":21149721,"portrait":"i/image6/M00/2A/F8/Cgp9HWBjL9uAd0YMAAW9EuFm3Xg678.jpg","realName":"吴偲睿","positionName":"HR","phone":null,"receiveEmail":null,"userLevel":"G1","canTalk":true},"8551839":{"userId":21182638,"portrait":"i/image2/M01/0E/AC/CgotOVyhgcmAaD2nAABq7l7a11A980.png","realName":"魏欣","positionName":"人事","phone":null,"receiveEmail":null,"userLevel":"G1","canTalk":true},"8688667":{"userId":19669329,"portrait":"i/image6/M01/38/1F/Cgp9HWB47iuAX5e_AAB8iMDFduc432.png","realName":"小郑","positionName":"招聘经理","phone":null,"receiveEmail":null,"userLevel":"G1","canTalk":true},"8708507":{"userId":21284995,"portrait":"i/image6/M00/35/A7/CioPOWByl0qAFGDzABMH7ovO3zA168.png","realName":"化琪","positionName":"HR","phone":null,"receiveEmail":null,"userLevel":"G1","canTalk":false}},"pageNo":1,"positionResult":{"resultSize":15,"result":[{"positionId":8699938,"positionName":"大数据开发工程师","companyId":26584,"companyFullName":"中国太平洋人寿保险股份有限公司","companyShortName":"太平洋寿险","companyLogo":"i/image6/M01/03/CD/Cgp9HWAftJeAe4bnAAAcL34g0og084.png","companySize":"2000人以上","industryField":"金融","financeStage":"上市公司","companyLabelList":["年终分红","五险一金","通讯津贴","交通补助"],"firstType":"开发|测试|运维类","secondType":"数据开发","thirdType":"数据开发","skillLables":["数据仓库","数仓架构","数仓工程师"],"positionLables":["数据仓库","数仓架构","数仓工程师"],"industryLables":[],"createTime":"2021-04-27 13:20:15","formatCreateTime":"2021-04-27","city":"上海","district":"徐汇区","businessZones":null,"salary":"30k-50k","salaryMonth":"0","workYear":"5-10年","jobNature":"全职","education":"本科","positionAdvantage":"五险一金","imState":"today","lastLogin":"2021-05-03 10:05:32","publisherId":16715139,"approve":1,"subwayline":"12号线","stationname":"虹漕路","linestaion":"9号线_桂林路;12号线_虹漕路;12号线_桂林公园;15号线_桂林公园","latitude":"31.172961","longitude":"121.413315","distance":null,"hitags":null,"resumeProcessRate":0,"resumeProcessDay":0,"score":119,"newScore":0.0,"matchScore":12.182264,"matchScoreExplain":null,"query":null,"explain":null,"isSchoolJob":0,"adWord":0,"plus":null,"pcShow":0,"appShow":0,"deliver":0,"gradeDescription":null,"promotionScoreExplain":null,"isHotHire":0,"count":0,"aggregatePositionIds":[],"reCallType":null,"userExpectId":-1,"userExpectText":"","promotionType":null,"is51Job":false,"expectJobId":null,"encryptId":"e137c4d28fa8d1d2c433ecd7d5f2902743a8c9795de353df","famousCompany":true,"hunterJob":false,"detailRecall":false},{"positionId":8551839,"positionName":"大数据开发工程师","companyId":123018947,"companyFullName":"中电科数智科技有限公司","companyShortName":"数智科技","companyLogo":"i/image6/M01/3C/C5/Cgp9HWCN9F-AOCnqAABY-485Sy4598.png","companySize":"50-150人","industryField":"网络通信","financeStage":"不需要融资","companyLabelList":[],"firstType":"开发|测试|运维类","secondType":"数据开发","thirdType":"数据开发","skillLables":["数仓架构","Hadoop","Spark","Flink"],"positionLables":["网络通信","软件服务|咨询","数仓架构","Hadoop","Spark","Flink"],"industryLables":["网络通信","软件服务|咨询","数仓架构","Hadoop","Spark","Flink"],"createTime":"2021-05-03 10:52:09","formatCreateTime":"10:52发布","city":"上海","district":"徐汇区","businessZones":null,"salary":"15k-30k","salaryMonth":"0","workYear":"5-10年","jobNature":"全职","education":"本科","positionAdvantage":"央企,福利好,稳定性强","imState":"today","lastLogin":"2021-05-03 10:46:31","publisherId":21182638,"approve":0,"subwayline":"12号线","stationname":"桂林公园","linestaion":"9号线_桂林路;12号线_桂林公园;15号线_桂林公园","latitude":"31.177892","longitude":"121.414728","distance":null,"hitags":null,"resumeProcessRate":0,"resumeProcessDay":0,"score":99,"newScore":0.0,"matchScore":18.298723,"matchScoreExplain":null,"query":null,"explain":null,"isSchoolJob":0,"adWord":0,"plus":null,"pcShow":0,"appShow":0,"deliver":0,"gradeDescription":null,"promotionScoreExplain":null,"isHotHire":0,"count":0,"aggregatePositionIds":[],"reCallType":null,"userExpectId":-1,"userExpectText":"","promotionType":null,"is51Job":false,"expectJobId":null,"encryptId":"6b9416155b8fb488c45a4f74b084e2706b0e2e82a7a3a3264afb80c159e9d967","famousCompany":false,"hunterJob":false,"detailRecall":false},{"positionId":8484354,"positionName":"大数据开发工程师","companyId":41444,"companyFullName":"杭州米雅信息科技有限公司","companyShortName":"米雅","companyLogo":"i/image3/M01/71/29/Cgq2xl5l9I-AWhbVAAEW9ccrwx4035.png","companySize":"150-500人","industryField":"软件服务|咨询,数据服务|咨询","financeStage":"不需要融资","companyLabelList":["年底双薪","绩效奖金","带薪年假","弹性工作"],"firstType":"开发|测试|运维类","secondType":"数据开发","thirdType":"BI工程师","skillLables":["Hadoop","Spark","Flink","Hive"],"positionLables":["软件服务|咨询","数据服务|咨询","Hadoop","Spark","Flink","Hive"],"industryLables":["软件服务|咨询","数据服务|咨询","Hadoop","Spark","Flink","Hive"],"createTime":"2021-05-03 10:46:47","formatCreateTime":"10:46发布","city":"北京","district":"朝阳区","businessZones":null,"salary":"25k-40k","salaryMonth":"15","workYear":"3-5年","jobNature":"全职","education":"本科","positionAdvantage":"晋升快,大牛带","imState":"today","lastLogin":"2021-05-03 10:46:43","publisherId":11656210,"approve":1,"subwayline":"15号线","stationname":"望京东","linestaion":"15号线_望京东","latitude":"40.003192","longitude":"116.48899","distance":null,"hitags":null,"resumeProcessRate":6,"resumeProcessDay":1,"score":99,"newScore":0.0,"matchScore":18.222744,"matchScoreExplain":null,"query":null,"explain":null,"isSchoolJob":0,"adWord":0,"plus":null,"pcShow":0,"appShow":0,"deliver":0,"gradeDescription":null,"promotionScoreExplain":null,"isHotHire":0,"count":0,"aggregatePositionIds":[],"reCallType":null,"userExpectId":-1,"userExpectText":"","promotionType":null,"is51Job":false,"expectJobId":null,"encryptId":"25bde3c651e9fe9f2abff9f25d386ea19bd7e07014674143","famousCompany":false,"hunterJob":false,"detailRecall":false},{"positionId":8707302,"positionName":"大数据开发工程师","companyId":147,"companyFullName":"北京拉勾网络技术有限公司","companyShortName":"拉勾集团","companyLogo":"i/image2/M01/79/70/CgotOV1aS4qAWK6WAAAM4NTpXws809.png","companySize":"500-2000人","industryField":"工具类产品,在线教育","financeStage":"D轮及以上","companyLabelList":["五险一金","弹性工作","带薪年假","免费两餐"],"firstType":"开发|测试|运维类","secondType":"数据开发","thirdType":"BI工程师","skillLables":["ETL","Oracle","MySQL","MongoDB"],"positionLables":["IT技术服务|咨询","软件服务|咨询","ETL","Oracle","MySQL","MongoDB"],"industryLables":["IT技术服务|咨询","软件服务|咨询","ETL","Oracle","MySQL","MongoDB"],"createTime":"2021-04-30 15:38:16","formatCreateTime":"3天前发布","city":"长沙","district":"岳麓区","businessZones":null,"salary":"10k-14k","salaryMonth":"13","workYear":"1-3年","jobNature":"全职","education":"本科","positionAdvantage":"福利好","imState":"threeDays","lastLogin":"2021-04-30 15:38:13","publisherId":19901470,"approve":1,"subwayline":null,"stationname":null,"linestaion":null,"latitude":"28.235193","longitude":"112.93142","distance":null,"hitags":["免费下午茶","ipo倒计时","bat背景","地铁周边","每天管两餐","定期团建","团队年轻有活力","6险1金"],"resumeProcessRate":33,"resumeProcessDay":2,"score":79,"newScore":0.0,"matchScore":12.182264,"matchScoreExplain":null,"query":null,"explain":null,"isSchoolJob":0,"adWord":0,"plus":null,"pcShow":0,"appShow":0,"deliver":0,"gradeDescription":null,"promotionScoreExplain":null,"isHotHire":0,"count":0,"aggregatePositionIds":[],"reCallType":null,"userExpectId":-1,"userExpectText":"","promotionType":null,"is51Job":false,"expectJobId":null,"encryptId":"4e0fbaf16055246644f38012ff06c0ce","famousCompany":true,"hunterJob":false,"detailRecall":false},{"positionId":7496713,"positionName":"大数据开发工程师","companyId":147,"companyFullName":"北京拉勾网络技术有限公司","companyShortName":"拉勾集团","companyLogo":"i/image2/M01/79/70/CgotOV1aS4qAWK6WAAAM4NTpXws809.png","companySize":"500-2000人","industryField":"工具类产品,在线教育","financeStage":"D轮及以上","companyLabelList":["五险一金","弹性工作","带薪年假","免费两餐"],"firstType":"开发|测试|运维类","secondType":"数据开发","thirdType":"BI工程师","skillLables":["Hadoop"],"positionLables":["招聘","Hadoop"],"industryLables":["招聘","Hadoop"],"createTime":"2021-04-28 18:11:48","formatCreateTime":"2021-04-28","city":"北京","district":"海淀区","businessZones":null,"salary":"35k-40k","salaryMonth":"14","workYear":"5-10年","jobNature":"全职","education":"本科","positionAdvantage":"发展空间大,弹性工作制,领导Nice","imState":"threeDays","lastLogin":"2021-04-30 13:59:53","publisherId":13704818,"approve":1,"subwayline":"10号线","stationname":"中关村","linestaion":"4号线大兴线_海淀黄庄;4号线大兴线_中关村;4号线大兴线_北京大学东门;10号线_苏州街;10号线_海淀黄庄","latitude":"39.982128","longitude":"116.307747","distance":null,"hitags":["免费下午茶","ipo倒计时","bat背景","地铁周边","每天管两餐","定期团建","团队年轻有活力","6险1金"],"resumeProcessRate":100,"resumeProcessDay":1,"score":78,"newScore":0.0,"matchScore":12.182264,"matchScoreExplain":null,"query":null,"explain":null,"isSchoolJob":0,"adWord":0,"plus":null,"pcShow":0,"appShow":0,"deliver":0,"gradeDescription":null,"promotionScoreExplain":null,"isHotHire":0,"count":0,"aggregatePositionIds":[],"reCallType":null,"userExpectId":-1,"userExpectText":"","promotionType":null,"is51Job":false,"expectJobId":null,"encryptId":"472749467beba213aa87810a660a0fb5","famousCompany":true,"hunterJob":false,"detailRecall":false},{"positionId":8158995,"positionName":"大数据开发工程师","companyId":139755,"companyFullName":"北京京邦达贸易有限公司","companyShortName":"京东物流","companyLogo":"i/image/M00/43/3E/Ciqc1F87b1SAZVVsAAC2hBgkjDU488.png","companySize":"2000人以上","industryField":"物流|运输","financeStage":"A轮","companyLabelList":["免费班车","餐补","安居计划","福利产假"],"firstType":"开发|测试|运维类","secondType":"数据开发","thirdType":"BI工程师","skillLables":["Java","Flink","Spark","Hadoop"],"positionLables":["Java","Flink","Spark","Hadoop"],"industryLables":[],"createTime":"2021-04-30 16:42:46","formatCreateTime":"3天前发布","city":"北京","district":"大兴区","businessZones":null,"salary":"25k-50k","salaryMonth":"14","workYear":"3-5年","jobNature":"全职","education":"本科","positionAdvantage":"数据量级大,规模大,公司重点项目,发展潜力大","imState":"threeDays","lastLogin":"2021-04-30 13:37:58","publisherId":6117379,"approve":1,"subwayline":"亦庄线","stationname":"经海路","linestaion":"亦庄线_经海路","latitude":"39.786448","longitude":"116.562817","distance":null,"hitags":null,"resumeProcessRate":7,"resumeProcessDay":3,"score":78,"newScore":0.0,"matchScore":12.182264,"matchScoreExplain":null,"query":null,"explain":null,"isSchoolJob":0,"adWord":0,"plus":null,"pcShow":0,"appShow":0,"deliver":0,"gradeDescription":null,"promotionScoreExplain":null,"isHotHire":0,"count":0,"aggregatePositionIds":[],"reCallType":null,"userExpectId":-1,"userExpectText":"","promotionType":null,"is51Job":false,"expectJobId":null,"encryptId":"0956a5d27f16b86c408e056b9647f637389f73d4136a426e","famousCompany":true,"hunterJob":false,"detailRecall":false},{"positionId":8032299,"positionName":"大数据开发工程师","companyId":286431,"companyFullName":"爱客科技(深圳)有限公司","companyShortName":"AfterShip爱客科技","companyLogo":"i/image2/M01/A3/E0/CgotOV2_33OAJMbDAAAoozKUaSQ386.png","companySize":"150-500人","industryField":"软件服务|咨询,营销服务|咨询,数据服务|咨询","financeStage":"B轮","companyLabelList":["持续盈利","国际龙头企业","SaaS平台","极客氛围"],"firstType":"开发|测试|运维类","secondType":"数据开发","thirdType":"数据开发","skillLables":[],"positionLables":["电商","大数据"],"industryLables":["电商","大数据"],"createTime":"2021-05-03 11:05:27","formatCreateTime":"11:05发布","city":"深圳","district":"南山区","businessZones":null,"salary":"18k-35k","salaryMonth":"14","workYear":"5-10年","jobNature":"全职","education":"本科","positionAdvantage":"技术大牛 国际化团队 工具文化","imState":"threeDays","lastLogin":"2021-05-01 15:05:48","publisherId":9641903,"approve":1,"subwayline":"2号线/蛇口线","stationname":"高新园","linestaion":"1号线/罗宝线_高新园;1号线/罗宝线_深大;2号线/蛇口线_科苑","latitude":"22.535004","longitude":"113.941975","distance":null,"hitags":null,"resumeProcessRate":100,"resumeProcessDay":1,"score":68,"newScore":0.0,"matchScore":17.766857,"matchScoreExplain":null,"query":null,"explain":null,"isSchoolJob":0,"adWord":0,"plus":null,"pcShow":0,"appShow":0,"deliver":0,"gradeDescription":null,"promotionScoreExplain":null,"isHotHire":0,"count":0,"aggregatePositionIds":[],"reCallType":null,"userExpectId":-1,"userExpectText":"","promotionType":null,"is51Job":false,"expectJobId":null,"encryptId":"5b2e31524b7f5efa14e2853c8241dfaad41d04d2557453b4","famousCompany":false,"hunterJob":false,"detailRecall":false},{"positionId":8134768,"positionName":"大数据开发工程师","companyId":188444,"companyFullName":"上海源犀信息科技有限公司","companyShortName":"Linkflow","companyLogo":"i/image2/M01/1E/F4/CgoB5ly5pVuAPqQwAAArReCIEQQ684.png","companySize":"50-150人","industryField":"软件服务|咨询","financeStage":"A轮","companyLabelList":["通讯津贴","交通补助","双休","弹性工作"],"firstType":"开发|测试|运维类","secondType":"数据开发","thirdType":"数据开发","skillLables":["Hadoop","Spark","Flink","Scala"],"positionLables":["工具类产品","软件服务|咨询","Hadoop","Spark","Flink","Scala"],"industryLables":["工具类产品","软件服务|咨询","Hadoop","Spark","Flink","Scala"],"createTime":"2021-05-03 10:20:27","formatCreateTime":"10:20发布","city":"上海","district":"徐汇区","businessZones":["徐家汇","漕河泾","龙华"],"salary":"15k-30k","salaryMonth":"0","workYear":"3-5年","jobNature":"全职","education":"本科","positionAdvantage":"发展前景 团队实力","imState":"today","lastLogin":"2021-05-03 10:17:39","publisherId":8480918,"approve":1,"subwayline":"3号线","stationname":"徐家汇","linestaion":"1号线_上海体育馆;1号线_徐家汇;3号线_宜山路;3号线_漕溪路;4号线_上海体育场;4号线_上海体育馆;9号线_徐家汇;9号线_宜山路;11号线_徐家汇;11号线_上海游泳馆;11号线_上海游泳馆;11号线_徐家汇","latitude":"31.185508","longitude":"121.433472","distance":null,"hitags":null,"resumeProcessRate":89,"resumeProcessDay":1,"score":67,"newScore":0.0,"matchScore":12.650813,"matchScoreExplain":null,"query":null,"explain":null,"isSchoolJob":0,"adWord":0,"plus":null,"pcShow":0,"appShow":0,"deliver":0,"gradeDescription":null,"promotionScoreExplain":null,"isHotHire":0,"count":0,"aggregatePositionIds":[],"reCallType":null,"userExpectId":-1,"userExpectText":"","promotionType":null,"is51Job":false,"expectJobId":null,"encryptId":"b968f1944dc3af16c4bea708b2f1e1991e2a90a34c06863d","famousCompany":false,"hunterJob":false,"detailRecall":false},{"positionId":8688667,"positionName":"高级大数据开发工程师","companyId":122086001,"companyFullName":"上海汐果体育用品有限公司","companyShortName":"汐果体育用品","companyLogo":"i/image/M00/64/42/Ciqc1F-X9XGAJHGeAACHI9crV-Q862.jpg","companySize":"50-150人","industryField":"消费生活,电商","financeStage":"A轮","companyLabelList":[],"firstType":"开发|测试|运维类","secondType":"数据开发","thirdType":"数据开发","skillLables":["Hadoop","BI/AI","AWS","hadoopEcosyste"],"positionLables":["软件服务|咨询","酒店 旅游业","Hadoop","BI/AI","AWS","hadoopEcosyste"],"industryLables":["软件服务|咨询","酒店 旅游业","Hadoop","BI/AI","AWS","hadoopEcosyste"],"createTime":"2021-05-03 10:48:05","formatCreateTime":"10:48发布","city":"上海","district":"浦东新区","businessZones":null,"salary":"25k-35k","salaryMonth":"13","workYear":"3-5年","jobNature":"全职","education":"本科","positionAdvantage":"高年终奖绩效奖金股票期权餐补弹性工作制","imState":"today","lastLogin":"2021-05-03 10:33:53","publisherId":19669329,"approve":1,"subwayline":"2号线","stationname":"上海科技馆","linestaion":"2号线_上海科技馆;2号线_世纪大道;4号线_世纪大道;4号线_浦电路(4号线);4号线_蓝村路;4号线_塘桥;6号线_蓝村路;6号线_浦电路(6号线);6号线_世纪大道;9号线_世纪大道","latitude":"31.216857","longitude":"121.531834","distance":null,"hitags":null,"resumeProcessRate":15,"resumeProcessDay":1,"score":45,"newScore":0.0,"matchScore":8.345834,"matchScoreExplain":null,"query":null,"explain":null,"isSchoolJob":0,"adWord":0,"plus":null,"pcShow":0,"appShow":0,"deliver":0,"gradeDescription":null,"promotionScoreExplain":null,"isHotHire":0,"count":0,"aggregatePositionIds":[],"reCallType":null,"userExpectId":-1,"userExpectText":"","promotionType":null,"is51Job":false,"expectJobId":null,"encryptId":"7e6bbd84db5a304dafb00f5a867f83d04bcb69ae8c592c345f5d98e611b45588","famousCompany":false,"hunterJob":false,"detailRecall":false},{"positionId":7897241,"positionName":"大数据开发工程师","companyId":145279,"companyFullName":"浙江创泰科技有限公司","companyShortName":"创泰科技","companyLogo":"i/image/M00/45/AF/CgqCHl9DTkuAD191AABxotsbymQ120.png","companySize":"500-2000人","industryField":"其他,移动互联网","financeStage":"B轮","companyLabelList":[],"firstType":"开发|测试|运维类","secondType":"数据开发","thirdType":"BI工程师","skillLables":["Storm","Hadoop","数据挖掘","ETL"],"positionLables":["大数据","移动互联网","Storm","Hadoop","数据挖掘","ETL"],"industryLables":["大数据","移动互联网","Storm","Hadoop","数据挖掘","ETL"],"createTime":"2021-05-03 07:41:34","formatCreateTime":"07:41发布","city":"杭州","district":"西湖区","businessZones":["翠苑","文一路","高新文教区"],"salary":"20k-30k","salaryMonth":"0","workYear":"5-10年","jobNature":"全职","education":"本科","positionAdvantage":"行业发展好","imState":"today","lastLogin":"2021-05-03 07:41:15","publisherId":9605519,"approve":1,"subwayline":"2号线","stationname":"丰潭路","linestaion":"2号线_学院路;2号线_古翠路;2号线_丰潭路","latitude":"30.29091","longitude":"120.11761","distance":null,"hitags":null,"resumeProcessRate":14,"resumeProcessDay":0,"score":44,"newScore":0.0,"matchScore":11.473111,"matchScoreExplain":null,"query":null,"explain":null,"isSchoolJob":0,"adWord":0,"plus":null,"pcShow":0,"appShow":0,"deliver":0,"gradeDescription":null,"promotionScoreExplain":null,"isHotHire":0,"count":0,"aggregatePositionIds":[],"reCallType":null,"userExpectId":-1,"userExpectText":"","promotionType":null,"is51Job":false,"expectJobId":null,"encryptId":"8f82ebd8bef435efedca98e9ded53cae4beb2290c981e32b","famousCompany":false,"hunterJob":false,"detailRecall":false},{"positionId":8637919,"positionName":"大数据开发工程师","companyId":62,"companyFullName":"北京字节跳动网络技术有限公司","companyShortName":"字节跳动","companyLogo":"i/image2/M01/79/0A/CgoB5ltr2A-AM5SFAADbT9jQCn841.jpeg","companySize":"2000人以上","industryField":"内容资讯,短视频","financeStage":"D轮及以上","companyLabelList":["扁平管理","弹性工作","就近租房补贴","六险一金"],"firstType":"开发|测试|运维类","secondType":"后端开发","thirdType":"其他后端开发","skillLables":[],"positionLables":["后端开发"],"industryLables":["后端开发"],"createTime":"2021-05-03 06:04:56","formatCreateTime":"06:04发布","city":"北京","district":"海淀区","businessZones":null,"salary":"30k-60k","salaryMonth":"0","workYear":"不限","jobNature":"全职","education":"本科","positionAdvantage":"下午茶,健身瑜伽,六险一金,五险一金","imState":"threeDays","lastLogin":"2021-05-01 12:03:46","publisherId":21149721,"approve":1,"subwayline":"10号线","stationname":"知春路","linestaion":"4号线大兴线_人民大学;4号线大兴线_海淀黄庄;10号线_海淀黄庄;10号线_知春里;10号线_知春路;13号线_知春路","latitude":"39.971819","longitude":"116.328708","distance":null,"hitags":null,"resumeProcessRate":6,"resumeProcessDay":1,"score":37,"newScore":0.0,"matchScore":10.763955,"matchScoreExplain":null,"query":null,"explain":null,"isSchoolJob":0,"adWord":0,"plus":null,"pcShow":0,"appShow":0,"deliver":0,"gradeDescription":null,"promotionScoreExplain":null,"isHotHire":0,"count":0,"aggregatePositionIds":[],"reCallType":null,"userExpectId":-1,"userExpectText":"","promotionType":null,"is51Job":false,"expectJobId":null,"encryptId":"e3bd8c773f89b6c7fa6bc5b5f4fc5976","famousCompany":true,"hunterJob":false,"detailRecall":false},{"positionId":8708507,"positionName":"大数据开发工程师","companyId":62,"companyFullName":"北京字节跳动网络技术有限公司","companyShortName":"字节跳动","companyLogo":"i/image2/M01/79/0A/CgoB5ltr2A-AM5SFAADbT9jQCn841.jpeg","companySize":"2000人以上","industryField":"内容资讯,短视频","financeStage":"D轮及以上","companyLabelList":["扁平管理","弹性工作","就近租房补贴","六险一金"],"firstType":"开发|测试|运维类","secondType":"数据开发","thirdType":"其他数据开发","skillLables":["数据挖掘"],"positionLables":["数据挖掘"],"industryLables":[],"createTime":"2021-05-03 05:35:30","formatCreateTime":"05:35发布","city":"北京","district":"海淀区","businessZones":null,"salary":"25k-50k","salaryMonth":"0","workYear":"不限","jobNature":"全职","education":"本科","positionAdvantage":"下午茶,健身瑜伽,六险一金,五险一金","imState":"disabled","lastLogin":"2021-04-30 18:28:01","publisherId":21284995,"approve":1,"subwayline":"10号线","stationname":"知春路","linestaion":"10号线_知春里;10号线_知春路;13号线_知春路","latitude":"39.978524","longitude":"116.336512","distance":null,"hitags":null,"resumeProcessRate":0,"resumeProcessDay":0,"score":34,"newScore":0.0,"matchScore":10.574003,"matchScoreExplain":null,"query":null,"explain":null,"isSchoolJob":0,"adWord":0,"plus":null,"pcShow":0,"appShow":0,"deliver":0,"gradeDescription":null,"promotionScoreExplain":null,"isHotHire":0,"count":0,"aggregatePositionIds":[],"reCallType":null,"userExpectId":-1,"userExpectText":"","promotionType":null,"is51Job":false,"expectJobId":null,"encryptId":"303215088a8e0346e107c0485fc838d0","famousCompany":true,"hunterJob":false,"detailRecall":false},{"positionId":8702442,"positionName":"大数据开发工程师","companyId":159475,"companyFullName":"广东柯内特环境科技有限公司","companyShortName":"柯内特","companyLogo":"i/image/M00/22/8B/CgpEMlkRJ0-AbWdGAABfJ3HlfU0247.png","companySize":"50-150人","industryField":"数据服务","financeStage":"不需要融资","companyLabelList":["弹性工作","节日礼物","扁平管理","技能培训"],"firstType":"开发|测试|运维类","secondType":"数据开发","thirdType":"BI工程师","skillLables":["ETL","数据仓库","数仓架构"],"positionLables":["物联网","数据服务|咨询","ETL","数据仓库","数仓架构"],"industryLables":["物联网","数据服务|咨询","ETL","数据仓库","数仓架构"],"createTime":"2021-05-02 22:51:59","formatCreateTime":"1天前发布","city":"广州","district":"天河区","businessZones":null,"salary":"8k-15k","salaryMonth":"13","workYear":"1-3年","jobNature":"全职","education":"本科","positionAdvantage":"双休、五险一金","imState":"today","lastLogin":"2021-05-02 22:48:16","publisherId":16678326,"approve":1,"subwayline":"21号线","stationname":"天河智慧城","linestaion":"21号线_天河智慧城;21号线_天河智慧城","latitude":"23.168126","longitude":"113.413857","distance":null,"hitags":null,"resumeProcessRate":18,"resumeProcessDay":1,"score":33,"newScore":0.0,"matchScore":8.383222,"matchScoreExplain":null,"query":null,"explain":null,"isSchoolJob":0,"adWord":0,"plus":null,"pcShow":0,"appShow":0,"deliver":0,"gradeDescription":null,"promotionScoreExplain":null,"isHotHire":0,"count":0,"aggregatePositionIds":[],"reCallType":null,"userExpectId":-1,"userExpectText":"","promotionType":null,"is51Job":false,"expectJobId":null,"encryptId":"0e62e36212784bb661629768604d58d733cdbc818be5e5c3","famousCompany":false,"hunterJob":false,"detailRecall":false},{"positionId":8520053,"positionName":"大数据开发工程师","companyId":117217,"companyFullName":"深圳市博悦科创科技有限公司","companyShortName":"博悦科创","companyLogo":"i/image/M00/10/8F/CgqKkVbf8TKACeAMAAAop3DnaW4486.png","companySize":"500-2000人","industryField":"IT技术服务|咨询","financeStage":"不需要融资","companyLabelList":["年底双薪","定期体检","绩效奖金","技能培训"],"firstType":"开发|测试|运维类","secondType":"数据开发","thirdType":"BI工程师","skillLables":["Hadoop","Linux","python","SQL"],"positionLables":["Hadoop","Linux","python","SQL"],"industryLables":[],"createTime":"2021-05-02 22:30:20","formatCreateTime":"1天前发布","city":"深圳","district":"福田区","businessZones":null,"salary":"12k-20k","salaryMonth":"0","workYear":"1-3年","jobNature":"全职","education":"本科","positionAdvantage":"六险一金 年终奖 年度体检 双休","imState":"today","lastLogin":"2021-05-02 23:00:03","publisherId":20159022,"approve":1,"subwayline":"9号线","stationname":"莲花北","linestaion":"3号线/龙岗线_少年宫;3号线/龙岗线_莲花村;4号线/龙华线_少年宫;4号线/龙华线_莲花北;9号线_孖岭","latitude":"22.556923","longitude":"114.070226","distance":null,"hitags":null,"resumeProcessRate":85,"resumeProcessDay":1,"score":32,"newScore":0.0,"matchScore":8.294578,"matchScoreExplain":null,"query":null,"explain":null,"isSchoolJob":0,"adWord":0,"plus":null,"pcShow":0,"appShow":0,"deliver":0,"gradeDescription":null,"promotionScoreExplain":null,"isHotHire":0,"count":0,"aggregatePositionIds":[],"reCallType":null,"userExpectId":-1,"userExpectText":"","promotionType":null,"is51Job":false,"expectJobId":null,"encryptId":"9a6b9c9a4cac57221c95c5c1e1eaa7c26daf649e3f64c236","famousCompany":false,"hunterJob":false,"detailRecall":false},{"positionId":8583753,"positionName":"大数据开发工程师","companyId":752543,"companyFullName":"北京亚格斯科技发展有限公司","companyShortName":"北京亚格斯科技发展有...","companyLogo":"i/image2/M01/6E/2C/CgotOV1CoB6AU_RDAAAg0fEUlug289.png","companySize":"50-150人","industryField":"软件开发","financeStage":"B轮","companyLabelList":[],"firstType":"开发|测试|运维类","secondType":"数据开发","thirdType":"BI工程师","skillLables":["数据仓库"],"positionLables":["金融产品","数据仓库"],"industryLables":["金融产品","数据仓库"],"createTime":"2021-05-02 18:49:02","formatCreateTime":"1天前发布","city":"北京","district":"东城区","businessZones":null,"salary":"12k-20k","salaryMonth":"0","workYear":"3-5年","jobNature":"全职","education":"本科","positionAdvantage":"彰显你的优秀","imState":"today","lastLogin":"2021-05-02 18:48:57","publisherId":19891428,"approve":1,"subwayline":"2号线","stationname":"东单","linestaion":"1号线_王府井;1号线_东单;1号线_建国门;2号线_建国门;2号线_北京站;2号线_崇文门;5号线_崇文门;5号线_东单;5号线_灯市口","latitude":"39.907493","longitude":"116.421787","distance":null,"hitags":null,"resumeProcessRate":3,"resumeProcessDay":0,"score":29,"newScore":0.0,"matchScore":7.3954706,"matchScoreExplain":null,"query":null,"explain":null,"isSchoolJob":0,"adWord":0,"plus":null,"pcShow":0,"appShow":0,"deliver":0,"gradeDescription":null,"promotionScoreExplain":null,"isHotHire":0,"count":0,"aggregatePositionIds":[],"reCallType":null,"userExpectId":-1,"userExpectText":"","promotionType":null,"is51Job":false,"expectJobId":null,"encryptId":"e4904e081d9eed16486280d3dd31266987224837839f7fac","famousCompany":false,"hunterJob":false,"detailRecall":false}],"locationInfo":{"city":null,"district":null,"businessZone":null,"isAllhotBusinessZone":false,"locationCode":null,"queryByGisCode":false},"queryAnalysisInfo":{"positionName":"大数据","positionNames":["大数据"],"companyName":null,"industryName":null,"usefulCompany":false,"jobNature":null},"strategyProperty":{"name":"dm-csearch-personalPositionLayeredStrategyNew","id":0},"hotLabels":null,"hiTags":null,"benefitTags":null,"industryField":null,"companySize":null,"positionName":null,"totalCount":4633,"triggerOrSearch":false,"categoryTypeAndName":{"3":"大数据"}},"pageSize":15},"resubmitToken":null,"requestId":null}

编写程序把采集的数据保存到HDFS中

package org.apache.ssm;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.net.URI;
import java.nio.charset.StandardCharsets;
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import java.util.UUID;

public class HttpClientData {

    /**
     * @apiNote 保存数据到HDFS中
     * @param result
     * @throws IOException
     * @throws InterruptedException
     */
    private static void saveDataInfoToHDFS(String result) throws IOException, InterruptedException {
        // 创建Hadoop配置对象
        Configuration conf = new Configuration();
        // 通过配置对象获取HDFS文件系统对象
        FileSystem fs = FileSystem.get(URI.create("hdfs://node:9000"), conf, "root");
        // 创建数据HDFS数据保存地址,如果文件路径不存在则创建
        Path filePath = new Path("/lagou/" + LocalDate.now().format(DateTimeFormatter.ofPattern("yyyyMMdd")));
        //if (!fs.exists(filePath)) {
        //   fs.mkdirs(filePath);
        //}
        // 使用UUID生成文件名称
        String fileName = UUID.randomUUID().toString().concat(".json");
        // 获取HDFS文件输出流
        FSDataOutputStream fsDataOutputStream = fs.create(new Path(filePath, fileName));
        // 使用HDFS IO工具进行数据文件上传到指定路径
        IOUtils.copyBytes(new ByteArrayInputStream(result.getBytes(StandardCharsets.UTF_8)), fsDataOutputStream, conf, true);
        fs.close();
        System.out.println("保存成功~");
    }
}

image.png

采集所有数据保存

public static void main(String[] args) throws Exception {

        for (int i = 1; i <= 30; i++) {
            // 采集数据
            String result = getDataInfo(i);
            // 保存采集结果到HDFS上
            saveDataInfoToHDFS(result);
            // 休眠1000毫秒
            Thread.sleep(1000);
        }
    }
}

image.png

[root@node ~]# hdfs dfs -count /lagou/20210503
           1           30             973903 /lagou/20210503
[root@node ~]# hdfs dfs -get /lagou/20210503

下一章使用IDEA开发MapReduce程序对采集的数据进行预处理,提取所需的维度保存到HDFS中供后续Hive进行数据分析。

  • 7
    点赞
  • 55
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 3
    评论
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

CDHong.it

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值