百度api调用，url文件下载，解压，解析

最新推荐文章于 2024-06-03 19:54:09 发布

chuiqueyu5654

最新推荐文章于 2024-06-03 19:54:09 发布

阅读量540

点赞数

文章标签： java json

原文链接：https://my.oschina.net/u/3110937/blog/1588281

版权

一. 百度api调用(SEM运营)

1.百度api调用(getAllObjects,getFileStatus,getFilePath等)

以下是百度api的调用参考地址:

https://cloud.baidu.com/doc/SEM/guanliAPI.html#.E6.9C.8D.E5.8A.A1.E8.AE.BF.E9.97.AE.E5.9C.B0.E5.9D.80

百度api的调用需要账户，账户分为账户管家(可管理子账户)和普通账户(特定api的调用)，且账户的调用有配额限制，是根据账户的维度来计算的(一周80000次)

查看api的requestParams和responseResult，进行调用和解析

下面是百度api调用的工具类：(需要账户管家(账户，密码，token)，api账户，请求参数)

返回的参数进行json解析。包括编码设置。

public class BaiDuApiUtils {

    public static Result<String> postForEntity(String username, String password, String token, String targetUserName, String url, Object body) throws Exception {
        Result<String> result=new Result();
        try {
            Map<String, Object> requestMap = new HashMap();
            Header header = new Header(username, password, token,targetUserName);
            requestMap.put(SEM_API_REQ_RESP_HEADER, header);
            requestMap.put(SEM_API_REQ_RESP_BODY, body);
            String jsonStr = JacksonUtil.obj2Str(requestMap);

            HttpHeaders httpHeaders = new HttpHeaders();
            httpHeaders.setContentType(MediaType.APPLICATION_JSON);

            HttpEntity<String> httpEntity = new HttpEntity<>(jsonStr, httpHeaders);

            RestTemplate restTemplate = restTemplate();

            LoggerUtil.info("BaiDuApiUtils request params:{}", jsonStr);

            ResponseEntity<String> responseEntity = restTemplate.postForEntity(url, httpEntity, String.class);
            Thread.sleep(2000);
            HttpStatus httpStatus = responseEntity.getStatusCode();
            if (!httpStatus.equals(HttpStatus.OK)) {
                LoggerUtil.error("BaiDuApiUtils error.request params:{}",jsonStr);
                result.setMessage("BaiDuApiUtils api request error params:"+jsonStr);
                result.setCode(Result.Code.ERROR);

                return result;
            }

            String bodyStr = responseEntity.getBody();
            LoggerUtil.info("baidu sem api response:" + bodyStr);

            JSONObject jsonObject = JSONObject.parseObject(bodyStr);


            Object headerObj = jsonObject.get(SEM_API_REQ_RESP_HEADER);
            JSONObject headerValue = JSONObject.parseObject(headerObj.toString());
            Integer status = new Integer(headerValue.get(SEM_API_RESPONSE_STATUS).toString());

            String desc = headerValue.get(SEM_API_RESPONSE_DESC).toString();
            if (!SEM_API_RESPONSE_SUCCESS.equals(desc) && 0 != status) {
                String failures = headerValue.get(SEM_API_RESPONSE_FAILURES).toString();
                result.setMessage(failures);
                result.setCode(Result.Code.ERROR);
                LoggerUtil.error("BaiDuApiUtils error.request params:{},账户:{}", jsonStr,targetUserName);
                return result;
            }

            result.setData(jsonObject.get(SEM_API_REQ_RESP_BODY).toString());
            return result;

        } catch (IOException e) {
            LoggerUtil.error("BaiDuApiUtils error.params:username-{},token-{},url-{},body-{}", username, token, url, body);
            throw e;
        }
    }

    /**
     * StringHttpMessageConverter：A default constructor that uses {@code "ISO-8859-1"} as the default charset.
     *
     * @return
     */
    private static RestTemplate restTemplate() {
        StringHttpMessageConverter stringHttpMessageConverter = new StringHttpMessageConverter(Charset.forName("utf8"));
        return new RestTemplateBuilder().additionalMessageConverters(stringHttpMessageConverter).build();
    }


}

使用到的常量类：

public class SemConstant {
    /************************************ SEM 百度 API 常量区 begin **************************************************/
    //Common
    public static final String SEM_API_REQ_RESP_BODY = "body";
    public static final String SEM_API_REQ_RESP_HEADER = "header";
    public static final String SEM_API_RESPONSE_STATUS = "status";
    public static final String SEM_API_RESPONSE_DESC = "desc";
    public static final String SEM_API_RESPONSE_SUCCESS = "success";
    public static final String SEM_API_RESPONSE_FAILURES = "failures";
    public static final String SEM_API_RESPONSE_DATA = "data";

    //ReportService
    public static final String REPORTSERVICE_NAME = "ReportService";
    public static final String REPORTSERVICE_GETREALTIMEDATA_NAME = "getRealTimeData";
    public static final String REPORTSERVICE_GETREALTIMEDATA_REALTIMEREQUESTTYPE = "realTimeRequestType";

    //BulkJobService
    public static final String BULKJOBSERVICE_NAME = "BulkJobService";
    public static final String BULKJOBSERVICE_GETALLOBJECTS_NAME = "getAllObjects";
    public static final String BULKJOBSERVICE_GETFILESTATUS_NAME = "getFileStatus";
    public static final String BULKJOBSERVICE_GETFILEPATH_NAME = "getFilePath";
    public static final String BULKJOBSERVICE_GETALLOBJECTS_FILEID = "fileId";
    public static final String BULKJOBSERVICE_GETFILESTATUS_ISGENERATED= "isGenerated";
    public static final String BULKJOBSERVICE_GETFILEPATH_KEYWORDFILEPATH = "keywordFilePath";
    public static final String[] BULKJOBSERVICE_KEYWORDFIELDS= {"campaignId", "adgroupId", "keywordId", "price", "pcDestinationUrl",
			"mobileDestinationUrl", "matchType", "pause", "status", "phraseType"};
    

    /************************************ SEM 常量区 begin **************************************************/
    public static final String SEM_REPORT_TYPE_MAIN = "main";
    public static final String SEM_REPORT_TYPE_DETAIL = "detail";
    public static final String REFFERAL_TYPE_SEM= "refferal_type_103";
    public static final String SEM_KEYWORD_MATCHTYPE_PREFIX= "sem_keyword_matchType_";
    public static final String SEM_KEYWORD_PHRASETYPE_PREFIX= "sem_keyword_phraseType";
    public static final String KEYWORD_STATUS_PREFIX= "keyword_status_";
    public static final String KEYWORD_PAUSE_FALSE= "false";
    public static final String KEYWORD_PAUSE_TRUE= "true";
    public static final String SEM_KEYWORD_FILE_GZIP_TYPE =".gz";
    public static final String SEM_CREATE_UPDATE_USER ="System";
}

使用到的Result类：

@SuppressWarnings("serial")
public class Result<T> implements Serializable {
	@SuppressWarnings({ "unchecked", "rawtypes" })
	public static final Result SUCCESS = new Result(Code.SUCCESS, null, null);

	@SuppressWarnings({ "rawtypes"})
	public static boolean isSuccessResult( Result r ){
		if( r == null ){
			return false;
		}
		return r.isSuccess();
	}
	
	public enum Code {
		SUCCESS, ERROR
	}

	private Code code = Code.SUCCESS;
	private String message;
	private T data;

	public Result() {
	}

	public Result(Code code, String message, T data) {
		this.code = code;
		this.message = message;
		this.data = data;
	}

	public Result(T data) {
		this.code = Code.SUCCESS;
		this.data = data;
	}

    public static <T> Result<T> newSuccess(T data) {
        return new Result<>(Code.SUCCESS, "", data);
    }

	@JsonIgnore
	@JSONField(serialize = false)
	public boolean isSuccess(){
		return Code.SUCCESS.equals(this.code);
	}
	
	public Code getCode() {
		return code;
	}

	public void setCode(Code code) {
		this.code = code;
	}

	public String getMessage() {
		return message;
	}

	public void setMessage(String message) {
		this.message = message;
	}

	public T getData() {
		return data;
	}

	public void setData(T data) {
		this.data = data;
	}

	public void doErrorHandle(String message) {
		this.code = Code.ERROR;
		this.message = message;
	}
}

比如getAllObjects的调用：(返回的是文件id)

	private List<String> getFileId(String account) throws Exception {
		try {
			// 解析文件id
			List<String> fileIdList = new ArrayList<String>();
			// 获取百度url
			String getAllObjectsUrl = semApiMapper.selectUrlByServiceAndInterface(SemConstant.BULKJOBSERVICE_NAME,
					SemConstant.BULKJOBSERVICE_GETALLOBJECTS_NAME);
			GetAllObjectsRequest getAllObjectsRequest = new GetAllObjectsRequest();
			getAllObjectsRequest.setKeywordFields(SemConstant.BULKJOBSERVICE_KEYWORDFIELDS);

			Result<String> result = BaiDuApiUtils.postForEntity(semApiMccConfig.getUsername(), semApiMccConfig.getPassword(),
					semApiMccConfig.getToken(), account, getAllObjectsUrl, getAllObjectsRequest);

			//账户无数据
			if (result.getCode().equals(Result.Code.ERROR)){
				return fileIdList;
			}
			
			JSONObject jsonContent = JSONObject.fromString(result.getData());
			String data = jsonContent.getString(SemConstant.SEM_API_RESPONSE_DATA);

			JSONArray jsonArray = new JSONArray(data);
			for (int i = 0; i < jsonArray.length(); i++) {
				JSONObject jsonObject = (JSONObject) jsonArray.get(i);
				String fileId = jsonObject.getString(SemConstant.BULKJOBSERVICE_GETALLOBJECTS_FILEID);
				fileIdList.add(fileId);
			}
			return fileIdList;
		} catch (Exception e) {
			LoggerUtil.error("BaiDuApiService getFieldId  Exception: {}", e);
			throw e;
		}
	}

getFileStatus的调用(返回的是int类型状态码)：

	private boolean getIsGenerated(List<String> fielIdList, String account) throws Exception {
		try {
			boolean hasGenerated = true;
			String getFileStatusUrl = semApiMapper.selectUrlByServiceAndInterface(SemConstant.BULKJOBSERVICE_NAME,
					SemConstant.BULKJOBSERVICE_GETFILESTATUS_NAME);
			for (String fieldId : fielIdList) {
				Map<String, String> bodyMap = new HashMap<String, String>();
				bodyMap.put(SemConstant.BULKJOBSERVICE_GETALLOBJECTS_FILEID, fieldId);

				Result<String> result = BaiDuApiUtils.postForEntity(semApiMccConfig.getUsername(),
						semApiMccConfig.getPassword(), semApiMccConfig.getToken(), account, getFileStatusUrl, bodyMap);

				//账户无数据
				if (result.getCode().equals(Result.Code.ERROR)){
					break;
				}
				
				JSONObject jsonContent = JSONObject.fromString(result.getData());
				String data = jsonContent.getString(SemConstant.SEM_API_RESPONSE_DATA);
				JSONArray jsonArray = new JSONArray(data);
				JSONObject jsonObject = (JSONObject) jsonArray.get(0);
				int isGenerated = jsonObject.getInt(SemConstant.BULKJOBSERVICE_GETFILESTATUS_ISGENERATED);
				if (3 != isGenerated) {
					hasGenerated = false;
					break;
				}
			}
			return hasGenerated;
		} catch (Exception e) {
			LoggerUtil.error("BaiDuApiService getIsGenerated Exception: {}", e);
			throw e;
		}
	}

getFilePath():返回文件路径

	private String getFilePath(String filedId, String account) throws Exception {
		try {
			String getFilePathUrl = semApiMapper.selectUrlByServiceAndInterface(SemConstant.BULKJOBSERVICE_NAME,
					SemConstant.BULKJOBSERVICE_GETFILEPATH_NAME);
			Map<String, String> bodyMap = new HashMap();
			bodyMap.put(SemConstant.BULKJOBSERVICE_GETALLOBJECTS_FILEID, filedId);
			Result<String> result = BaiDuApiUtils.postForEntity(semApiMccConfig.getUsername(), semApiMccConfig.getPassword(),
					semApiMccConfig.getToken(), account, getFilePathUrl, bodyMap);

			//账户无数据
			if (result.getCode().equals(Result.Code.ERROR)){
				return Constant.TOOL_TIP_SQJE;
			}
			
			JSONObject jsonContent = JSONObject.fromString(result.getData());
			String data = jsonContent.getString(SemConstant.SEM_API_RESPONSE_DATA);
			JSONArray jsonArray = new JSONArray(data);
			JSONObject jsonObject = (JSONObject) jsonArray.get(0);
			String keywordFilePath = jsonObject.getString(SemConstant.BULKJOBSERVICE_GETFILEPATH_KEYWORDFILEPATH);

			return keywordFilePath;
		} catch (Exception e) {
			LoggerUtil.error("BaiDuApiService getFilePath Exception: {}", e);
			throw e;
		}
	}

二：根据百度api获取的文件路径下载文件到本地(压缩文件gzip)

文件下载：（注意打成jar包后文件夹的路径调整以及存在情况）

    /**
     * @param fileUrl 远程地址
     * @param fileLocal 本地路径
     * @throws Exception 
     */
    public static void downloadFile(String fileUrl,String fileLocal) throws Exception {
       URL url = new URL(fileUrl);
       HttpURLConnection urlCon = (HttpURLConnection) url.openConnection();
       urlCon.setConnectTimeout(60000);
       urlCon.setReadTimeout(60000);
       int code = urlCon.getResponseCode();
       if (code != HttpURLConnection.HTTP_OK) {
           throw new Exception("文件读取失败");
       }
       
       //读文件流
       DataInputStream in = new DataInputStream(urlCon.getInputStream());
       File file=new File(fileLocal);
       if(!file.exists()) {
    	   if(!file.getParentFile().exists()) {
    		   file.getParentFile().mkdirs();
    	   }
    	   file.createNewFile();
       }
       DataOutputStream out = new DataOutputStream(new FileOutputStream(fileLocal));
       byte[] buffer = new byte[2048];
       int count = 0;
       while ((count = in.read(buffer)) > 0) {
           out.write(buffer, 0, count);
       }
       out.close();
       in.close();
   }

调用下载到项目中:(注意下载的路径，下载的时候先定义好下载的文件名，让文件能写入进来)

	private boolean remoteFileHandle(String filePaths, String fileName) throws Exception {
		try {
			long currentTimeMillisStart = System.currentTimeMillis();
			// 下载到本地
			String fileLocal = this.getClass().getResource("/static/").getPath() + fileName + SemConstant.SEM_KEYWORD_FILE_GZIP_TYPE;
			FileUtils.downloadFile(filePaths, fileLocal);
			long currentTimeMillisEnd = System.currentTimeMillis();
			return currentTimeMillisEnd - currentTimeMillisStart > 3600000;
		} catch (Exception e) {
			LoggerUtil.error("BaiDuApiService remoteFileHandle Exception: {}", e);
			throw e;
		}
	}

三.解压下载下来的gzip文件:(考虑到可能有多个文件，需要多个压缩文件进行处理)

这里写了个gzip解压的工具类：

package com.qf.openchannel.util;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.zip.GZIPInputStream;

public class GzipUtil {

	public static final int BUFFER = 1024;  
	public static final String EXT = ".gz";  
	/** 
     * gzip解压
     * @param file  需要解压的文件 
     * @param delete  是否删除原始文件 
     * @param outPath 解压文件的输出路径 
     * @throws Exception 
     */  
    public static void decompress(File file, boolean delete, String outPath)  
            throws Exception {  
        FileInputStream fis = new FileInputStream(file);  
        FileOutputStream fos = null;  
        if (outPath == null || outPath=="") {  
            fos = new FileOutputStream(file.getPath().replace(EXT, ""));  
        } else {  
            fos = new FileOutputStream(outPath + File.separator  
                    + file.getName().replace(EXT, ""));
        }  
  
        decompress(fis, fos);  
        fis.close();  
        fos.flush();  
        fos.close();  
  
        if (delete) {  
            file.delete();
        }  
    }  
  
    /** 
     * 数据解压
     * @param is 
     * @param os 
     * @throws Exception 
     */  
    public static void decompress(InputStream is, OutputStream os)  
            throws Exception {  
        GZIPInputStream gis = new GZIPInputStream(is);  
        int count;  
        byte data[] = new byte[BUFFER];  
        while ((count = gis.read(data, 0, BUFFER)) != -1) {  
            os.write(data, 0, count);  
        }  
  
        gis.close();  
    }  
	
    /** 
     * 文件解压缩(解压单个文件) 
     * @param path 需要解压的文件路径（包含文件名称） 
     * @param delete 是否删除原始文件（true：删除；false：保留） 
     * @param outPath 解压后文件的输出路径，如果该参数的值为 null，则输出解压文件到当前文件夹 
     * @throws Exception 
     */  
    public static void decompress(String path, boolean delete, String outPath)  
            throws Exception {  
        File file = new File(path);  
        decompress(file, delete, outPath);
    }  
}

然后调用工具类，对下载下来的文件进行解压：(不包含文件名,再解压过程中，添加了对特定文件的过滤FileNameFilter)

	/**
	 * gzip文件解压
	 * 
	 * @param semPath
	 *            需要解压的文件路径
	 * @throws Exception
	 */
	private void gzipDecompress(String semPath) throws Exception {
		try {
			File file = new File(semPath);
			BaiDuFileFilter baiDuFileFilter = new BaiDuFileFilter(SemConstant.SEM_KEYWORD_FILE_GZIP_TYPE);
			String[] files = file.list(baiDuFileFilter);
			Thread.sleep(2000);
			for (int x = 0; x < files.length; x++) {
				try {
					// 调用解压方法 并删除原始文件
					GzipUtil.decompress(semPath + "/" + files[x], true, semPath);
				} catch (Exception e) {
					// 异常文件记录
					LoggerUtil.info("BaiDuApiService gzipDecompress abnormal file:" + files[x]);
					continue;
				}
			}
		} catch (Exception e) {
			LoggerUtil.error("BaiDuApiService gzipDecompress Exception: {}", e);
			throw e;
		}
	}

四. csv文件解析：

根据实际情况，是否需要去除掉某些字段，解析具体的url的具体处理方法。

	public static List<KeyWordReportVo> readCsv2Object(String filePaths) throws Exception {
		File csv = new File(filePaths); // CSV文件路径
		BufferedReader br = null;
		List<KeyWordReportVo> keyWordList = new ArrayList<KeyWordReportVo>();
		try {
			br = new BufferedReader(new FileReader(csv));
		} catch (FileNotFoundException e) {
			e.printStackTrace();
		}
		String line = "";
		try {
			// 读取到的内容给line变量
			String headerLine = br.readLine();
			LoggerUtil.info("FileUtils readCsv2Object headerInfo: {}", headerLine);
			while ((line = br.readLine()) != null) {
				String[] split = line.split("\\s+");
				List<String> lineList = Arrays.asList(split);
				KeyWordReportVo keyWordReport = new KeyWordReportVo();
				keyWordReport.setCampaignId(lineList.get(0));
				keyWordReport.setAdgroupId(lineList.get(1));
				keyWordReport.setKeywordId(lineList.get(2));
				keyWordReport.setPrice(lineList.get(3));
				//解析带引号的pcUrl(url解析去掉activityid参数)
				String[] strArray= new String[]{"activityid"};
				keyWordReport.setPcDestinationUrl(StringUtil.removeFieldFromUrl(lineList.get(4).replace("\"", ""), strArray));
				keyWordReport.setMobileDestinationUrl(StringUtil.removeFieldFromUrl(lineList.get(5), strArray));
				keyWordReport.setMatchType(lineList.get(6));
				keyWordReport.setPause(lineList.get(7));
				keyWordReport.setStatus(lineList.get(8));
				keyWordReport.setPhraseType(lineList.get(9));

				keyWordList.add(keyWordReport);
			}
		} catch (IOException e) {
			e.printStackTrace();
		}
		
		br.close();
		//解析完删除文件
		deleteFile(csv);
		return keyWordList;
	}
	
	//文件删除(单个)
	public static void deleteFile(File file) throws Exception{
		try {
			//文件存在，并且是文件，直接删除
			if(file.exists() && file.isFile()) {
				file.delete();
			}
		} catch (Exception e) {
			LoggerUtil.error("FileUtils deleteFile Exception:{}-filePath:{}",e,file.getPath());
		}
	}

解析的时候，要看具体的需求，是否知道原先文件的字段，需要哪些字段。(以上是在知道文件字段的前提下完成，如果不知道文件的具体字段，可采用以下方法)

	/** 
     * 解析csv文件 到一个list中 每个单元个为一个String类型记录，每一行为一个list。 再将所有的行放到一个总list中(包含头信息和数据信息) 
     */  
    public static List<List<String>> readCSVFile(String filePaths) throws IOException {  
    	InputStreamReader fr=new InputStreamReader(new FileInputStream(filePaths),"GBK");
    	BufferedReader br = new BufferedReader(fr);  
        String rec = null;// 一行  
        String str;// 一个单元格  
        List<List<String>> listFile = new ArrayList<List<String>>();  
        try {  
            // 读取一行  
            while ((rec = br.readLine()) != null) {  
                int index=0;  
                Pattern pCells = Pattern  
                        .compile("(\"[^\"]*(\"{2})*[^\"]*\")*[^,]*,");  
                Matcher mCells = pCells.matcher(rec);  
                List<String> cells = new ArrayList<String>();// 每行记录一个list  
                // 读取每个单元格  
                while (mCells.find()) {  
                    str = mCells.group();  
                    str = str.replaceAll(  
                            "(?sm)\"?([^\"]*(\"{2})*[^\"]*)\"?.*,", "$1");  
                    str = str.replaceAll("(?sm)(\"(\"))", "$2");  
                    cells.add(str);  
                    index = mCells.end();  
                }  
                cells.add(rec.substring(index));  
                listFile.add(cells);  
            }  
        } catch (Exception e) {  
            e.printStackTrace();  
        } finally {  
            if (fr != null) {  
                fr.close();  
            }  
            if (br != null) {  
                br.close();  
            }  
        }  
        return listFile;  
    }

以上是将文件解析成List<List<String>>,将csv文件中的每个单元记录当作一个元素进行解析，然后根据自己的需求进行处理。

五.解析url:

开发过程中如果获取到一段url，想去除url中的某些字段，可使用以下工具类：

/**
	 * 移除url中的某个字段参数
	 * @param url
	 * @param strArray
	 * @return
	 */
	public static String removeFieldFromUrl(String url,String[] strArray) {
		for (int i = 0; i < strArray.length; i++) {
			url = url.replaceAll(strArray[i]+"=.*?&", "");
		}
		return url;
	}

在url匹配过程中，可使用lambda表达式进行匹配：(如下)

Optional<KeyWordReportVo> optional = keyWordList.stream()
						.filter(o -> ((null != o.getPcDestinationUrl() && null != semPlan.getPcUrl()
								&& o.getPcDestinationUrl().equals(semPlan.getPcUrl())))
								|| (null != o.getMobileDestinationUrl() && null != semPlan.getMobileUrl()
										&& o.getMobileDestinationUrl().equals(semPlan.getMobileUrl())))
						.findFirst();


if (optional.isPresent()) {
//匹配成功
//业务逻辑处理
}

如果匹配结果是个list，可修改成以下：

				List<KeyWordReportVo> collect = keyWordList.stream()
						.filter(o -> ((null != o.getPcDestinationUrl() && null != semPlan.getPcUrl()
								&& o.getPcDestinationUrl().equals(semPlan.getPcUrl())))
								|| (null != o.getMobileDestinationUrl() && null != semPlan.getMobileUrl()
										&& o.getMobileDestinationUrl().equals(semPlan.getMobileUrl())))
						.collect(Collectors.toList());

后续可对list进行结果处理。

转载于:https://my.oschina.net/u/3110937/blog/1588281

chuiqueyu5654

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
百度api调用，url文件下载，解压，解析

一. 百度api调用(SEM运营) 1.百度api调用(getAllObjects,getFileStatus,getFilePath等) 以下是百度api的调用参考地址: https://cloud.baidu.com/doc/SEM/guanl...
复制链接

扫一扫