1、问题现象
通过url connection连接读取下载文件流,由于文件比较大,超过百万行级的数据量,下载到百万级行的会出现卡顿问题
2、报错代码
HttpURLConnection urlConnection = null;
BufferedReader reader = null;
try {
List<String> labelList = new ArrayList<>();
int currentOffset = 0;
String kfdsDownLoadUrl = "xxx";
StringBuffer urlSb = new StringBuffer();
urlSb.append(kfdsDownLoadUrl).append("?appCode=").append("ivr-scheduler").append("&batchNo=").append("d2764914-ee65-496b-83a9-42aa303041d7").append("&fileCode=");
urlSb.append("-1186007130");
URL url = new URL(urlSb.toString());
reader = new BufferedReader(new InputStreamReader(url.openConnection().getInputStream(), "UTF-8"));
final String[] title = {"dateTime", "fuid", "fmobile", "cityid", "cityname", "provinceid", "provincename", "labels"};
String temp;
while (null != (temp = reader.readLine())) {
currentOffset++;
if (currentOffset % 1000 == 0) {
logger.info("更新 groupcode:{} 阅读位置{} labelList{}", "G7nmWHGTzr", currentOffset, JSONObject.toJSONString(labelList));
}
final String[] data = temp.split(REGEX);
final Map<String, String> map = new HashMap<>();
for (int n = 0; n < title.length; n++) {
if (n < data.length) {
map.put(title[n], data[n]);
}
}
String uid = map.get("fuid");
String cityId = map.get("cityid");
String labels = map.get("labels");
String provinceId = map.get("provinceId");
String cityname = map.get("cityname");
String provinceName = map.get("provinceName");
if (!StringUtils.isEmpty(labels)) {
List<String> tmpLabelList = ListUtil.getListByString(labels, ",");
for (String tmpLabel : tmpLabelList) {
if (!labelList.contains(tmpLabel) && !"null".equalsIgnoreCase(tmpLabel)) {
labelList.add(tmpLabel);
}
}
}
if (currentOffset % 1000 == 0) {
logger.info("开始处理一批数据,处理耗时2s");
Thread.sleep(2000);
}
}
}catch (Exception e) {
e.printStackTrace();
} finally {
if (reader != null)
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
if (urlConnection != null) {
urlConnection.disconnect();
}
}
3、解决方案,参考连接:URLConnection_记忆中逝去的往事的博客-CSDN博客
使用HttpURLConnection实例化对象,并给connection设置一些参数
默认参数:
protected boolean connected
如果连接已经打开,boolean字段connected为true,如果连接关闭,这个字段则为false。由于创建一个新的URLConnection对象时连接尚未打开,所以其初始值为false。没有直接读取或改变connected值的方法。不过,任何导致URLConnection连接的方法都会将这个变量设置为true;
protected boolean allowUserInteraction
有些URLConnection需要与用户交互。allowUserInteraction字段指示了是否允许用户交互。默认值为false;
protected boolean doInput
URLConnection可以用于读取服务器、写入服务器,或者同时用于读/写服务器。如果URLConnection可以用来读取,保护类型boolen字段doInput就为true,否则为false;默认为true!
protected boolean doOutput
如果URLConnection可以用于写入,保护类型boolen字段doOutput就为true,否则为false;默认为false!
protected boolean useCaches
useCaches变量确定了是否可以使用缓存。默认值为true,表示将使用缓存;false表示不使用缓存 !
protected URL url;
protected boolean doInput = true;
protected boolean doOutput = false;
protected boolean allowUserInteraction = defaultAllowUserInteraction;
protected boolean useCaches = defaultUseCaches;
protected long ifModifiedSince = 0;
protected boolean connected = false;
有4个方法可以查询和修改连接的超时值。也就是说,底层socket等待远程服务器的响应时,等待多长时间后会抛出SocketTimeoutException异常:
//控制socket等待建立连接的时间
public void setConnectTimeout(int timeout)
public int getConnectTimeout()
//控制输入流等待数据到达的时间
public void setReadTimeout(int timeout)
public int getReadTimeout()
最终解决问题的代码段:
URL url = new URL(urlSb.toString());
urlConnection = (HttpURLConnection) url.openConnection();
// urlConnection = url.openConnection();
// 设置超时时间
urlConnection.setUseCaches(false);
urlConnection.setDoInput(true);
urlConnection.setDoOutput(true);
urlConnection.setConnectTimeout(99999999);
urlConnection.setReadTimeout(99999999);
// 连接
urlConnection.connect();
reader = new BufferedReader(new InputStreamReader(urlConnection.getInputStream(), "UTF-8"));