使用http client 提交文件到aliyunoss存储文件,发现英文文件名可以顺利上传,而中文文件名就报错:
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>AccessDenied</Code>
<Message>Invalid according to Policy: Policy Condition failed: ["eq", "$key", "tid53036/aä¸bæc_1606194337_1606194337412_bffbd4.png"]</Message>
<RequestId>5FBC9520990C63343792A8A0</RequestId>
<HostId>pro-cs.kefutoutiao.com</HostId>
</Error>
从报错信息,明显能够看到中文文件名乱码了。
代码中POST的时候使用了textbody发送变量:
HttpEntity requestEntity = MultipartEntityBuilder.create()
.addTextBody("x-oss-object-acl", "private")
.addTextBody("key", bean.getKey())
.addTextBody("policy", bean.getPolicy())
.addTextBody("OSSAccessKeyId", bean.getAccessid())
.addTextBody("Signature", bean.getSignature())
.addBinaryBody("file", fileBytes)
.addTextBody("Filename", fileName2)
.addTextBody("success_action_status", "200")
.build();
httpPost.setEntity(requestEntity);
CloseableHttpResponse response2 = httpClient.execute(httpPost);
通过wireshark抓包发现变量默认是用 text/plain;charset=ISO-8859-1 编码的,这肯定是有问题的。
修改为以下代码后,中文文件名可以正常了:
//为了处理中文的文件名,默认的ISO8859只能处理英文
ContentType contentTypeTextUtf8 = ContentType.create("text/plain", StandardCharsets.UTF_8.toString());
String fileName2 = bean.getKey().substring(bean.getDir().length());
fileName2 = new String(fileName2.getBytes(),StandardCharsets.UTF_8.toString());
String key = bean.getKey();
key = new String(key.getBytes(), StandardCharsets.UTF_8);
HttpEntity requestEntity = MultipartEntityBuilder.create()
.addTextBody("x-oss-object-acl", "private")
.addTextBody("key", key, contentTypeTextUtf8)
.addTextBody("policy", bean.getPolicy())
.addTextBody("OSSAccessKeyId", bean.getAccessid())
.addTextBody("Signature", bean.getSignature())
.addBinaryBody("file", fileBytes)
.addTextBody("Filename", fileName2, contentTypeTextUtf8)
.addTextBody("success_action_status", "200")
.setCharset(StandardCharsets.UTF_8)
.build();
httpPost.setEntity(requestEntity);
CloseableHttpResponse response2 = httpClient.execute(httpPost);
需要注意的有两点:
1) 字符串必须转码为UTF8, 然后TextBody也指定 text/plain;charset=UTF-8 格式;
2) 发现用GBK也能正常处理,而默认html5端发送变量只用了 form/data, 没有任何编码信息,估计是按照GBK默认处理的。
附上一张wireshark抓包的截图, 可以看到黄色的部分默认编码是iso8859,而绿色的部分是utf-8,并且汉字也能正常解码显示: