关于邮件解析时utf-8部分乱码问题

最新推荐文章于 2024-08-11 06:14:25 发布

isoftstone57226

最新推荐文章于 2024-08-11 06:14:25 发布

阅读量1.7k

点赞数 1

分类专栏：项目开发问题

本文链接：https://blog.csdn.net/jxjyzzc/article/details/45752017

版权

项目开发问题专栏收录该内容

1 篇文章 0 订阅

订阅专栏

在学习与邮件解析有关的项目时，使用javamail，由于网上的例子有各种bug，因此写下一点点自己碰到的问题：

由于part.getContent()现在得到的为InputStream类型，需要进行类型转换，于是在网上查找使用以下代码进行转换：

	        	String contentType = part.getContentType();
			ByteArrayOutputStream   baos = null;
	        	 if(part.isMimeType("text/plain")&&!conname){
	 	        	if(part.getContent() instanceof InputStream){
		        		StringBuffer   bodytext =   new   StringBuffer(); //1.bodytext用来获取读取到的信息
        				byte[]   b   =   new   byte[4096]; 
        				for   (int   n;   (n   =   in.read(b))   !=   -1;)   { 
                				bodytext.append(new   String(b,   0,   n)); //2.用byte数组接收并放入bodytext中

       					} 
		        	} 
		        	else{
		        		bodytext.append(part.getContent());
		        	} 
	 	        	String charset = "";
	 	        	if(contentType.contains("charset")){
	 	        		charset = contentType.substring
	 	        				(contentType.indexOf("charset")+"charset=".length()).replace("\"", "");
	 	        		if("".equals(charset)||charset.equals("gb2312"))
	 	        			charset = "gbk";
	 	        		if(baos!=null){
	 	        			bodytext = new StringBuffer()
 	        				.append(new String(bodytext.toString().getByte(),charset));//3.对读取到的boxytext按照正文字符集进行编码

} } }

charset为gbk的邮件可以有效的读取到但是charset为utf-8时会有部分读到的内容乱码，猜测可能是由于第2个注释中用byte数组接收并读取放入StringBuffer过程中new String出现部分字节读取发生转码等问题，因此改用ByteArrayOutputStream读取内容后进行转码，更正后代码如下：

<span style="white-space:pre">			</span>ByteArrayOutputStream   baos = null;

	        	 if(part.isMimeType("text/plain")&&!conname){
	 	        	if(part.getContent() instanceof InputStream){
		        		InputStream is = (InputStream) part.getContent();
		        		baos   =   new   ByteArrayOutputStream();
			        	int n;
			        	while((n =is.read()) != -1)   {
			        		baos.write(n);  
			        	}
		        	} 
		        	else{
		        		bodytext.append(part.getContent());
		        	} 
	 	        	String charset = "";
	 	        	if(contentType.contains("charset")){
	 	        		charset = contentType.substring
	 	        				(contentType.indexOf("charset")+"charset=".length()).replace("\"", "");
	 	        		if("".equals(charset)||charset.equals("gb2312"))
	 	        			charset = "gbk";
	 	        		if(baos!=null){
	 	        			bodytext = new StringBuffer()
 	        				.append(new String(baos.toByteArray(),charset));
	 	        		}
	 	        	}

测试读取内容后基本文字信息可完整读取，除了部分的分号与特殊字符无法读取外，不影响进一步的功能开发，So告一段落...