bom问题，编码问题

最新推荐文章于 2020-12-27 19:21:46 发布

pretendcool

最新推荐文章于 2020-12-27 19:21:46 发布

阅读量692

点赞数 1

分类专栏：开发笔记文章标签： bom utf-8

本文链接：https://blog.csdn.net/pretendcool/article/details/46520631

版权

开发笔记专栏收录该内容

8 篇文章 0 订阅

订阅专栏

新公司的老项目，说起来挺那啥的还
svn上是基于eclipse的工程，我习惯了idea，便导入idea，配置好启动参数后。
启动却报了一堆莫名其妙的error

第一行，第一列有错，这显然有问题。
而且很多文件几乎是全是错误，每隔一个字母就有一个类似的error。

查了查编码问题是无疑了。eclipse是识别BOM的，确切的说是eclipse的编译器是识别bom的
所谓bom就是文件开头的4位16进制数。用于标识文件编码格式。不懂的看这里

而用editplus等工具打开查看就更明显了

再看那个几乎全是error的文件用editplus打开看的样子

首先，明确。这是两个问题
几乎全是error是因为编码使用了UTF-16（大端序）导致，我们应该首先把它转化为UTF-8编码。
如果就一两个，可以使用editplus，UE等工具直接另存为
如：

如果太多，就得用程序批量处理了。写了个工具类，java的

package cn.xiaozhi.tools;

import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.nio.charset.Charset;
import java.nio.charset.UnsupportedCharsetException;

public class ConvertEncode {
	public static void main(String[] args) {
		new ConvertEncode().start(args[0].toString());
	}
	
	public void start(String root){
		File dir = new File(root);
		process(dir);
	}
	
	//迭代找到所有子文件
	public void process(File file){
		if(file.isFile()){
			doRemove(file);
		}else{
			File[] files = file.listFiles();
			for(File child:files){
				process(child);
			}
		}
	}

	//实际处理文件的方法，临时用，业务集中在这里 
	private void doRemove(File file){
		//读取文件
		String fileName = file.getName();
		System.out.println("----Start process:"+fileName+"---->");
		//获取文件后缀，只改java
		if(fileName.lastIndexOf(".")>-1 && "java".equalsIgnoreCase(fileName.substring(fileName.lastIndexOf(".")+1,fileName.length()))){
			//判断文件编码格式
			String code="";
			try {
				code = codeString(file);
				System.out.println(fileName+":encoding:"+code);
			} catch (Exception e1) {
				// TODO Auto-generated catch block
				System.out.println("#can't read this code");
			}
			
			//只处理UTF-16BE的
			if("UTF-16BE".equalsIgnoreCase(code)){
				String fileContent = "";
				try {
					fileContent = getFileContentFromCharset(file, code);
				} catch (Exception e) {
					System.out.println("can't read file,do next--->");
					return;
				}
				try {
					saveFile2Charset(file, "UTF-8", fileContent);
				} catch (Exception e) {
					// TODO Auto-generated catch block
					e.printStackTrace();
					return;
				}
			}
			System.out.println("----end process"+fileName+"----");
		}else{
			System.out.println("----Not process:"+fileName+"----");
			return;
		}
		
	}
	
	 /**
     * 以指定编码方式读取文件，返回文件内容
     *
     * @param file
     *            要转换的文件
     * @param fromCharsetName
     *            源文件的编码
     * @return
     * @throws Exception
     */
    public static String getFileContentFromCharset(File file,
            String fromCharsetName) throws Exception {
        if (!Charset.isSupported(fromCharsetName)) {
            throw new UnsupportedCharsetException(fromCharsetName);
        }
        InputStream inputStream = new FileInputStream(file);
        InputStreamReader reader = new InputStreamReader(inputStream,
                fromCharsetName);
        char[] chs = new char[(int) file.length()];
        reader.read(chs);
        String str = new String(chs).trim();
        reader.close();
        return str;
    }
 
    /**
     * 以指定编码方式写文本文件，存在会覆盖
     * 
     * @param file
     *            要写入的文件
     * @param toCharsetName
     *            要转换的编码
     * @param content
     *            文件内容
     * @throws Exception
     */
    public static void saveFile2Charset(File file, String toCharsetName,
            String content) throws Exception {
        if (!Charset.isSupported(toCharsetName)) {
            throw new UnsupportedCharsetException(toCharsetName);
        }
        file.delete();
        OutputStream outputStream = new FileOutputStream(file);
        OutputStreamWriter outWrite = new OutputStreamWriter(outputStream,
                toCharsetName);
        outWrite.write(content);
        outWrite.close();
    }
    
	/**
	 * 判断文件的编码格式
	 * @param fileName :file
	 * @return 文件编码格式
	 * @throws Exception
	 */
	public static String codeString(File fileName) throws Exception{
		BufferedInputStream bin = new BufferedInputStream(
		new FileInputStream(fileName));
		int p = (bin.read() << 8) + bin.read();
		String code = null;
		
		switch (p) {
			case 0xefbb:
				code = "UTF-8";
				break;
			case 0xfffe:
				code = "Unicode";
				break;
			case 0xfeff:
				code = "UTF-16BE";
				break;
			default:
				code = "GBK";
		}
		
		bin.close();
		return code;
	}
}

这样，先解决掉几乎全是error问题。

然后处理bom问题

贴出一个方法

 /**
     * 读取流中前面的字符，看是否有bom，如果有bom，将bom头先读掉丢弃
     *
     * @param in
     * @return
     * @throws java.io.IOException
     */
    public static InputStream getInputStream(InputStream in) throws IOException {

        PushbackInputStream testin = new PushbackInputStream(in);
        int ch = testin.read();
        if (ch != 0xEF) {
            testin.unread(ch);
        } else if ((ch = testin.read()) != 0xBB) {
            testin.unread(ch);
            testin.unread(0xef);
        } else if ((ch = testin.read()) != 0xBF) {
            throw new IOException("错误的UTF-8格式文件");
        }
        return testin;

    }

就可以了

pretendcool

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
bom问题，编码问题

新公司的老项目，说起来挺那啥的还svn上是基于eclipse的工程，我习惯了idea，便导入idea，配置好启动参数后。启动却报了一堆莫名其妙的error第一行，第一列有错，这显然有问题。而且很多文件几乎是全是错误，每隔一个字母就有一个类似的error。查了查编码问题是无疑了。eclipse是识别BOM的，确切的说是eclipse的编译器是识别bom的所
复制链接

扫一扫