Messy code issue

1 篇文章 0 订阅
1 篇文章 0 订阅

Source: -homepage--social--weibo#N101F9”>https://www.ibm.com/developerworks/cn/java/analysis-and-summary-of-common-random-code-problems/index.html?cm_mmc=dwchina--homepage--social--weibo#N101F9

Reason

  1. Encode

  2. Decode

  3. Lack of a font library

Analysis phenomenon

  1. Caused by encoding

    In English Windows, u create a txt, type and save “你好”. Then u will see “??” after u open it.

    • Reason:
      Windows uses ANSI encode by default, and locale of Ewin is English, which mapping codepage 437 as the encode way is ISO-8859-1. This cause all chinese symbols will be mapping “3F3F” as encode result. And 3F reach “?”.

    • Solution:
      No decode way could display that right characters. So we should choose the right encode way when we save double byte character doc such as GB2312 or UTF-8 as simple chinese while BIG5 or UTF-8 in complex chinese. For chinese user, changing the locale to Chinese also a good idea.

  2. Caused by decoding

    Create a txt with “你好”, and copy it to Ewin. Then open it and get the error.

    • Reason:
      Cwin create txt used ANSI as GB2312, and after copy it to Ewin, notepad will use ISO-8859-1 as decode way.

    • Solution:
      Select the right decode method.

  3. Caused by application function.

    Open the uedit32.exe(cn version) and get the messy code.

    • Reason: Windows will use Unicode if the application support Unicode or use the ANSI(Which means as the country decided standard encode method)

    • Solution: Edit the Regional and language options: set the standard and format and non-Unicode as simple chinese. Then the system will decode use ANSI.

  4. Caused by lack of font

    Open file and get square symbol.

    • Reason: From binary byte sequence to code point, then to character which is found from font library. Then show as lattice on the screen. If not fonud, then use square to replace it.

    • Solution: Setup the library.

Think in coding

1.
I/O operation: read is decode(byte->character) while write is encode(character->byte)

  1. Here is the java I/O interface:

    java i/o interface

    When we use Writer and FileOutputStream:

    File I/O Stream

  2. String.getBytes.

    String.getBytes(): Encodes this String into a sequence of bytes using the platform’s default charset(Charset.defaultCharset(), which is decided by system attribute file.encoding), storing the result into a new byte array.

    Note: if use do not set the jvm’s file.encoding, it will depend on the environment which start the JVM: If cmd, then use regional language while eclipse could set this attribute.

List[1]. String.getBytes() display messy code

public static void main(String[] args) {
    private static final String fileName = "c:\\log.txt" ;
    String str ="你好,中国";
    writeError(str);
}

private static void writeError(String a_error) {
    try {
        File logFile = new File(fileName);
        //创建字节流对象
        FileOutputStream outPutStream = new FileOutputStream(logFile, true);
        //使用平台的默认字符集将此字符串编码为一系列字节
        outPutStream.write(a_error.getBytes(), 0, a_error.length() );
        outPutStream.flush();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

List[2].outputStreamWrite to set character library

private static void writeErrorWithCharSet(String a_error) {
    try { 
        File logFile = new File(FileName);
        String charsetName = "utf-8";
        //指定字符字节转换时使用的字符集为 Unicode,编码方式为 utf-8 
        Writer m_write = new BufferedWriter(
        new OutputStreamWriter(new java.io.FileOutputStream(logFile, true),
        charsetName) );
        m_write.write(a_error);
        m_write.close();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

To avoid messy code issue, when call the I/O api, u had better to use the overload format with pointing library args.

Web Application

Web messy code

Reason:

  1. Browser not followed the URI encode standard. Server not config the encode and decode. Devloper’s error.

  2. GET method: encode the non-ASCII character by urlencode.

    域名:端口/contextPath/servletPath/pathInfo?queryString PathInfo and queryString will depend on the server. Tomcat always set them on the server.xml, pathInfo part decode character library is defined on the connector’s , and queryString was by useBodyEncodingForURI(if not set, tomcat will use UTF-8:version >= 8.0)

    To avoid the encode which we do not want, we had better use ASCII only(or urlencode first) on the url.

  3. Post method: Browser will check the contentType(“text/html;charset=utf-8”) then encode form by using it.

    <%@ page language="java" contentType="text/html; charset="GB18030" pageEncoding="UTF-8"%> pageEncoding is how to save the jsp file.

list[3] POST request set setContentType

protected void doPost(HttpServletRequest request, HttpServletResponse
response) throws ServletException, IOException {
    if(!ServletFileUpload.isMultipartContent(request)){
        throw new ServletException("Content type is not multipart/form-data");
    }
    response.setCharacterEncoding("UTF-8");//设置响应编码 
    response.setContentType("text/html;charset=UTF-8");
    PrintWriter out = response.getWriter();
    out.write("<html><head></head><body>");
    try { 
        List<FileItem> items = (List<FileItem>)
        uploader.parseRequest(request);
        …
}

JSP, use post method to do request

<%@ page language="java" contentType="text/html; charset=utf-8" pageEncoding="utf-8"%>

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

<title>index</title>

<meta http-equiv="pragma" content="no-cache">

<meta http-equiv="cache-control" content="no-cache">

<meta http-equiv="expires" content="0">

</head>

<body>

<form action="FileUploadServlet" method="post" enctype="multipart/form-data">

选择上传文件:<input type="file" name="fileName">

<br>

<input type="submit" value="上传">

</form>

</body>

</html>
- Browser display: Chrome use jsp contentType and charset while firefox use text encoding.

- For jsp(html): jsp will saved as pageEncoding, if not ponit it, then use charset, if not charset, then as default ISO-8859-1. Charset reponse for notify the browser how to decode web page.

- For dynamic: Server use HttpServletResponse.setContentType to set http header's contentType.

File name be messy code when downloading

Reason: Header only support ASCII library, and encode other character to 3F(?)

Solution: urlEncode.encode(filename, charset) at first, then put it on the header.

list[4]

protected void doGet(HttpServletRequest request, HttpServletResponse
response) throws ServletException, IOException {
    String fileName = getDecodeParameter(request,"fileName");
    String userName = getDecodeParameter(request, "username");
    response.setHeader("Content-Disposition", "attachment; filename=\"" +
    URLEncoder.encode(fileName,"utf-8") + "\";userName=\"" +
    URLEncoder.encode(userName,"utf-8") + "\"");
}

DataBase operation

database messy code

Bridge: Unicode

Server database, client system, client environment varible.

Create databse using utf-8, and SQL NCHAR could solve the multi-language issues.

References:

Deep in analyzing the web request

Referring RFC

Deep in analyzing java cnEncode

Unicode Encode standard

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值