自定义解析器【一】

最新推荐文章于 2024-07-20 17:51:29 发布

preqel

最新推荐文章于 2024-07-20 17:51:29 发布

阅读量1.1k

点赞数

分类专栏：其他 Android移动开发文章标签：解析数据 java

本文链接：https://blog.csdn.net/laibowon/article/details/76522296

版权

Android移动开发同时被 2 个专栏收录

28 篇文章 0 订阅

订阅专栏

其他

6 篇文章 0 订阅

订阅专栏

像xml，html这些标记行语言，我们通常会去采用java自带的解析工具去解析，如果我们要自己写一个类似于pull解析器或者dom解析器，能实现吗？这当然是很麻烦的，但是我们这里可以写一个稍微简单些的解析器。
比如我们要解析的文本是：

<xxml>
<span>
<button id="btn1" width="40" height="40"/>
<label id="label1" width="30" height="40"/> 
</xxml>

这只是一段简单的例子，我们现在的需求是要解析出span，button，label这三个标签，收集每个标签里面的属性，并在控制台打印出来。
我们这里不会用到java自带的xml解析器。而是重新去写一个简单的解析器。
这里的思路是这样的：
在循环里不停去读字符流的字符，直到遇到结尾为止。
解析到”<”表明是标签控件开始。
如果解析到“/”表示是标签结尾。
打印处理标签内属性
如果是标签内的属性的话，就要把属性加入到当前标签里面。
关键代码如下：

public class Parser {

    int pos  = 0;
    public static final char EOF = (char) -1;
    private Reader reader ;

    public Parser(String parser) {
        this.reader = new StringReader(parser);
    }

    public void parse() throws Exception {

        while (true) {
            char ch = this.getChar();
            if (ch == EOF)
                break;
            else if (ch == '<') {

            } else if (ch == '/') {
                this.ungetChar(2);
                this.parseEndTag(pos);
                System.out.println(ch + " ---- ");
                break;
            } else {

                this.ungetChar(1);
                this.parseText(pos);
            }
        }
        return;
    }

这里我们可以看到，用StingReader这个类不断去读取字符流中的数据，一直到读取完了。当解析到‘/’符号的时候表明这是一个标签的结尾，这时候我们就要执行parseEndTag结束一个完整的标签。

 public void parseEndTag(int start) throws Exception {
        StringBuffer temp = new StringBuffer();
        while (true) {
            char ch = this.getChar();
            if ('<' == ch) {
                ch = this.getChar();
                if ('/' != ch) {
                   // throw new ParserException("illegal tag:" + pos);
                }
            } else if (' ' == ch||'/' == ch) {

            }else if ('>' == ch) {
                break;
            } else {
                temp.append(ch);
            }
        }
   }

当我们
解析到的不是结束标签的时候，就要把
当前标签的内容收集并且打印出来。

public void parseText(int start) throws Exception {

                  while (true) {
                        char ch = this.getChar();
                        if (EOF == ch) {
                            this.ungetChar();
                            char data[] = this.makeString(start, pos);

                            break;
                        } else if ('<' == ch) {
                            //解析到 空 等各类标签后 循环解析到 < 为止， 计算进入时的字符间的内容
                            this.ungetChar();
                            char data[] = this.makeString(start, pos);
//                          if (data != null) {
//                              System.out.println(new String(data));
//                  //                       
//                          }
                            break;
                        }
                    }
        }
    public char[] makeString(int start, int end) throws Exception {
        int length = end - start;
        char data[] = new char[length];
        this.reader.reset();
        this.reader.skip(start);
        this.reader.read(data);
        String text = new String(data);

        if (text.trim().equals("")  ) {
            data = null;
        }
         System.out.println("[" + start + "," + end + "]Element: " + new String(data));
         printNode(text);//对于标签里面的属性，再进行解析并且打印
          return data;

}

 /*
         * 打印一个标签里面的属性
         * 具体场景可以根据
         */
        private void printNode(String text) throws IOException {
            Reader reader =new StringReader(text);
            char c = (char) reader.read();
            String temp="";
            boolean first = true;
            while (c != EOF) {
                if (c == ' ') {
                    if(first)
                        System.out.println(temp);
                    else 
                        System.out.println("Attribute："+temp);
                    first = false;
                    temp = "";
                }
                temp += c;
                c = (char) reader.read();
            }    
        }

好了，接下来我们来运行程序，打印果如下：
这里写图片描述

我们看到，Element表示标签名，xml，span，button这些标签名都打印出来了。
然后有属性的标签，比如像button，里面的属性都收集过来打印出来了。
提供下完整代码地址：
https://github.com/preqel/Parser
我们已经完成了对xml的简单解析，但是还有些问题没解决：
1.嵌套标签的解析，父子元素关系。
2.对于注解符号的解析
这些内容放到后面有时间进行研究。