先看一个简单的html文档
<html>
<head>
<title>test</title>
</head>
<body>
<div style="height: 100px; border: 1px solid #ff0000; font-size: 24px; font-weight: bold;">Hello World!</div>
</body>
</html>
1. 首先用一个类来描述一个节点
public class Node{
private String nodeName;
private int nodeType;
private Map<String, String> attributes;
private List<Node> childNodes;
private Node parent;
// getter & setter
...
}
然后我们开始对输入内容进行解析,解析的过程其实就是解析字符串的过程,为了便于解析先把源字符串封装成一个HtmlStream对象.
String source = IO.read(new File("test.html"), "UTF-8");
HtmlStream stream = new HtmlStream(source);
char c;
int i = 0;
// 忽略掉文档开头的空格
while((