网上关于Jsoup的资料较少,写下这篇文章简单记录下使用心得,丰富下知识储备量
首先,确认Jsoup是org.jsoup.nodes包下的处理网页元素的sdk工具类,其次才是使用
1.解析html模板信息
Document document = Jsoup.parse(templateInfo);
2.获取表格的第一个tr元素(根据标签取)
Element element = document.select("table").select("tr").get(0);
3.获取tr的所有td(根据标签取)
Elements tds = element.select("td");
4.获取td的text内容(即使td里还有span标签)
String text = td.text();
5.获取特定css属性值的元素(根据css值取)
document.getElementsByAttributeValue("style","-ms-word-break: break-all;");
示例代码:
public static void main(String[] args) {
String templateInfo = "<html>\n" +
" <head></head>\n" +
" <body>\n" +
" <table width=\"1414\" class=\"handsontable\" version=\"2\">\n" +
" <tbody>\n" +
" <tr>\n" +
" <td width=\"156\" valign=\"top\" style=\"-ms-word-break: break-all;\">地市</td>\n" +
" <td width=\"156\" valign=\"top\" style=\"-ms-word-break: break-all;\">区县</td>\n" +
" <td width=\"156\" valign=\"top\" style=\"-ms-word-break: break-all;\">渠道经理</td>\n" +
" <td width=\"156\" valign=\"top\" rowspan=\"1\" colspan=\"1\">目标周期</td>\n" +
" <td valign=\"top\" rowspan=\"1\" colspan=\"1\" width=\"156\">月度目标值</td>\n" +
" <td valign=\"top\" rowspan=\"1\" colspan=\"1\" width=\"156\"><span style=\"font-family: Arial, Helvetica, sans-serif; font-size: 12px; line-height: 21px; background-color: rgb(255, 255, 255);\">目标客户数</span></td>\n" +
" <td width=\"156\" valign=\"top\" rowspan=\"1\" colspan=\"1\">月度综合完成率</td>\n" +
" <td width=\"156\" valign=\"top\" style=\"-ms-word-break: break-all;\">季度综合完成率</td>\n" +
" <td width=\"156\" valign=\"top\" style=\"-ms-word-break: break-all;\">综合排名</td>\n" +
" </tr>\n" +
" <tr>\n" +
" <td width=\"156\" valign=\"top\" style=\"-ms-word-break: break-all;\" smart_fill=\"DB_COLUMN\">CITY_GROUP</td>\n" +
" <td width=\"156\" valign=\"top\" smart_fill=\"DB_COLUMN\">COUNTY_GROUP</td>\n" +
" <td width=\"156\" valign=\"top\" smart_fill=\"DB_COLUMN\">MANAGE_NAME</td>\n" +
" <td width=\"156\" valign=\"top\" rowspan=\"1\" colspan=\"1\" smart_fill=\"DB_COLUMN\">PERIOD_TYPE</td>\n" +
" <td valign=\"top\" rowspan=\"1\" colspan=\"1\" width=\"156\" smart_fill=\"DB_COLUMN\">MONTH_TARGET_NUM</td>\n" +
" <td valign=\"top\" rowspan=\"1\" colspan=\"1\" width=\"156\" smart_fill=\"DB_COLUMN\">TARGET_NUM</td>\n" +
" <td width=\"156\" valign=\"top\" rowspan=\"1\" colspan=\"1\" smart_fill=\"DB_COLUMN\">MONTH_FINISH_RATE</td>\n" +
" <td width=\"156\" valign=\"top\" smart_fill=\"DB_COLUMN\">FINISH_RATE</td>\n" +
" <td width=\"156\" valign=\"top\" smart_fill=\"DB_COLUMN\">LAST_RANK</td>\n" +
" </tr>\n" +
" </tbody>\n" +
" </table>\n" +
" <p><br /></p>\n" +
" </body>\n" +
"</html>";
Document document = Jsoup.parse(templateInfo);
Element element = document.select("table").select("tr").get(0);
Elements tds = element.select("td");
for(Element td:tds){
if(td.childNodeSize()>0){
System.out.println(td.text());
}
}
}