Java爬虫(三)后台发请求获取页面解析数据

一 、发请求获取网页内容

1.我们将请求ascii表网址,处理数据获取表格中的具体内容
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;


@RestController
public class SendRequest {
    
    @GetMapping("/todo")
    public String  getsource() {
        try {
            //建立连接如ascii表网址 我们获取ascii内容并解析
            URL url = new URL("http://ascii.911cha.com/");
            HttpURLConnection connection = (HttpURLConnection) url.openConnection();
            connection.setDoInput(true);
            connection.setRequestMethod("GET");
            //
            connection.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)");

            InputStreamReader read = new InputStreamReader(connection.getInputStream(), "utf-8");
            //为字符输入流添加缓冲
            BufferedReader br = new BufferedReader(read);
            //读取返回结果
            String data=br.readLine();
            while (br.readLine() != null) {
                data =data+br.readLine();
            }
            System.out.println(data);
            // 释放资源
            br.close();
            read.close();
            connection.disconnect();
            return data;
        } catch (MalformedURLException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }
}

得到数据如下

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><head><meta http-equiv="content-language" content="zh-CN" /><link rel="dns-prefetch" href="//i.911cha.com" /><meta http-equiv="x-dns-prefetch-control" content="on" /><link rel="icon" href="http://www.911cha.com/favicon.ico" type="image/x-icon" /><meta http-equiv="X-UA-Compatible" content="IE=7" /></head><body><div id="top"><div id="menu"><ul><li><a href="https://dream.911cha.com/" target="_blank">周公解梦</a></li><li><a href="https://nongli.911cha.com/" target="_blank">老黄历</a></li><li><a href="https://tianqi.911cha.com/" target="_blank">天气预报</a></li><li><a href="https://youbian.911cha.com/" target="_blank">邮编</a></li><li><a href="https://huoche.911cha.com/" target="_blank">列车时刻表</a></li><li><a href="https://fangjia.911cha.com/" target="_blank">放假安排</a></li></ul></div><div id="ulink"><a href="http://www.baidu.com/s?wd=911%B2%E9%D1%AF" target="_blank" rel="nofollow">911查询</a> <a href="http://www.911cha.com/" onclick="this.style.behavior='url(#default#homepage)';this.setHomePage('http://www.911cha.com/');">回首页</a> <a href="http://www.911cha.com/shortcut.php?t=ascii" rel="nofollow">保存到桌面</a></div></div><div id="mainbox">		<div class="panel">			<div class="mcon center"><p><form name="animalForm" class="f14">输入待查字符 <input name="year" id="year" type="text" size="18" delay="0" value="" class="inp3" onmouseover="this.className='inp3_2';" onblur="this.className='inp3'" onkeyup="document.getElementById('k').innerHTML = 'ASCII码值 '+event.keyCode" /> <span id="k" style="background-color:yellow"> </span></form></p></div><style></style><p>  ASCII第一次以规范标准的型态发表是在1967年,最后一次更新则是在1986年,至今为止共定义了128个字符,其中33个字符无法显示(这是以现今操作系统为依归,但在DOS模式下可显示出一些诸如笑脸、扑克牌花式等8-bit符号),且这33个字符多数都已是陈废的控制字符,控制字符的用途主要是用来操控已经处理过的文字,在33个字符之外的是95个可显示的字符,包含用键盘敲下空白键所产生的空白字符也算1个可显示字符(显示为空白)。</p><h3 class="f14 p8 center pink">ASCII控制字符</h3><thead><th>二进制</th><th>十六进制</th><th>可以显示的表示法</th></tr><tbody><td>0000&#160;0000</td><td align="center">00</td><td align="center" class="xianshi"></td></tr><td>0000&#160;0001</td><td align="center">01</td><td align="center" class="xianshi"></td></tr><td>0000&#160;0010</td><td align="center">02</td><td align="center" class="xianshi"></td></tr><td>0000&#160;0011</td><td align="center">03</td><td align="center" class="xianshi"></td></tr><td>0000&#160;0100</td><td align="center">04</td><td align="center" class="xianshi"></td></tr><td>0000&#160;0101</td><td align="center">05</td><td align="center" class="xianshi"></td></tr><td>0000&#160;0110</td><td align="center">06</td><td align="center" class="xianshi"></td></tr><td>0000&#160;0111</td><td align="center">07</td><td align="center" class="xianshi"></td></tr><td>0000&#160;1000</td><td align="center">08</td><td align="center" class="xianshi"></td></tr><td>0000&#160;1001</td><td align="center">09</td><td align="center" class="xianshi"></td></tr><td>0000&#160;1010</td><td align="center">0A</td><td align="center" class="xianshi"></td></tr><td>0000&#160;1011</td><td align="center">0B</td><td align="center" class="xianshi"></td></tr><td>0000&#160;1100</td><td align="center">0C</td><td align="center" class="xianshi"></td></tr><td>0000&#160;1101</td><td align="center">0D</td><td align="center" class="xianshi"></td></tr><td>0000&#160;1110</td><td align="center">0E</td><td align="center" class="xianshi"></td></tr><td>0000&#160;1111</td><td align="center">0F</td><td align="center" class="xianshi"></td></tr><td>0001&#160;0000</td><td align="center">10</td><td align="center" class="xianshi"></td></tr><td>0001&#160;0001</td><td align="center">11</td><td align="center" class="xianshi"></td></tr><td>0001&#160;0010</td><td align="center">12</td><td align="center" class="xianshi"></td></tr><td>0001&#160;0011</td><td align="center">13</td><td align="center" class="xianshi"></td></tr><td>0001&#160;0100</td><td align="center">14</td><td align="center" class="xianshi"></td></tr><td>0001&#160;0101</td><td align="center">15</td><td align="center" class="xianshi"></td></tr><td>0001&#160;0110</td><td align="center">16</td><td align="center" class="xianshi"></td></tr><td>0001&#160;0111</td><td align="center">17</td><td align="center" class="xianshi"></td></tr><td>0001&#160;1000</td><td align="center">18</td><td align="center" class="xianshi"></td></tr><td>0001&#160;1001</td><td align="center">19</td><td align="center" class="xianshi"></td></tr><td>0001&#160;1010</td><td align="center">1A</td><td align="center" class="xianshi"></td></tr><td>0001&#160;1011</td><td align="center">1B</td><td align="center" class="xianshi"></td></tr><td>0001&#160;1100</td><td align="center">1C</td><td align="center" class="xianshi"></td></tr><td>0001&#160;1101</td><td align="center">1D</td><td align="center" class="xianshi"></td></tr><td>0001&#160;1110</td><td align="center">1E</td><td align="center" class="xianshi"></td></tr><td>0001&#160;1111</td><td align="center">1F</td><td align="center" class="xianshi"></td></tr><td>0111&#160;1111</td><td align="center">7F</td><td align="center" class="xianshi"></td></tr></table><table width="100%" border="0" cellspacing="0" cellpadding="0"><td><thead><th>二进制</th><th>十六进制</th></tr><tbody><td>0010&#160;0000</td><td align="center">20</td></tr><td>0010&#160;0001</td><td align="center">21</td></tr><td>0010&#160;0010</td><td align="center">22</td></tr><td>0010&#160;0011</td><td align="center">23</td></tr><td>0010&#160;0100</td><td align="center">24</td></tr><td>0010&#160;0101</td><td align="center">25</td></tr><td>0010&#160;0110</td><td align="center">26</td></tr><td>0010&#160;0111</td><td align="center">27</td></tr><td>0010&#160;1000</td><td align="center">28</td></tr><td>0010&#160;1001</td><td align="center">29</td></tr><td>0010&#160;1010</td><td align="center">2A</td></tr><td>0010&#160;1011</td><td align="center">2B</td></tr><td>0010&#160;1100</td><td align="center">2C</td></tr><td>0010&#160;1101</td><td align="center">2D</td></tr><td>0010&#160;1110</td><td align="center">2E</td></tr><td>0010&#160;1111</td><td align="center">2F</td></tr><td>0011&#160;0000</td><td align="center">30</td></tr><td>0011&#160;0001</td><td align="center">31</td></tr><td>0011&#160;0010</td><td align="center">32</td></tr><td>0011&#160;0011</td><td align="center">33</td></tr><td>0011&#160;0100</td><td align="center">34</td></tr><td>0011&#160;0101</td><td align="center">35</td></tr><td>0011&#160;0110</td><td align="center">36</td></tr><td>0011&#160;0111</td><td align="center">37</td></tr><td>0011&#160;1000</td><td align="center">38</td></tr><td>0011&#160;1001</td><td align="center">39</td></tr><td>0011&#160;1010</td><td align="center">3A</td></tr><td>0011&#160;1011</td><td align="center">3B</td></tr><td>0011&#160;1100</td><td align="center">3C</td></tr><td>0011&#160;1101</td><td align="center">3D</td></tr><td>0011&#160;1110</td><td align="center">3E</td></tr><td>0011&#160;1111</td><td align="center">3F</td></tr></td><td><thead><th>二进制</th><th>十六进制</th></tr><tbody><td>0100&#160;0000</td><td align="center">40</td></tr><td>0100&#160;0001</td><td align="center">41</td></tr><td>0100&#160;0010</td><td align="center">42</td></tr><td>0100&#160;0011</td><td align="center">43</td></tr><td>0100&#160;0100</td><td align="center">44</td></tr><td>0100&#160;0101</td><td align="center">45</td></tr><td>0100&#160;0110</td><td align="center">46</td></tr><td>0100&#160;0111</td><td align="center">47</td></tr><td>0100&#160;1000</td><td align="center">48</td></tr><td>0100&#160;1001</td><td align="center">49</td></tr><td>0100&#160;1010</td><td align="center">4A</td></tr><td>0100&#160;1011</td><td align="center">4B</td></tr><td>0100&#160;1100</td><td align="center">4C</td></tr><td>0100&#160;1101</td><td align="center">4D</td></tr><td>0100&#160;1110</td><td align="center">4E</td></tr><td>0100&#160;1111</td><td align="center">4F</td></tr><td>0101&#160;0000</td><td align="center">50</td></tr><td>0101&#160;0001</td><td align="center">51</td></tr><td>0101&#160;0010</td><td align="center">52</td></tr><td>0101&#160;0011</td><td align="center">53</td></tr><td>0101&#160;0100</td><td align="center">54</td></tr><td>0101&#160;0101</td><td align="center">55</td></tr><td>0101&#160;0110</td><td align="center">56</td></tr><td>0101&#160;0111</td><td align="center">57</td></tr><td>0101&#160;1000</td><td align="center">58</td></tr><td>0101&#160;1001</td><td align="center">59</td></tr><td>0101&#160;1010</td><td align="center">5A</td></tr><td>0101&#160;1011</td><td align="center">5B</td></tr><td>0101&#160;1100</td><td align="center">5C</td></tr><td>0101&#160;1101</td><td align="center">5D</td></tr><td>0101&#160;1110</td><td align="center">5E</td></tr><td>0101&#160;1111</td><td align="center">5F</td></tr></table><td>&#160;</td><table width="100%" border="0" cellpadding="0" cellspacing="0" class="bx"><tr valign="bottom"><th>十进制</th><th>图形</th></thead><tr><td align="center">96</td><td align="center">`</td><tr><td align="center">97</td><td align="center">a</td><tr><td align="center">98</td><td align="center">b</td><tr><td align="center">99</td><td align="center">c</td><tr><td align="center">100</td><td align="center">d</td><tr><td align="center">101</td><td align="center">e</td><tr><td align="center">102</td><td align="center">f</td><tr><td align="center">103</td><td align="center">g</td><tr><td align="center">104</td><td align="center">h</td><tr><td align="center">105</td><td align="center">i</td><tr><td align="center">106</td><td align="center">j</td><tr><td align="center">107</td><td align="center">k</td><tr><td align="center">108</td><td align="center">l</td><tr><td align="center">109</td><td align="center">m</td><tr><td align="center">110</td><td align="center">n</td><tr><td align="center">111</td><td align="center">o</td><tr><td align="center">112</td><td align="center">p</td><tr><td align="center">113</td><td align="center">q</td><tr><td align="center">114</td><td align="center">r</td><tr><td align="center">115</td><td align="center">s</td><tr><td align="center">116</td><td align="center">t</td><tr><td align="center">117</td><td align="center">u</td><tr><td align="center">118</td><td align="center">v</td><tr><td align="center">119</td><td align="center">w</td><tr><td align="center">120</td><td align="center">x</td><tr><td align="center">121</td><td align="center">y</td><tr><td align="center">122</td><td align="center">z</td><tr><td align="center">123</td><td align="center">{</td><tr><td align="center">124</td><td align="center">|</td><tr><td align="center">125</td><td align="center">}</td><tr><td align="center">126</td><td align="center">~</td></tbody></td></table>			</div>	</div>			<div class="mtitle b">相关链接</div>		</div>	<div class="mtitle"><a href="https://www.911cha.com/" class="black noline">常备实用查询</a></div></div><div class="adbox"><script type="text/javascript">gogid="5394567419";gogw="250";gogh="250";</script><script type="text/javascript" src="http://i.911cha.com/gog.js"></script></div><div class="adbox"><script type="text/javascript">gogid="5394567419";gogw="250";gogh="250";</script><script type="text/javascript" src="http://i.911cha.com/gog.js"></script></div></div></div><script language="javascript">for(i=0;i<=8;i++){document.getElementById('tlink'+i).className = "pink";}else{document.getElementById('flink'+i).style.display = "none";}</script>	<div class="panel">		<div class="mcon l200"><span class="green">热门查询:</span> <a href="https://nongli.911cha.com/" target="_blank">老黄历</a> <a href="https://dream.911cha.com/" target="_blank">周公解梦</a> <a href="https://dream.911cha.com/" target="_blank">周公解梦大全查询</a> <a href="http://fangjia.911cha.com/" target="_blank">2019放假安排</a> <a href="https://shouji.911cha.com/" target="_blank">手机号码测吉凶</a> <a href="http://xing.911cha.com/" target="_blank">百家姓</a> | <a href="https://nongli.911cha.com/" target="_blank">黄道吉日</a> <a href="https://nongli.911cha.com/2019-5-27.html" target="_blank">2019年5月27日黄历</a> <a href="https://nongli.911cha.com/2019-5-28.html" target="_blank">2019年5月28日黄历</a> <a href="https://nongli.911cha.com/2019-5-29.html" target="_blank">2019年5月29日黄历</a> <a href="https://nongli.911cha.com/2019-5-30.html" target="_blank">2019年5月30日黄历</a> <a href="https://nongli.911cha.com/" target="_blank">2019年6月黄道吉日</a> | <a href="http://caipu.911cha.com/leibie_129.html" target="_blank">家常菜</a> <a href="http://caipu.911cha.com/leibie_101.html" target="_blank">鲁菜</a> <a href="http://caipu.911cha.com/leibie_102.html" target="_blank">川菜</a> <a href="http://caipu.911cha.com/leibie_106.html" target="_blank">苏菜</a> <a href="http://caipu.911cha.com/leibie_100.html" target="_blank">粤菜</a> <a href="http://caipu.911cha.com/leibie_104.html" target="_blank">闽菜</a> <a href="http://caipu.911cha.com/leibie_105.html" target="_blank">浙菜</a> <a href="http://caipu.911cha.com/leibie_103.html" target="_blank">湘菜</a> <a href="http://caipu.911cha.com/leibie_107.html" target="_blank">徽菜</a> <a href="http://caipu.911cha.com/leibie_119.html" target="_blank">沪菜</a> <a href="http://caipu.911cha.com/leibie_108.html" target="_blank">京菜</a> <a href="http://caipu.911cha.com/leibie_109.html" target="_blank">渝菜</a> | <a href="https://tianqi.911cha.com/" target="_blank" target="_blank">天气预报</a> <a href="https://tianqi.911cha.com/beijing/" target="_blank">北京天气</a> <a href="https://tianqi.911cha.com/shanghai/" target="_blank">上海天气</a> <a href="https://tianqi.911cha.com/xianggang/" target="_blank">香港天气</a> <a href="https://tianqi.911cha.com/guangzhou/" target="_blank">广州天气</a> <a href="https://tianqi.911cha.com/shenzhen/" target="_blank">深圳天气</a> <a href="https://tianqi.911cha.com/taibei/" target="_blank">台北天气</a> <a href="https://tianqi.911cha.com/aomen/" target="_blank">澳门天气</a> <a href="https://tianqi.911cha.com/tianjin/" target="_blank">天津天气</a> <a href="https://tianqi.911cha.com/shenyang/" target="_blank">沈阳天气</a> <a href="https://tianqi.911cha.com/dalian/" target="_blank">大连天气</a> <a href="https://tianqi.911cha.com/nanjing/" target="_blank">南京天气</a> <a href="https://tianqi.911cha.com/suzhou/" target="_blank">苏州天气</a> <a href="https://tianqi.911cha.com/hangzhou/" target="_blank">杭州天气</a> <a href="https://tianqi.911cha.com/wuhan/" target="_blank">武汉天气</a> <a href="https://tianqi.911cha.com/chongqing/" target="_blank">重庆天气</a> <a href="https://tianqi.911cha.com/chengdu/" target="_blank">成都天气</a> <a href="https://tianqi.911cha.com/wuxi/" target="_blank">无锡天气</a> <a href="https://tianqi.911cha.com/ningbo/" target="_blank">宁波天气</a> <a href="https://tianqi.911cha.com/hefei/" target="_blank">合肥天气</a> <a href="https://tianqi.911cha.com/xiamen/" target="_blank">厦门天气</a> | <a href="https://dream.911cha.com/" target="_blank">周公解梦大全</a> <a href="https://nongli.911cha.com/" target="_blank">老黄历</a> <a href="https://tianqi.911cha.com/" target="_blank">天气预报查询</a> <a href="https://huoche.911cha.com/" target="_blank">火车时刻表</a> <a href="https://shouji.911cha.com/" target="_blank">手机号码归属地</a> <a href="http://caipu.911cha.com/" target="_blank">家常菜谱大全</a> <a href="https://huilv.911cha.com/" target="_blank">货币汇率查询</a> <a href="https://youbian.911cha.com/" target="_blank">邮政编码查询</a> <a href="" target="_blank"></a> <a href="http://wannianli.911cha.com/" target="_blank">万年历</a> <a href="https://fangjia.911cha.com/" target="_blank">2019年放假安排</a> <a href="https://nannv.911cha.com/" target="_blank">生男生女预测表</a> <a href="https://jx.911cha.com/" target="_blank">QQ号码吉凶</a> <a href="https://anquanqi.911cha.com/" target="_blank">安全期计算器</a> <a href="https://guanyin.911cha.com/" target="_blank">观音灵签</a> <span class="green">日常生活:</span> <a href="https://shouji.911cha.com/" target="_blank">手机号码归属地</a> <a href="https://youbian.911cha.com/" target="_blank">邮政编码查询</a> <a href="https://jigou.911cha.com/" target="_blank">机构邮政编码查询</a> <a href="https://huilv.911cha.com/" target="_blank">货币汇率查询</a> <a href="https://tianqi.911cha.com/" target="_blank">天气预报查询</a> <a href="http://caipu.911cha.com/" target="_blank">家常菜谱大全</a> <a href="https://pm25.911cha.com/" target="_blank">PM2.5查询</a> <a href="https://tel.911cha.com/" target="_blank">常用电话号码</a> <a href="https://kuaidi.911cha.com/" target="_blank">快递查询</a> <a href="https://quhao.911cha.com/" target="_blank">区号查询</a> <a href="https://daxie.911cha.com/" target="_blank">数字大写转换</a> <a href="https://fangjia.911cha.com/" target="_blank">2019年放假安排</a> <a href="https://taiwanpc.911cha.com/" target="_blank">台湾邮编查询</a> <a href="https://chebiao.911cha.com/" target="_blank">汽车车标大全</a> <a href="https://daxue.911cha.com/" target="_blank">大学查询</a> <a href="https://lilv.911cha.com/" target="_blank">人民币存款利率表</a> <a href="https://flag.911cha.com/" target="_blank">升降旗时间</a> <a href="http://country.911cha.com/" target="_blank">国家地区查询</a> <a href="http://npo.911cha.com/" target="_blank">全国社会性组织</a> <span class="gray">(共19个)</span> <span class="green">站长工具:</span> <a href="https://ip.911cha.com/" target="_blank">IP地址查询</a> <a href="http://xiazaidizhi.911cha.com/" target="_blank">下载地址加解密工具</a> <a href="http://erweima.911cha.com/" target="_blank">二维码生成器</a> <a href="http://process.911cha.com/" target="_blank">进程查询</a> <a href="http://mima.911cha.com/" target="_blank">密码强度检测</a> <a href="http://ascii.911cha.com/" target="_blank">ASCII码对照表</a> <a href="https://shijianchuo.911cha.com/" target="_blank">UNIX时间戳</a> <span class="gray">(共7个)</span> <span class="green">交通出行:</span> <a href="https://huoche.911cha.com/" target="_blank">火车时刻表</a> <a href="https://xianxing.911cha.com/" target="_blank">北京车牌尾号限行查询</a> <a href="http://lukuang.911cha.com/" target="_blank">实时路况查询</a> <a href="http://ditie.911cha.com/" target="_blank">地铁线路图</a> <a href="https://airportcode.911cha.com/" target="_blank">机场三字码查询</a> <a href="http://weizhang.911cha.com/" target="_blank">交通违章查询</a> <a href="https://chepai.911cha.com/" target="_blank">车牌号查询</a> <a href="https://ditu.911cha.com/" target="_blank">中国电子地图</a> <a href="https://shicha.911cha.com/" target="_blank">世界时差查询</a> <span class="gray">(共9个)</span> <span class="green">休闲娱乐:</span> <a href="https://caitu.911cha.com/" target="_blank">疯狂猜图答案</a> <a href="https://miyu.911cha.com/" target="_blank">中华谜语大全</a> <a href="http://naojin.911cha.com/" target="_blank">脑筋急转弯</a> <a href="https://raokouling.911cha.com/" target="_blank">绕口令大全</a> <a href="https://jx.911cha.com/" target="_blank">QQ号码吉凶</a> <a href="http://nianling.911cha.com/" target="_blank">外星年龄</a> <a href="http://tizhong.911cha.com/" target="_blank">外星体重</a> <a href="https://guwen.911cha.com/" target="_blank">竖排古文</a> <span class="gray">(共8个)</span> <span class="green">民俗文化:</span> <a href="https://dream.911cha.com/" target="_blank">周公解梦大全</a> <a href="https://nongli.911cha.com/" target="_blank">老黄历</a> <a href="http://xing.911cha.com/" target="_blank">百家姓大全</a> <a href="http://today.911cha.com/" target="_blank">历史上的今天</a> <a href="https://xiehouyu.911cha.com/" target="_blank">歇后语大全</a> <a href="https://shengxiao.911cha.com/" target="_blank">十二生肖</a> <a href="http://wannianli.911cha.com/" target="_blank">万年历</a> <a href="https://jieqi.911cha.com/" target="_blank">二十四节气表</a> <a href="http://dimu.911cha.com/" target="_blank">地母经</a> <a href="http://mingyan.911cha.com/" target="_blank">名人名言名句大全</a> <a href="http://yanyu.911cha.com/" target="_blank">民间谚语</a> <a href="https://birth.911cha.com/" target="_blank">解密生日</a> <a href="http://foxue.911cha.com/" target="_blank">佛学大辞典</a> <span class="gray">(共13个)</span> <span class="green">学习应用:</span> <a href="https://zidian.911cha.com/" target="_blank">新华字典</a> <a href="https://cidian.911cha.com/" target="_blank">汉语词典</a> <a href="https://chengyu.911cha.com/" target="_blank">成语大全</a> <a href="https://shici.911cha.com/" target="_blank">诗词大全</a> <a href="https://fanyi.911cha.com/" target="_blank">在线翻译</a> <a href="https://danci.911cha.com/" target="_blank">英语单词大全</a> <a href="https://yingwenming.911cha.com/" target="_blank">英文名</a> <a href="https://zhuanye.911cha.com/" target="_blank">专业英汉汉英词典</a> <a href="https://baike.911cha.com/" target="_blank">百科全书</a> <a href="https://suoxie.911cha.com/" target="_blank">英文缩写大全</a> <a href="http://wubi.911cha.com/" target="_blank">五笔字根表</a> <a href="https://bihua.911cha.com/" target="_blank">笔画数查询</a> <a href="https://bushou.911cha.com/" target="_blank">汉字部首查询</a> <a href="https://pinyin.911cha.com/" target="_blank">汉字拼音查询</a> <a href="http://quwei.911cha.com/" target="_blank">区位码查询</a> <a href="http://jianfan.911cha.com/" target="_blank">汉字简体繁体转换</a> <a href="https://zhengma.911cha.com/" target="_blank">郑码编码查询</a> <a href="https://cangjie.911cha.com/" target="_blank">仓颉编码查询</a> <a href="https://sijiao.911cha.com/" target="_blank">四角号码在线查询</a> <a href="https://dianma.911cha.com/" target="_blank">中文电码查询</a> <a href="http://bianma.911cha.com/" target="_blank">在线编码解码</a> <a href="https://pi.911cha.com/" target="_blank">百万圆周率</a> <a href="http://morsecode.911cha.com/" target="_blank">摩尔斯电码</a> <a href="https://jisuanqi.911cha.com/" target="_blank">科学计算器</a> <a href="http://shurufa.911cha.com/" target="_blank">在线输入法</a> <span class="gray">(共25个)</span> <span class="green">身体健康:</span> <a href="https://anquanqi.911cha.com/" target="_blank">安全期计算器</a> <a href="http://yaopin.911cha.com/" target="_blank">药品查询</a> <a href="http://greenfood.911cha.com/" target="_blank">绿色食品</a> <a href="https://pianfang.911cha.com/" target="_blank">民间偏方大全</a> <a href="https://mingfang.911cha.com/" target="_blank">中草药名方大全</a> <a href="https://yanfang.911cha.com/" target="_blank">中草药民间验方</a> <a href="https://jiufang.911cha.com/" target="_blank">酒方大全</a> <a href="https://yingyang.911cha.com/" target="_blank">食物营养成分查询</a> <a href="https://zhongcaoyao.911cha.com/" target="_blank">中草药大全</a> <a href="https://bencao.911cha.com/" target="_blank">中华本草</a> <a href="https://zhongyi.911cha.com/" target="_blank">中医名词辞典</a> <a href="https://zhoupu.911cha.com/" target="_blank">粥谱大全</a> <span class="gray">(共12个)</span> <span class="green">占卜求签:</span> <a href="https://xingxiu.911cha.com/" target="_blank">二十八星宿算命</a> <a href="https://jinqiangua.911cha.com/" target="_blank">六十四卦金钱课</a> <a href="https://guanyin.911cha.com/" target="_blank">观音灵签</a> <a href="https://huangdaxian.911cha.com/" target="_blank">黄大仙灵签</a> <a href="https://zhuge.911cha.com/" target="_blank">诸葛神算</a> <a href="https://mazu.911cha.com/" target="_blank">妈祖天后灵签</a> <a href="https://guandi.911cha.com/" target="_blank">关帝灵签</a> <a href="https://lvzu.911cha.com/" target="_blank">吕祖灵签</a> <a href="https://chegong.911cha.com/" target="_blank">车公灵签</a> <a href="https://wanggong.911cha.com/" target="_blank">王公祖仔灵签</a> <a href="https://wenwang.911cha.com/" target="_blank">文王神卦</a> <a href="https://lingqijing.911cha.com/" target="_blank">灵棋经</a> <a href="https://chenggu.911cha.com/" target="_blank">称骨算命</a> <a href="https://yuce.911cha.com/" target="_blank">预测吉凶</a> <a href="https://zhiwen.911cha.com/" target="_blank">指纹运势查询</a> <a href="https://nannv.911cha.com/" target="_blank">生男生女预测表</a> <a href="https://yuanfen.911cha.com/" target="_blank">姓名缘分测试</a> <span class="gray">(共17个)</span><div class="cboth"></div></div></div>var _hmt = _hmt || [];  var hm = document.createElement("script");  var s = document.getElementsByTagName("script")[0]; })();</div><div class="qrcode">911查询官方微信<br /><a href="#top"><img src="http://ii.911cha.com/blue/weixin.gif" width="99" height="99" /></a><br />关注 ww911cha</div></html>

网页内容
在这里插入图片描述

二 、数据解析

我们通过鼠标点击f12开发者模式左上角大概确定了元素的位置,如,那么我们就可以通过正则表达式解析这张表的内容了。
在这里插入图片描述

具体操作如下:
1.截取出这部分代码
  Pattern pattern = Pattern.compile("<tbody>(.*?)(?<dates>.*?)</tbody>");
        Matcher matcher = pattern.matcher(date);
        while (matcher.find()){
            String dates = matcher.group("dates");
            System.out.println(dates);
        }

通过这段代码解析后我们得到如下核心数据

<td>0000&#160;0000</td><td align="center">00</td><td align="center" class="xianshi"></td></tr><td>0000&#160;0001</td><td align="center">01</td><td align="center" class="xianshi"></td></tr><td>0000&#160;0010</td><td align="center">02</td><td align="center" class="xianshi"></td></tr><td>0000&#160;0011</td><td align="center">03</td><td align="center" class="xianshi"></td></tr><td>0000&#160;0100</td><td align="center">04</td><td align="center" class="xianshi"></td></tr><td>0000&#160;0101</td><td align="center">05</td><td align="center" class="xianshi"></td></tr><td>0000&#160;0110</td><td align="center">06</td><td align="center" class="xianshi"></td></tr><td>0000&#160;0111</td><td align="center">07</td><td align="center" class="xianshi"></td></tr><td>0000&#160;1000</td><td align="center">08</td><td align="center" class="xianshi"></td></tr><td>0000&#160;1001</td><td align="center">09</td><td align="center" class="xianshi"></td></tr><td>0000&#160;1010</td><td align="center">0A</td><td align="center" class="xianshi"></td></tr><td>0000&#160;1011</td><td align="center">0B</td><td align="center" class="xianshi"></td></tr><td>0000&#160;1100</td><td align="center">0C</td><td align="center" class="xianshi"></td></tr><td>0000&#160;1101</td><td align="center">0D</td><td align="center" class="xianshi"></td></tr><td>0000&#160;1110</td><td align="center">0E</td><td align="center" class="xianshi"></td></tr><td>0000&#160;1111</td><td align="center">0F</td><td align="center" class="xianshi"></td></tr><td>0001&#160;0000</td><td align="center">10</td><td align="center" class="xianshi"></td></tr><td>0001&#160;0001</td><td align="center">11</td><td align="center" class="xianshi"></td></tr><td>0001&#160;0010</td><td align="center">12</td><td align="center" class="xianshi"></td></tr><td>0001&#160;0011</td><td align="center">13</td><td align="center" class="xianshi"></td></tr><td>0001&#160;0100</td><td align="center">14</td><td align="center" class="xianshi"></td></tr><td>0001&#160;0101</td><td align="center">15</td><td align="center" class="xianshi"></td></tr><td>0001&#160;0110</td><td align="center">16</td><td align="center" class="xianshi"></td></tr><td>0001&#160;0111</td><td align="center">17</td><td align="center" class="xianshi"></td></tr><td>0001&#160;1000</td><td align="center">18</td><td align="center" class="xianshi"></td></tr><td>0001&#160;1001</td><td align="center">19</td><td align="center" class="xianshi"></td></tr><td>0001&#160;1010</td><td align="center">1A</td><td align="center" class="xianshi"></td></tr><td>0001&#160;1011</td><td align="center">1B</td><td align="center" class="xianshi"></td></tr><td>0001&#160;1100</td><td align="center">1C</td><td align="center" class="xianshi"></td></tr><td>0001&#160;1101</td><td align="center">1D</td><td align="center" class="xianshi"></td></tr><td>0001&#160;1110</td><td align="center">1E</td><td align="center" class="xianshi"></td></tr><td>0001&#160;1111</td><td align="center">1F</td><td align="center" class="xianshi"></td></tr><td>0111&#160;1111</td><td align="center">7F</td><td align="center" class="xianshi"></td></tr></table><table width="100%" border="0" cellspacing="0" cellpadding="0"><td><thead><th>二进制</th><th>十六进制</th></tr><tbody><td>0010&#160;0000</td><td align="center">20</td></tr><td>0010&#160;0001</td><td align="center">21</td></tr><td>0010&#160;0010</td><td align="center">22</td></tr><td>0010&#160;0011</td><td align="center">23</td></tr><td>0010&#160;0100</td><td align="center">24</td></tr><td>0010&#160;0101</td><td align="center">25</td></tr><td>0010&#160;0110</td><td align="center">26</td></tr><td>0010&#160;0111</td><td align="center">27</td></tr><td>0010&#160;1000</td><td align="center">28</td></tr><td>0010&#160;1001</td><td align="center">29</td></tr><td>0010&#160;1010</td><td align="center">2A</td></tr><td>0010&#160;1011</td><td align="center">2B</td></tr><td>0010&#160;1100</td><td align="center">2C</td></tr><td>0010&#160;1101</td><td align="center">2D</td></tr><td>0010&#160;1110</td><td align="center">2E</td></tr><td>0010&#160;1111</td><td align="center">2F</td></tr><td>0011&#160;0000</td><td align="center">30</td></tr><td>0011&#160;0001</td><td align="center">31</td></tr><td>0011&#160;0010</td><td align="center">32</td></tr><td>0011&#160;0011</td><td align="center">33</td></tr><td>0011&#160;0100</td><td align="center">34</td></tr><td>0011&#160;0101</td><td align="center">35</td></tr><td>0011&#160;0110</td><td align="center">36</td></tr><td>0011&#160;0111</td><td align="center">37</td></tr><td>0011&#160;1000</td><td align="center">38</td></tr><td>0011&#160;1001</td><td align="center">39</td></tr><td>0011&#160;1010</td><td align="center">3A</td></tr><td>0011&#160;1011</td><td align="center">3B</td></tr><td>0011&#160;1100</td><td align="center">3C</td></tr><td>0011&#160;1101</td><td align="center">3D</td></tr><td>0011&#160;1110</td><td align="center">3E</td></tr><td>0011&#160;1111</td><td align="center">3F</td></tr></td><td><thead><th>二进制</th><th>十六进制</th></tr><tbody><td>0100&#160;0000</td><td align="center">40</td></tr><td>0100&#160;0001</td><td align="center">41</td></tr><td>0100&#160;0010</td><td align="center">42</td></tr><td>0100&#160;0011</td><td align="center">43</td></tr><td>0100&#160;0100</td><td align="center">44</td></tr><td>0100&#160;0101</td><td align="center">45</td></tr><td>0100&#160;0110</td><td align="center">46</td></tr><td>0100&#160;0111</td><td align="center">47</td></tr><td>0100&#160;1000</td><td align="center">48</td></tr><td>0100&#160;1001</td><td align="center">49</td></tr><td>0100&#160;1010</td><td align="center">4A</td></tr><td>0100&#160;1011</td><td align="center">4B</td></tr><td>0100&#160;1100</td><td align="center">4C</td></tr><td>0100&#160;1101</td><td align="center">4D</td></tr><td>0100&#160;1110</td><td align="center">4E</td></tr><td>0100&#160;1111</td><td align="center">4F</td></tr><td>0101&#160;0000</td><td align="center">50</td></tr><td>0101&#160;0001</td><td align="center">51</td></tr><td>0101&#160;0010</td><td align="center">52</td></tr><td>0101&#160;0011</td><td align="center">53</td></tr><td>0101&#160;0100</td><td align="center">54</td></tr><td>0101&#160;0101</td><td align="center">55</td></tr><td>0101&#160;0110</td><td align="center">56</td></tr><td>0101&#160;0111</td><td align="center">57</td></tr><td>0101&#160;1000</td><td align="center">58</td></tr><td>0101&#160;1001</td><td align="center">59</td></tr><td>0101&#160;1010</td><td align="center">5A</td></tr><td>0101&#160;1011</td><td align="center">5B</td></tr><td>0101&#160;1100</td><td align="center">5C</td></tr><td>0101&#160;1101</td><td align="center">5D</td></tr><td>0101&#160;1110</td><td align="center">5E</td></tr><td>0101&#160;1111</td><td align="center">5F</td></tr></table><td>&#160;</td><table width="100%" border="0" cellpadding="0" cellspacing="0" class="bx"><tr valign="bottom"><th>十进制</th><th>图形</th></thead><tr><td align="center">96</td><td align="center">`</td><tr><td align="center">97</td><td align="center">a</td><tr><td align="center">98</td><td align="center">b</td><tr><td align="center">99</td><td align="center">c</td><tr><td align="center">100</td><td align="center">d</td><tr><td align="center">101</td><td align="center">e</td><tr><td align="center">102</td><td align="center">f</td><tr><td align="center">103</td><td align="center">g</td><tr><td align="center">104</td><td align="center">h</td><tr><td align="center">105</td><td align="center">i</td><tr><td align="center">106</td><td align="center">j</td><tr><td align="center">107</td><td align="center">k</td><tr><td align="center">108</td><td align="center">l</td><tr><td align="center">109</td><td align="center">m</td><tr><td align="center">110</td><td align="center">n</td><tr><td align="center">111</td><td align="center">o</td><tr><td align="center">112</td><td align="center">p</td><tr><td align="center">113</td><td align="center">q</td><tr><td align="center">114</td><td align="center">r</td><tr><td align="center">115</td><td align="center">s</td><tr><td align="center">116</td><td align="center">t</td><tr><td align="center">117</td><td align="center">u</td><tr><td align="center">118</td><td align="center">v</td><tr><td align="center">119</td><td align="center">w</td><tr><td align="center">120</td><td align="center">x</td><tr><td align="center">121</td><td align="center">y</td><tr><td align="center">122</td><td align="center">z</td><tr><td align="center">123</td><td align="center">{</td><tr><td align="center">124</td><td align="center">|</td><tr><td align="center">125</td><td align="center">}</td><tr><td align="center">126</td><td align="center">~</td>
2.我们再优化下标签,将标签里面的属性去掉
dates =dates.replaceAll("<td().*?>","<td>");
dates =dates.replaceAll("<tr().*?>","<tr>");

得到数据

<td>0000&#160;0000</td><td>00</td><td></td></tr><td>0000&#160;0001</td><td>01</td><td></td></tr><td>0000&#160;0010</td><td>02</td><td></td></tr><td>0000&#160;0011</td><td>03</td><td></td></tr><td>0000&#160;0100</td><td>04</td><td></td></tr><td>0000&#160;0101</td><td>05</td><td></td></tr><td>0000&#160;0110</td><td>06</td><td></td></tr><td>0000&#160;0111</td><td>07</td><td></td></tr><td>0000&#160;1000</td><td>08</td><td></td></tr><td>0000&#160;1001</td><td>09</td><td></td></tr><td>0000&#160;1010</td><td>0A</td><td></td></tr><td>0000&#160;1011</td><td>0B</td><td></td></tr><td>0000&#160;1100</td><td>0C</td><td></td></tr><td>0000&#160;1101</td><td>0D</td><td></td></tr><td>0000&#160;1110</td><td>0E</td><td></td></tr><td>0000&#160;1111</td><td>0F</td><td></td></tr><td>0001&#160;0000</td><td>10</td><td></td></tr><td>0001&#160;0001</td><td>11</td><td></td></tr><td>0001&#160;0010</td><td>12</td><td></td></tr><td>0001&#160;0011</td><td>13</td><td></td></tr><td>0001&#160;0100</td><td>14</td><td></td></tr><td>0001&#160;0101</td><td>15</td><td></td></tr><td>0001&#160;0110</td><td>16</td><td></td></tr><td>0001&#160;0111</td><td>17</td><td></td></tr><td>0001&#160;1000</td><td>18</td><td></td></tr><td>0001&#160;1001</td><td>19</td><td></td></tr><td>0001&#160;1010</td><td>1A</td><td></td></tr><td>0001&#160;1011</td><td>1B</td><td></td></tr><td>0001&#160;1100</td><td>1C</td><td></td></tr><td>0001&#160;1101</td><td>1D</td><td></td></tr><td>0001&#160;1110</td><td>1E</td><td></td></tr><td>0001&#160;1111</td><td>1F</td><td></td></tr><td>0111&#160;1111</td><td>7F</td><td></td></tr></table><table width="100%" border="0" cellspacing="0" cellpadding="0"><td><thead><th>二进制</th><th>十六进制</th></tr><tbody><td>0010&#160;0000</td><td>20</td></tr><td>0010&#160;0001</td><td>21</td></tr><td>0010&#160;0010</td><td>22</td></tr><td>0010&#160;0011</td><td>23</td></tr><td>0010&#160;0100</td><td>24</td></tr><td>0010&#160;0101</td><td>25</td></tr><td>0010&#160;0110</td><td>26</td></tr><td>0010&#160;0111</td><td>27</td></tr><td>0010&#160;1000</td><td>28</td></tr><td>0010&#160;1001</td><td>29</td></tr><td>0010&#160;1010</td><td>2A</td></tr><td>0010&#160;1011</td><td>2B</td></tr><td>0010&#160;1100</td><td>2C</td></tr><td>0010&#160;1101</td><td>2D</td></tr><td>0010&#160;1110</td><td>2E</td></tr><td>0010&#160;1111</td><td>2F</td></tr><td>0011&#160;0000</td><td>30</td></tr><td>0011&#160;0001</td><td>31</td></tr><td>0011&#160;0010</td><td>32</td></tr><td>0011&#160;0011</td><td>33</td></tr><td>0011&#160;0100</td><td>34</td></tr><td>0011&#160;0101</td><td>35</td></tr><td>0011&#160;0110</td><td>36</td></tr><td>0011&#160;0111</td><td>37</td></tr><td>0011&#160;1000</td><td>38</td></tr><td>0011&#160;1001</td><td>39</td></tr><td>0011&#160;1010</td><td>3A</td></tr><td>0011&#160;1011</td><td>3B</td></tr><td>0011&#160;1100</td><td>3C</td></tr><td>0011&#160;1101</td><td>3D</td></tr><td>0011&#160;1110</td><td>3E</td></tr><td>0011&#160;1111</td><td>3F</td></tr></td><td><thead><th>二进制</th><th>十六进制</th></tr><tbody><td>0100&#160;0000</td><td>40</td></tr><td>0100&#160;0001</td><td>41</td></tr><td>0100&#160;0010</td><td>42</td></tr><td>0100&#160;0011</td><td>43</td></tr><td>0100&#160;0100</td><td>44</td></tr><td>0100&#160;0101</td><td>45</td></tr><td>0100&#160;0110</td><td>46</td></tr><td>0100&#160;0111</td><td>47</td></tr><td>0100&#160;1000</td><td>48</td></tr><td>0100&#160;1001</td><td>49</td></tr><td>0100&#160;1010</td><td>4A</td></tr><td>0100&#160;1011</td><td>4B</td></tr><td>0100&#160;1100</td><td>4C</td></tr><td>0100&#160;1101</td><td>4D</td></tr><td>0100&#160;1110</td><td>4E</td></tr><td>0100&#160;1111</td><td>4F</td></tr><td>0101&#160;0000</td><td>50</td></tr><td>0101&#160;0001</td><td>51</td></tr><td>0101&#160;0010</td><td>52</td></tr><td>0101&#160;0011</td><td>53</td></tr><td>0101&#160;0100</td><td>54</td></tr><td>0101&#160;0101</td><td>55</td></tr><td>0101&#160;0110</td><td>56</td></tr><td>0101&#160;0111</td><td>57</td></tr><td>0101&#160;1000</td><td>58</td></tr><td>0101&#160;1001</td><td>59</td></tr><td>0101&#160;1010</td><td>5A</td></tr><td>0101&#160;1011</td><td>5B</td></tr><td>0101&#160;1100</td><td>5C</td></tr><td>0101&#160;1101</td><td>5D</td></tr><td>0101&#160;1110</td><td>5E</td></tr><td>0101&#160;1111</td><td>5F</td></tr></table><td>&#160;</td><table width="100%" border="0" cellpadding="0" cellspacing="0" class="bx"><tr><th>十进制</th><th>图形</th></thead><tr><td>96</td><td>`</td><tr><td>97</td><td>a</td><tr><td>98</td><td>b</td><tr><td>99</td><td>c</td><tr><td>100</td><td>d</td><tr><td>101</td><td>e</td><tr><td>102</td><td>f</td><tr><td>103</td><td>g</td><tr><td>104</td><td>h</td><tr><td>105</td><td>i</td><tr><td>106</td><td>j</td><tr><td>107</td><td>k</td><tr><td>108</td><td>l</td><tr><td>109</td><td>m</td><tr><td>110</td><td>n</td><tr><td>111</td><td>o</td><tr><td>112</td><td>p</td><tr><td>113</td><td>q</td><tr><td>114</td><td>r</td><tr><td>115</td><td>s</td><tr><td>116</td><td>t</td><tr><td>117</td><td>u</td><tr><td>118</td><td>v</td><tr><td>119</td><td>w</td><tr><td>120</td><td>x</td><tr><td>121</td><td>y</td><tr><td>122</td><td>z</td><tr><td>123</td><td>{</td><tr><td>124</td><td>|</td><tr><td>125</td><td>}</td><tr><td>126</td><td>~</td>

最后的数据就只有和标签了,大家根据自己需求再对数据进行处理就可以了

  • 4
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值