public class jsoup {
public void spider(){
Connection conn = Jsoup.connect("http://news.baidu.com/ns");
try {
Document dom = conn.userAgent("jsoup")
.timeout(30000)
.data("word", "java爬虫")
.get();
Element res = dom.getElementById("content_left");
Elements elements = res.getElementsByTag("a");
for(Element e:elements){
System.out.println(e.text());
System.out.println(e.attr("href"));
System.out.println("~~~~~~~~~~~~~~~~~~~~~~~~~~~");
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static void main(String[] args){
(new jsoup()).spider();
}
}
x
javascript:void(0)
~~~~~~~~~~~~~~~~~~~~~~~~~~~
SeimiCrawler v0.3.2 发布,Java爬虫框架
http://www.oschina.net/news/73499/seimicrawler-v-0-3-2
~~~~~~~~~~~~~~~~~~~~~~~~~~~
百度快照
http://cache.baidu.com/c?m=9d78d513d99b12eb0bfa950e1a67d071685697133dc0a7116b93d3169c3e1d070571e2c83a3f554196d27c105cee1806b1ac656537747ce0ddd5d41d98fa8f2d2e8e2c3f6d5dd11a4d8848ef98037b9660875a9feb0ee7ccf22593d8d3c4df2253&p=9b74d216d9c10bff57ed977a470d80&newp=8973de038c934eaf5be9c32d02148f231610db2151d4d4126b82c825d7331b001c3bbfb423221b02d7c4766501a44d5ee0fa3075360021a3dda5c91d9fb4c57479df&user=baidu&fm=sc&query=java%C5%C0%B3%E6&qid=aea57cc8000018a6&p1=1
~~~~~~~~~~~~~~~~~~~~~~~~~~~
webmagic 0.2.1 发布,Java爬虫框架
http://www.linuxidc.com/Linux/2013-08/89178.htm
~~~~~~~~~~~~~~~~~~~~~~~~~~~
百度快照
http://cache.baidu.com/c?m=9f65cb4a8c8507ed4fece763105392230e54f73d678b975f2482c25f93130a1c187b9de07b655a19d3c77f6616af3f5ee0ed3c7934062aa09bbfd20c82afd7756fde286c2358d55613a30edecc5154c337e05bfed81d&p=882a9546d3d910f50abe9b7c4e0a9d&newp=882a9546d39f0bc304be9b7c164fc4231610db2151d4d4116b82c825d7331b001c3bbfb423221b02d7c4766501a44d5ee0fa3075360021a3dda5c91d9fb4c57479d46d582c&user=baidu&fm=sc&query=java%C5%C0%B3%E6&qid=aea57cc8000018a6&p1=2
~~~~~~~~~~~~~~~~~~~~~~~~~~~
一看就明白的爬虫入门讲解:基础理论篇(上篇)
http://www.chinaz.com/web/2015/1123/473874.shtml?qq-pf-to=pcqq.group
~~~~~~~~~~~~~~~~~~~~~~~~~~~
2条相同新闻
/ns?word=java%E7%88%AC%E8%99%AB+cont:1426151410&same=2&cl=1&tn=news&rn=30&fm=sd
~~~~~~~~~~~~~~~~~~~~~~~~~~~
百度快照
http://cache.baidu.com/c?m=9f65cb4a8c8507ed4fece763105392230e54f732668c8c4637c3933fc239045c0231b3a627201303cec67f6700b24f59ebfa3374200357f6c18ed714cabae66b6c9f274232489141649544b8ca30679063d34de9d84ca7e7b77087eb8f93895b0b9917566d81809c2b0703bb6de76430f4d19d&p=882a9546d4d910f50abe9b7c4e0a9d&newp=882a9546d49f0bc304be9b7c164fc4231610db2151d6d201298ffe0cc4241a1a1a3aecbf27291705d7c27e6d06a44d59eef53d743d0834f1f689df08d2ecce7e73c24b68&user=baidu&fm=sc&query=java%C5%C0%B3%E6&qid=aea57cc8000018a6&p1=3
~~~~~~~~~~~~~~~~~~~~~~~~~~~
基于java社会化海量数据采集爬虫框架搭建(附代码)(2)
http://www.zgjzx.com.cn/caijinggupiao/1187473_2.html
~~~~~~~~~~~~~~~~~~~~~~~~~~~
百度快照
http://cache.baidu.com/c?m=9f65cb4a8c8507ed4fece763105392230e54f72b698f985f68d4e419ce3b4c413037bfa676714b5c8899293246ed120fb7ed35713d0626b29adf8f3eddac925f75ce786a6459db0144dc41fc8f1532c050872beeb868e5ad803384d9d6&p=882a9546d6d910f50abe9b7c4e0a9d&newp=882a9546d69f0bc304be9b7c164fc4231610db2151d4db106b82c825d7331b001c3bbfb423221b02d7c4766501a44d5ee0fa3075360021a3dda5c91d9fb4c57479d46d582c&user=baidu&fm=sc&query=java%C5%C0%B3%E6&qid=aea57cc8000018a6&p1=4
~~~~~~~~~~~~~~~~~~~~~~~~~~~