java写的一个简单的爬虫(jsoup)

public class jsoup {
	
	public void spider(){
		Connection conn = Jsoup.connect("http://news.baidu.com/ns");
		try {
			Document dom = conn.userAgent("jsoup")
					.timeout(30000)
					.data("word", "java爬虫")
					.get();
			Element res = dom.getElementById("content_left");
			Elements elements = res.getElementsByTag("a");
			for(Element e:elements){
				System.out.println(e.text());
				System.out.println(e.attr("href"));
				System.out.println("~~~~~~~~~~~~~~~~~~~~~~~~~~~");
			}
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
		
	}
	
	public static void main(String[] args){
		
		(new jsoup()).spider();
		
	}

}
x
javascript:void(0)
~~~~~~~~~~~~~~~~~~~~~~~~~~~
SeimiCrawler v0.3.2 发布,Java爬虫框架
http://www.oschina.net/news/73499/seimicrawler-v-0-3-2
~~~~~~~~~~~~~~~~~~~~~~~~~~~
百度快照
http://cache.baidu.com/c?m=9d78d513d99b12eb0bfa950e1a67d071685697133dc0a7116b93d3169c3e1d070571e2c83a3f554196d27c105cee1806b1ac656537747ce0ddd5d41d98fa8f2d2e8e2c3f6d5dd11a4d8848ef98037b9660875a9feb0ee7ccf22593d8d3c4df2253&p=9b74d216d9c10bff57ed977a470d80&newp=8973de038c934eaf5be9c32d02148f231610db2151d4d4126b82c825d7331b001c3bbfb423221b02d7c4766501a44d5ee0fa3075360021a3dda5c91d9fb4c57479df&user=baidu&fm=sc&query=java%C5%C0%B3%E6&qid=aea57cc8000018a6&p1=1
~~~~~~~~~~~~~~~~~~~~~~~~~~~
webmagic 0.2.1 发布,Java爬虫框架
http://www.linuxidc.com/Linux/2013-08/89178.htm
~~~~~~~~~~~~~~~~~~~~~~~~~~~
百度快照
http://cache.baidu.com/c?m=9f65cb4a8c8507ed4fece763105392230e54f73d678b975f2482c25f93130a1c187b9de07b655a19d3c77f6616af3f5ee0ed3c7934062aa09bbfd20c82afd7756fde286c2358d55613a30edecc5154c337e05bfed81d&p=882a9546d3d910f50abe9b7c4e0a9d&newp=882a9546d39f0bc304be9b7c164fc4231610db2151d4d4116b82c825d7331b001c3bbfb423221b02d7c4766501a44d5ee0fa3075360021a3dda5c91d9fb4c57479d46d582c&user=baidu&fm=sc&query=java%C5%C0%B3%E6&qid=aea57cc8000018a6&p1=2
~~~~~~~~~~~~~~~~~~~~~~~~~~~
一看就明白的爬虫入门讲解:基础理论篇(上篇)
http://www.chinaz.com/web/2015/1123/473874.shtml?qq-pf-to=pcqq.group
~~~~~~~~~~~~~~~~~~~~~~~~~~~
2条相同新闻
/ns?word=java%E7%88%AC%E8%99%AB+cont:1426151410&same=2&cl=1&tn=news&rn=30&fm=sd
~~~~~~~~~~~~~~~~~~~~~~~~~~~
百度快照
http://cache.baidu.com/c?m=9f65cb4a8c8507ed4fece763105392230e54f732668c8c4637c3933fc239045c0231b3a627201303cec67f6700b24f59ebfa3374200357f6c18ed714cabae66b6c9f274232489141649544b8ca30679063d34de9d84ca7e7b77087eb8f93895b0b9917566d81809c2b0703bb6de76430f4d19d&p=882a9546d4d910f50abe9b7c4e0a9d&newp=882a9546d49f0bc304be9b7c164fc4231610db2151d6d201298ffe0cc4241a1a1a3aecbf27291705d7c27e6d06a44d59eef53d743d0834f1f689df08d2ecce7e73c24b68&user=baidu&fm=sc&query=java%C5%C0%B3%E6&qid=aea57cc8000018a6&p1=3
~~~~~~~~~~~~~~~~~~~~~~~~~~~
基于java社会化海量数据采集爬虫框架搭建(附代码)(2)
http://www.zgjzx.com.cn/caijinggupiao/1187473_2.html
~~~~~~~~~~~~~~~~~~~~~~~~~~~
百度快照
http://cache.baidu.com/c?m=9f65cb4a8c8507ed4fece763105392230e54f72b698f985f68d4e419ce3b4c413037bfa676714b5c8899293246ed120fb7ed35713d0626b29adf8f3eddac925f75ce786a6459db0144dc41fc8f1532c050872beeb868e5ad803384d9d6&p=882a9546d6d910f50abe9b7c4e0a9d&newp=882a9546d69f0bc304be9b7c164fc4231610db2151d4db106b82c825d7331b001c3bbfb423221b02d7c4766501a44d5ee0fa3075360021a3dda5c91d9fb4c57479d46d582c&user=baidu&fm=sc&query=java%C5%C0%B3%E6&qid=aea57cc8000018a6&p1=4
~~~~~~~~~~~~~~~~~~~~~~~~~~~

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值