crawl.py

>crawl.py http://www.hao123.com/index.htm

结果如下:

parsedurl =  ParseResult(scheme='http', netloc='www.hao123.com', path='/index.htm', params='', query='', fragment='')
path = 
www.hao123.com/index.htm
ext =  ('www.hao123.com/index', '.htm')
path = 
www.hao123.com/index.htm
ldir =  www.hao123.com
ldir =  www.hao123.com
path =  www.hao123.com/index.htm
self.url =  http://www.hao123.com/index.htm
self.file =  www.hao123.com/index.htm
retval =  ('www.hao123.com/index.htm', <httplib.HTTPMessage instance at 0x010F9968>)

( 1 )
URL:
http://www.hao123.com/index.htm
FILE: www.hao123.com/index.htm
http://www.hao123.com                                         ... new, added to Q
http://www.hao123.com/redian/tongzhi.htm                      ... new, added to Q
http://utility.hao123.com/quality_form.php                    ... discarded, not in domain
*  javascript:void(0)                                            ... discarded, javascript
http://www.hao123.com/redian/scookie.htm                      ... new, added to Q
*  javascript:void(0)                                            ... discarded, javascript
*  javascript:void(0)                                            ... discarded, javascript
*  javascript:void(0)                                            ... discarded, javascript
http://www.hao123.com                                         ... discarded, already in Q
http://wenku.baidu.com                                        ... discarded, not in domain
http://baike.baidu.com                                        ... discarded, not in domain
http://jingyan.baidu.com                                      ... discarded, not in domain
http://hi.baidu.com                                           ... discarded, not in domain
http://top.baidu.com                                          ... discarded, not in domain
http://dict.baidu.com                                         ... discarded, not in domain
http://s.baidu.com                                            ... discarded, not in domain
http://www.baidu.com                                          ... discarded, not in domain
http://www.hao123.com/daquan/shfwsite.htm                     ... new, added to Q
http://www.hao123.com/netbuy.htm                              ... new, added to Q
http://www.hao123.com/caipiao.htm                             ... new, added to Q
http://www.hao123.com/haoserver/index.htm                     ... new, added to Q
http://www.hao123.com/tianqi.htm                              ... new, added to Q
http://www.hao123.com/stock.htm                               ... new, added to Q
http://www.hao123.com/stock3.htm                              ... new, added to Q
http://www.hao123.com/bankjt.htm                              ... new, added to Q
http://www.hao123.com/lvyou.htm                               ... new, added to Q

..........

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值