用python库调查网站背景

用python库调查网站背景

了解网站的背景信息,比如

  • 网站地图
  • 网站大小
  • 网站所用的架构
  • 网站所有者

网站地图

网站提供的Sitemap文件(即网站地图)可以帮助爬虫定位网站最新的内容,而无须爬取每一个网页。如果想要了解更多信息,可以从http://www.sitemaps.org/protocol.html 获取网站地图标准的定义。

估算网站大小

这里写图片描述

网站所用的架构

PC:~/Project/python$ sudo pip install builtwith
PC:~/Project/python$ python2.7
Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import builtwith
>>> builtwith.parse('http://www.baidu.com')
{u'javascript-frameworks': [u'jQuery']}

寻找网站所有者

PC:~/Project/python$ sudo pip install python-whois
Collecting python-whois
  Downloading python-whois-0.6.5.tar.gz
Collecting future (from python-whois)
  Downloading future-0.16.0.tar.gz (824kB)
    100% |████████████████████████████████| 829kB 94kB/s 
Installing collected packages: future, python-whois
  Running setup.py install for future ... done
  Running setup.py install for python-whois ... done
Successfully installed future-0.16.0 python-whois-0.6.5
PC:~/Project/python$ python2.7
Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import whois
>>> print whois.whois('zhaozhoutea.com')
{
  "updated_date": [
    "2017-04-04 00:00:00", 
    "2017-04-04 10:29:46"
  ], 
  "status": "clientTransferProhibited https://icann.org/epp#clientTransferProhibited", 
  "name": "Talos Gabor", 
  "dnssec": "unsigned", 
  "city": "Budapest", 
  "expiration_date": [
    "2018-04-04 00:00:00", 
    "2018-04-04 04:00:00"
  ], 
  "zipcode": "1022", 
  "domain_name": [
    "ZHAOZHOUTEA.COM", 
    "zhaozhoutea.com"
  ], 
  "country": "HU", 
  "whois_server": "whois.onlinenic.com", 
  "state": "Pest megye", 
  "registrar": "Onlinenic Inc", 
  "referral_url": "http://www.onlinenic.com", 
  "address": "Herman Otto u. 25/A.", 
  "name_servers": [
    "NS1.E-TIGER.NET", 
    "NS2.NS0.HU", 
    "ns1.e-tiger.net", 
    "ns2.ns0.hu"
  ], 
  "org": "Talos Gabor", 
  "creation_date": [
    "2014-04-04 00:00:00", 
    "2014-04-04 04:00:00"
  ], 
  "emails": [
    "onlinenic-enduser@onlinenic.com", 
    "mediacenter@mediacenter.hu"
  ]
}
>>> 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值