子域名收集工具编写

子域名收集工具编写

多线程爆破

1.提取文件中的子域名字典,循环将subdomain.domain(测试子域名)压入队列中,准备传入多线程类。

q = queue.Queue()
        path = os.path.join(common.get_root_path(), 'dir\dic.txt')
        with open(path, 'r') as f:
            r = f.readlines()
        for i in r:
            i = i.strip('\n')
            url = '{subdomain}.{domain}'.format(subdomain=i, domain=self.domain)
            q.put(url)

2.多线程类:从队列中获得url,调用dns库的resolve进行A类型解析,如果能解析出A记录,即根据域名解析出了对应的ip,则认为此子域名存在存入result数组中。

class Brute(threading.Thread):
	   def __init__(self, q, result, total):
	       threading.Thread.__init__(self)
	       self.q = q
	       self.result = result
	       self.total = total
	
	   def run(self) -> None:
	       while not self.q.empty():
	           url = self.q.get()
	           self.msg()
	           try:
	               # 调用dns库中的resolve方法对url进行A类型dns查询
	               A = resolver.resolve(url, 'A')
	               if A.response.answer:
	                   self.result.append(url)
	           except Exception:
	               pass
	
	   def msg(self):
	       left = self.q.qsize()
	       total = self.total
	       per = ((total - left) / total) * 100
	       sys.stdout.write('\r{} left {} total|{:.2f}% scan'.format(left, total, per))

3.运行多线程模块爬取结果

st = time.time()
print('开始爆破子域名{}'.format(time.strftime('%X')))
threads = []
result = []
thread_count = 2000
total = q.qsize()
for i in range(thread_count):
    threads.append(self.Brute(q, result, total))

for t in threads:
    t.start()

for t in threads:
    t.join()

path = common.get_result(domain=self.domain, type='brute.json')
common.save_result(path, result)
print('\n爆破结束{}  共计用时{:.2f}s'.format(time.strftime('%X'), time.time() - st))

多进程和协程爬取

本模块采用多进程和协程爬取第三方网站子域名记录 以chinaz为例
1.先获取chinaz针对domain查询的结果总页数,把所有结果页面的url存入列表中

def get_urls(domain):
    urls = []
    url = 'https://tool.chinaz.com/subdomain?domain=' + domain
    # 得到一个总页数
    try:
        r = requests.get(url=url)
        soup = BeautifulSoup(r.text, features='lxml')
        pages = soup.find(name='span', attrs={'class': 'col-gray02'})
        s = pages.string
        index_p = s.index('页')
        page = int(s[1:index_p])
        print('chinaz查询总页数:' + str(page))
        # 得到所有的结果页面url
        for i in range(1, page + 1):
            urls.append('https://tool.chinaz.com/subdomain?domain=' + domain + '&page=' + str(i))
        return urls
    except AttributeError:
        print('chinaz收集失败,没有查询到相关的子域名')
        return None

2.协程函数:访问传入的chinaz页面并获得当前页面查询的所有子域名结果,存入subdomains列表

async def spider2(url, subdomains):
    # print('正在请求',url)
    async with(sem):
        async with aiohttp.ClientSession() as session:
            async with session.request('GET', url) as resp:
                # print(resp.status)
                # 解析网页这个行为可以被切换不用一直等待
                content = await resp.text()
                # 从aiohttp response中解析出子域名的链接
                soup = BeautifulSoup(content, features='lxml')
                subs = soup.find_all(name='div', attrs={'class': 'w23-0 subdomain'})
                for sub in subs:
                    # print(sub.a.string)
                    subdomains.append(sub.a.string)

3.定义事件循环列表,打包协程任务加入事件循环并启动

class chinaz(object):
    def __init__(self, domain):
        self.domain = domain

    def run(self):
        st = time.time()
        print('begin to collect from chinaz:{}'.format(time.strftime('%X')))
        subdomains = []
        urls = get_urls(self.domain)
        if not urls:
            print('chinaz没有查询到子域名')
            return
        # 打包任务 并开启事件循环
        coroutines = []
        for url in urls:
            coroutines.append(spider2(url, subdomains))
        loop = asyncio.get_event_loop()
        loop.run_until_complete(asyncio.wait(coroutines))
        # 将得到的subdomains结果数组保存
        path = common.get_result(domain=self.domain, type='chinaz.json')
        common.save_result(path, subdomains)
        print('end to collect from chinaz:{}   共计用时{:.2f}s'.format(time.strftime('%X'), time.time()-st))
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值