动手实现简易网站目录扫描器（桌面窗口版）——WebScannerTkl

最新推荐文章于 2024-06-10 09:35:22 发布

垃圾管理员

最新推荐文章于 2024-06-10 09:35:22 发布

阅读量2.9k

点赞数 2

分类专栏： Python手记信息安全

本文链接：https://blog.csdn.net/qq_41500251/article/details/107645886

版权

Python手记同时被 2 个专栏收录

33 篇文章 9 订阅

订阅专栏

信息安全

16 篇文章 12 订阅

订阅专栏

效果展示

在这里插入图片描述
项目目录，与命令行版扫描器同:

前言

这篇文章与前一篇原理相同，都是对生成的可能链接进行试错验证，所以我们不再讨论原理部分，主要内容放在语言的继承和窗口的可视化上。

图形化界面我采用了tkinter标准库，没有什么特殊原因，只是不需要另外安装。

首先，先明确一件事情，tkinter不属于Python语言本身。tkinter是一个开放源码的图形接口开发工具，最初是用TCL编写的GUI函数库，它提供许多图形接口，Python将其移植到自己的内建模块里。所以说，tkinter就是tkinter它自己，Python只是借用，并不属于自己的一部分。

其中每个部件我们有个统一的名称叫“控件”，英文为widget。

你可以选择任意你习惯的第三方库，能达成效果就好。

设计实现

可视化

因为核心部分在命令行版就已经做了分析与设计，所以我们从GUI开始。

在这里插入图片描述
简单将其分成五部分来进行说明。（在实际的设计中，应当是预先想好设计图纸，然后再根据特性去实现。我们现在反过来了，看着已经实现的东西，来编写代码。）

接收根目录的Widget（1）

这里很好找到适合的控件，只是个简单的接收输入功能嘛。用Entry就好：

window = tk.Tk()
window.title('Web Scanner    by 刑者')
window.geometry('600x400')
window.config(background='#CCE8CF')

entry = tk.Entry(window)
entry.place(x=100, y=10)

接收进程数与超时时长的widget（2）

第二部分的两个使用了同一widget——Combobox，它属于tkinter里ttk的内容。

Combobox可以当成Entry和下拉菜单的组合，文章开头的截图里没有进行数值选择的演示，大家可以自行操作。

from tkinter import ttk

thread_num = tk.StringVar()
comboxlist = ttk.Combobox(window, textvariable=thread_num)
comboxlist["values"] = ('1', '3', '5')
comboxlist.current(0)
comboxlist.place(x=100, y=50)


timeout = tk.StringVar()
comboxlist1 = ttk.Combobox(window, textvariable=timeout)
comboxlist1["values"] = ('1', '0.5', '0.1')
comboxlist1.current(0)
comboxlist1.place(x=100, y=90)

二者分别规定了可选范围（values），其中的current方法是设置默认数值，比如两个combobox都是使用current（0）作为默认值，在运行是它就会分别取values中下标为0的值，作为显示值。

到这大家可能看到了，我所有的widget都是使用place方法进行的配置。我必须要承认，这不是一个好的配置方式。在大部分情况下，应当优先选择pack方法，pack使用相对位置的概念处理控件配置，是三个配置器中，灵活性最高，延展性最好的一个。（另一个是grid，一般在设计表格型布局的时候使用，比如我们这里的第三部分——复选框。）

上面所说的三个方法呢，用来包装和定位各组件在容器或者窗口内的位置，我们也叫做窗口控件配置管理器（Widget Layout Manager，有的书上翻译成了配置管理员，我觉得不好听）。

而我这里使用place绝对位置来处理，直接固定住控件的位置的原因，只是单纯的因为懒。（使用pack会多一些参数，这个人懒得调参~）

选择目录字典（3）

在图中的第三部分，我们给了一组复选框，让用户根据网站开发情况，有针对性的选择合适的字典进行扫描。

复选框，使用Checkbutton。

你会怎样写这6个Checkbutton呢？可以这样：

status_asp = tk.BooleanVar()
status_aspx = tk.BooleanVar()
status_dir = tk.BooleanVar()
status_jsp = tk.BooleanVar()
status_mdb = tk.BooleanVar()
status_php = tk.BooleanVar()

cb_asp = tk.Checkbutton(window, text='ASP', variable=status_asp).place(x=400, y=10)
cb_aspx = tk.Checkbutton(window, text='ASPX', variable=status_aspx).place(x=400, y=50)
cb_dir = tk.Checkbutton(window, text="DIR", variable=status_dir).place(x=400, y=90)
cb_jsp = tk.Checkbutton(window, text="JSP", variable=status_jsp).place(x=500, y=10)
cb_mdb = tk.Checkbutton(window, text="MDB", variable=status_mdb).place(x=500, y=50)
cb_php = tk.Checkbutton(window, text="PHP", variable=status_php).place(x=500, y=90)

读取字典的时候这样写：

def read_dict():
    ''' load the dict '''
    combin = []
    global status_asp, status_aspx, status_dir, status_jsp, status_mdb, status_php
    if status_asp.get():
        with open('ASP.txt') as f:
            combin.extend(f.read().split())
    if status_aspx.get():
        with open('ASPX.txt') as f:
            combin.extend(f.read().split())
    if status_dir.get():
        with open('DIR.txt') as f:
            combin.extend(f.read().split())
    if status_jsp.get():
        with open('JSP.txt') as f:
            combin.extend(f.read().split())
    if status_mdb.get():
        with open('MDB.txt') as f:
            combin.extend(f.read().split())
    if status_php.get():
        with open('PHP.txt') as f:
            combin.extend(f.read().split())
    return combin

当然了，这并不优雅。如果你是用pack来进行配置的话，修改起来可能会更加的简单，place的灵活性太差了。不过我们还是可以尝试把它写的优雅一些，如：

dicts = ["ASP", "ASPX", "DIR", "JSP", "MDB", "PHP"]
checkboxes = []
for i in range(3):
    checkboxes.append(tk.BooleanVar())
    tk.Checkbutton(window, text=dicts[i], variable=checkboxes[i]).place(x=400, y=10+40*i)
for i in range(3,6):
    checkboxes.append(tk.BooleanVar())
    tk.Checkbutton(window, text=dicts[i], variable=checkboxes[i]).place(x=500, y=10+40*(i-3))

是不是少了很多代码？看着就舒服。还是那个问题，因为place太死板了，我们这里需要两个循环，你可以尝试使用pack来配置，一个循环解决问题哈。

那读取字典的操作，也就可以简单化了：

def read_dict():
    ''' load the dict '''
    global dicts, checkboxes
    combin = []
    for i in range(len(checkboxes)):
        if checkboxes[i].get():
            with open(dicts[i]+".txt", "r") as f:
                combin.extend(f.read().split())
    return combin

不多说了，优不优雅自己体会。（在命令行版中的读取字典的操作，也向这里一样相当的冗余，你尝试着将它修改的优雅简洁些吧~）

运行按钮（4）

第四部分，就一个按钮，不用想，用Button。

当点击按钮时，启动扫描器。

def run():
    WST = WebScannerTkl()
    WST.run()
    
def scan():
    threading.Thread(target=run).start()
    
button = tk.Button(window, text="Let's go", command=scan)
button.place(x=280, y=150)

这里没有过多要说的，在run函数里创建扫描器对象，并调用对象的run方法进行扫描（两个run不要搞混啊）。这个类的设计大部分功能与上篇文章里的命令行版相同，我们只需要继承过来，重写一些方法即可。（放在后面说）

那，点击按钮的时候，为什么不让它直接执行run，而是通过scan创建个线程，让线程去执行呢？

还记得第一篇文章（端口扫描器），我们最后讨论的GIL吗？窗口的显示是个不断的循环输出，如果让进程去执行函数功能，在这段执行时间段里，没人来循环界面，界面就会出现“卡死”的情况。忘记的小伙伴去翻下吧。传送门

输出框（5）

第五部分，实时输出扫描结果。它不需要什么其他的功能，用Text就好。

t = tk.Text(window, height=10)
t.place(x=10, y=220)

（再次认错，代码是很久之前写的，里面变量起的名字让人抓狂，你在编写的时候记得改正哈。）

五大部分完成了，其实还有一部分，就是在窗口的右下角有个进度标签，可以回到开头动态图片里查看。

它的设计呢，我是使用的Label标签，然后实时的修改变量的值。除了此处的标签，还有输入框前面的"HTTP URL"，“线程数”，“超时”都是使用的标签。

var_progress = tk.StringVar()

tk.Label(window, text='HTTP URL:').place(x=20, y=10)
tk.Label(window, text='进程数:').place(x=20, y=50)
tk.Label(window, text='请求超时:').place(x=20, y=90)
progress = tk.Label(window, textvariable=var_progress).place(x=500, y=370)

至此，可视化完成。

类的设计

我们前面说了，进行扫描的核心代码与命令行版本近乎相同，我们先把命令行版的贴出来：

class WebScanner(object):

    _running_time = 0
    _number_of_threads_completed = 0
    
    def __init__(self, host, thread_num, paths):
        """initialization object"""
        
        self.host = host if host[-1] != '/' else host[:-1]
        self.thread_num = thread_num
        self.paths = paths

    def animation(self):
        """opening animation"""
        
        return r'''
 __      __      ___.     _________                                         
/  \    /  \ ____\_ |__  /   _____/ ____ _____    ____   ____   ___________ 
\   \/\/   // __ \| __ \ \_____  \_/ ___\\__  \  /    \ /    \_/ __ \_  __ \
 \        /\  ___/| \_\ \/        \  \___ / __ \|   |  \   |  \  ___/|  | \/
  \__/\  /  \___  >___  /_______  /\___  >____  /___|  /___|  /\___  >__|   
       \/       \/    \/        \/     \/     \/     \/     \/     \/       
        '''

    def run(self):
        """start scanner"""

        WebScanner._running_time = time.time()
        for i in range(self.thread_num):
            threading.Thread(target=self._subthread).start()

    def _subthread(self):
        "get url from dictionary and try to connect"

        while self.paths:
            sub_url = self.paths.pop()
            headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) '
                        'Chrome/51.0.2704.63 Safari/537.36'}
            url = self.host + sub_url
            req = urllib.request.Request(url=url, headers=headers)
            try:
                ##Discard requests longer than two seconds
                request = urllib.request.urlopen(req, timeout=1)
                result = request.geturl(), request.getcode(), len(request.read())
                print(result)
            except:
                pass
        WebScanner._number_of_threads_completed += 1
        if WebScanner._number_of_threads_completed == self.thread_num:
            print("Cost time {} seconds.".format(time.time() - WebScanner._running_time))

试图分析，我们需要修改哪些部分。

animation我们不需要，不管他。run函数不需要动，没有问题。进程函数呢~，它就需要改变了。

在命令行版本中，对扫描结果我们直接进行print输出，而在GUI版中，我们需要输出到Text里，这是需要改变的地方。还需要做别的事吗？需要，还有个进度条呢？每次执行扫描，都要把右下角的Label内容加一，这样才是实时的情况。这里的Text和Label绑定的变量，都将其作为对象初始化的一个属性。那加上超时属性，总共三个新属性需要添加。分析完毕。

综上，只需修改_subthread和__init__函数即可：

class WebScannerTkl(WebScanner):

    def __init__(self, host, thread_num, paths, timeout, print_, progress):
        super().__init__(host, thread_num, paths)
        self.print = print_
        self.progress = progress
        self.timeout = timeout

    def _subthread(self):
        "get url from dictionary and try to connect"
        length = len(self.paths)
        progress = 0
        while self.paths:
            sub_url = self.paths.pop()
            headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) '
                        'Chrome/51.0.2704.63 Safari/537.36'}
            url = self.host + sub_url
            req = urllib.request.Request(url=url, headers=headers)
            try:
                ##Discard requests longer than two seconds
                request = urllib.request.urlopen(req, timeout=self.timeout)
                result = request.geturl(), request.getcode(), len(request.read())
                self.print.insert("insert", str(result)+"\n")
            except:
                pass
            finally:
                progress += 1
                self.progress.set("进度:" + str(progress) + r'/' + str(length))
        WebScanner._number_of_threads_completed += 1
        if WebScanner._number_of_threads_completed == self.thread_num:
            self.print.insert("insert","Cost time {} seconds.".format(time.time() - WebScanner._running_time))

现在是不是体会到继承的强大了，少写了多少代码呀~。

现在，按钮的执行函数也就可以确认了：

def run():
    WST = WebScannerTkl(entry.get(), int(thread_num.get()), \
                        read_dict(), float(timeout.get()), \
                        t, var_progress)
    WST.run()

分别传入，根目录，线程数，字典列表，超时时间，输出框（Text），进度标签变量进行初始化。（还是那个问题，变量名没有起好，理解了就行哈。）

节目清单

完整代码：

import tkinter as tk
from tkinter import ttk
from web import WebScanner
import threading
import urllib.request
import time

class WebScannerTkl(WebScanner):

    def __init__(self, host, thread_num, paths, timeout, print_, progress):
        super().__init__(host, thread_num, paths)
        self.print = print_
        self.progress = progress
        self.timeout = timeout

    def _subthread(self):
        "get url from dictionary and try to connect"
        length = len(self.paths)
        progress = 0
        while self.paths:
            sub_url = self.paths.pop()
            headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) '
                        'Chrome/51.0.2704.63 Safari/537.36'}
            url = self.host + sub_url
            req = urllib.request.Request(url=url, headers=headers)
            try:
                ##Discard requests longer than two seconds
                request = urllib.request.urlopen(req, timeout=self.timeout)
                result = request.geturl(), request.getcode(), len(request.read())
                self.print.insert("insert", str(result)+"\n")
            except:
                pass
            finally:
                progress += 1
                self.progress.set("进度:" + str(progress) + r'/' + str(length))
        WebScanner._number_of_threads_completed += 1
        if WebScanner._number_of_threads_completed == self.thread_num:
            self.print.insert("insert","Cost time {} seconds.".format(time.time() - WebScanner._running_time))


def read_dict():
    ''' load the dict '''
    global dicts, checkboxes
    combin = []
    for i in range(len(checkboxes)):
        if checkboxes[i].get():
            with open(dicts[i]+".txt", "r") as f:
                combin.extend(f.read().split())
    return combin

def run():
    WST = WebScannerTkl(entry.get(), int(thread_num.get()), \
                        read_dict(), float(timeout.get()), \
                        t, var_progress)
    WST.run()


window = tk.Tk()
window.title('Web Scanner    by 刑者')
window.geometry('600x400')
window.config(background='#CCE8CF')

entry = tk.Entry(window)
entry.place(x=100, y=10)

thread_num = tk.StringVar()
comboxlist = ttk.Combobox(window, textvariable=thread_num)
comboxlist["values"] = ('1', '3', '5')
comboxlist.current(0)
comboxlist.place(x=100, y=50)


timeout = tk.StringVar()
comboxlist1 = ttk.Combobox(window, textvariable=timeout)
comboxlist1["values"] = ('1', '0.5', '0.1')
comboxlist1.current(0)
comboxlist1.place(x=100, y=90)

dicts = ["ASP", "ASPX", "DIR", "JSP", "MDB", "PHP"]
checkboxes = []
for i in range(3):
    checkboxes.append(tk.BooleanVar())
    tk.Checkbutton(window, text=dicts[i], variable=checkboxes[i]).place(x=400, y=10+40*i)
for i in range(3,6):
    checkboxes.append(tk.BooleanVar())
    tk.Checkbutton(window, text=dicts[i], variable=checkboxes[i]).place(x=500, y=10+40*(i-3))

t = tk.Text(window, height=10)
t.place(x=10, y=220)

def scan():
    threading.Thread(target=run).start()
    
button = tk.Button(window, text="Let's go", command=scan)
button.place(x=280, y=150)

var_progress = tk.StringVar()

tk.Label(window, text='HTTP URL:').place(x=20, y=10)
tk.Label(window, text='进程数:').place(x=20, y=50)
tk.Label(window, text='请求超时:').place(x=20, y=90)
progress = tk.Label(window, textvariable=var_progress).place(x=500, y=370)

window.mainloop()

有些地方没有写注释，不过我们都一步步的分析过，应该在理解上没有什么问题。

发现我结束的有点突兀，但确实没有什么点值得写了，很多问题我们都在前两篇文章里讨论过，无需多谈。如仍旧有疑问，我们评论里见吧。

（下一篇是，PHP一句话连接工具（GUI版），下个月见吧。）

完。

垃圾管理员

关注

2
点赞
踩
4

收藏

觉得还不错? 一键收藏
1
评论
动手实现简易网站目录扫描器（桌面窗口版）——WebScannerTkl

效果展示项目目录，同命令窗口版:前言如果你的php环境是phpstudy，在整片文章开始前，请检查是否为2016或2018版，如果是请将phpstudy\PHPTutorial\php\php-5.4.45\ext\php_xmlrpc.dll文件，用5.4.45-nts目录下面的同名文件做替换，5.2.17下的也做同样处理。（保险起见，无论有无隐患，都可以操作下。如果想确认隐患，可打开前述dll文件，查找"eval"关键字。）...
复制链接

扫一扫