总觉得之前的网页扫描后台有点慢,今天上网学了多线程编程,记录一下
——————————————分割线——————————————
python的多线程一般使用threading库(还有一个thread库,没用过)
用这个语句创建一个线程:
import threading
t = threading.Thread(target = func,args = (arg1,arg2...))
t就是一个线程,其中:
参数target是指这个线程start()以后回调的函数
参数args是个n元组,是传入回调函数的参数
一般的多线程创建使用for循环:
threads = []
for i in range(3):
t = threading.Thread(target = func,args = (arg1,arg2...))
threads.append(t)
t.setDaemon(True)
t.start()
上面的代码创建并几乎同时(一般在同一秒内)启动了3个线程,setDaemon函数的调用,将线程t设置为守护线程,在后台运行(这是必须的)
使用多线程需要注意父线程可能先于子线程结束,为了防止这一点,引入join()方法
for t in threads:
t.join()
这段代码加在上面一段代码后面,在添入数组threads中的子线程对应的func没有结束之前,父线程会一直阻塞
使用多线程的情况,一般在发送url请求时使用,如果一个一个的发送,会造成很大的浪费;若一起发送很多请求则可以尽量减少浪费,下面是我用多线程优化后的url后台扫描(还添加了页面重定向的处理,修改前的代码见:http://blog.csdn.net/qq_29947311/article/details/52259632)
import urllib
import urllib2
import re
import threading
#rewrite the redirct class
class RedirctHandler(urllib2.HTTPRedirectHandler):
"""docstring for RedirctHandler"""
def http_error_301(self, req, fp, code, msg, headers):
pass
def http_error_302(self, req, fp, code, msg, headers):
pass
agent = "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:48.0) Gecko/20100101 Firefox/48.0"
headers = {"user-Agent":agent}
fpout = open("urlok.txt","w+")
fpres = open("result.txt","w")
threads1 = []
threads2 = []
def urlSend(url):
try:
req = urllib2.Request(url = url,headers = headers)
res = urllib2.urlopen(req)
print "[+] Success"
fpout.write(url)
return 1
except urllib2.HTTPError,e:
print "[-] " + str(e.code)
return 0
except urllib2.URLError,e:
print "[-] " + str(e.reason)
return 0
#为了使用多线程,将原来写在主函数中的判断语句写进一个新函数(不这么做的话,返回值不好处理)
def newUrlSend(newurl):
pattern = re.compile("<title>(.*?)</title>")
if urlSend(newurl):
print "[+] " + newurl.strip("\n") + " OPEN"
req = urllib2.Request(url = newurl,headers = headers)
res = urllib2.urlopen(req)
f = pattern.search(res.read())
if f:
fpres.write(newurl.strip("\n") +" title:"+ f.group(1) + "\n")
print "[+] title is:" + f.group(1) + "\n"
else:
print "[-] No title \n"
fpres.write(newurl + "\n")
else:
print "[-] "+ newurl.strip("\n") + " CLOSE\n"
def main():
fpurl = open("URLlist.txt","r")
fpdic = open("dictionary.txt","r")
#——————————第一处优化————————————————————————————————————
for line in fpurl.readlines():
print "[*] Linking " + line.strip("\n") + " now..."
t = threading.Thread(target = urlSend,args = (line,))
threads1.append(t)
t.setDaemon(True)
t.start()
for t in threads1:
t.join()
#————————————————————————————————————————————————————————————
print "--------------------------Scan Start--------------------------------------"
fpurl.close()
fpout.seek(0,0)
for line in fpout.readlines():
for item in fpdic.readlines():
#——————————————————第二处优化——————————————————————————
newurl = re.sub("/\w+\.php","/"+item,line)
t = threading.Thread(target = newUrlSend,args = (newurl,))
threads2.append(t)
t.start()
fpdic.seek(0,0)
for t in threads2:
t.join()
#——————————————————————————————————————————————————
if __name__ == "__main__":
main()
这样就快多啦!