python学习笔记之：多线程编程

最新推荐文章于 2024-09-23 14:24:34 发布

Vccxx

最新推荐文章于 2024-09-23 14:24:34 发布

阅读量324

点赞数

分类专栏： python学习笔记文章标签： python 多线程

本文链接：https://blog.csdn.net/qq_29947311/article/details/52263581

版权

python学习笔记专栏收录该内容

4 篇文章 0 订阅

订阅专栏

总觉得之前的网页扫描后台有点慢，今天上网学了多线程编程，记录一下
——————————————分割线——————————————
python的多线程一般使用threading库（还有一个thread库，没用过）
用这个语句创建一个线程：

import threading
t = threading.Thread(target = func,args = (arg1,arg2...))

t就是一个线程，其中：
参数target是指这个线程start（）以后回调的函数
参数args是个n元组，是传入回调函数的参数

一般的多线程创建使用for循环:

threads = []
for i in range(3):
    t = threading.Thread(target = func,args = (arg1,arg2...))
    threads.append(t)
    t.setDaemon(True)
    t.start()

上面的代码创建并几乎同时（一般在同一秒内）启动了3个线程,setDaemon函数的调用，将线程t设置为守护线程，在后台运行（这是必须的）

使用多线程需要注意父线程可能先于子线程结束，为了防止这一点，引入join（）方法

for t in threads:
    t.join()

这段代码加在上面一段代码后面，在添入数组threads中的子线程对应的func没有结束之前，父线程会一直阻塞

使用多线程的情况，一般在发送url请求时使用，如果一个一个的发送，会造成很大的浪费；若一起发送很多请求则可以尽量减少浪费，下面是我用多线程优化后的url后台扫描（还添加了页面重定向的处理，修改前的代码见：http://blog.csdn.net/qq_29947311/article/details/52259632）

import urllib
import urllib2
import re
import threading
#rewrite the redirct class
class RedirctHandler(urllib2.HTTPRedirectHandler):
  """docstring for RedirctHandler"""
  def http_error_301(self, req, fp, code, msg, headers):
    pass
  def http_error_302(self, req, fp, code, msg, headers):
    pass

agent = "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:48.0) Gecko/20100101 Firefox/48.0"
headers = {"user-Agent":agent}
fpout = open("urlok.txt","w+")
fpres = open("result.txt","w")
threads1 = []
threads2 = []

def urlSend(url):
    try:
        req = urllib2.Request(url = url,headers = headers)
        res = urllib2.urlopen(req)
        print "[+] Success"
        fpout.write(url)
        return 1
    except urllib2.HTTPError,e:
        print "[-] " + str(e.code)
        return 0
    except urllib2.URLError,e:
        print "[-] " + str(e.reason)
        return 0
#为了使用多线程，将原来写在主函数中的判断语句写进一个新函数（不这么做的话，返回值不好处理）
def newUrlSend(newurl):
    pattern = re.compile("<title>(.*?)</title>")
    if urlSend(newurl):
        print "[+] " + newurl.strip("\n") + "  OPEN"
        req = urllib2.Request(url = newurl,headers = headers)
        res = urllib2.urlopen(req)
        f = pattern.search(res.read())
        if f:
            fpres.write(newurl.strip("\n") +"  title:"+ f.group(1) + "\n")
            print "[+] title is:" +  f.group(1) + "\n"
        else:
            print "[-] No title \n"
            fpres.write(newurl + "\n")
    else:
        print "[-] "+ newurl.strip("\n") + "  CLOSE\n"

def main():
    fpurl = open("URLlist.txt","r")
    fpdic = open("dictionary.txt","r")
#——————————第一处优化————————————————————————————————————
    for line in fpurl.readlines():
        print "[*] Linking " + line.strip("\n") + " now..."
        t = threading.Thread(target = urlSend,args = (line,))
        threads1.append(t)
        t.setDaemon(True)
        t.start()
    for t in threads1:
        t.join()
#————————————————————————————————————————————————————————————
    print "--------------------------Scan Start--------------------------------------"

    fpurl.close()
    fpout.seek(0,0)

    for line in fpout.readlines():
        for item in fpdic.readlines():
#——————————————————第二处优化——————————————————————————
            newurl = re.sub("/\w+\.php","/"+item,line)
            t = threading.Thread(target = newUrlSend,args = (newurl,))
            threads2.append(t)
            t.start()
            fpdic.seek(0,0)
    for t in threads2:
        t.join()
#——————————————————————————————————————————————————

if __name__ == "__main__":
    main()