Python3：《学习笔记与实战》之多线程（1）读取url 处理数据

最新推荐文章于 2021-09-09 19:49:44 发布

孤云独去闲.com

最新推荐文章于 2021-09-09 19:49:44 发布

阅读量496

点赞数

分类专栏： python3学习笔记 python3基础语法文章标签： python3多线程 python3队列 python3栈

本文链接：https://blog.csdn.net/weixin_41858342/article/details/88048957

版权

python3学习笔记同时被 2 个专栏收录

7 篇文章 0 订阅

订阅专栏

python3基础语法

4 篇文章 0 订阅

订阅专栏

一、了解多线程及进程的相关概念

1.多线程和队列

在python中，多个线程之间的数据是共享的，多个线程进行数据交换的时候，不能够保证数据的安全性和一致性。所以当多个线程需要进行数据交换的时候，队列就出现了，队列可以完美解决线程间的数据交换，保证线程间数据的安全性和一致性。

2.队列、栈、双端队列，堆

python3的队列模块

import queue

from collections import deque

先进先出队列：FIFO队列先进先出，queue.Queue。

后进先出栈：(last in first out，LIFO)栈(stacks)是一种只能通过访问其一端来实现数据存储与检索的线性数据结构。

双端队列：（deque，全名double-ended queue），是一种具有队列和栈的性质的数据结构

堆：二叉堆：是一棵特殊的完全二叉树

二叉树中的所有的父节点的值都不大于/不小于其子节点；

根节点的值必定是所有节点中最小/最大的

将父节点的值不大于子节点且根节点值最小的称为最小堆，反之称为最大堆。

堆是一种高级的数据结构，在python中有相应的模块deapq

多线程一般采用先进先出队列

3.queue模块中的常用方法

queue.qsize() 返回队列的大小

queue.empty() 如果队列为空，返回True,反之False

queue.full() 如果队列满了，返回True,反之False

queue.full 与 maxsize 大小对应

queue.get([block[, timeout]])获取队列，立即取出一个元素， timeout超时时间

queue.put(item[, timeout]]) 写入队列，立即放入一个元素， timeout超时时间

queue.get_nowait() 相当于queue.get(False)

queue.put_nowait(item) 相当于queue.put(item, False)

queue.join() 阻塞调用线程，直到队列中的所有任务被处理掉, 实际上意味着等到队列为空，再执行别的操作

queue.task_done() 在完成一项工作之后，queue.task_done()函数向任务已经完成的队列发送一个信号

4.python3中的线程模块

import threading

创建线程：t = threading.Thread(target=get_info)

function - 线程函数。
args - 传递给线程函数的参数,他必须是个tuple类型。
kwargs - 可选参数。
线程的完整过程

       startTime = time.time()
        #print(startTime)
        threads = []
        #可以调节线程数， 进而控制接口访问速度 指定线程数目
        threadNum = 30
            
        #创建线程 并添加到线程表
        for i in range(0, threadNum):
            t = threading.Thread(target=get_info)
            threads.append(t)

        #开启线程
        for t in threads:
            t.start()

        #等待所有线程完成
        #多线程多join的情况下，依次执行各线程的join方法, 这样可以确保主线程最后退出
        for t in threads:
            t.join()#且各个线程间没有阻塞
        endTime = time.time()

5.多线程的几个简单概念

每个独立的线程有一个程序运行的入口、顺序执行序列和程序的出口
但是线程不能够独立执行，必须依存在应用程序中，由应用程序提供多个线程执行控制。
每个线程都有他自己的一组CPU寄存器，称为线程的上下文，该上下文反映了线程上次运行该线程的CPU寄存器的状态。
指令指针和堆栈指针寄存器：是线程上下文中两个最重要的寄存器，线程总是在进程的上下文中运行的，这些地址都用于标志拥有线程的进程地址空间中的内存。
在 Python3 中不能再使用”thread” 模块。为了兼容性，Python3 将 thread 重命名为 “_thread”。

6.多线程的常用方法

threading.currentThread(): 返回当前的线程变量。
threading.enumerate():
返回一个包含正在运行的线程的list。正在运行指线程启动后、结束前，不包括启动前和终止后的线程。

threading.activeCount():
返回正在运行的线程数量，与len(threading.enumerate())有相同的结果。

除了使用方法外，线程模块同样提供了Thread类来处理线程，Thread类提供了以下方法:

run(): 用以表示线程活动的方法。
start():启动线程活动。
join([time]): 等待至线程中止。这阻塞调用线程直至线程的join()
方法被调用中止-正常退出或者抛出未处理的异常-或者是可选的超时发生。
isAlive(): 返回线程是否活动的。
getName(): 返回线程名。
setName(): 设置线程名。

import math
import requests
from random import randrange
import random
import time
import traceback
import sys
import json
import numpy as np
import pymysql

from time import ctime, sleep
import threading
import collections,queue
from collections import deque

#stack是堆栈，没有迭代器，特点是后进先出
#deque是双端队列，支持迭代器              
#queue是队列，特点是先进先出，不支持迭代器

conn=pymysql.connect("192.168.1.1","user","yuanye","data",charset='utf8')

#教育评分
        
def get_info():
    
    while True:
        try:
            i = urlQueue.qsize()
            if (i<1):
                break
            #不阻塞的读取队列数据
            info = urlQueue.get_nowait().split('$')
            url =info[0]
            id=int(info[1])
            #id = idQueue.get_nowait()
           
        except Exception as e:
               break
        
        try:
            res= requests.get(url)
            #print(res.text)       
            get_distance=get_education(res.text,id)
        except:
            print("\n获取JSON信息出错\n")
            print(traceback.print_exc())
            return 0
        
def get_education(info,id):
    
    try:
        
        pri_school=0.0
        mid_school=0.0
        university_score=0.0
        result_score=0.0
        info = json.loads(info)
        
        for i in info["ext"]:
            if(i["parent_id"]==7):
                #print("this is if")
                if(i["class_id"]==22):
                    #print("this is university")
                    score=round(education_score(i["class_id"],i["level"],i["distance"]),1)
                    university_score+=score
                    #print(university_score)
                elif(i["class_id"]==21):
                    #print("this is the middle school")
                    score=round(education_score(i["class_id"],i["level"],i["distance"]),1)
                    mid_school+=score
                else:
                    #print("this is primary school")
                    score=round(education_score(i["class_id"],i["level"],i["distance"]),1)
                    pri_school+=score
            else:
                continue
                
            #对大学进行标准化 通过对四川大学的抽查 选定10分标准化 ID【1:20000】；小于5分不做标准化；
            #因此选定10分作为离差标准化的最大值 最小值为0；计算公式为：（X-MIN）/（MAX-MIN）
            
            if(university_score>5.0):
                university_score=(university_score/8)*5
                
            #对中学进行标准化 选定8分作为标准化最大值
            if(mid_school>4):
                mid_school=(mid_school/8)*5
             
            #对result_score 进行标准化
            result_score=((university_score+ mid_school+pri_school)/7)*5
            if(result_score>5.0):
                result_score=5.0
            #截止2018-8-11 优化到的最好参数为 【5,4,3.5,0.7,0.3】；
            #result_score，super_score，lower_score【8,8,18】
        #print(result_score)
       
        update_info(id,round(result_score,1))
        
    except:
        print("\nJSON信息循环出错\n")
        print(traceback.print_exc())
        
def education_score(class_id,level,distance):
    y=0.0;
    class_id=int(class_id)
    level=int(level)
    dict_zx={'1':2.0,'2':1.8,'3':1.4,'4':1.1}
    dict_dx={'1':5,'2':4.5,'3':4.2,'4':3.6,'5':3.2}
    
    distance=0.0025*(int(distance)-1200)
    sigmoid = 1/(1+np.exp(distance))
    #print(sigmoid)
        
    try:

        if(class_id==19):
            y=0.2*1 #幼儿园评分

        elif(class_id==20):    
            y=0.5*1 #小学评分

        elif(class_id==21):#中学评分
            if(level==1):
                y=dict_zx["1"]
            elif(level==2):
                y=dict_zx["2"]
            elif(level==3):
                y=dict_zx["3"]
            elif(level==4):
                y=dict_zx["4"]

        elif(class_id==22):#大学评分
            if(level==1):
                y=dict_dx["1"]
            elif(level==2):
                y=dict_dx["2"]
            elif(level==3):
                y=dict_dx["3"]
            elif(level==4):
                y=dict_dx["4"]
            elif(level==5):
                y=dict_dx["5"]
        
        return y*sigmoid
    except:
        print("计算教育评分失败")
        print(traceback.print_exc())
        return 0


def update_info(id,score):
        conn_up=pymysql.connect("192.168.1.1","user","yuanye","data",charset='utf8')
        cursor = conn_up.cursor() 
        sql = "UPDATE `store_count_street_point` SET education_score =%f where id=%d" %(score,id)
        try:
            cursor.execute(sql)
            conn_up.commit()
            cursor.close()
            conn_up.close()
        except:
            print("\n更新1 状态出错\n")
            print (traceback.print_exc())
            return 0
        
if __name__ == '__main__':
    cursor = conn.cursor()
    sql = "SELECT * FROM store_count_street_point where is_street=1  "
    cursor.execute(sql)
    results = cursor.fetchall()
    conn.commit()
    cursor.close()
    
    #print(results)
    url_first ='http://192.168.1.1:80/data/Agent? 
    method=get_inner_info&password=yyyyyy&geo='
    token ='&dis=1000&parent_id=7'
    #创建多线程 可访问的url队列
    urlQueue = queue.Queue()
    
    try:
        for data in results:
            value=data[2]
            url_last=  url_first + value +  token
            #把url加入队列里
            urlQueue.put(url_last+'$'+str(data[0]))
            #print(url_last)
            
        startTime = time.time()
        #print(startTime)
        threads = []
        # 可以调节线程数， 进而控制接口访问速度 指定线程数目
        threadNum = 30
            
        #创建线程 并添加到线程表
        for i in range(0, threadNum):
            t = threading.Thread(target=get_info)
            threads.append(t)

        #开启线程
        for t in threads:
            t.start()

        #等待所有线程完成
        for t in threads:
        #多线程多join的情况下，依次执行各线程的join方法, 这样可以确保主线程最后退出
        #且各个线程间没有阻塞
            t.join()
        endTime = time.time()
        print ('Done, Time cost: %s ' % (endTime - startTime))
        print("\n完成教育评分计算\n")
    except:
        print("\n未完成教育评分计算\n")
        print(traceback.print_exc())
        
        
#————————————————————多线程程序————————————————————————
#多线程概念
#1.互斥锁来限制线程对共享资源的访问  同步阻塞
#2.守护线程

孤云独去闲.com

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python3：《学习笔记与实战》之多线程（1）读取url 处理数据

一、了解多线程及进程的相关概念1.多线程和队列在python中，多个线程之间的数据是共享的，多个线程进行数据交换的时候，不能够保证数据的安全性和一致性。所以当多个线程需要进行数据交换的时候，队列就出现了，队列可以完美解决线程间的数据交换，保证线程间数据的安全性和一致性。2.队列、栈、双端队列，堆 python3的队列模块 import queue...
复制链接

扫一扫