Python 数据库操作、异常机制、多线程、多进程

最新推荐文章于 2024-07-26 14:20:58 发布

JoyceLiu_Ronghua

最新推荐文章于 2024-07-26 14:20:58 发布

阅读量412

点赞数 3

文章标签：数据库

本文链接：https://blog.csdn.net/JoyceLiu_Ronghua/article/details/140500040

版权

数据库操作

关系型数据库： table ，row， column

SQLite：

基于文件，单文件

import sqlite3

可以链接实体数据库，也可以链接内存数据库

MySQL：

CS结构

KV型（文档型）数据库：collection record/doc field

dbm：

轻量级，键值必须是字符串或字节串

redis：

CS结构，内存型

MongoDB

支持分布式、数据类型多样、适合集群

import pymongo

异常

语法： try...except...[else]...[finally]...

python内置异常 IndexError ValueError......

自定义异常

class ShortInputException(Exceptuion):

GUI编程

PyQt或PySide

基于Qt框架的库，提供了丰富的组件和高级功能，如信号和槽机制

适用于开发复杂的桌面应用程序

多线程

创建线程

import threading

def sub_task(*args):
    start, end = args[0], args[1]
    print("Thread {current_thread.name} started, args:", start, end)
    return sum(range(start, end))

def main():
    subThread = threading.Thread(target=sub_task, args=(1, 10))
    subThread.start()
    result = subThread.join()
    print("Thread", subThread.name, "ended, result:", result)
    print("Main thread ended.")

main()

子线程的返回值无法获得，需要用线程池执行

from concurrent.futures import ThreadPoolExecutor

def thread_function():
    return 42

# 使用线程池执行任务
with ThreadPoolExecutor(max_workers=1) as executor:
    future = executor.submit(thread_function)
    result = future.result()  # 获取返回值
    print(f"线程返回值: {result}")

线程锁

thread.Lock() loca.acquire() lock.release()

死锁

python提供了递归锁Rlock，可以在线程需要访问多个变量或资源时使用，避免死锁发生

线程通信

queue模块

import threading
import queue


def thread_function(q):
    for i in range(10):
        q.put(i*i)
    q.put('EOF')    

# 创建队列
q = queue.Queue()
# 创建并启动线程
thread = threading.Thread(target=thread_function, args=(q,))
thread.start()

# 获取队列中的返回值
while True:
    item = q.get()
    if item == 'EOF':
        break
    print(f"线程返回值: {item}")

# 等待线程完成
thread.join()

多进程

多线程的debug操作起来更麻烦，实践中更多用多进程

import multiprocessing
import random

def computer(n):
    return sum(random.randint(1,10) for _ in range(n))

if __name__ == '__main__':
    pool = multiprocessing.Pool(processes=2)
    print("Results:", pool.map(computer, range(20)))
    result_async = pool.map_async(computer, range(20))


     # 获取结果，设置超时时间
    results = result_async.get(timeout=10)
    for result in results:
        print(result)  # 获取任务结果
    pool.close()  # 关闭进程池，不再接受任务

多进程VS多线程

可以各写一个进行测试

在Linux 用time统计时间 time -v python pythonFile.py

在CPU密集型任务下，多进程更快，效果更好（各种循环处理，计数等）

IO密集型，多线程能有效提高效率（文件处理、网络爬虫等）

网络爬虫

爬虫调度器

URL管理器

防止循环抓取，防止重复抓取

实现方法

python内存 set集合；

Mysql urls(url,is_crawled);

Redis 性能最好，待爬取URL set，已爬取URL set

网页下载器

from urllib import request

import requests
from bs4 import BeautifulSoup

# 目标网页的URL
url = 'http://example.com'

# 使用requests获取网页内容
response = requests.get(url)

# 检查请求是否成功
if response.status_code == 200:
    # 使用BeautifulSoup解析HTML
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # 提取网页的标题
    title = soup.find('title').text
    print('网页标题:', title)
    
    # 根据需要提取其他内容，例如提取所有的链接
    links = soup.find_all('a')
    for link in links:
        print(link.get('href'))
else:
    print('请求失败，状态码:', response.status_code)

用户名密码登录网站

import http.cookiejar
import getpass
import urllib
import urllib.parse
import urllib.request

url='https://passport.xxxx.com/' 
username=input('Enter username: ')
password=getpass.getpass('Enter password: ')

values = {'username':username, 'password':password}
postData = urllib.parse.urlencode(values).encode()

cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))

req = urllib.request.Request(url,postData)
response = opener.open(req)
print(response.read())

网页解析器

正则表达式--模糊匹配

seleninum模拟浏览器需要安装对应浏览器插件

JoyceLiu_Ronghua

关注

3
点赞
踩
10

收藏

觉得还不错? 一键收藏
0
评论
Python 数据库操作、异常机制、多线程、多进程

python提供了递归锁Rlock，可以在线程需要访问多个变量或资源时使用，避免死锁发生。Redis 性能最好，待爬取URL set，已爬取URL set。基于Qt框架的库，提供了丰富的组件和高级功能，如信号和槽机制。在CPU密集型任务下，多进程更快，效果更好（各种循环处理，计数等）可以链接实体数据库，也可以链接内存数据库。IO密集型，多线程能有效提高效率（文件处理、网络爬虫等）关系型数据库： table ，row， column。子线程的返回值无法获得，需要用线程池执行。防止循环抓取，防止重复抓取。
复制链接

扫一扫