高效 Python 编程：你必须掌握的核心数据结构-CSDN博客

本文链接：https://blog.csdn.net/Xianxiancq/article/details/147756096

为什么选择合适的数据结构如此重要？

在 Python 编程中，选择正确的数据结构可以帮助你编写更易维护的代码，甚至会改变你解决问题的方式。

Python 以其灵活性和可读性，成为各领域开发者最受欢迎的编程语言之一。然而，编写高效 Python 代码的关键之一，是理解并正确使用适合你场景的数据结构。

Python Data Structures Every Programmer Should Know

本文将带你探索每个 Python 开发者都应该掌握的重要数据结构——涵盖内建类型以及标准库中的结构。让我们开始吧！

🔗 代码链接

什么是数据结构？

在深入具体实现之前，让我们先了解什么是数据结构。简单来说，数据结构是用来组织、处理、检索和存储数据的专用格式。你可以把它们看作各种不同的“容器”，每种容器都有独特的特性，使其适用于特定任务。

选择合适的数据结构能提升程序的效率和可读性；反之，选错结构会导致程序缓慢、占用大量内存且难以维护。

Python 内建数据结构

Python 拥有多种内建数据结构，帮助你高效地存储、管理和操作数据。了解它们的使用场景，是编写简洁高效代码的基础。

我们将依次介绍以下基础结构：

列表（有序、可变）
元组（有序、不可变）
字典（键值映射）
集合（无序、唯一元素）

列表：有序、可变的集合

列表是 Python 中简单而实用的数据结构。它们可以容纳任意类型的对象，适合需要修改顺序（如添加、删除或排序元素）的场景。

tasks = ["write report", "send email", "attend meeting"]
tasks.append("review pull request")        # 在末尾添加任务
tasks.insert(1, "check calendar")          # 在位置1插入任务
completed_task = tasks.pop(2)              # 移除并返回索引2的任务

print("Tasks left:", tasks)
print("Completed:", completed_task)

输出：

Tasks left: ['write report', 'check calendar', 'attend meeting', 'review pull request']
Completed: send email

我们通过添加、插入和移除操作动态管理任务列表。

适用场景：需要频繁更新的有序数据，如队列、购物车、日志等。

元组：有序、不可变的集合

元组与列表类似，但它们一经创建内容不可更改。适用于存储固定项的集合。

coordinates = (37.7749, -122.4194)
print(f"Latitude: {coordinates[0]}, Longitude: {coordinates[1]}")

输出：

Latitude: 37.7749, Longitude: -122.4194

返回最小值和最大值的元组示例：

def min_max(numbers):
    return (min(numbers), max(numbers))

print(min_max([3, 7, 1, 9]))

输出：

(1, 9)

适用场景：需要确保数据不被更改，或函数返回多个值时。

字典：键值映射

字典允许你将键与值关联，并能快速访问。键必须唯一且不可变。

user = {
    "name": "Alice",
    "email": "alice@example.com",
    "is_active": True
}
user["is_active"] = False  # 更新值
print(f"User {user['name']} is active: {user['is_active']}")

输出：

User Alice is active: False

单词计数示例：

def word_count(text):
    counts = {}
    for word in text.lower().split():
        counts[word] = counts.get(word, 0) + 1
    return counts

print(word_count("Python is powerful and Python is fast"))

输出：

{'python': 2, 'is': 2, 'powerful': 1, 'and': 1, 'fast': 1}

适用场景：计数器、查找表、缓存、存储类似对象的数据。

集合：无序、唯一元素

集合是一组唯一元素的集合。可用来快速判断成员关系，以及执行并集、交集等集合操作。

python_devs = {"Alice", "Bob", "Charlie"}
javascript_devs = {"Alice", "Eve", "Dave"}

both = python_devs & javascript_devs           # 交集
either = python_devs | javascript_devs         # 并集
only_python = python_devs - javascript_devs    # 差集

print("Knows both:", both)
print("Knows either:", either)
print("Knows only Python:", only_python)

输出：

Knows both: {'Alice'}
Knows either: {'Bob', 'Charlie', 'Eve', 'Dave', 'Alice'}
Knows only Python: {'Bob', 'Charlie'}

去重邮件地址：

emails = ["a@example.com", "b@example.com", "a@example.com"]
unique_emails = set(emails)
print(unique_emails)

输出：

{'b@example.com', 'a@example.com'}

适用场景：去重、成员检查、集合代数（如筛选、比较等）。

Python 标准库数据结构

Python 标准库中还包含许多扩展型数据结构，进一步提升内建类型的功能。它们是为常见编程需求而设计，使代码更快、更简洁、更高效。

下面介绍几个常用的标准库数据结构：collections 和 heapq 模块。

collections.deque：双端队列

deque（发音为“deck”）是一种双端队列，适用于需要在两端快速添加和删除元素的场景。与列表不同，列表在开头插入或删除元素的操作是 O(n)，而 deque 始终是 O(1)。

适用场景：

构建任务队列（如打印机任务）
实现滑动窗口算法
广度优先搜索（BFS）
滚动缓冲区（追踪最近 N 次事务）

不适用场景：

需要随机访问元素（如快速定位第100个元素）
极度追求最小内存占用

示例：

from collections import deque

# 初始化队列
tasks = deque(["email client", "compile report", "team meeting"])

# 将紧急任务添加到左侧
tasks.appendleft("fix production issue")

# 低优先级任务添加到末尾
tasks.append("update documentation")

# 处理任务
next_task = tasks.popleft()  # 处理“fix production issue”
later_task = tasks.pop()     # 处理“update documentation”

print(tasks)

输出：

deque(['email client', 'compile report', 'team meeting'])

collections.defaultdict：带默认值的字典

defaultdict 类似普通字典，但能自动为不存在的键提供默认值，无需手动判断。

适用场景：

自动分组（如按文件扩展名归类文件）
计数（如统计每个用户的 API 调用次数）
构建图结构（如邻接表）
累加数据（如自动创建列表、集合或计数器）

不适用场景：

希望缺失键抛出异常以便发现错误

示例：

from collections import defaultdict
# 按部门分组员工
employees = [
    ("HR", "Alice"),
    ("Engineering", "Bob"),
    ("HR", "Carol"),
    ("Engineering", "Dave"),
    ("Sales", "Eve")
]
departments = defaultdict(list)
for dept, name in employees:
    departments[dept].append(name)
print(departments)

输出：

defaultdict(<class 'list'>, {'HR': ['Alice', 'Carol'], 'Engineering': ['Bob', 'Dave'], 'Sales': ['Eve']})

collections.Counter：快速计数工具

Counter 类可以对可哈希对象进行计数，自动追踪元素出现的频率。

适用场景：

日志分析（统计特定事件的发生频率）
查找应用返回的最常见错误码
跟踪资源使用频率（如最常访问的 URL）
执行多重集操作（元素计数的加减）

不适用场景：

只需计数极少项目时，普通字典即可。

示例：

from collections import Counter
# 分析页面访问
page_visits = [
    "/home", "/products", "/about", "/products", "/home", "/contact"
]

visit_counter = Counter(page_visits)
# 最常访问的两个页面
print(visit_counter.most_common(2))
# 添加更多访问记录
visit_counter.update(["/home", "/blog"])
print(visit_counter)

输出：

[('/home', 2), ('/products', 2)]
Counter({'/home': 3, '/products': 2, '/about': 1, '/contact': 1, '/blog': 1})

heapq：高效优先队列

heapq 模块提供堆操作——堆是一种特殊的树结构，最小（或最大）元素总在顶部。支持快速插入和取出，并始终维护堆属性。

适用场景：

构建优先队列（如按紧急程度调度任务）
查找大型数据集中最小/最大 K 个元素
实现算法，如 Dijkstra 最短路径
合并已排序的数据流

不适用场景：

需要快速查找或删除任意元素时。堆仅优化最小/最大元素的访问。

示例：

import heapq
# 以优先级管理任务（数字越小优先级越高）
tasks = [(3, "write report"), (1, "fix critical bug"), (4, "team meeting")]

# 转为堆
heapq.heapify(tasks)

# 添加新任务
heapq.heappush(tasks, (2, "code review"))

# 按优先级处理任务
while tasks:
    priority, task = heapq.heappop(tasks)
    print(f"Processing {task} with priority {priority}")

输出：

Processing fix critical bug with priority 1
Processing code review with priority 2
Processing write report with priority 3
Processing team meeting with priority 4