django查询性能优化之路

野生码农一灯

于 2024-01-28 20:56:45 发布

阅读量1.1k

点赞数 30

分类专栏： django 文章标签： django python

本文链接：https://blog.csdn.net/2401_82782706/article/details/135900379

版权

django 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

性能优化是程序开发中必不可少的环节。理论上，一开始程序员就应该写性能最优的代码。现实中受限于经验、项目进度等因素制约，总有一些问题在暴露后方能解决。

本次复盘仅针对查询，涉及到：

减少不必要的IO（只加载有需要的字段及用时才加载）
消灭查询N+1
减少代码层面的运算

1、减少不必要的`IO`：延迟查询`defer`

defer的宗旨是：用的时候才加载，下面是一个简单的博客列表页示例：

models.py

class Blog(models.Model):
    title = models.CharField()
    content = models.TextField()
    is_special = BoolenField(default=False)

views.py

# 使用简单演示 实际上需要分页
class BlogListView(View):
    def get(self, request):
        blogs = Blog.objects.defer('content')
        return render(request, 'list.html')

list.html

{% for blog in blogs %}
	<!--需要时才读取content的内容-->
	{% if blog.is_special %}
	{{ blog.content }}
	{% endif %}
	{{ blog.title }}
{% endfor %}

list.html是一个博客列表页，只显示标题列表，blogs = Blog.objects.defer('content')，声明了整个查询集不会立刻读取content字段，只读取title和is_special的字段内容，需要时才读取某条记录的content内容。

相比之下，如果修改视图.all()，会发生什么事

blogs = Blog.objects.all()

当访问到达时，会把所有的文章内容都读取并放到内存中，实际上content字段我们是用不到的，或者有有条件地用。这无疑增加了服务器的开销，让用户多等了一会。即便是在分页的情况下，也不应这么写。

defer的特点是，被声明的字段不会加载，用到某条记录的content，也仅仅是加载该记录的content，极大地减少了数据库的IO和服务器的内存开销，提高了响应速度。

2、减少不必要的`IO`：`only`有限查询

defer和only是互补的，依然是上面的示例和需求：

views.py

# 使用简单演示 实际上需要分页
class BlogListView(View):
    def get(self, request):
        blogs = Blog.objects.only('title')
        return render(request, 'list.html')

它只立刻读取title字段，其他的字段会使用延迟加载的策略，即除了title，其他的都用时才加载。

不管是defer和only，返回的始终是QuerySet，在filter后使用defer或only也是可以的，如：

blogs = Blog.objects.filter(is_special=True).only('title')

3、减少代码层面的运算：`values`/`values_list`

依然使用上面的示例，现在前端要通过js发起的博客列表的请求并渲染到页面中，通常的做法是：

views.py

class BlogListView(View):
    def get(self, request):
        blogs = Blog.objects.only('title')
        data = []
        for blog in blogs:
            data.append({'title': blog.title})
        return JsonResponse({'data': data})

这么做不会有错，不过有更好的方式可以实现，可以省去python代码层面的开销，提高性能：

class BlogListView(View):
    def get(self, request):
        blogs = Blog.objects.values('title')
        return JsonResponse({'data': list(blogs)})

values只会读取指定字段，返回一个包含类字典的QuerySet，如：

<QuerySet [{'name': '春天的故事'}, {'name': 'Python入门与精通'}, {'name': 'java企业级实战项目'}}]>

查询集是一个可迭代对象，使用list即可转换成字典列表响应给前端。

values_list作用类似，单个元素是一个只包含指定字段的值的元组。

4、消灭查询`N+1`：跨关系查询`select_related`

把上面的Blog模型再丰富一下，增加作者的模型，并添加外键关系：

models.py

class Author(models.Model):
    author = models.CharField()

class Blog(models.Model):
    title = models.CharField()
    content = models.TextField()
    is_special = BoolenField(default=False)
    author = models.ForeignKeyField(Author, models.CASCADE)

在博客列表页中，除了显示博客标题外，还需要显示博客的作者，示例：

不正确的做法：

views.py

class BlogListView(View):
    def get(self, request):
        blogs = Blog.objects.all()
        data = []
        for blog in blogs:
            # for 循环中，blog.author.author反复读取Author表，如果作者相同会造成重复读取同一条记录
            data.append({'title': blog.title, 'author': blog.author.author})
        return JsonResponse({'data': data})

入门django时，经常踩这个坑，后来意识到会造成反复读取同一条数据时做了功夫，我记得大概是这样的：

class BlogListView(View):
    def get(self, request):
        blogs = Blog.objects.all()
        authors = []
        data = []
        author_index = 0
        for blog in blogs:
            author = None
            # for 循环中，blog.author.author反复读取Author表，如果作者相同会造成重复读取同一条记录
            if blog.author not in authors:
                authors.append(blog.author)
                author = blog.author.author
                author_index += 1
            else:
                author = authors[authors.index(blog.author)].author
            data.append({'title': blog.title, 'author': author})
        return JsonResponse({'data': data})

虽然没有N+1的问题了，但多了循环和条件分支，增加了代码运算的开销。django有便捷的查询接口，通常为：

class BlogListView(View):
    def get(self, request):
        # 一次性将author表相关记录读取出来
        blogs = Blog.objects.select_related('author').all()
        data = []
        for blog in blogs:
            # for 循环中，不会反复读取数据库
            data.append({'title': blog.title, 'author': blog.author.author})
        return JsonResponse({'data': data})

select_related('author')使得执行sql时会通过一次查询，把涉及到的所有外键Author记录全部拿出来放到内存中，避免在循环时反复向数据库，导致增加额外的IO，这就是常说的避免查询时N+1的问题，其中1是我们想要的查询，但产生了N次额外的不必要的查询。

注：

1、因为要预取数据，所以select_related和defer/only是天生相克的，并不能放到一起使用；

2、select_related适用于外键和一对一的简单查询，它也可以不指定外键，置空时会提前读取所有外键的所关联的数据，本质上，select_related是将表达式构建成一个完整的sql表达式并在数据库层面进行查询，这一点和下面要说的prefetch_related稍有不同。

5、跨表查询`prefetch_related`消灭`N+1`

该方法通常用于跨多对多查询（其实外键也可以），为方便演示，再次扩展模型，需求和上面的一样：

class Author(models.Model):
    author = models.CharField()
    
class Tags(models.Model):
    tag = models.CharField()

class Blog(models.Model):
    title = models.CharField()
    content = models.TextField()
    is_special = BoolenField(default=False)
    author = models.ForeignKeyField(Author, models.CASCADE)
    tags = models.ManyToManyField(Tags)

查询时产生典型的N+1问题如下：

blogs = Blog.objects.all()

for blog in blogs:
    tags = []
    tags.append(blog.tags.all())
    print(f'{blog}的标签有{tags}')

遍历过程中，执行blog.tags.all()的后续会产生大量的查询，每一篇博客的标签都有可能相同，而每一次都要执行查询，浪费了资源。

推荐用法：

blogs = Blog.objects.prefetch_related('tags')

prefetch_related返回的依然是QuerySet，而且，它还支持JOIN语法，为此，做增加一个模型并和Tags模型关联：

class Color(models.Model):
    color = models.CharField()

class Tags(models.Model):
    tag = models.CharField()
    colors = models.ManyToManyField()

blogs = Blog.objects.prefetch_related('tags__color')

在上面的prefetch_related中，不仅一次查询了Tags相关记录，而且还查询了Color的相关记录，实现了一次查询到两个表的相关记录。

如果不使用django提供的这些接口来进行查询，要想达到相同的效果，python层面的代码量会成倍增加，况且python代码的执行要逊于数据库层面的操作。

注

1、prefetch_related能和defer/only一起使用，也能和select_related一起使用，以获得更佳的性能；

2、prefetch_related也能用于外键查询。

6、跨表查询`Prefetch`消灭`N+1`

Prefetch是一个类，它结合prefetch_related能实现更精细的查询控制，以获取更优的查询性能，

示例一：

需求：获取所有Author的同时，查询每个Author的Blog，将结果保存到一个变量中并将该变量变成Author实例的一个属性：

from django.db.models import Prefetch

authors = Author.objects.prefetch_related(
	Prefetch('blog_set', queryset=Blog.objects.only('id', 'title'), to_attr='blogs')
)
for author in authors:
    # 已预加载
    print(author.name)
    # 已预加载 额外的属性
    print(author.blogs)

Prefetch实例能为prefetch_related提供更为精细的查询控制，参数含义：

blog_set 在查询Author时，同时去查询关联表Blog
queryset 为查询Blog时提供更多的条件控制，比如使用filter=(status=True)
to_attr 为每个Author查询到Blog记录后将数据保存到临时变量blogs中并绑定到该author实例中

示例二（使用新的模型）：

class Author(models.Model):
    name = models.CharField(max_length=100)

class Publisher(models.Model):
    name = models.CharField(max_length=300)

class Book(models.Model):
    name = models.CharField(max_length=300)
    authors = models.ManyToManyField(Author)
    publisher = models.ForeignKey(Publisher, on_delete=models.CASCADE)

需求：查询所有书籍的同时，预加载出版社信息，并预加载每本书的所有关联作者：

books = Book.objects.select_related('publisher').prefetch_related(
	Prefetch('author', queryset=Author.objects.only('name', 'id'))
)
for book in books:
    # 已预加载 不会有N+1的问题
    print(book.publisher)
    # 已预加载 不会有N+1的问题
    print(book.authors)

当然了，下面的方法也能实现：

books = Book.objects.select_related('publisher').prefetch_related('authors')

不过，缺点也明显，prefetch_related无法对Author的查询集做延迟加载，会加载不需要的字段，故此，最上面的查询语句可以控制得精准：

books = Book.objects\
	# 改用prefetch_related实现
	.prefetch_related(Prefetch('publisher', queryset=Publisher.objects.only('id', 'name')))\
	.prefetch_related(Prefetch('authors', queryset=Author.objects.only('name', 'id')))

7、使用`annotate`聚合函数减少代码层面的运算

在第6点示例中：

authors = Author.objects.prefetch_related(
	Prefetch('blog_set', queryset=Blog.objects.only('id', 'title'), to_attr='blogs')
)

其中to_attr能为示例绑定新的属性，使用annotate也能为查询集中的每个实例绑定新的属性，它是基于数据库运算的结果，性能要比在代码层面去实现要好，

示例需求：获取所有作者，并为每个作者实例统计其书本的数量：

authors = Author.objects.annotate(book_count=Count('book'))
for author in authors:
    # 当前作者书本的总量
    print(author.book_count)

示例需求：获取所有作者，并为每个作者实例统计其书本涉及到的出版社的数量：

authors = Author.objects.annotate(publisher_count=Count('book__publisher'))
for author in authors:
    # 当前作者书本所涉及到出版社的总量
    print(author.publisher_count)

注：

聚合函数的用法远不止于此，能使用聚合函数实现的，就不要使用自己的代码来实现，因为聚合函数是运行在数据库层面，性能高于自己使用代码实现的逻辑。

业务环境中的查询远比这里的示例要复杂，尤其是与数据统计相关的查询，只有多使用，多理解多优化才能变得随手拈来。

总结

1、总是优先使用django提供的查询接口，实现复杂的查询需求；

2、日常使用中，要让消灭N+1查询成为习惯，而不是成为优化性能的手段；

3、使用聚合函数来替代一些可替代的运算场景以减少开销提升效率。

野生码农一灯

关注

30
点赞
踩
14

收藏

觉得还不错? 一键收藏
0
评论
django查询性能优化之路

1、总是优先使用django提供的查询接口，实现复杂的查询需求；2、日常使用中，要让消灭N+1查询成为习惯，而不是成为优化性能的手段；3、使用聚合函数来替代一些可替代的运算场景以减少开销提升效率。
复制链接

扫一扫