django ORM使用技巧，提高性能

最新推荐文章于 2021-02-17 19:11:45 发布

C丶卡萝尔

最新推荐文章于 2021-02-17 19:11:45 发布

阅读量870

点赞数 1

文章标签： django

本文链接：https://blog.csdn.net/u012790802/article/details/105199156

版权

关于queryset的注意事项

运行环境
关于切片
关于遍历
- 缓存QuerySet的查询结果（默认自动缓存）
- 不缓存QuerySet的查询（了解）
关联查询中的select_related和prefetch_related
- select_related
- prefetch_related
注意

运行环境

python 3.7.5
django 2.2.7

关于切片

data = User.objects.all()

# 写法一
result1 = list(data)[:3]
# 写法二
result2 = list(data[:3])

结论：使用写法二替代写法一

原因：写法一会将所有的数据查出来，再进行切片；而写法二只会取切片的数据

具体到查询的SQL语句：
写法一：select xxxx from user;
写法二：select xxxx from user limit 3;

如何查看sql语句

# result1，result2都必须为QuerySet类型
print(result1.query)
print(result1.query)

关于遍历

缓存QuerySet的查询结果（默认自动缓存）

data = User.objects.all()

for d in data:
	print(d.username)

for d in data:
	print(d.username)

username = data[0].username

上面对data进行了两次循环，再取了第一个数据的username，但是只会查询一次数据库。

如果数据集在后面还会用到的话，第一行的写法是不错的，原因如下：

QuerySet类实现了__iter__方法，所以在遍历QuerySet时会调用它的__iter__方法，这个方法有什么作用，来看一下解释

如果一个类想被用于for … in循环，类似list或tuple那样，就必须实现一个__iter__()方法，该方法返回一个迭代对象，然后，Python的for循环就会不断调用该迭代对象的next()方法拿到循环的下一个值，直到遇到StopIteration错误时退出循环。
　
引用自：https://www.cnblogs.com/shiyublog/p/10919984.html

def __iter__(self):
	"""
	The queryset iterator protocol uses three nested iterators in the
	default case:
		1. sql.compiler.execute_sql()
		   - Returns 100 rows at time (constants.GET_ITERATOR_CHUNK_SIZE)
			 using cursor.fetchmany(). This part is responsible for
			 doing some column masking, and returning the rows in chunks.
		2. sql.compiler.results_iter()
		   - Returns one row at time. At this point the rows are still just
			 tuples. In some cases the return values are converted to
			 Python values at this location.
		3. self.iterator()
		   - Responsible for turning the rows into model objects.
	"""
	self._fetch_all()
	return iter(self._result_cache)

而__iter__又会调用_fetch_all。第一次进来的时候self._result_cache为None，所以执行数据库查询。

然后把查询结果给了self._result_cache缓存起来，第二次又是遍历的data，此时self._result_cache不为None，就不会执行数据库查询，而是直接从里面取值。

def _fetch_all(self):
	if self._result_cache is None:
		self._result_cache = list(self._iterable_class(self))
	if self._prefetch_related_lookups and not self._prefetch_done:
		self._prefetch_related_objects()

这段代码又是怎么回事呢？

username = data[0].username

因为data的类型是QuerySet，QuerySet类实现了__getitem__，这个方法有什么用呢，来看一下它的解释

凡是在类中定义了这个__getitem__ 方法，那么它的实例对象（假定为p），可以像这样p[key] 取值，当实例对象做p[key] 运算时，会调用类中的方法__getitem__。
　
引用自：https://zhuanlan.zhihu.com/p/27661382

来看看QuerySet怎么实现__getitem__的，我们可以看到第二个if判断，是不是豁然开朗了？

def __getitem__(self, k):
	"""Retrieve an item or slice from the set of results."""
	if not isinstance(k, (int, slice)):
		raise TypeError
	assert ((not isinstance(k, slice) and (k >= 0)) or
			(isinstance(k, slice) and (k.start is None or k.start >= 0) and
			 (k.stop is None or k.stop >= 0))), \
		"Negative indexing is not supported."

	if self._result_cache is not None:
		return self._result_cache[k]

	# 省略后面代码
	......

不缓存QuerySet的查询（了解）

QuerySet查询到的数据缓存是放在内存中的，它的好处是遍历快，不用再次进行数据库连接、查询。它的缺点是如果数据量太大，全部缓存到内存中，会很占空间，一旦请求过多，内存会吃不消。那如果我只遍历一次，后面不会用到，那么应该如何操作呢？

for u in User.objects.all().iterator():
	print(u.username)

使用QuerySet的iterator方法，使用的是生成器的方式获取数据，然后不缓存数据，这个方法定义在django/db/models/query.py中，有兴趣的可以去看一下。

有的人可能会想，我可不可以只缓存一部分数据呢，比如我遍历了data[0]，data[2], data[10]，那就只缓存这三个数据，其余的不缓存。

很可惜，答案是不能。为什么，我们来看一下。

data = User.objects.all()
username = data[0].username

不经过遍历的data直接取第一个，上面我们说过了data[0]会调用__getitem__。在之前的讲解中，我们把__getitem__中部分代码省略了，现在我们把完整的代码贴出来。

def __getitem__(self, k):
	"""Retrieve an item or slice from the set of results."""
	if not isinstance(k, (int, slice)):
		raise TypeError
	assert ((not isinstance(k, slice) and (k >= 0)) or
			(isinstance(k, slice) and (k.start is None or k.start >= 0) and
			 (k.stop is None or k.stop >= 0))), \
		"Negative indexing is not supported."

	if self._result_cache is not None:
		return self._result_cache[k]

	if isinstance(k, slice):
		qs = self._chain()
		if k.start is not None:
			start = int(k.start)
		else:
			start = None
		if k.stop is not None:
			stop = int(k.stop)
		else:
			stop = None
		qs.query.set_limits(start, stop)
		return list(qs)[::k.step] if k.step else qs

	qs = self._chain()
	qs.query.set_limits(k, k + 1)
	qs._fetch_all()
	return qs._result_cache[0]

第一次进入这个方法self._result_cache是没有值的
我们使用的是data[0]，所以这个k的值就为0，并且是int类型。当我们使用data[:3]时，k为(None, 3, None)，是slice类型

所以前面的三个if判断都不会进去，而是直接走最后的几步代码

qs = self._chain()
qs.query.set_limits(k, k + 1)
qs._fetch_all()
return qs._result_cache[0]

self._chain()是干嘛的，点进去看看，原来是复制一份当前的QuerySet。这个函数用的地方很多，在使用这个函数的地方，当前QuerySet就会被deepcopy一份，即使在后面使用过程中进行了缓存，也对原来的QuerySet不起作用（重点）。

def _chain(self, **kwargs):
    """
    Return a copy of the current QuerySet that's ready for another
    operation.
    """
    obj = self._clone()
    if obj._sticky_filter:
        obj.query.filter_is_sticky = True
        obj._sticky_filter = False
    obj.__dict__.update(kwargs)
    return obj

再来看self._clone()，深复制一份QuerySet，也就是说我们查询数据用的的QuerySet和当前QuerySet（变量data）没有关系了，所以也就不会在当前QuerySet上缓存数据了。自然而然，被复制出来的QuerySet取数据就会查询数据库（重点！重点！重点！）。

def _clone(self):
	"""
	Return a copy of the current QuerySet. A lightweight alternative
	to deepcopy().
	"""
	c = self.__class__(model=self.model, query=self.query.chain(), using=self._db, hints=self._hints)
	c._sticky_filter = self._sticky_filter
	c._for_write = self._for_write
	c._prefetch_related_lookups = self._prefetch_related_lookups[:]
	c._known_related_objects = self._known_related_objects
	c._iterable_class = self._iterable_class
	c._fields = self._fields
	return c

关联查询中的select_related和prefetch_related

这里我新建了一个部门类叫Department，另一个User类使用的是django自带的User。

Department有三个字段，分别是：部门名称（name），部门中的人（user，多对多），部门中的一个人（one_user，外键），只是为了测试，不要纠结为什么又有user又有one_user，关注字段类型就好了。

from django.db import models
from django.contrib.auth.models import User


class Department(models.Model):
    name = models.CharField(null=True, max_length=20)
    user = models.ManyToManyField(User)
    # 不用纠结on_delete字段，我随便给的一个值，django升级到2.0之后,表与表之间关联的时候,必须要写on_delete参数,否则会报异常
    one_user = models.ForeignKey(User, on_delete=models.SET_NULL, null=True, related_name='one_user')

select_related

主要是针对model中的一对一，ForeignKey字段优化。

来看例子，取出每个Department的one_user的名字，看看这两个写法各自查询了多少次数据库：

第一种写法
```
depts1 = Department.objects.all()
for d in depts1:
	if d.one_user:
    	print(d.one_user.username)
```
首先用 select xxx from department 从department表中，查出了所有部门
　
如果这个部门的one_user字段有值，那么关联department表和one_user。查询语句为：select xxx from user where user.id=xxx
　
所以总的查询次数为：1+N，N表示d.one_user有值的个数
第二种写法
```
depts2 = Department.objects.select_related('one_user').all()
for d in depts2:
	if d.one_user:
    	print(d.one_user.username)
```
只用了一次外连接查询，就将主表和关联表的字段取出来了
　
select xxx from department left outer join user on (department.one_user_id = user.id)

prefetch_related

主要是针对model中的多对多字段优化
例子：

写法一
```
depts1 = Department.objects.all()
for d in depts1:
	for u in d.user.all():
    	print(u.username)
```
首先用 select xxx from department 从department表中，查出了所有部门
　
再用inner join关联department表和user表，找出department.id=d.id的人。
　
查询语句为：select xxx from department inner join user on (department.user_id = user.id) where department.id=xxx
　
所以总的查询次数为：1+N，N表示部门d的个数
写法二
```
depts2 = Department.objects.prefetch_related('user').all()
for d in depts2:
	for u in  d.user.all():
		print(u.username)
```
用了两次查询
第一次查询出部门的数据，select xxx from department
　
第二次inner join两张表，取出department.id in (xxx,xxxxx,xxx,xxx)，也就是第一次查询出来的所有部门的id的集合的数据
　
select xxx from department inner join user on (department.one_user_id = user.id) where department.id in (xx, xxx, xxx, xx)

注意

还有一些就是比较普通的优化了，例如查询的时候只查询需要的字段，尽量不要把所有字段查出来完等等
新创建的QuerySet，如果没缓存过数据，本身是不包含数据的，它是由一条查询数据的SQL和一些其它属性字段构成的类（class）。我们为什么在DEBUG模式中能看到变量，以及使用它呢？

因为QuerySet实现了__repr__方法，来描述自己，在用list的时候其实就已经去数据库查询一次了，不相信的话自己可以试试。
```
# The maximum number of items to display in a QuerySet.__repr__
REPR_OUTPUT_SIZE = 20

def __repr__(self):
	data = list(self[:REPR_OUTPUT_SIZE + 1])
	if len(data) > REPR_OUTPUT_SIZE:
		data[-1] = "...(remaining elements truncated)..."
	return '<%s %r>' % (self.__class__.__name__, data)
```