django model filter_django中探索如何提高查询数据效率

平时用到django时候挺多, 最近有在搞数据库表分区的实践, 所以顺便尝试在django如何提高查询效率, 也就尝试在django中加入线程和协成的代码, 看一下效果

相关工具

django + pycharm + mysql  + postman

创建django项目, 具体不多说了

 1401  django-admin.py startproject db
 1405  cd db
 1413  python manage.py startapp cs

我获得文件夹树

7992364b81722fc7b7b6b4e37095e142.png

接着我们在cs/models.py编辑一个People模型

from django.db import models
class People(models.Model):
    name = models.CharField(max_length=10, help_text='姓名')
    sex = models.SmallIntegerField(default=0, help_text="性别")
    age = models.SmallIntegerField(default=0, help_text='年龄')
    address = models.CharField(default='', max_length=128, help_text='地址')

然后我们生成对应表

 1416  python manage.py makemigrations
 1417  python manage.py migrate cs

接下来我们创建了600w的假数据,

在/cs/management/commands/create_people.py 创建一个django的脚本

from django.core.management.base import BaseCommand
from cs.models import People
import random


class Command(BaseCommand):
    """
    """
    def handle(self, *args, **options):

        spe = 500
        objs = list()
        for i in range(6000000):
            age = random.randint(1, 70)
            sex = random.randint(0, 1)
            objs.append(People(name=str(i), age=age, sex=sex))

            if i % 500 != 0:
                continue
            People.objects.bulk_create(objs)
            objs = list()
            print(i)

然后运行, 生成数据

python manage.py create_people

这样我们600w的测试数据就创建完成了

from django.shortcuts import render
from django.views import View
from django.http.response import JsonResponse
from cs.models import People
import gevent
from gevent.pool import Pool
import asyncio
from concurrent.futures import ThreadPoolExecutor, as_completed
from django.db.transaction import atomic

# Create your views here.
import time

class DbViews(View):

    def get_count(self, age):
        return age, People.objects.filter(age=age).count()

    async def gets(self, age):
        count = People.objects.filter(age=age).count
        return count()

    async def get_async_count(self, age):
        print("***age***", age, time.time())
        age_count = await self.gets(age)
        print("***age***", age, time.time())

        return age, age_count

    def gevent_pool(self, ages):
        ages_detail = dict()
        pool = Pool()
        ret = list()
        for age in ages:
            t = pool.spawn(self.get_count, age)
            ret.append(t)
        gevent.joinall(ret)
        for t in ret:
            age, age_count = t.value
            ages_detail.update({age: age_count})
        return ages_detail

    def cur_get(self, ages):
        ages_detail = dict()

        for age in ages:
            age_count = People.objects.filter(age=age).count()
            ages_detail.update({age: age_count})
        return ages_detail

    def thead_pool(self, ages):

        ages_detail = dict()
        thead_poll = ThreadPoolExecutor(max_workers=3)

        ret = list()
        for age in ages:

            t = thead_poll.submit(self.get_count, age)
            ret.append(t)
        for t in as_completed(ret):
            age, age_count = t.result()
            ages_detail.update({age: age_count})
        return ages_detail

    def async_get(self, ages):
        ages_detail = dict()

        loop = asyncio.new_event_loop()
        asyncio.set_event_loop(loop)
        ret = list()

        for age in ages:
            t = asyncio.ensure_future(self.get_async_count(age))
            ret.append(t)
            
        done, pending = loop.run_until_complete(asyncio.wait(ret))
        for t in done:
            age, age_count = t.result()
            ages_detail.update({age: age_count})
        return ages_detail


    def hclock(self):
        people = People.objects.select_for_update().filter(id__in=[3,4])
        with atomic():
            for i in people:
                time.sleep(10)
                print("*******",i.id)
                i.age += 1
                i.save()
                time.sleep(10)


    def get(self, request):
        ages = [10, 67, 29]
        ages_detail = self.hclock()
        ages_detail = dict()

        return JsonResponse(ages_detail)

这里是全部view代码, 下面我们会一个说明

然后我们在/db/urls.py配置一个接口路径people

from django.conf.urls import url
from django.contrib import admin
from cs.views import DbViews

urlpatterns = [
    url('people/', DbViews.as_view(), name='cs')
]

启动web服务

我们假定我们的需求, 我们需要获取10, 67, 29岁三种年龄的人数, 我们先使用查询三次数据库的正常方式

    def get_count(self, age):
        return age, People.objects.filter(age=age).count()
    def cur_get(self, ages):
        ages_detail = dict()

        for age in ages:
            age_count = People.objects.filter(age=age).count()
            ages_detail.update({age: age_count})
        return ages_detail
    def get(self, request):
        ages = [10, 67, 29]
        ages_detail = self.cur_get(ages)

我们得到了

0e6f788009d4464491621966daf4cbde.png

消耗了2419ms

接下来我们使用gevent 方式

    def gevent_pool(self, ages):
        ages_detail = dict()
        pool = Pool()
        ret = list()
        for age in ages:
            t = pool.spawn(self.get_count, age)
            ret.append(t)
        gevent.joinall(ret)
        for t in ret:
            age, age_count = t.value
            ages_detail.update({age: age_count})
        return ages_detail


    def get(self, request):
        ages = [10, 67, 29]
        ages_detail = self.gevent_pool(ages)

        return JsonResponse(ages_detail)

我们得到了

20f97b599196bf3cd869494b7a90b046.png

很明显的请求的时间没有多大差别, 这可能models的count不支持的协成操作, 在搞过gevent下载, 可以使用monkey更改一下底层的一些东西

于是加入了

from gevent import monkey
monkey.patch_all()

但是出现了错误

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/lixiang/projects/py3/lib/python3.6/site-packages/django/utils/autoreload.py", line 227, in wrapper
    fn(*args, **kwargs)
  File "/Users/lixiang/projects/py3/lib/python3.6/site-packages/django/core/management/commands/runserver.py", line 125, in inner_run
    self.check(display_num_errors=True)
  File "/Users/lixiang/projects/py3/lib/python3.6/site-packages/django/core/management/base.py", line 359, in check
    include_deployment_checks=include_deployment_checks,
  File "/Users/lixiang/projects/py3/lib/python3.6/site-packages/django/core/management/base.py", line 346, in _run_checks
    return checks.run_checks(**kwargs)
  File "/Users/lixiang/projects/py3/lib/python3.6/site-packages/django/core/checks/registry.py", line 81, in run_checks
    new_errors = check(app_configs=app_configs)
  File "/Users/lixiang/projects/py3/lib/python3.6/site-packages/django/core/checks/urls.py", line 16, in check_url_config
    return check_resolver(resolver)
  File "/Users/lixiang/projects/py3/lib/python3.6/site-packages/django/core/checks/urls.py", line 26, in check_resolver
    return check_method()
  File "/Users/lixiang/projects/py3/lib/python3.6/site-packages/django/urls/resolvers.py", line 254, in check
    for pattern in self.url_patterns:
  File "/Users/lixiang/projects/py3/lib/python3.6/site-packages/django/utils/functional.py", line 35, in __get__
    res = instance.__dict__[self.name] = self.func(instance)
  File "/Users/lixiang/projects/py3/lib/python3.6/site-packages/django/urls/resolvers.py", line 405, in url_patterns
    patterns = getattr(self.urlconf_module, "urlpatterns", self.urlconf_module)
  File "/Users/lixiang/projects/py3/lib/python3.6/site-packages/django/utils/functional.py", line 35, in __get__
    res = instance.__dict__[self.name] = self.func(instance)
  File "/Users/lixiang/projects/py3/lib/python3.6/site-packages/django/urls/resolvers.py", line 398, in urlconf_module
    return import_module(self.urlconf_name)
  File "/Users/lixiang/projects/py3/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 152, in __exit__
  File "<frozen importlib._bootstrap>", line 107, in release
RuntimeError: cannot release un-acquired lock

看来django不让这样更改, 我们有尝试了async这种方式

     async def get_async_count(self, age):
        print("***age***", age, time.time())
        age_count = await self.gets(age)
        print("***age***", age, time.time())
     def async_get(self, ages):
        ages_detail = dict()

        loop = asyncio.new_event_loop()
        asyncio.set_event_loop(loop)
        ret = list()

        for age in ages:
            t = asyncio.ensure_future(self.get_async_count(age))
            ret.append(t)

        done, pending = loop.run_until_complete(asyncio.wait(ret))
        for t in done:
            age, age_count = t.result()
            ages_detail.update({age: age_count})
        return ages_detail
     def get(self, request):
        ages = [10, 67, 29]
        ages_detail = self.async_get(ages)
        return JsonResponse(ages_detail)

得到结果

71665d0f6d07293e2b35acb7231431a2.png

还是没有什么差异, 看来协成没有生效, 于是我们在调用协程时候打印了一下

***age*** 10 1560943042.4748821
***age*** 10 1560943043.3534641
***age*** 67 1560943043.3534899
***age*** 67 1560943044.1488252
***age*** 29 1560943044.148851
INFO 2019-06-19 11:17:24,987 basehttp 3413 123145324937216 "GET /people/ HTTP/1.1" 200 39 ('GET /people/ HTTP/1.1', '200', '39')
***age*** 29 1560943044.9875998

很明显执行的时候是阻塞了, 看来orm的方法执行时无法使用协成

现在我们使用一下线程

 def thead_pool(self, ages):

        ages_detail = dict()
        thead_poll = ThreadPoolExecutor(max_workers=3)

        ret = list()
        for age in ages:

            t = thead_poll.submit(self.get_count, age)
            ret.append(t)
        for t in as_completed(ret):
            age, age_count = t.result()
            ages_detail.update({age: age_count})
        return ages_detail
def get(self, request):
        ages = [10, 67, 29]
        ages_detail = self.thead_pool(ages)
        return JsonResponse(ages_detail)

得到结果

ee09be49f765c91b86f05dd6949c9db7.png

这里响应时间是857ms, 比以上的大约2500ms方法快了大约1600ms,看来在django中对于多个查询来说, 使用线程池是比较有效的提高访问效率的方式

我们对于600w的数据也可以使用分区提高效率, 我们先把原来数据表cs_people删除, 按age分区在重新导入数据

drop table cs_people
create table cs_people(`id` INT AUTO_INCREMENT,
                        `name` varchar(10),
                        `sex` smallint,
                        `age` smallint,
                        `address` varchar(128),
                         PRIMARY KEY (`id`, `age`))
    PARTITION by range COLUMNS(age)(
    partition p0 values less than(10),
    partition p1 values less than(20),
    partition p2 values less than(30),
    partition p3 values less than(40),
    partition p4 values less than(50),
    partition p6 values less than(60),
    partition p7 values less than MAXVALUE);

我们可以查看分区情况:

select PARTITION_NAME, TABLE_ROWS from INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME='cs_people';

31bf821d13b0127a1f01f656e0b511e6.png

我们还可以查看索引情况

show index from cs_people

821205c5d9d3b350ee0f9c94e7d20d01.png

我们在生成数据的脚本中添加了批量创建的脚本, 使用了django自动bulk_create所以创建很快

其实在创建分区的时候我们已经将age设置为索引了, 一会我们在说只索引的情况

我们看一下各种方式的结果, 因为协程方式没有生效,我们就不讨论了

正常方式, 串行三次请求, 时间524ms(这个地方忘记保存截图了)

线程池请求, 时间215ms

77bccbf0996e5aa9307a141670d554e6.png

很明显这两种方式有很大的提升, 这是因为mysql将按照不同age存在到了不到分区表里, 我们在查询不同age时, mysql可以智能的到不同的小分区中查找,一下子将查询集缩小了数倍, 这里有索引的一些功劳,我们接下来在看看只有索引的情况下,效果是怎么样的

我们删除分区表穿件索引表,这里和分区创建表的时候就不在使用migrate的了

6070c04a0931f3288bb8b11d84c959b4.png

创建索引

6e7fceeaa336cc5215a0b78d4261ca79.png

查看索引

afecfb957ddac97186b41669c82c43e9.png

正常方式 81ms

f1ead82a81c9395ff3c28eaf85939ef4.png

线程池,84ms

f1c0385601c9059727f4623ae13bdb3b.png

结果发现对于600w的数据,分区会有些额外开销, 造成了访问时间多余只构建索引的情况, 而且正常和线程池这两种方式时间大致相同

但是后来我增加了只构建索引这种情况的数据量,到达了1200w

线程池: 73ms

4a676ec7184f4dbfcc8140a1eea61f1b.png

正常, 111ms

98f98a0df76ad62a24534c0f23c32e7b.png

这样看来相对数据量大情况还是线程池效率比较高

  1. 对于小数据量来说, 只创建索引就可以解决
  2. 如果数据量稍微有点大的, 可以使用多线程加索引情况
  3. 而理论上使用分区情况应该是数据量比较大的情况, 具体应该会与实际数据有关系
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值