Django, lxml, WSGI, and Python sub-interpreter magic

Django, lxml, WSGI, and Python sub-interpreter magic

One of the applications we’ve been spending a fair chunk of time on here in the library is a user-friendly front-end to our fedora repository. It’s built on internally-developedPython libraries for repository access, XML data mapping, and Django tie-ins. We’re aiming to opensource that library soon, but this post isn’t about that library. In fact, it’s only sort of about the application. This post is about an interesting problem we ran into this week when trying to deploy that application into our staging environment for testing.

See, we’ve made some great strides with development, and we’re ready to put them up so that our users—mostly our own librarians for now—can test them. Development has progressed smoothly under Django’s manage.py runserver. The other day, though, when we ran our application under apache, it surprised us by locking up hard.

Now, I can’t think of the last time I saw an http daemon freeze up like that, but it was clear that’s what was happening. The web request wasn’t returning anything (not even a 500 Internal Server Error). Browsers just sat there spinning. curl sat waiting for a response. And eventually apache would give up and drop the connection. It was dead at the starting bell, and with no prior warning of any problems in development. We were confounded.

Debugging was an interesting experience, and I hope to post sometime about how that progressed. In the end, though, we figured out it was a design decision that made it happen. Here are the players in this drama:

lxml is a fine XML processing library for Python. We use it to process XML as we communicate with fedora. We particularly picked it because it supports XPath expressions,XSLT, and XML Schema, and because it’s pretty darn portable with minimal fuss.

Cython is a tool for gluing together C and Python. I started using a variant called Pyrex several years ago, and I happen to think the approach is a great one. lxml happens to use Cython internally. Most users will never need to know that fact, but it becomes relevant in a bit.

Django is our web development framework of choice these days at Emory Libraries. It’s written in Python, which has given us a huge dose of flexibility, stability, and power in our development.

mod_wsgi is how we deploy our Django code to production. There are other options, but we’ve found WSGI gives us the best mix of flexibility and stability so far.

Unfortunately, it was a combination of design decisions in those tools—particularly Cython, Python, and WSGI—that locked up our app.

The problem, it turns out, is subtle, but it stems from the use of Cython (via lxml) and mod_wsgi together. These can be made to work together, but it requires careful configuration to work around some incompatibilities. This is complicated by some further design decisions in Django, which I’ll say more about in a bit. First, lxml, Cython, and the simplified GIL interface.

Cython, as mentioned above, is a tool for gluing together C and Python. The idea is you write code that looks a lot like Python, but with a few C-like warts, and Cython compiles your code down to raw C. This is perfect for exposing C libraries in Pythonic idioms, and lxml uses it to great effect to provide its XML access. Now, Cython happens to use Python’s simplified GIL interface internally for locking. Unfortunately this means that it’s incompatible with an obscure Python feature called sub-interpreters. Most applications don’t need to use this feature. Most applications—notably including Django’s manage.py runserver—never notice or care.

mod_wsgi is a perfect example of good use of sub-interpreters. It uses them to allow apache admins to run lots of little WSGI-based web apps all in a single process, but still give each one its own Python environment. Without this, things like Django’s model registration patterns—along with similar global systems in many other Python libraries—would leave separate applications all interfering with each other.

Unfortunately, given that Cython-based libraries are incompatible with sub-interpreters, and given that mod_wsgi uses sub-interpreters, it follows logically that Cython-based libraries like lxml are incompatible with simple mod_wsgi configurations. In our case, this manifested as a single-thread self-deadlock in the Python Global Interpreter Lock whenever we tried to use our application at all. We were lucky: As the Python C-API docs say, “Simple things may work, but confusing behavior will always be near.”

Now, once that incompatibility is recognized and accepted, hope is not lost. If you’re only running a single WSGI application, your workaround might even be easy. You can force a mod_wsgi application to avoid the problem by forcing it into the global application group:

WSGIApplicationGroup %{GLOBAL}

If you want to run multiple WSGI applications, though, they might not play so well all together like that. Remember, as I described above, WSGI uses sub-interpreters to prevent applications from accidentally stepping on each other. Django applications, in particular, must run in separate sub-interpreters. If you want to run a couple of them, and they’re all incompatible with sub-interpreters, you need to keep them separate.

We’re just starting to deal with this problem, but it looks like mod_wsgi daemon processes are just what the doctor ordered. What we’re looking at right now is using a separate WSGIDaemonProcess for each lxml-enabled Django site. According to the docs, this should eliminate sub-interpreter conflicts while still giving each application its own distinct interpreter space. Which will probably eat some system resources, but it’s better than locking up on every request.

I’ll update this post if the strategy turns out not to work. So far, though, I’m hopeful.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Django是一个用于快速开发Web应用程序的Python Web框架。而python-docx-template是一个Python库,它可以使用Word文档作为模板,然后根据传入的数据批量生成Word文档。在Django中,我们可以利用python-docx-template库来实现批量生成Word文档的功能。 首先,我们需要在Django项目中安装python-docx-template库。可以使用pip命令来安装该库: ```bash pip install python-docx-template ``` 接下来,我们可以在Django项目中创建一个视图函数,用于接收数据并根据模板生成Word文档。在视图函数中,我们可以使用python-docx-template库提供的方法将数据填充到Word模板中,生成最终的Word文档。 例如,假设我们有一个Word文档模板`template.docx`,里面包含了一些需要填充数据的位置,我们可以在Django中这样写视图函数: ```python from docxtpl import DocxTemplate from django.http import HttpResponse def generate_word_document(request): # 从请求中获取数据 data = request.GET.get('data', '') # 读取Word模板 doc = DocxTemplate("template.docx") # 根据数据填充模板 context = {'data': data} doc.render(context) # 写入生成的Word文档 doc.save("generated_document.docx") # 返回生成的Word文档给用户 with open("generated_document.docx", 'rb') as f: response = HttpResponse(f.read(), content_type='application/vnd.openxmlformats-officedocument.wordprocessingml.document') response['Content-Disposition'] = 'attachment; filename=generated_document.docx' return response ``` 通过上述视图函数,我们可以在Django项目中实现批量生成Word文档的功能,用户可以通过传入数据来生成他们所需的Word文档。这样我们就可以方便地利用PythonDjango来批量生成Word文档,提高生产效率。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值