python 黑盒测试_处理Python导入黑盒

最新推荐文章于 2022-07-25 15:02:34 发布

cumei1658

最新推荐文章于 2022-07-25 15:02:34 发布

阅读量1.3k

点赞数

文章标签： python java linux 编程语言 spring

原文链接：https://www.pybloggers.com/2011/09/dealing-with-the-python-import-blackbox/

版权

在Python中，当尝试导入模块并出现错误时，系统不会提供有关模块未找到还是导入失败的详细信息。本文探讨了在黑盒测试中遇到的问题，导入过程的细节，以及如何通过分析错误回溯来判断模块是否真正加载失败。作者提出了一个解决方案，通过检查回溯对象的帧来确定模块是否成功加载，从而更准确地处理导入错误。

摘要由CSDN通过智能技术生成

python 黑盒测试

Turns out, this does not work reliably, in fact it will only work when

事实证明，这不能可靠地工作，实际上只有当

For a long time Python’s import system was (although customizable) at the very core a black box. You could hook into some parts of it but others were hidden from you. On top of that the only signalling that the import system has is “here is your module, be happy” or “oh look, an import error”. Unfortunately Python’s exceptions are an example of a stringly typed API, and one of the worst.

长期以来，Python的导入系统（尽管是可自定义的）在核心部分是一个黑匣子。您可以插入其中的某些部分，但其他部分对您而言是隐藏的。最重要的是，导入系统具有的唯一信号是“您的模块在这里，很高兴”或“哦，看，导入错误”。不幸的是，Python的异常是字符串型API的示例，也是最糟糕的一种。

But one step after another. What’s the actual problem of that black box. it works, right?

但是一步一步。黑匣子的实际问题是什么？它有效，对不对？

用例 (The Use Case)

The problem arises when you start doing things and want to respond to errors. A good example are imports where you try to import something and if that fails you want to do something else. For instance you have a module name as a string and you want to try to import that. If that module does not exist (not if it fails to import!) you want to do something else. Django’s middlewares for instance are defined as strings in the configuration module and if there is a typo you want to tell the users where the problem is.

当您开始做事并想对错误进行响应时，就会出现问题。一个很好的例子是导入，您尝试导入某些内容，如果导入失败，则您想做其他事情。例如，您有一个模块名作为字符串，并且您想尝试导入它。如果该模块不存在（如果导入失败，则不会！），您需要执行其他操作。例如，Django的中间件在配置模块中定义为字符串，如果有错字，您想告诉用户问题出在哪里。

If you import module A and if that does not exist you want to fall back to module B, you don’t want to swallow the import error of module A since that one might have been a dependency that failed loading.

如果导入模块A，但如果不存在，则要回退到模块B，则不希望吞下模块A的导入错误，因为该错误可能是加载失败的依赖项。

Consider you have a module called foo that depends on a module named bar. If foo does not exist you want to retry with simplefoo. This is what nearly everybody is doing:

考虑您有一个名为foo的模块，该模块依赖于名为bar的模块。如果foo不存在，则要使用simplefoo重试。这几乎是每个人都在做的事情：

trytry :
    :
    import import foo
foo
except except ImportErrorImportError :
    :
    import import simplefoo simplefoo as as foo
foo

However if now foo is failing to import because bar is missing you get the import error “No module named simplefoo” even though the correct error would have been “No module named bar”.

但是，如果现在由于缺少bar而导致foo无法导入，则即使正确的错误是“没有名为bar的模块”，也会出现导入错误“没有名为simplefoo的模块”。

问题 (The Problem)

The problem is that Python does not provide you with information if the module was not found or failed to import. In theory you could build yourself something with the imp module that splits up finding and loading but there are a handful of problems with that:

问题是，如果未找到模块或导入模块失败，Python不会为您提供信息。从理论上讲，您可以使用imp模块来构建自己的东西，该模块可以拆分查找和加载内容，但是存在一些问题：

The Python import process is notoriously underspecified and exploited in various ways. Just because an importer says it finds a module it does not mean it can properly import it. For instance there are many finders that will tell you that find_module succeeded just to fail later with an error on load_module.
The Python import machinery is complex and even with the new importlib module everything but easy to use. To replicate the logic that Python is applying to locate modules you need around 80 lines of code, even with importlib available.
The import process is highly dynamic and there are various ways in which people can customize the importing, going beyond what is possible with regular import hooks by overriding __import__.

众所周知，Python导入过程的规范不足，并以各种方式被利用。仅仅因为进口商说找到了模块，并不意味着它可以正确地导入它。例如，有许多查找程序会告诉您find_module成功，只是稍后在load_module上出现错误而失败。
Python导入机制很复杂，即使有了新的importlib模块，也很容易使用。要复制Python用来定位模块的逻辑，即使有可用的importlib ，也需要大约80行代码。
导入过程是高度动态的，人们可以通过多种方式自定义导入，这是通过覆盖__import__来超越常规导入挂钩所能实现的。

The second possibility that is actually in use sometimes is parsing the error message of the import error. This however is a lost cause because the error message is implementation defined and differs quite often. On top of that is the import machinery in Python a recursive process and gives very awkward results:

有时实际使用的第二种可能性是解析导入错误的错误消息。但是，这是一个丢失的原因，因为错误消息是由实现定义的，并且经常会有所不同。最重要的是，Python中的导入机制是一个递归过程，并且给出非常尴尬的结果：

As you can see, the error message does not even include the whole import path at all times. Sometimes the error message is something completely unrelated, sometimes the whole error message is just the module name. Sometimes it’s “No module named %s”, sometimes the module name is on quotes. This is because various parts of the system can abort an import process and since this is customizable …

如您所见，错误消息甚至没有始终包含整个导入路径。有时错误消息是完全不相关的，有时整个错误消息仅是模块名称。有时是“没有名为％s的模块”，有时模块名称用引号引起来。这是因为系统的各个部分都可以中止导入过程，并且这是可自定义的……

导入过程详细信息 (Import Process Details)

The way imports work is that at a very early point an entry in sys.modules is created for the new module. When the module code is executed it will be executed in a frame where the globals of the frame are the dictionary of the module in sys.modules. As such this is valid in Python:

导入工作的方式是在很早的时候就为新模块创建sys.modules条目。执行模块代码时，它将在一个框架中执行，其中框架的全局变量是sys.modules中模块的字典。因此，这在Python中有效：

import import sys
sys
a_value a_value = = [[ 11 , , 22 , , 33 ]
]
this this = = syssys .. modulesmodules [[ __name____name__ ]
]
assert assert a_value a_value is is thisthis .. a_value
a_value

Now in theory one could think that if an import fails we will have a partial entry in sys.modules left to introspect if the import failed at a later point. This however is usually not the case because on import errors caused by the actual importers an importer is required to remove the entry in sys.modules again so we don’t have much luck there.

从理论上讲，现在可以想到，如果导入失败，我们将在sys.modules中留出一部分条目以供以后检查导入是否失败。但是通常不是这种情况，因为在实际进口商造成的进口错误中，要求进口商再次删除sys.modules中的条目，所以我们在这里没有太多的运气。

Consider this fail_module.py:

考虑以下fail_module.py ：

If we however attempt to access fail_module later it will be gone:

但是，如果我们稍后尝试访问fail_module ，它将消失：

>>> >>>  import import sys
sys
>>> >>>  import import fail_module
fail_module
Traceback (most recent call last):
  File Traceback (most recent call last):
  File "<stdin>", line "<stdin>" , line 1, in 1 , in <module>
  File <module>
  File "fail_module.py", line "fail_module.py" , line 7, in 7 , in <module>
    <module>
    import import missing_module
missing_module
ImportError: ImportError : No module named missing_module
No module named missing_module
>>> >>>  import import sys
sys
>>> >>>  'fail_module' 'fail_module' in in syssys .. modules
modules
False
False
Traceback (most recent call last):
  File Traceback (most recent call last):
  File "<stdin>", line "<stdin>" , line 1, in 1 , in <module>
  File <module>
  File "fail_module.py", line "fail_module.py" , line 7, in 7 , in <module>
    <module>
    import import missing_module
missing_module
ImportError: ImportError : No module named missing_module
No module named missing_module

Since we also can’t replace sys.modules with a custom data structure where we get callbacks when things are inserted we have no chance there.

由于我们也无法用自定义数据结构替换sys.modules ，因此在插入事物时我们会得到回调。

旁道 (Sidechannels)

I had to solve this problem again yesterday when I worked on a way to get rid of namespace packages in Flask without pissing existing users off. I think I found something that works reliable enough where I don’t want to shoot myself for writing the code.

昨天，当我设法摆脱Flask中的命名空间包而又不惹恼现有用户的时候，我不得不再次解决这个问题。我想我发现了一些可以可靠工作的东西，我不想因为编写代码而自暴自弃。

The idea is that if you get an import error you don’t only get an import error but also a traceback object if you want. And that traceback object has all the frames of the traceback linked to it. If you walk the traceback you can find out if at any point the module you attempted to import was involved. If that was the case, the module succeeded in loading and something that it did resulted in an import error.

这个想法是，如果您遇到导入错误，那么您不仅会得到导入错误，而且还会得到回溯对象。并且该追溯对象具有链接到其的所有追溯帧。如果您进行追溯，则可以随时了解您尝试导入的模块是否涉及。如果是这种情况，则模块成功加载，并且确实导致导入错误。

Now obviously there are downsides of this approach, so let’s go over them:

现在显然有这种方法的缺点，所以让我们仔细研究一下：

It assumes that the module we import does not override __name__. Since that is a horrible idea anyways that’s something we can ignore.
It assumes that there will be at least one traceback frame originating from that module. This will not be the case if that module was a C module that dynamically imported another module. This however is negligible since this is on the one hand a very uncommon thing to do and secondly this comes with its own set of problems.
It walks a traceback so your JIT will not be happy with that. On the other hand you should only import modules in non critical code paths anyways.

假定我们导入的模块未覆盖__name__ 。既然这是一个可怕的想法，那我们就可以忽略掉。
假设至少有一个追溯模块来自该模块。如果该模块是动态导入另一个模块的C模块，则情况并非如此。但是，这可以忽略不计，因为一方面这是一件非常不常见的事情，其次它也有其自身的一系列问题。
它具有追溯功能，因此您的JIT对此不满意。另一方面，无论如何，您仅应在非关键代码路径中导入模块。

So how does the code look?

那么代码看起来如何？

You can use it like this:

您可以像这样使用它：

json json = = import_moduleimport_module (( 'simplejson''simplejson' )
)
if if json json is is NoneNone :
    :
    json json = = import_moduleimport_module (( 'json''json' )
    )
    if if json json is is NoneNone :
        :
        raise raise RuntimeErrorRuntimeError (( 'Unable to find a json implementation''Unable to find a json implementation' )
)

Generally the implementation is straightforward. Try to import with __import__, if that fails get the current traceback and see if any of the frames originated in the module we tried to import. If that is the case, we reraise the exception with the original traceback, otherwise just return None to mark a missing module.

通常，实现很简单。尝试使用__import__导入，如果失败，则获取当前的回溯，并查看是否有任何帧源自我们尝试导入的模块。如果是这样，我们将使用原始回溯引发异常，否则只需返回None即可标记缺少的模块。

Since None has a special meaning in sys.modules which marks an import error we know that an imported module never is None and we can use this as return value to indicate a module that does not exist. If we would instead raise an exception we would have the very same problem again since exceptions bubble up and we don’t know if someone would handle it. So raising something like ModuleNotFound instead of returning None would cause troubles if the module we import recursively imports something with import_module and does not handle the exception.

由于None在sys.modules中具有特殊的含义，它表示导入错误，因此我们知道导入的模块永远不会为None ，我们可以将其用作返回值以指示不存在的模块。如果我们改为提出一个异常，我们将再次遇到同样的问题，因为异常会冒出来，而且我们不知道是否有人会处理。因此，如果我们递归导入的模块使用import_module递归导入某些东西并且不处理该异常，则抛出类似ModuleNotFound的东西而不返回None会引起麻烦。

为什么行得通？ (Why does it work?)

Now you would think this only makes sense that it works, but it actually surprised me that it does. The reason it surprises me is that Python normally shuts down modules in a very weird way by setting all the values in the global dictionary to None. Since the actual modules is long gone when you get the import error you would think that the reference to the globals you have is full of Nones and the names would never be the module name.

现在您会认为这仅是有意义的，但实际上让我感到惊讶。让我感到惊讶的原因是，Python通常通过将全局字典中的所有值都设置为None来以非常奇怪的方式关闭模块。由于当您遇到导入错误时，实际的模块早已一去不复返了，因此您会认为对全局变量的引用充满了None ，并且名称永远不会是模块名称。

To quote the documentation:

引用文档：

Starting with version 1.5, Python guarantees that globals whose name begins with a single underscore are deleted from their module before other globals are deleted; if no other references to such globals exist, this may help in assuring that imported modules are still available at the time when the __del__ method is called.

从版本1.5开始，Python保证在删除其他全局变量之前，将从其下划线开头的全局变量从其模块中删除。如果不存在对此类全局变量的其他引用，则这可能有助于确保在调用__del__方法时导入的模块仍然可用。

This however is only true when the module is shut down when the interpreter is shutting down, not when the module is garbage collected. And with that, the above hack works. If Python would do what the documentation says in the module destructor instead of the interpreter shutdown code our hack would not work.

但是，只有在解释器关闭时关闭模块时才是这样，而不是在垃圾回收模块时才这样做。如此一来，上述骇客就可以使用。如果Python将按照文档在模块析构函数中的说明而不是在解释器关闭代码中的说明进行操作，那么我们的黑客将无法正常工作。

Also this requires that a traceback object indeed still owns a reference to f_globals. Now if you look at the traceback output itself you will never see information that needs to be derived from the module global dictionary so it appears to be implementation specific functionality that is not guaranteed. However, and here is the catch. The import hook protocol also specifies that a module can inject __loader__ into the frame so that the source can be loaded from the __loader__ if the source is not based on the filesystem. And for this to work the globals have to be there. On top of that this also gives us confirmation that garbage collected modules must not clear out their globals with Nones or we would not be able to extract the sourcecode for certain import hooks when an import error occurs since the loader would be gone.

这也要求回溯对象确实仍然拥有对f_globals的引用。现在，如果您查看回溯输出本身，您将永远不会看到需要从模块全局字典中获取的信息，因此它似乎是无法保证的特定于实现的功能。但是，这是要抓住的地方。导入挂钩协议还指定了模块可以将__loader__注入到框架中，以便如果源不基于文件系统，则可以从__loader__加载源。为了使此工作奏效，全局对象必须存在。最重要的是，这还使我们确认，垃圾收集的模块一定不能使用None清除它们的全局变量，否则当发生导入错误时我们将无法为某些导入挂钩提取源代码，因为加载程序将消失。

And with that, the above hack suddenly looks quite reasonable and supported again.

有了这个，上述破解突然看起来很合理，并再次得到了支持。