python col函数,在pyspark中找不到col函数

In pyspark 1.6.2, I can import col function by

from pyspark.sql.functions import col

but when I try to look it up in the Github source code I find no col function in functions.py file, how can python import a function that doesn't exist?

解决方案

It exists. It just isn't explicitly defined. Functions exported from pyspark.sql.functions are thin wrappers around JVM code and, with a few exceptions which require special treatment, are generated automatically using helper methods.

If you carefully check the source you'll find col listed among other _functions. This dictionary is further iterated and _create_function is used to generate wrappers. Each generated function is directly assigned to a corresponding name in the globals.

Finally __all__, which defines a list of items exported from the module, just exports all globals excluding ones contained in the blacklist.

If this mechanisms is still not clear you can create a toy example:

Create Python module called foo.py with a following content:

# Creates a function assigned to the name foo

globals()["foo"] = lambda x: "foo {0}".format(x)

# Exports all entries from globals which start with foo

__all__ = [x for x in globals() if x.startswith("foo")]

Place it somewhere on the Python path (for example in the working directory).

Import foo:

from foo import foo

foo(1)

An undesired side effect of such metaprogramming approach is that defined functions might not be recognized by the tools depending purely on static code analysis. This is not a critical issue and can be safely ignored during development process.

Depending on the IDE installing type annotations might resolve the problem (see for example zero323/pyspark-stubs#172).

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值