我2015年的前5个“新” Python模块

As I’ve been blogging a lot more about Python over the last year, I thought I’d list a few of my favourite ‘new’ Python modules from 2015. These aren’t necessarily modules that were newly released in 2015, but modules that were ‘new to me’ this year – and may be new to you too!

去年我在博客上写了很多关于Python的文章,因此我想列出一些2015年以来我最喜欢的“新” Python模块。这些模块不一定是2015年新发布的模块,而是一些模块今年对我来说是新的–也许对您来说也是新的!

tqdm (tqdm)

This module is so simple but so useful – it makes it stupidly easy to display progress bars for loops in your code. So, if you have some code like this:

这个模块非常简单,但却非常有用–它使显示代码循环的进度条变得非常容易。 因此,如果您有这样的代码:

for item in items:
    process(item)

Just wrap the iterable in the tqdm function:

只需将迭代器包装在tqdm函数中:

from tqdm import tqdm

for item in tqdm(items):
    process(item)

and you’ll get a beautiful progress bar display like this:18%|█████████                               | 9/50 [00:09<;00:41,  1.00it/s

然后您会得到一个漂亮的进度条显示,如下所示: 18%|█████████| 9/50 [00:09 <; 00:41,1.00it / s

I introduced my wife to this recently and it blew her mind – it’s so beautifully simple, and so useful.

我最近向我的妻子介绍了这件事,这让她大吃一惊-它是如此简单,实用。

作业库 (joblib)

I had come across joblib previously, but I never really ‘got’ it – it seemed a bit of a mismash of various functions. I still feel like it’s a bit of a mishmash, but it’s very useful. I was re-introduced to it by one of my Flowminder colleagues, and we used it extensively in our data analysis code. So, what does it do? Well, three main things: 1) caching, 2) parallelisation, and 3) persistence (saving/loading data). I must admit that I haven’t used the parallel programming functionality yet, but I have used the other functions extensively.

以前我曾经遇到过joblib,但是我从未真正“了解”它-似乎对各种功能有些误解。 我仍然觉得这有点像杂烩,但这非常有用。 我的一位Flowminder同事将我重新引入了它,我们在数据分析代码中广泛使用了它。 那么,它是做什么的呢? 好了,主要有三件事:1) 缓存 ,2) 并行化和3) 持久性 (保存/加载数据)。 我必须承认我还没有使用并行编程功能,但是我已经广泛使用了其他功能。

The caching functionality allows you to easily ‘memoize’ functions with a simple decorator. This caches the results, and loads them from the cache when calling the function again using the same parameters – saving a lot of time. One tip for this is to choose the arguments of the function that you memoize carefully: although joblib uses a fairly fast hashing function to compare the arguments, it can still take a while if it is processing an absolutely enormous array (many Gigabytes!). In this case, it is often better to memoize a function that takes arguments of filenames, dates, model parameters or whatever else is used to create the large array – cutting out the loading of the large array and the hashing of that array on each call.

缓存功能使您可以使用简单的装饰器轻松地“记忆”功能。 这样可以缓存结果,并在使用相同参数再次调用该函数时从缓存中加载结果,从而节省了大量时间。 一个提示是选择您要仔细记下的函数的参数:尽管joblib使用相当快的哈希函数比较参数,但是如果要处理一个绝对巨大的数组(许多千兆字节!),它仍会花费一些时间。 在这种情况下,最好记住一个函数,该函数采用文件名,日期,模型参数或用于创建大型数组的其他参数作为参数–减少了大型数组的加载和每次调用时该数组的哈希值。

The persistence functionality is strongly linked to the memoization functions – as it is what is used to save the cached results to file. It basically performs the same function as the built-in pickle module (or the dill module – see below), but works really efficiently for objects that contain numpy arrays. The interface is exactly the same as the pickle interface (simple load and dump functions), so it’s really easy to switch over. One thing I didn’t realise before is that if you set compressed=True then a) your output files will be smaller (obviously!) and b) the output will all be in one file (as opposed to the default, which produces a .pkl file along with many .npy files).

持久性功能与备忘录功能紧密相连,因为它是用于将缓存的结果保存到文件的。 它基本上执行与内置pickle模块(或dill模块–参见下文)相同的功能,但对于包含numpy数组的对象确实有效。 该接口与pickle接口完全相同(简单的加载转储功能),因此切换非常容易。 我之前没有意识到的一件事是,如果您设置Compressed = True,则a)您的输出文件将更小(很明显!),b)输出将全部在一个文件中(与默认文件相反,它将产生一个.pkl文件以及许多.npy文件)。

(folium)

folium

I’ve barely scratched the surface of this library, but it’s been really helpful for doing quick visualisations of geographic data from within Python – and it even plays well with the Jupyter Notebook!

我几乎没有涉及到该库的表面,但是它对于从Python内部快速显示地理数据非常有帮助-甚至在Jupyter Notebook中也能很好地发挥作用!

One of the pieces of example code from the documentation shows how easily it can be used:

文档中的示例代码片段之一显示了如何轻松使用它:

map_1 = folium.Map(location=[45.372, -121.6972])
map_1.simple_marker([45.3288, -121.6625], popup='Mt. Hood Meadows')
map_1.simple_marker([45.3311, -121.7113], popup='Timberline Lodge')
map_1.create_map(path='output.html')

You can easily configure almost every aspect of the map above, including the background map used (any leaflet tileset will work), point icons, colours, sizes and pretty-much anything else. You can visualise GeoJSON data and do choropleth maps too (even linking to Pandas data frames!).

您可以轻松配置上面地图的几乎每个方面,包括使用的背景地图(可以使用任何传单贴图集),点图标,颜色,大小以及几乎所有其他内容。 您可以可视化GeoJSON数据,也可以制作全合成图(甚至链接到Pandas数据框!)。

Again, I used this in my work with Flowminder, but have since used it in all sorts of other contexts too. Just taking the code above and putting the call to simple_marker in a loop makes it really easy to visualise a load of points.

同样,我在Flowminder的工作中使用了此方法,但此后也将其用于各种其他上下文中。 只需采用上面的代码并将对simple_marker的调用simple_marker循环中,就可以非常轻松地可视化点的负载。

The example above shows how to save a map to a specified HTML file – but to use it within the Jupyter Notebook just make sure that the map object (map_1 in the example above) is by itself on the final line in a cell, and the notebook will work its magic and display it inline…perfect!

上面的示例显示了如何将地图保存到指定HTML文件-但是要在Jupyter Notebook中使用它,只需确保地图对象(在上面的示例中为map_1 )本身位于单元格的最后一行,并且笔记本将发挥其魔力并内嵌显示……完美!

tinydb (tinydb)

The first version of my ‘new’ module recipy (as presented at the Collaborations Workshop 2015) used MongoDB as the backend data store. However, this added significant complexity to the installation/set-up process, as you needed to install a MongoDB server first, get it running etc. I went looking for a pure-Python NoSQL database and came across TinyDB…which had a simple interface, and has handled everything I’ve thrown at it so far!

我的“新”模块清单的第一个版本(在2015年协作研讨会上介绍)使用MongoDB作为后端数据存储。 但是,这增加了安装/设置过程的复杂性,因为您需要首先安装MongoDB服务器,使其运行等。我一直在寻找纯Python NoSQL数据库并遇到TinyDB……它具有一个简单的界面,并且已经处理了到目前为止我提出的所有内容!

In the longer-term we are thinking of making the backend for recipy configurable – so that people can use MongoDB if they want some of the advantages that brings (being able to share the database easily across a network, better performance), but we’ll still keep TinyDB as the default as it just makes life so much easier!

在我们正在考虑制作针对后端的长期recipy配置-使人们可以使用MongoDB的,如果他们想了一些带来(能够通过网络轻松共享数据库,更好的性能)的优点,但我们”仍然将TinyDB保留为默认设置,因为它使生活变得更加轻松!

莳萝 (dill)

dill is a better pickle (geddit?). You’ve probably used the built-in pickle module to store various Python objects to disk but every so often you may have received an error like this:

dill是一种更好的pickle (geddit?)。 您可能已经使用了内置的pickle模块将各种Python对象存储到磁盘,但是每隔一段时间,您可能会收到如下错误:

---------------------------------------------------------------------------
PicklingError                             Traceback (most recent call last)
<ipython-input-5-aa42e6ee18b1> in <module>()
----> 1 pickle.dumps(x)

PicklingError: Can't pickle <function <lambda> at 0x103671e18>: attribute lookup <lambda> on __main__ failed

That’s because the pickle object is relatively limited in what it can pickle: it can’t cope with nested functions, lambdas, slices, and more. You may not often want to pickle those objects directly – but it is fairly common to come across these inside other objects you want to pickle, thus causing the pickling to fail.

这是因为pickle对象在可以进行pickle的能力方面相对有限:它无法处理嵌套函数,lambda,slice等。 您可能并不经常希望直接对这些对象进行腌制-但是在您想腌制的其他对象内部碰到这些对象是相当普遍的,从而导致腌制失败。

dill solves this by simply being able to pickle a lot more stuff – almost everything, in fact, apart from frames, generators and tracebacks. As with joblib above, the interface is exactly the same as pickle (load and dump), so it’s really easy to switch.

dill通过简单地腌制更多的东西来解决此问题-实际上,除了框架,生成器和回溯之外,几乎所有东西都可以腌制。 与上面的joblib一样,该接口与pickleloaddump )完全相同,因此切换起来真的很容易。

Also, for those of you who like R’s ability to save the entire session in a .RData file, dill has dump_session and load_session functions too – which do exactly what you’d expect!

另外,对于那些喜欢R能够将整个会话保存到.RData文件中的人来说,莳萝也具有dump_sessionload_session函数–完全load_session您的期望!

奖励:水 (Bonus: Anaconda)

This is a ‘bonus’ as it isn’t actually a Python module, but is something that I started using for the first time in 2015 (yes, I was late to the party!) but couldn’t manage without now!

这是一个“奖励”,因为它实际上不是Python模块,而是我在2015年首次开始使用的东西(是的,我参加聚会晚了!),但现在就无法管理!

Anaconda can be a little confusing because it consists of a number of separate things – multiple of which are called ‘Anaconda’. So, these are:

Anaconda可能会有些混乱,因为它由许多独立的事物组成–其中许多被称为“ Anaconda”。 因此,这些是:

  • A cross-platform Scientific Python distribution – along the same lines as the Enthought Python Distribution, WinPython and so on. Once you’ve downloaded and installed it you get a full scientific Python stack including all of the standard libraries (numpy, scipy, pandas, matplotlib, sklearn, skimage…and many more). It is available in four flavours overall: Python 2 or Python 3, each of which has the option of the full Anaconda distribution, or the reduced-size Miniconda distribution. This leads nicely on to…
  • The conda package management system. This is designed to work with Python packages, but can be used for installing any binaries. You may think this sounds very much like pip, but it’s far better because a) it installs binaries (no more compiling numpy from source!), b) it deals with dependencies better, and c) you can easily create multiple environments which can have different packages installed and run different Python versions
  • The anaconda.org repository (previously called binstar) where you can create an account and upload binary conda packages to easily share with others. For example, I’ve got a couple of conda packages hosted there, which makes them really easy to install for anyone running conda.
  • 跨平台的科学Python发行版–与Enthought Python发行版WinPython等相同。 下载并安装后,您将获得一个完整的科学Python堆栈,包括所有标准库(numpy,scipy,pandas,matplotlib,sklearn,skimage等)。 总共提供了四种样式:Python 2或Python 3,每种都有完整的Anaconda发行版或减小尺寸的Miniconda发行版的选项。 这很好地导致...
  • 康达包装管理系统。 它设计用于Python软件包,但可用于安装任何二进制文件。 您可能认为这听起来很像pip,但这要好得多,因为a)它安装了二进制文件(不再需要从源代码编译numpy!),b)它更好地处理了依赖项,并且c)您可以轻松地创建可以安装了不同的软件包并运行不同的Python版本
  • anaconda.org存储库(以前称为binstar),您可以在其中创建一个帐户并上传二进制conda软件包以轻松与他人共享。 例如,我在那里托管了几个conda软件包,这使得它们对于运行conda的任何人都非常容易安装。

So, there we go – my top five Python modules that were new to me in 2015. Hope you found it useful – and Merry Christmas and Happy New Year to you all!

因此,我们走了– 2015年对我来说最重要的五个Python模块。希望您发现它有用–祝大家圣诞快乐和新年快乐!

翻译自: https://www.pybloggers.com/2015/12/my-top-5-new-python-modules-of-2015/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值