jupyter notebook 常用魔法函数 Magic Function%time %timeit %who %pip %cd

最新推荐文章于 2023-04-20 17:57:13 发布

Alex Tech Bolg

最新推荐文章于 2023-04-20 17:57:13 发布

阅读量2k

点赞数

分类专栏： Python常用工具文章标签： python jupyter 魔法函数

本文链接：https://blog.csdn.net/qq_41103204/article/details/116517827

版权

Python常用工具专栏收录该内容

9 篇文章 1 订阅

订阅专栏

本文介绍了如何在Jupyter Notebook中使用IPython的魔法命令行功能，包括%who、%whos查看变量，%time和%timeit进行代码计时，以及prun测试代码执行时间。通过实例演示了不同类型的Magic commands和如何诊断代码性能瓶颈。

摘要由CSDN通过智能技术生成

魔法命令行（Magic command system）

IPython 可以让我们在 jupyter notebook 里实现命令行的操作，比如下列语句

In [1]: %cd
/home/fperez

In [2]: cd mydir
/home/fperez/mydir

需要在命令前面加上 %。（不加其实也可以使用，只是需要注意命令会被变量名所覆盖）

In [1]: cd ipython     # %cd is called by automagic
/home/fperez/ipython

In [2]: cd=1           # now cd is just a variable

In [3]: cd ..          # and doesn't work as a function anymore
File "<ipython-input-3-9fedb3aff56c>", line 1
  cd ..
      ^
SyntaxError: invalid syntax


In [4]: %cd ..         # but %cd always works
/home/fperez

In [5]: del cd     # if you remove the cd variable, automagic works again

In [6]: cd ipython

/home/fperez/ipython

可以安装各种 python 的库

%pip install [pkgs]

查看变量 who/whos

在jupyter中查看变量以及变量的详细信息都可以用魔法函数 (Magic Function)，可参考如下

首先，定义变量a、b以及一个类。

class testClass:
    """
    This class is for testing ipython magic function
    """
    def __init__(self, a, b):
        self.a = a
        self.b = b
    def method1(self):
        """
        This is method1
        """
        print("This is method1!")

my_test = testClass(1, 100)

alpha = 123
beta = 'test'

%who 查看有全部变量、函数等。
%who [Type] 查看特定类型的变量、函数等
%whos 查看变量、函数的详细信息

图中可以看到 my_test 的 Type 是我们定义的testClass 这个类
在这里插入图片描述

%who_ls 以 list 形式输出

两类 Magic commands

jupyter 中有两种 Magic commands，分别是

Line Magics (% prefix)
Cell Magics (%% prefix)

Line Magics 只能作用于他们所在的行；Cell Magics 作用于他们所在的整个 cell，需要注意的是，Cell Magics 必须在 cell 的第一行（甚至需要在注释之前）。

知道了这个就能更好的理解下面的计时部分

计时命令 time/timeit

%time 将会给出当前行的代码运行一次所花费的时间
%%time 将会给出当前整个 cell 中代码的执行时间
%timeit 将会执行一个语句70000次(默认情况下)，然后给出运行最快3次的平均值，如下图

完整语法如下：

%timeit [-n<N> -r<R> [-t|-c] -q -p<P> -o] statement

- -n<N>: 循环 <N> 次执行给定的语句，如果 n 没有提供，那么将根据足够的精度来自动确定 n
- -r<R>: 重复 <R> 次，每次重复包含<N> 次循环。默认为7
- -t: 测量时间的单位，默认为 Unix 时间，这个是用来测量 wall time
-c: 测量时间的单位，默认为 Unix 时间，这个是用来测量 CPU user time.
- -p<P>: 输出精度，默认 3
- -q: 退出，不打印结果
- -o: 返回一个可以被保存被查看的变量 TimeitResult

在这里插入图片描述

%%timeit 是对一个 cell 进行操作

补充：Wall Clock/Cpu time

基本概念

Wall clock time 用来度量实际消耗了多少时间 (real time)，也就是字面意思“墙上的时钟”。从进行开始执行到完成所经历的 wall clock time，包括其他进程使用的时间片 (time slice) 和本进程耗费在阻塞 (如等待I/O操作完成) 上的时间

CPU time 是指 CPU 消耗时间的总和，包括 user time 和 sys time

user time 是进程执行用户态代码（内核外）耗费的CPU时间，仅统计该进程执行时实际使用的CPU时间，而不计入其他进程使用的时间片和本进程阻塞的时间
sys time 是该进程在内核态运行所耗费的CPU时间，即内核执行系统调用所使用的CPU时间

关系

如果是一个CPU，那么 CPU time 一定总是小于 real time。但是如果是多处理器，那么总 CPU time 可能会超过 real time。因为多进程中 CPU time 会进行累积。

单线程处理：

CPU / real ≈ 1: 处理时间大部分小号在使用 CPU 上。那么此时更快的 CPU 让我们的程序运行更快。可利用多核处理器的并行执行优势
CPU / real < 1: 如果这个比例越小，那么说明我们越多的处理时间消耗在等待上（比如网络、硬盘、其他进程释放 CPU、睡眠等）。比如，CPU / real 是 0.75, 那么 25% 的时间消耗在等待上。多核并行执行优势并不明显

多进程处理：（如果是个多线程，并且电脑有 N 个 CPU，最小 N 线程，那么 CPU / real 最高是 N）

CPU / real < 1: 进程大部分时间消耗在等待上。
CPU / real ≈ N: 进程已经在 CPU 上饱和了。
Other values: 进程是等待和 CPU 运行的某种组合，这个时候我们很难通过只看两种 time 来判断我们的瓶颈（bottleneck）在哪。

测试每一部分的运行时间 prun

可参考以下示例

import pandas as pd
import numpy as np
 
df = pd.DataFrame({'a': np.random.randn(1000),
                   'b': np.random.randn(1000),
                   'N': np.random.randint(100, 1000, (1000)),
                   'x': 'x'})
 
def f(x):
    return x * (x-1)
 
def integrate_f(a, b, N):
    s = 0     
    dx = (b - a) / N
    for i in range(N):
        s += f(a + i * dx)
    return s * dx

%prun -l 4 df.apply(lambda x: integrate_f(x['a'], x['b'], x['N']), axis=1)
 
Out:
         661861 function calls (656852 primitive calls) in 0.267 seconds
    
    Ordered by: internal time
    List reduced from 141 to 4 due to restriction <4>
 
    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1000    0.136    0.000    0.196    0.000 <ipython-input-10-d33f40f5bef5>:1(integrate_f)
  543296    0.061    0.000    0.061    0.000 <ipython-input-9-30a8062c568e>:1(f)
    3000    0.008    0.000    0.046    0.000 base.py:2454(get_value)
    3000    0.005    0.000    0.053    0.000 series.py:598(__getitem__)

可以看到，程序运行的大部分时间都消耗在 integrate_f 和 f 函数上。

参考资料：
https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit
https://pythonspeed.com/articles/blocking-cpu-or-io/#:~:text=Wall%20clock%20time%20measures%20how,seconds%20the%20CPU%20was%20busy.&text=sys%20%3A%20the%20operating%20system%20CPU,system%20calls%20from%20the%20process.
https://blog.csdn.net/filyouzicha/article/details/52447887
https://blog.csdn.net/fpy192/article/details/100001765

Alex Tech Bolg

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
3
评论
jupyter notebook 常用魔法函数 Magic Function%time %timeit %who %pip %cd

目录魔法命令行（Magic command system）查看变量两类 Magic commands计时魔法命令行（Magic command system）IPython 可以让我们在 jupyter notebook 里实现命令行的操作，比如下列语句In [1]: %cd/home/fperezIn [2]: cd mydir/home/fperez/mydir需要在命令前面加上 %。（不加其实也可以使用，只是需要注意命令会被变量名所覆盖）In [1]: cd ipython
复制链接

扫一扫

专栏目录