前言

以前在我的PPTpython高级编程也提到了一些关于ipython的用法. 今天继续由浅入深的看看ipython,
本文作为读者的你已经知道ipython并且用了一段时间了.

%run

这是一个magic命令, 能把你的脚本里面的代码运行, 并且把对应的运行结果存入ipython的环境变量中:

      
      
1
2
3
4
5
6
7
8
9
      
      
$cat t.py
# coding=utf-8
l = range(5)
$ipython
In [1]: %run t.py # `%`可加可不加
In [2]: l # 这个l本来是t.py里面的变量, 这里直接可以使用了
Out[2]: [0, 1, 2, 3, 4]
alias
      
      
1
2
3
4
5
6
7
8
9
      
      
In [ 3]: %alias largest ls -1sSh | grep %s
In [ 4]: largest to
total 42M
20K tokenize.py
16K tokenize.pyc
8.0K story.html
4.0K autopep8
4.0K autopep8.bak
4.0K story_layout.html

PS 别名需要存储的, 否则重启ipython就不存在了:

      
      
1
2
      
      
In [5]: %store largest
Alias stored: largest (ls -1sSh | grep %s)

下次进入的时候%store -r

bookmark - 对目录做别名
      
      
1
2
3
4
5
6
7
8
9
10
      
      
In [ 2]: %pwd
Out[ 2]: u'/home/vagrant'
In [ 3]: %bookmark dongxi ~/shire/dongxi
In [ 4]: %cd dongxi
/home/vagrant/shire/dongxi_code
In [ 5]: %pwd
Out[ 5]: u'/home/vagrant/shire/dongxi_code'
ipcluster - 并行计算

其实ipython提供的方便的并行计算的功能. 先回答ipython做并行计算的特点:

1.

      
      
1
      
      
$wget http://www.gutenberg.org/files/27287/27287-0.txt

第一个版本是直接的, 大家习惯的用法.

      
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
      
      
In [ 1]: import re
In [ 2]: import io
In [ 3]: non_word = re.compile( r'[\W\d]+', re.UNICODE)
In [ 4]: common_words = {
...: 'the', 'of', 'and', 'in', 'to', 'a', 'is', 'it', 'that', 'which', 'as', 'on', 'by',
...: 'be', 'this', 'with', 'are', 'from', 'will', 'at', 'you', 'not', 'for', 'no', 'have',
...: 'i', 'or', 'if', 'his', 'its', 'they', 'but', 'their', 'one', 'all', 'he', 'when',
...: 'than', 'so', 'these', 'them', 'may', 'see', 'other', 'was', 'has', 'an', 'there',
...: 'more', 'we', 'footnote', 'who', 'had', 'been', 'she', 'do', 'what',
...: 'her', 'him', 'my', 'me', 'would', 'could', 'said', 'am', 'were', 'very',
...: 'your', 'did', 'not',
...: }
In [ 5]: def yield_words(filename):
...: import io
...: with io.open(filename, encoding= 'latin-1') as f:
...: for line in f:
...: for word in line.split():
...: word = non_word.sub( '', word.lower())
...: if word and word not in common_words:
...: yield word
...:
In [ 6]: def word_count(filename):
...: word_iterator = yield_words(filename)
...: counts = {}
...: counts = defaultdict(int)
...: while True:
...: try:
...: word = next(word_iterator)
...: except StopIteration:
...: break
...: else:
...: counts[word] += 1
...: return counts
...:
In [ 6]: from collections import defaultdict # 脑残了 忘记放进去了..
In [ 7]: %time counts = word_count(filename)
CPU times: user 88.5 ms, sys: 2.48 ms, total: 91 ms
Wall time: 89.3 ms

现在用ipython来跑一下:

      
      
1
      
      
ipcluster start -n 2 # 好吧, 我的Mac是双核的

先讲下ipython 并行计算的用法:

      
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
      
      
In [ 1]: from IPython.parallel import Client # import之后才能用%px*的magic
In [ 2]: rc = Client()
In [ 3]: rc.ids # 因为我启动了2个进程
Out[ 3]: [ 0, 1]
In [ 4]: %autopx # 如果不自动 每句都需要: `%px xxx`
%autopx enabled
In [ 5]: import os # 这里没autopx的话 需要: `%px import os`
In [ 6]: print os.getpid() # 2个进程的pid
[stdout: 0] 62638
[stdout: 1] 62636
In [ 7]: %pxconfig --targets 1 # 在autopx下 这个magic不可用
[stderr: 0] ERROR: Line magic function `%pxconfig` not found.
[stderr: 1] ERROR: Line magic function `%pxconfig` not found.
In [ 8]: %autopx # 再执行一次就会关闭autopx
%autopx disabled
In [ 10]: %pxconfig --targets 1 # 指定目标对象, 这样下面执行的代码就会只在第2个进程下运行
In [ 11]: %%px --noblock # 其实就是执行一段非阻塞的代码
....: import time
....: time.sleep( 1)
....: os.getpid()
....:
Out[ 11]: <AsyncResult: execute>
In [ 12]: %pxresult # 看 只返回了第二个进程的pid
Out[ 1: 21]: 62636
In [ 13]: v = rc[:] # 使用全部的进程, ipython可以细粒度的控制那个engine执行的内容
In [ 14]: with v.sync_imports(): # 每个进程都导入time模块
....: import time
....:
importing time on engine(s)
In [ 15]: def f(x):
....: time.sleep( 1)
....: return x * x
....:
In [ 16]: v.map_sync(f, range( 10)) # 同步的执行
Out[ 16]: [ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
In [ 17]: r = v.map(f, range( 10)) # 异步的执行
In [ 18]: r.ready(), r.elapsed # celery的用法
Out[ 18]: ( True, 5.87735)
In [ 19]: r.get() # 获得执行的结果
Out[ 19]: [ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

入正题:

      
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
      
      
In [ 20]: def split_text(filename):
....: text = open(filename).read()
....: lines = text.splitlines()
....: nlines = len(lines)
....: n = 10
....: block = nlines//n
....: for i in range(n):
....: chunk = lines[i*block:(i+ 1)*(block)]
....: with open( 'count_file%i.txt' % i, 'w') as f:
....: f.write( '\n'.join(chunk))
....: cwd = os.path.abspath(os.getcwd())
....: fnames = [ os.path.join(cwd, 'count_file%i.txt' % i) for i in range(n)] # 不用glob是为了精准
....: return fnames
In [ 21]: from IPython import parallel
In [ 22]: rc = parallel.Client()
In [ 23]: view = rc.load_balanced_view()
In [ 24]: v = rc[:]
In [ 25]: v.push(dict(
....: non_word=non_word,
....: yield_words=yield_words,
....: common_words=common_words
....: ))
Out[ 25]: <AsyncResult: _push>
In [ 26]: fnames = split_text(filename)
In [ 27]: def count_parallel():
.....: pcounts = view.map(word_count, fnames)
.....: counts = defaultdict(int)
.....: for pcount in pcounts.get():
.....: for k, v in pcount.iteritems():
.....: counts[k] += v
.....: return counts, pcounts
.....:
In [ 28]: %time counts, pcounts = count_parallel() # 这个时间包含了我再聚合的时间
CPU times: user 47.6 ms, sys: 6.67 ms, total: 54.3 ms # 是不是比直接运行少了很多时间?
Wall time: 106 ms # 这个时间是
In [ 29]: pcounts.elapsed, pcounts.serial_time, pcounts.wall_time
Out[ 29]: ( 0.104384, 0.13980499999999998, 0.104384)

想查看全部的magic可以使用ismagic, 列出可用的全部magics

            
            
1
            
            
%lsmagic

magic分为2类:

  • line magic: 一些功能命令
  • cell magic: 主要是渲染ipython notebook页面效果以及执行某语言的代码
idb - python db.py shell extension

idb是我最近写的一个magic. 主要是给ipython提供db.py的接口,我们直接分析代码(我只截取有代表性的一段):

            
            
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
            
            
import os.path
from functools import wraps
from operator import attrgetter
from urlparse import urlparse
from db import DB # db.py提供的接口
from IPython.core.magic import Magics, magics_class, line_magic # 这三个就是我们需要做magic插件的组件
def get_or_none(attr):
return attr if attr else None
def check_db(func):
@wraps(func)
def deco(*args):
if args[ 0]._db is None: # 每个magic都需要首页实例化过db,so 直接加装饰器来判断
print '[ERROR]Please make connection: `con = %db_connect xx` or `%use_credentials xx` first!' # noqa
return
return func(*args)
return deco
@magics_class # 每个magic都需要加这个magics_class装饰器
class SQLDB(Magics): # 要继承至Magics
_db = None # 每次打开ipython都是一次实例化
@line_magic('db_connect') # 这里用了line_magic 表示它是一个line magic.(其他2种一会再说) magic的名字是db_connect. 注意 函数名不重要
# 最后我们用 %db_connect而不是%conn
def conn(self, parameter_s): # 每个这样的方法都接收一个参数 就是你在ipython里输入的内容
"""Conenct to database in ipython shell.
Examples::
%db_connect
%db_connect postgresql://user:pass@localhost:port/database
"""
uri = urlparse(parameter_s) # 剩下的都是解析parameter_s的逻辑
if not uri.scheme:
params = {
'dbtype': 'sqlite',
'filename': os.path.join(os.path.expanduser( '~'), 'db.sqlite')
}
elif uri.scheme == 'sqlite':
params = {
'dbtype': 'sqlite',
'filename': uri.path
}
else:
params = {
'username': get_or_none(uri.username),
'password': get_or_none(uri.password),
'hostname': get_or_none(uri.hostname),
'port': get_or_none(uri.port),
'dbname': get_or_none(uri.path[ 1:])
}
self._db = DB(**params) # 这里给_db赋值
return self._db # return的结果就会被ipython接收,显示出来
@line_magic('db') # 一个新的magic 叫做%db -- 谨防取名冲突
def db(self, parameter_s):
return self._db
@line_magic('table')
@check_db
def table(self, parameter_s):
p = parameter_s.split() # 可能传进来的是多个参数,但是对ipython来说,传进来的就是一堆字符串,所以需要按空格分隔下
l = len(p)
if l == 1:
if not p[ 0]:
return self._db.tables
else:
return attrgetter(p[ 0])(self._db.tables)
else:
data = self._db.tables
for c in p:
if c in [ 'head', 'sample', 'unique', 'count', 'all', 'query']:
data = attrgetter(c)(data)()
else:
data = attrgetter(c)(data)
return data
def load_ipython_extension(ipython): # 注册一下. 假如你直接去ipython里面加 就不需要这个了
ipython.register_magics(SQLDB)

PS:

  1. 调试中可以使用%reloa_ext idb 的方式重启magic
  2. %install_ext 之后默认放在你的ipython自定义目录/extensions里. 我这里是~/.ipython/extensions

好了,大家是不是觉得ipython的magic也不是很难嘛

来了解ipython都提供了什么?
  1. magic装饰器的类型:
  • line_magic # 刚才我们见识了, 就是%xx, xx就是magic的名字
  • cell_magic # 就是%%xx
  • line_cell_magic # 可以是%xx, 也可以是%%xx

先说cell_magic 来个例子,假如我想执行个ruby,本来应该是:

            
            
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
            
            
In [ 1]: !ruby -e 'p "hello"'
"hello"
In [ 2]: %%ruby # 也可以这样
...: p "hello"
...:
"hello"
再说个notebook的:
In [ 3]: %%javascript
...: require.config({
...: paths: {
...: chartjs: '//code.highcharts.com/highcharts'
...: }
...: });
...:
<IPython.core.display.Javascript object>
});

然后再说line_cell_magic:

            
            
1
2
3
4
5
6
7
8
9
10
11
            
            
In [ 4]: %time 2** 128
CPU times: user 2 µs, sys: 1 µs, total: 3 µs
Wall time: 5.01 µs
Out[ 4]: 340282366920938463463374607431768211456L
In [ 5]: %%time
...: 2** 128
...:
CPU times: user 4 µs, sys: 0 ns, total: 4 µs
Wall time: 9.06 µs
Out[ 5]: 340282366920938463463374607431768211456L

Ps: line_cell_magic方法的参数是2个:

            
            
1
2
            
            
@line_cell_magic
def xx(self, line='', cell=None):
带参数的magic(我直接拿ipython源码提供的magic来说明):

一共2种风格:

  • 使用getopt: self.parse_options
  • 使用argparse: magic_arguments
self.parse_options
            
            
1
2
3
4
5
            
            
@line_cell_magic
def prun(self, parameter_s='', cell=None):
opts, arg_str = self.parse_options(parameter_s, 'D:l:rs:T:q',
list_all= True, posix= False)
...

getopt用法可以看这里 http://pymotw.com/2/getopt/index.html#module-getopt

我简单介绍下’D:l:rs:T:q’就是可以使用 -D, -l, -r, -s, -T, -q这些选项. :号是告诉你是否需要参数,split下就是:
D:,l:,r,s:,T:,q 也就是-r和-q不需要参数其他的都是参数 类似 %prun -D

magic_arguments
            
            
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
            
            
@magic_arguments.magic_arguments() # 最上面
@magic_arguments.argument('--breakpoint', '-b', metavar='FILE:LINE',
help= """
Set break point at LINE in FILE.
"""
) # 这种argument可以有多个
@magic_arguments.argument('statement', nargs='*',
help= """
Code to run in debugger.
You can omit this in cell magic mode.
"""
)
@line_cell_magic
def debug(self, line='', cell=None):
args = magic_arguments.parse_argstring(self.debug, line) # 要保持第一个参数等于这个方法名字,这里就是self.debug
...

还有个magic方法集: 用于并行计算的magics: IPython/parallel/client/magics.py