目标地址:http://www.espncricinfo.com/champions-league-twenty20-2012/engine/match/574265.html
liz@nb-liz:~$ script pyquery.log2
Script started, file is pyquery.log2
liz@nb-liz:~$ ipython
Python 2.7.3 (default, Jan 2 2013, 16:53:07)
Type "copyright", "credits" or "license" for more information.
IPython 1.1.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: from pyquery import PyQuery as pq # 使用
In [2]: d=pq(url="http://www.espncricinfo.com/champions-league-twenty20-2012/engine/match/574265.html") #取内容
In [6]: d('#inningsBat1')
Out[6]: [<table#inningsBat1.inningsTable>]
In [8]: d('#inningsBat1').html()
。。。
In [11]: d('#inningsBat1').find('.playerName').html()
Out[11]: u'<a href="/champions-league-twenty20-2012/content/player/237095.html" target="" title="view the player profile for Murali Vijay" class="playerName">M Vijay</a> '
In [14]: d('#inningsBat1').eq(1).find('.playerName').html()
In [18]: t=d('#inningsBat1')
In [19]: t
Out[19]: [<table#inningsBat1.inningsTable>]
In [20]: t.children()
Out[20]: [<tr>, <tr.inningsHead>, <tr.inningsRow>, <tr.inningsComms>, <tr.inningsRow>, <tr.inningsComms>, <tr.inningsRow>, <tr.inningsComms>, <tr.inningsRow>, <tr.inningsComms>, <tr.inningsRow>, <tr.inningsComms>, <tr.inningsRow>, <tr.inningsComms>, <tr.inningsRow>, <tr.inningsComms>, <tr.inningsRow>, <tr.inningsComms>, <tr.inningsRow>, <tr.inningsComms>, <tr.inningsRow>, <tr.inningsComms>, <tr.inningsRow>, <tr>, <tr.inningsRow>]
In [22]: t('tr.inningsRow')
Out[22]: [<tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>]
In [23]: trs=t('tr.inningsRow')
In [24]: trs
Out[24]: [<tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>]
In [25]: trs.eq(0).html()
Out[25]: u'<td class="inningsIcon" οnclick="ToggleRowVisibility(\'inningsBat1\',3); "><img src="http://i.imgci.com/espncricinfo/col_ps.gif" width="7" height="7" name="inningsBat1.1" class="inningsIcon" alt="View dismissal" title="View dismissal" id="inningsBat1.1" /></td>\n <td width="192" class="playerName"><a href="/champions-league-twenty20-2012/content/player/237095.html" target="" title="view the player profile for Murali Vijay" class="playerName">M Vijay</a> </td>\n <td width="259" class="battingDismissal"> b Ojha </td>\n <td class="battingRuns">39</td>\n <td class="battingDetails">36</td>\n <td class="battingDetails">25</td>\n <td class="battingDetails">5</td>\n <td class="battingDetails">2</td>\n <td class="battingDetails">156.00</td>\n '
In [27]: trs.eq(0).find('.playerName').html()
Out[27]: u'<a href="/champions-league-twenty20-2012/content/player/237095.html" target="" title="view the player profile for Murali Vijay" class="playerName">M Vijay</a> '
In [28]: n1=trs.eq(0).find('.playerName')
In [29]: n1.find('a').html()
Out[29]: 'M Vijay'
In [34]: trs[0]
Out[34]: <Element tr at 0x968e44c>
In [40]: trs
Out[40]: [<tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>, <tr.inningsRow>]
In [42]: for i in trs:
print d(i).find('.playerName').find('a').text()
....:
M Vijay
F du Plessis
SK Raina
MS Dhoni
S Badrinath
RA Jadeja
JA Morkel
WP Saha
R Ashwin
BW Hilfenhaus
None
None
In [44]: for i in trs:
print d(i).find('.playerName').find('a').text()
print d(i).find('.battingRuns').text()
print d(i).find('.battingDismissal').text()# for...in 结构中用pyquery实例(d)开始找
....: print '\n'
....:
M Vijay
39
b Ojha
F du Plessis
52
c Sharma b Malinga
SK Raina
8
c Johnson b Malinga
MS Dhoni
35
c Smith b Malinga
S Badrinath
2
c †Karthik b Smith
RA Jadeja
12
run out (†Karthik/Johnson)
JA Morkel
0
c Tendulkar b Malinga
WP Saha
5
c †Karthik b Malinga
R Ashwin
13
not out
BW Hilfenhaus
0
not out
None
7
(b 1, w 6)
None
173
(8 wickets; 20 overs; 100 mins)
In [45]: for i in trs:
print 'Player Name:',d(i).find('.playerName').find('a').text()
print 'Batting Runs:',d(i).find('.battingRuns').text()
print 'Batting Dismissal:',d(i).find('.battingDismissal').text()# for...in 结构中用pyquery实例(d)开始找
print '\n'
....:
Player Name: M Vijay
Batting Runs: 39
Batting Dismissal: b Ojha
Player Name: F du Plessis
Batting Runs: 52
Batting Dismissal: c Sharma b Malinga
Player Name: SK Raina
Batting Runs: 8
Batting Dismissal: c Johnson b Malinga
Player Name: MS Dhoni
Batting Runs: 35
Batting Dismissal: c Smith b Malinga
Player Name: S Badrinath
Batting Runs: 2
Batting Dismissal: c †Karthik b Smith
Player Name: RA Jadeja
Batting Runs: 12
Batting Dismissal: run out (†Karthik/Johnson)
Player Name: JA Morkel
Batting Runs: 0
Batting Dismissal: c Tendulkar b Malinga
Player Name: WP Saha
Batting Runs: 5
Batting Dismissal: c †Karthik b Malinga
Player Name: R Ashwin
Batting Runs: 13
Batting Dismissal: not out
Player Name: BW Hilfenhaus
Batting Runs: 0
Batting Dismissal: not out
Player Name: None
Batting Runs: 7
Batting Dismissal: (b 1, w 6)
Player Name: None
Batting Runs: 173
Batting Dismissal: (8 wickets; 20 overs; 100 mins)
In [46]:
Do you really want to exit ([y]/n)? y
liz@nb-liz:~$ exit
exit
Script done, file is pyquery.log2
liz@nb-liz:~$