neo4j :rel_Neo4j:BBC冠军联赛图表

neo4j :rel

几个周末前,我开始抓取拜仁慕尼黑/巴塞罗那比赛的BBC直播文本提要,最初只是从犯规开始,然后建立犯规图表。

从那以后,我花了更多时间,并设法对其他一些事件进行建模,包括尝试,进球,发球和任意球。

我开始只是为了拜仁慕尼黑/巴塞罗那的比赛而做,但是意识到扩大这一范围并为2014/2015冠军联赛的每场比赛绘制图表并不是特别困难。

为此,我们首先需要下载每个匹配项的页面。 我下载了此页面,并编写了一个简单的Python脚本来获取适当的URI:

from bs4 import BeautifulSoup
from soupselect import select
import bs4
 
soup = BeautifulSoup(open("data/results", "r"))
 
matches = select(soup, "a.report")
 
for match in matches:
    print "http://www.bbc.co.uk/%s" %(match.get("href"))

然后,我将运行此脚本的输出通过管道传递到wget中:

find_all_matches.py | xargs wget -O data/raw

更新抓取和导入代码以处理多个匹配相对简单。 从头到尾的整个过程如下所示:

2015-05-29_23-27-56

大部分代码处于“抓取魔术”阶段,在该阶段中,我经历了所有事件并提取了可以在图中链接在一起的适当元素。

例如,任意球和犯规事件通常相邻,因此我们希望淘汰两名参与者,纸牌类型,事件时间和事件发生的比赛。

我使用Python的Beautiful Soup库完成此任务,但是没有理由您不能使用另一套工具。

README页面显示了如何创建您自己的图版本,但是这里是使用Rik的 元图查询查询图外观的概述:

图21

到目前为止,这是我最喜欢的一些查询:

哪个射门次数超过10的球员转换率最高?

match (a:Attempt)<-[:HAD_ATTEMPT]-(app)<-[:MADE_APPEARANCE]-(player),
      (app)-[:FOR_TEAM]-(team)
WITH player, COUNT(*) as times, COLLECT(a) AS attempts, team
WITH player, times, LENGTH([a in attempts WHERE a:Goal]) AS goals, team
WHERE times > 10
RETURN player.name, team.name, goals, times, (goals * 1.0 / times) AS conversionRate
ORDER BY conversionRate DESC
LIMIT 10
 
==> +------------------------------------------------------------------------------------+
==> | player.name           | team.name            | goals | times | conversionRate      |
==> +------------------------------------------------------------------------------------+
==> | "Luiz Adriano"        | "Shakhtar Donetsk"   | 9     | 14    | 0.6428571428571429  |
==> | "Yacine Brahimi"      | "FC Porto"           | 5     | 13    | 0.38461538461538464 |
==> | "Mario Mandzukic"     | "Atlético de Madrid" | 5     | 14    | 0.35714285714285715 |
==> | "Sergio Agüero"       | "Manchester City"    | 6     | 18    | 0.3333333333333333  |
==> | "Karim Benzema"       | "Real Madrid"        | 6     | 19    | 0.3157894736842105  |
==> | "Klaas-Jan Huntelaar" | "FC Schalke 04"      | 5     | 16    | 0.3125              |
==> | "Neymar"              | "Barcelona"          | 9     | 29    | 0.3103448275862069  |
==> | "Thomas Müller"       | "FC Bayern München"  | 7     | 24    | 0.2916666666666667  |
==> | "Jackson Martínez"    | "FC Porto"           | 7     | 24    | 0.2916666666666667  |
==> | "Callum McGregor"     | "Celtic"             | 3     | 11    | 0.2727272727272727  |
==> +------------------------------------------------------------------------------------+

哪些球员因犯规立即报仇?

match (firstFoul:Foul)-[:COMMITTED_AGAINST]->(app1)<-[:MADE_APPEARANCE]-(revengeFouler),
      (app1)-[:IN_MATCH]->(match), (firstFoulerApp)-[:COMMITTED_FOUL]->(firstFoul),
      (app1)-[:COMMITTED_FOUL]->(revengeFoul)-[:COMMITTED_AGAINST]->(firstFoulerApp),
       (firstFouler)-[:MADE_APPEARANCE]->(firstFoulerApp)
WHERE (firstFoul)-[:NEXT]->(revengeFoul)
RETURN firstFouler.name AS firstFouler, revengeFouler.name AS revengeFouler, firstFoul.time, revengeFoul.time, match.home + " vs " + match.away
 
==> +---------------------------------------------------------------------------------------------------------------------------------+
==> | firstFouler         | revengeFouler               | firstFoul.time | revengeFoul.time | match.home + " vs " + match.away        |
==> +---------------------------------------------------------------------------------------------------------------------------------+
==> | "Derk Boerrigter"   | "Jean Philippe Mendy"       | "88:48"        | "89:42"          | "Celtic vs NK Maribor"                  |
==> | "Mario Suárez"      | "Pajtim Kasami"             | "27:17"        | "32:38"          | "Olympiakos vs Atlético de Madrid"      |
==> | "Aleksandr Volodko" | "Casemiro"                  | "39:27"        | "44:32"          | "FC Porto vs BATE Borisov"              |
==> | "Thomas Müller"     | "Mario Fernandes"           | "87:22"        | "88:31"          | "CSKA Moscow vs FC Bayern München"      |
==> | "Vinicius"          | "Marco Verratti"            | "56:36"        | "58:00"          | "APOEL Nicosia vs Paris Saint Germain"  |
==> | "Lasse Schöne"      | "Dani Alves"                | "84:08"        | "86:18"          | "Barcelona vs Ajax"                     |
==> | "Nick Viergever"    | "Dani Alves"                | "57:22"        | "60:37"          | "Barcelona vs Ajax"                     |
==> | "Nani"              | "Atsuto Uchida"             | "6:10"         | "8:40"           | "FC Schalke 04 vs Sporting Lisbon"      |
==> | "Andreas Samaris"   | "Yannick Ferreira-Carrasco" | "89:21"        | "90:00 +4:21"    | "Monaco vs Benfica"                     |
==> | "Simon Kroon"       | "Guillherme Siqueira"       | "84:05"        | "90:00 +0:29"    | "Atlético de Madrid vs Malmö FF"        |
==> | "Mario Suárez"      | "Isaac Thelin"              | "32:02"        | "38:47"          | "Atlético de Madrid vs Malmö FF"        |
==> | "Hakan Balta"       | "Henrikh Mkhitaryan"        | "62:09"        | "64:14"          | "Borussia Dortmund vs Galatasaray"      |
==> | "Marco Reus"        | "Selcuk Inan"               | "36:17"        | "44:03"          | "Borussia Dortmund vs Galatasaray"      |
==> | "Hakan Balta"       | "Sven Bender"               | "10:57"        | "12:51"          | "Borussia Dortmund vs Galatasaray"      |
==> | "Vinicius"          | "Edinson Cavani"            | "87:56"        | "90:00 +1:25"    | "Paris Saint Germain vs APOEL Nicosia"  |
==> | "Jackson Martínez"  | "Carlos Gurpegi"            | "64:55"        | "66:17"          | "Athletic Club vs FC Porto"             |
==> | "Nani"              | "Chinedu Obasi"             | "1:30"         | "4:47"           | "Sporting Lisbon vs FC Schalke 04"      |
==> | "Vitali Rodionov"   | "Bruno Martins Indi"        | "52:16"        | "60:08"          | "BATE Borisov vs FC Porto"              |
==> | "Raheem Sterling"   | "Behrang Safari"            | "29:00"        | "33:27"          | "Liverpool vs FC Basel"                 |
==> | "Derlis González"   | "Fábio Coentrão"            | "52:55"        | "57:59"          | "FC Basel vs Real Madrid"               |
==> | "Josip Drmic"       | "Lisandro López"            | "15:04"        | "17:35"          | "Benfica vs Bayer 04 Leverkusen"        |
==> | "Fred"              | "Bastian Schweinsteiger"    | "6:04"         | "9:28"           | "Shakhtar Donetsk vs FC Bayern München" |
==> | "Alex Sandro"       | "Derlis González"           | "4:07"         | "7:28"           | "FC Basel vs FC Porto"                  |
==> | "Luca Zuffi"        | "Ruben Neves"               | "73:49"        | "84:44"          | "FC Porto vs FC Basel"                  |
==> | "Marco Verratti"    | "Oscar"                     | "28:49"        | "34:04"          | "Chelsea vs Paris Saint Germain"        |
==> | "Cristiano Ronaldo" | "Jesús Gámez"               | "20:59"        | "25:37"          | "Real Madrid vs Atlético de Madrid"     |
==> | "Bernardo Silva"    | "Álvaro Morata"             | "49:20"        | "62:31"          | "Monaco vs Juventus"                    |
==> | "Arturo Vidal"      | "Fabinho"                   | "38:19"        | "45:00"          | "Monaco vs Juventus"                    |
==> +---------------------------------------------------------------------------------------------------------------------------------+

哪些球员花费最长的时间为犯规报仇?

match (foul1:Foul)-[:COMMITTED_AGAINST]->(app1)-[:COMMITTED_FOUL]->(foul2)-[:COMMITTED_AGAINST]->(app2)-[:COMMITTED_FOUL]->(foul1),
      (player1)-[:MADE_APPEARANCE]->(app1), (player2)-[:MADE_APPEARANCE]->(app2),
      (foul1)-[:COMMITTED_IN_MATCH]->(match:Match)<-[:COMMITTED_IN_MATCH]-(foul2)
WHERE (foul1)-[:NEXT*]->(foul2)
WITH match, foul1, player1, player2, foul2 ORDER BY foul1.sortableTime, foul2.sortableTime
WITH match, foul1, player1, player2, COLLECT(foul2) AS revenge
WITH match, foul1,  player1,player2,  revenge[0] AS revengeFoul
RETURN player1.name, player2.name, foul1.time, revengeFoul.time, revengeFoul.sortableTime - foul1.sortableTime AS secondsWaited, match.home + " vs " + match.away AS match
ORDER BY secondsWaited DESC
LIMIT 5
 
==> +---------------------------------------------------------------------------------------------------------------------------+
==> | player1.name      | player2.name        | foul1.time | revengeFoul.time | secondsWaited | match                           |
==> +---------------------------------------------------------------------------------------------------------------------------+
==> | "Stefan Johansen" | "Ondrej Duda"       | "1:30"     | "82:11"          | 4841          | "Legia Warsaw vs Celtic"        |
==> | "Neymar"          | "Vinicius"          | "2:35"     | "80:08"          | 4653          | "Barcelona vs APOEL Nicosia"    |
==> | "Jérémy Toulalan" | "Stefan Kießling"   | "9:19"     | "86:37"          | 4638          | "Monaco vs Bayer 04 Leverkusen" |
==> | "Nabil Dirar"     | "Domenico Criscito" | "6:32"     | "82:39"          | 4567          | "Zenit St Petersburg vs Monaco" |
==> | "Nabil Dirar"     | "Eliseu"            | "7:20"     | "81:30"          | 4450          | "Monaco vs Benfica"             |
==> +---------------------------------------------------------------------------------------------------------------------------+

谁的镜头最多?

match (team)<-[:FOR_TEAM]-(app)<-[appRel:MADE_APPEARANCE]-(player:Player)
optional match (a:Attempt)<-[att:HAD_ATTEMPT]-(app)
WITH player, COUNT( DISTINCT appRel) AS apps, COUNT(att) as times, COLLECT(a) AS attempts, team
WITH player,apps, times, LENGTH([a in attempts WHERE a:Goal]) AS goals, team
WHERE times > 10
RETURN player.name, team.name, apps, goals, times, (goals * 1.0 / times) AS conversionRate
ORDER BY times DESC
LIMIT 10
 
==> +-------------------------------------------------------------------------------------------+
==> | player.name          | team.name             | apps | goals | times | conversionRate      |
==> +-------------------------------------------------------------------------------------------+
==> | "Cristiano Ronaldo"  | "Real Madrid"         | 12   | 10    | 69    | 0.14492753623188406 |
==> | "Lionel Messi"       | "Barcelona"           | 12   | 10    | 51    | 0.19607843137254902 |
==> | "Robert Lewandowski" | "FC Bayern München"   | 12   | 6     | 43    | 0.13953488372093023 |
==> | "Carlos Tévez"       | "Juventus"            | 12   | 7     | 34    | 0.20588235294117646 |
==> | "Gareth Bale"        | "Real Madrid"         | 10   | 2     | 32    | 0.0625              |
==> | "Luis Suárez"        | "Barcelona"           | 9    | 6     | 30    | 0.2                 |
==> | "Neymar"             | "Barcelona"           | 11   | 9     | 29    | 0.3103448275862069  |
==> | "Hakan Calhanoglu"   | "Bayer 04 Leverkusen" | 8    | 2     | 29    | 0.06896551724137931 |
==> | "Edinson Cavani"     | "Paris Saint Germain" | 8    | 6     | 27    | 0.2222222222222222  |
==> | "Alexis Sánchez"     | "Arsenal"             | 9    | 4     | 25    | 0.16                |
==> +-------------------------------------------------------------------------------------------+

也许您能想到一些更酷的产品? 我很想看到他们。 从github获取代码并尝试一下。

翻译自: https://www.javacodegeeks.com/2015/06/neo4j-the-bbc-champions-league-graph.html

neo4j :rel

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值