Creating new analytics in sports

Hello. Yay. Finally, we can start. Um before we introduce this session, I want to start with a little pump up. So, hold on, let me see if this is gonna work to keep up with the game. That's best we got to anticipate what will happen next. The NHL can now access a decade of data in a millisecond by using AWS machine learning to analyze hundreds of thousands of faces. They know when this plug is likely to go before he even hit the nex. And if they know the place to do that from the NHF, imagine what it can do for your business.

Alright, cool.

Hello, welcome. My name is Julie Souza. I head sports for global professional services here at AWS and this panel is all about analytics in sports and more specifically for the evolution of analytics in sports. We think it's gonna be a great conversation. We have a tremendous panel here. So I'm gonna let them all introduce themselves, starting with Andrew.

Hi, everyone. I'm Andrew Reich, a senior sports consultant in professional services for AWS.

Brant Berglund. I am the senior director of coaching and GM technology with the NHL. I was a Bruins video coach for eight years and I did, uh, I oversaw hockey product development for a company that made the first iPad app on the benches in the NHL. And then I've worked here since, with the, with the league, since March 9th of, uh, 2020. It's a good time to start a new job.

Well, my name is Cassie Campbell Pascal. I'm a broadcaster for Rogers, Sportsnet in Canada and ESPN in the US. And um former captain of Canada's national women's team a long, long, long time ago because it's now my 18th year in broadcasting. And uh I've been fortunate enough to win three Olympic gold medals and eight world championships that really hard to follow.

Um my name is Leah Lee, principal architect and data scientist at AWS.

Awesome, great. Alright. So I think we're going to start, we're gonna take this, this journey on analytics and it all starts with first having data. Alright. So where does the data come from in sports that we can then analyze to create these, you know, analytics and insights? And so each league around the world, many of them, you know, track optically, some through RFID or sensors, brant. Can you talk a little bit about how the NHL collects its data? And I know you have props.

Sure, just to kind of quickly summarize it's an infrared camera system we have installed in every venue. There are player tags that go right above the player nameplate. So the next time you watch an NHL game on tv, if you see the nameplate of a player, it will be somewhere on this side of their body. I believe it might vary from team to team, but it's off to the side just a little bit and the puck has lights in it as well. If you actually can maybe see probably not. There's little dots going around it and there's lights on both sides.

I will say one of the interesting so to give a little background and then a story, I suppose to the technical aspects of this, this gets picked up 12 times a second for every player. This gets picked up 60 times a second. Ok. We have other feeds that are available that up the sample rate. But the actual detection that we get is the hertz that I mentioned there. So you would probably ask why this isn't located smack dab in the middle of someone's co. Well, I was not here at this point, but we did a test and this will turn into more of a psychological experiment for some of the people in the room when we tell the story. And I don't know if Cassie heard this one but I hope she appreciates it.

The player tags were sporadically going out and it was not consistent. It wasn't a latency thing, it wasn't a throughput thing. They were disappearing randomly and then they were back and then they were gone again and then they were back and then they were gone again. And people in this room were probably there and were freaking out because they couldn't figure out why.

Well, we couldn't do this with the NHL players because this was not officially sanctioned by the NHPLA at that point. So we were testing, we had to go out and find another team to test with. We tested with a high school team in a game. They were girls and ponytails were covering the tags sporadically as they skated and moved and looked and move their hair from side to side. So we had to move them a little bit offside. There are plenty of flares in the NHL that have long hair and sometimes that's what you have to deal with, but that's why they are offset if anybody ever sees them in a broadcast and says, and they wonder why they're not dead center in their core. That's why ponytails.

Alright. So we were collecting this data in quite rapid and copious volume. We have this data. And so now the question really comes for collecting all of this data. What do we ask of it? How do we start peering out? What sort of questions, what we want to ask, what insights we want to glean from it? Where do those ideas come from? And I, I know that there is all four of these people on the panel, have a stake in that. But can you, can you guys start to describe this? I don't know if you want to kick it off.

Well, I'll tee it up and go to Cassie because from the time we walked from downstairs in our little prep area up to here, I think we came up with three new ideas, right? Because she's our voice. We can do all this work that we do as a group and other people do behind all of us. We're here representing a lot of other people and we can put all this time and effort in to come up with shot and save analytics, faceoff, probability and analysis that you see up on the screen. But if she can't talk about it and it doesn't work from her, from a hockey perspective or a timing perspective in terms of the windows that she has to speak as a broadcaster, it doesn't work because if she can't do it, you can't hear it. And if she can't understand it, I can tell you right now, there's not a chance any of you guys are going to understand it, right? And it's really, really important for us to work with people like Cassie to get the feedback on what she wants to see more from, from us to help her tell our story as a lead better.

Yeah, I think for me, you know, if you're in a panel role, so you're in the intermission, you have, you know, sort of 30 to 45 seconds to get your thought out sometime and then it goes to your next colleague and then sometimes there's an argument or, you know, you debate things back and forth but you really have 35 to maybe 45 seconds. But if you're a color analyst and you're in the booth during a game, you have max 15 seconds to get your point out because the puck is dropping and they're moving on to the next play. And it general rule of thought from a broadcaster's perspective is you allow your play by play person to take it from the face off to a stoppage in play. Unless it's a long period of time, then of course, you add a little bit of color and insight in between.

Um, so sometimes it's just depending on what role you're in, it would be hard to see face off probability come up on the screen. You guys would see it at home. I see it, but I don't, I don't wanna talk because you guys also want to watch the game, but I might go back to it. Well, this is why he lost. This is why he won. I can add a little flavor to it and you guys have already seen the percentage. So that's how it can fit in. So it just depends what role you are based on how much time you have to actually get it in the broadcast and explain it fully and properly so that the viewer can understand.

It's funny too, John Bags one time when I asked him what he liked about faceoff probability. He's in a different role than you. Right. He's doing the play by play. He said he liked it because it was a bite size nugget of information that pops up on the screen and he can talk about it or he doesn't have to, but it's something the fans can easily digest and see and it supports what's going on in the game. Even though the conversation might be about something else.

We're in a world. I mean, we're in the land of it here, right? You turn on tv, shows during the day, on sports channels. It used to be darts, it used to be in Canada. It used to be poker down here. Now, if you turn on tv, shows during the day, sometimes you feel like you're watching CNBC because there's l bars everywhere and you're watching a gambling show and there's odds and there's all this information popping up on the screen while someone's talking, we're constantly multitasking right now. So having something to talk about is great, but there's also what can pop up on the screen that doesn't need to be covered, I think too.

From the AWS side, it's also understanding the data and it's understanding what is available to us to create, you know, whether it's an API or an algorithm or a model. And that's where aligning with the NHL and Cassie on what do we want to create? And what is, what is the effort going to require something like shot and save analytics. Let's dive deeper on shots and saves. Well, how does that get created? How do we output those analytics? And we will get into that in a little while, something like a faceoff probability?

Alright, we're taking the tracking data, we're taking historical data, but we need to create more of a model type of solution to create that probability similar with opportunity analysis. So for the AWS side, it's really important to understand what's available in aligning with the NHL and the broadcasters on, on what is the vision and what are those requirements for those analytics?

Let's just go for or do you want to say please?

Well, no, I think it's interesting too that I look down on that and that's the chronological order from left to right on either side, whichever one you're looking at, I'm looking at this one out here. That's the chronological order in which we've released these projects right shot and save analytics is, is really just bubbling up locations of shots from different areas and letting broadcasters slice into them from a lot of different angles, right? Individual players, multiple players, different times in the game, different score differentials. But the whole point of it was let's pipeline stuff into the broadcast booth fast so that they can do something. You know, they, they have an idea for something, they can look it up and get a quick return.

Faceoff probability. Really use the tracking data and the power of faceoff probability is it's, it's fairly simple, right? You're literally just looking at a probability of two players going in a face, taking a faceoff against each other. But the data behind it is really the compelling piece and how fast it works. If you guys, if there are hockey fans in the room and I sure hope there are right, you watch someone get thrown out on a broadcast of a faceoff and you immediately see the two players change. And not only are you actually getting that based off of the location of the players on the ice from these tags, you're also seeing a calculation change live, it's subd latency, right? I'm not sure if this fits under this topic, but the next thing that we got to is opportunity analysis, which has sort of been rolling out the last season or so. And that's our deepest dive yet and we really dive deep onto that and we'll get more into that later on.

But it's funny that I think over the last three seasons as we have gotten more into the tracking data from the NHL side, the progression with AWS has gotten deeper and deeper and gone into sort of different areas of usage. So that's a good segue. Let's let's dive into opportunity analysis as an example on this.

So we've decided we want to be able to evaluate the opportunity of the shot. Is that a high-quality, high opportunity shot or does someone take a low, low opportunity shot when we're, when we're doing that, when we want to solve for that? What are the factors that we have to consider? Right? Because you're talking about a lot of different data, a lot of different points.

Um I'm gonna, I'm gonna go to this next slide here because I'm gonna show the factors that go into this, but I don't think this is where we started. So I don't brand or leon. Do you want to talk about the factors that go into determining if we decided we wanted to do opportunity analysis? How do we think of those factors that I want leon to go more into that? But I'll just say too that not only are we coming out with a projected goal rate to say what we, what the model says, the likelihood of the goal is being scored, we're actually coming out with why? with this too? And that's an important output of this model too.

Yeah, absolutely. So, um so from a data science perspective, there are three important criteria that we look for when we look for. Um the uh the factors that matter on ice for any, for any given metric.

Um the first one is that it has to be crisp and fan understandable. So, so for a lot of all, all everything you see on this screen actually fits that criteria. So, um, distance to goal line, uh amount of, uh the amount of distance the goalie traveled in the last two seconds before shot, right. Those, those are very understandable for the fans. They're very crisp when, when it gets sent to you, cassie, you can, you can talk about it and it makes a lot of sense for the, for the fans at home.

Um I think early on, we actually had a factor called the difference in standard deviation in puck movement between five seconds and two seconds. Um so something like that, you know, mathematically, it might work out, it might even tell you something about what's happening on the ice, but it just makes everybody kind of lose track of what's actually going on. When we tried, we tried to word it as much. There was a white board, there was a lot of different color ink. It didn't work.

But um the second thing is that the factor has to actually predict shot results. Now, that sounds really simple, but actually, we'll get into this a little bit later, but they're actually, that's more complex than it seems. So sometimes you have a, you have a factor that sounds like it seems like it would work really, really well when you actually build the model and, and people tell the story, but maybe the geometry of the, of the um of the rink and help players move around kind of negate some of the, the value of that factor. So that doesn't work as well there. Um either.

And the third thing is that the factors um as much as possible should be independent of each other. They shouldn't correlate as much. I mean, the ideal thing here is to say, well, here are the opportunity analysis factors that matter on ice and they're all completely separate from each other. And that's not really possible. Uh unfortunately, from that, from a, from a just a, a modeling perspective, but as much as possible to try to make these um actually mean something and try to make these um not correlate so much with, with each other.

Ok. So now we know the factors, right, we, we know what we're trying to solve for. We know what factors we're going to consider um to put into our model, leon walk us through building the algorithm or the model to consider these factors. Can you de deconstruct that process?

Yeah. So um opportunity analysis is a supervised machine learning model. Um so it's uh it's inputs are these opportunity analysis factors and its output is the projected goal rate. So that's the probability that if you see uh these, these factors on ice for any given shot, what is the probability that that's going to lead to a goal? Um after uh after the shot is released.

Um so the intuition here is that these factors um result from or they represent the setup of the shot before the shot is released, right? So you have the the puck movement and the player movement in several seconds immediately before, before the puck is released as part of part of the shot. And we use those to calculate these um opportunity analysis factors what the machine learning model is doing during training, uh we're taking historical data. So in this case, 70,000 historical shots from, from the nhl and telling the machine learning model to learn, what are the combinations or situations based on these uh factors that then leads to likely or having a test to score the shot upon upon shot. And conversely, what are the set of factors and the situations on ice where um it doesn't lead to a very good opportunity for scoring.

So um the machine learning model learns as a supervised model, it learns from the 70,000 historical shots under these circumstances and then once trained, it gets deployed into um into uh into our system, which then allows live games to use this model um to um to rate opportunity shot opportunities when it encounters new shots.

Um so um i, i wanna say here uh for a bit that this is um that uh that so the data scientists including myself who built this used amazon stage maker as a tool in order to build and deploy this model. Um I'm from aws, so i have to say good things about about stage maker, but i promise that actually i really do like stage maker.

Um sage maker is aws purpose built machine learning capability, right? It has a lot of data preparation capabilities, training, tuning validation. And also sage maker can let you essentially call up a very large amount of compute and memory as is needed during the, during the time of training. And so it's really an enabling technology for us because having ch maker means the data scientists on the team can focus on the data science and not really on the provision of infrastructure or dealing with the libraries. And it it really allowed opportunities to be developed very, very uh doctor analysis model to be developed very thoroughly because we were able to do that so quickly.

So uh so well using the tool, you know, it's interesting if i can just add something outside of my broadcasting world. So as an athlete, if you look at that list on the left there, i think you see the word meridian three times at least.

Um so we're taught as players when we're trying to score and we're in the offensive zone if you can cross the royal road, which is the meridian line between the middle of the blue line to the goalie and essentially make the goalie move if you can get to that area before you shoot, you're making the goalie move and you're giving yourself a better chance to score a goal.

So i can tell you that on the broadcast, he crossed, she crossed the royal road. Um, and then, you know, gave herself or himself a better opportunity, but there's three times where it's talking about the meridian. So we're learning that as athletes as well, how can we make our opportunity score even higher? You know, each time we get that opportunity and, and then you go into the broadcasting side and you can see all these different analy ana analytics to get this opportunity, analysis factors or all the factors, i guess. And it's, it's pretty incredible that we as athletes are taught exactly to cross that royal road and that's just gives you a little sense of, ok, i was taught well, you know, it's the right idea.

So it's pretty neat that way too. And i think that's part of how we come up with a list of factors to examine, right? Is we take the conventional logic that we've heard for years and we go back and, you know, you either want to find something that it completely supports that and it's amazing how many times that's the case, right?

I mean, you have all these learned eyes watching the game, they're not going to be wrong consistently about something or you hope it completely dispels it. And then you have this really amazing new study that you can go back and say, actually, no, that's wrong. It's better to do this. Right. And this sort of spells out even further. Like, how much did the meridian crossing matter on this individual shot? Was it the fact that it crossed or was it the fact that it recently crossed, you know, or was the fact that it crossed almost five seconds before the release something that lowered the projected goal rate? Because honestly, the meridian crossing at that point, it doesn't really matter because the goalie had plenty of time to compensate.

It's a great marriage between the conventional logic of, of the game and the data science world. And also i want to say that from the data science perspective. So this is obviously a kind of a clear box or white box model, right? Like you kind of know everything about it that all, all these factors are are the machine learning features.

Um and so, so these days, a lot of times when data scientists work, you create, you use these much more powerful models, whether it's deep learning or various ways of, of changing up the uh the the the machine learning features. So they're more orthogonal to each other.

Um but those kind of in doing so you lose something, you lose some of that, that storytelling ability. So um so by having these factors that are very rooted in hockey as what goes into the machine learning model um, whatever the model outputs, whatever the model tells you that, that naturally feeds into the storytelling and, and it is very compelling from, uh, what does it mean? And what is it useful, uh, perspective.

And we talked about that. I remember having that discussion when we were, we want to do a projected goal rate of sorts, right? The NFL has their QBR and there's all these sort of things that you slap a number on it and you say, well, that's so and that's great. Like when you start to believe in those things and you start to trust those things, then they become part of the norm and they become part of the vernacular. But like this is all new ground for hockey in a lot of ways.

So we, i remember discussing whether or not we wanted to make the factors visible and not only that but like the contribution of the factors visible, which is really the tricky part, especially for me to understand without a data science background and without, you know, certainly a machine learning background. And, and i think that's sort of the fun part with this whole group is like we're always learning from each other, right? And you know, i think it was a great discussion to have and then to look at this, now you get your number, you can talk about quickly or you get a number, you get a much deeper dive where you can say well, this player, you know, had the highest goalie angle differential.

Meaning at the time of release, what was the line from the puck to the middle of the net and the line of the goalie tagged to the middle of the net, you can imagine that if a goalie is this far off, that's not a very easy save for him to make. But if he's dead on, right, it has to be a good shot to beat him. If that player is constantly getting the goalie's angle off, that's a great studio piece for you to talk about because you could show some data and you could back it up with, sorry to put words in your mouth. I'm assuming it is, the more we can put the eye test and the analytics together and make it work, i think the better it is for the viewer at home.

So you've seen the experts eye test and then it's confirmed with analytics. It's like, whoa, like that's like mind-blowing almost, right? You like or if you skip the analytics and then you find the examples from an eye test again, like it works both ways. So it just adds to the experience i think of the viewer.

Yeah, i mean, it's and that's, that's the point too, right? Why are we doing this to educate the fan to entertain the fan? Right. So um 100%.

So now we've got these factors, you've described what it takes to, to create the model, but then you've got to get it out the door, right. What does that architecture look like? Leon? I'm looking at you again and then I'll give you a break with the engineering component of this. How is the analytic calculated and made available? And how does AWS and the cloud sort of support that transmission of all of this?

So the puck and player raw trajectory data originates from each of NHL's 32 arenas. That's this left box here, the NHL arena data sources, those pluck and player tracking data sources are also combined with other in-game events data. So that's, you know, shot events, penalty events, game clock events.

Um these kind of games start start stop events, all, all of these are combined together and streamed um as a as a streaming package or stream of data into the AWS cloud. So there's 32 arenas and all of them streamed into one centralized size location.

When it gets into the AWS cloud, it first goes into the Amazon K data stream service um which is Amazon's uh manage, serverless big data, big data streaming service. So we use it here as a essentially a low latency buffer to make sure we don't lose that data.

Um as it's being processed from there, it goes into uh Amazon managed service for Apache Flink. Uh so this um if you're not familiar with this, this contains the custom logic that transforms that p the raw poum player um data stream into those uh opportunity analysis factors that uh that we were, we were showing earlier.

So it's essentially a uh in the, in the parlance of engineering, it's a low latency state, full compute um capability that that's runs inside of um that's runs inside AWS uh as part of the Apache Apache Flink service.

Um so uh flink also then takes all of those analysis factors that were just computed and the interfaces with the model that's deployed and running inside of StageMaker, real time inference in points.

So, so what it's doing is taking now all this um uh you know, the the latest factors that it's received on, on on ice doing the machine learning inference and getting back that projected goal rate as well as that list of how the, how the different factors contribute to that goal rate, it gets those and then it sends it downstream into the the media, broadcasters, the media partners teams um and kind of downstream applications that can now use this to entertain the fans and, and, and do storytelling.

Um additionally, some of these other components here. NHL also hosts the databases and APIs to store the both the raw data as well as the analytics for the long term

So they can be pulled by applications. I don't need immediate access to that data, but that needs it for other storytelling rather analytics purposes downstream.

So, a couple of important things here. As I again, I already mentioned, you know, 32 NHL arenas all sent here. And this entire system schedules with or scales with the NHL game schedule. So what it actually does is it looks up the NHL game schedule and it scales up and down its resources depending on that schedule really depending on how many games are being played at once.

Uh so it said it saves costs and doesn't end up running 24/7. Uh so also, you know, Brand mentioned a little bit earlier, you have the 60 hertz signal from the puck and the 12 6, 12 hertz signal from from the players. And depending on how many games are happening at the same time in the different NHL arenas, you could actually end up getting over 1000 messages per second coming from the NHL arenas and all of that is, is streamed in into this architecture and the kind of process with very low latency in some of our prediction pipelines, especially for faceoff probability that even under a condition of 1000 data points per second, we're able to end to end, get out that prediction to the broadcasters in about 500 milliseconds.

So it's really a sub you know, sub second level of latency, even if you're having very large amounts of real time data coming in. So that literally never gets old. Like I can't. We were here in, in Vegas the first time we tested it live and sitting in the press box and watching a player get thrown out on the ice. And I looked down at our, our app that we were using at the time to see it and the change had already happened. I mean, it's really, for me that was where I became so aware of how fast and how powerful this really was.

Well, and it's important, I mean, in sport, you don't have the luxury of time, right? So latency is a huge, huge deal. And you think about the applications of this data, obviously, you're trying to tell a story on air whatnot. What if you're using this data to place a bet, you know, like it's got to be fast, right?

Let's, let's, let's move forward a little bit because I wanna talk about sort of the iterative process. You don't build a model and then just ship it out there and let it go, right. There's this iterative process that happens. I, I wanna talk about that like, how does that happen? How do we test these models? How do we get feedback to know whether or not something's, you know, it's working, it's not working, we're missing something.

Um and are you, so I think never any arguments, I think what's, what's really unique? I mean, you can see the panel up here myself used to work for NHL networks, broadcast background Brand used to work for a team. Cassie was on the ice at a very, very, very high level. She emphasized and she's a broadcaster and then you have Leon who's a data scientist. So we all look at this from four different perspectives.

So a lot of times for me after that output with the data is ready. I like to put my old broadcast hat on and really try and match that with the video, the video and the game action that I'm seeing and look for trends and try and say, hey, if I were going to prepare a video package for Cassie, how quickly can i produce a store? How quickly can I edit something together? And in the beginning, there were times where there was all of this raw data and I would go back to Leon and say, hey, can we maybe clean this up a little bit, you know, we want it to be a little easier to understand or easier to retrieve.

So there are a lot of different components to consider when, you know, not only when you're creating the model, but also when somebody is retrieving the results of the model, you know, an interesting thing was it save an analysis? I don't know why I'm having trouble saying analytics and analysis when that's my job. But anyway, um you know, and it's looked at one of the goaltenders, I forget who was up there but we, we could be sitting in our green room watching the games and, you know, preparing for the intermission live as it's going. And we could say, hey, look, look at this glove, there's something going on with this glove, you know, like there's just something off like he's got it pointed down or it's up or, you know, and all of a sudden we'll say, well, how many goals does he have against on that side? And then all of a sudden the analytics comes up and it's like, boom, and you just saw this little twerk with his glove and then he's got, that's where most of the goals are scored or whatever the may be.

So that's where it kind of gets really fun in that sense too when you're preparing things where it, it happens instantaneously where you've got that, the period to talk about it in the green room, so to speak. And then they got to put together this chart right away and it just, it's just like really fun when you can sort of predict stuff like that and have analytics sort of back you up. And that's, I mean, that's probably for us that's I think our favorite part too when we finally have the data to look at.

And we have to go back as Leon said, you know, this is a process where it's, it's a white box, not a black box process. So we actually have the features to go back and look at to make sure that they make sense. And that's part of our process that we go through where we go and look at the video examples and we look at why this opportunity was rated what it was, you know, and then we get to put some visualizations together to see how you can tell a story a little differently. And he loves this one story about a certain player. I'll let you take it because I think you're better at telling it than I am.

So Brandt was preparing, you know, some examples of projected goal rates and on the Kings, he put together a map showing who had the highest projected goal rate on the Los Angeles Kings and shot angle. So shot angle was on the y projected goal rate was on the and the player with the highest projected gore, he had it by initials and the initials were a k and they said, oh, it's Adrian Kempe, you know most goals in the Kings, arguably the best player and says, no, it's not Adrian Kempe. I said, what do you mean? He said it's and I'm like, oh, I did not see that coming. And then we really, we went in, we dove in, we looked at examples of Kattar the high opportunities that he was getting. He just wasn't executing the goals. And we really looked at those examples of how he was. He had a lot of high opportunities, but he wasn't clear, wasn't executing as many as maybe should have been that night. He scored four goals.

So I'm immediately reaching out to Brant like, hey, like, did you see this? And then we're thinking well, too bad. We didn't have the Cassie show produced by Andrew and Brand because we all would have looked like heroes and would have been bragging about it the next day that Cassie did this breakdown on and Kattar having the highest projected goal rate and, oh, look that night, he scores four goals.

So that's where, you know, it gets fun and exciting that, you know, the data can really unlock the story and focus on someone who, you know, maybe they're, maybe they're due, you know, maybe they haven't had the best puck lock as some might say. And, you know, they're, they're due for some success and in that case, it, he was due and it converted and, you know, Andrew is saying we talked about it the next day we didn't, we were texting like during the game and then after the game and i had to explain to my wife who was texting me at, you know, 1130 at night east coast time. It's Andrew who's Andrew, don't worry about it. But you know that it's go that's a golden moment though, right? From a broadcasting perspective. If you do your pregame show and you talk about this, it, it's like, oh, you know, it's just this golden moment where you predict something.

And for example, if he was to see that as a player from a player perspective, you know, your coach is telling you, hey, you're getting tons of chances, don't worry, it's gonna come, it's gonna come. But when you actually see where your chances are coming from and what's happening from like a data perspective as a player, which I never had cause I'm too old for all this stuff. But he would be like, ok, I really am like now this is key information for me as a player to see and it makes me feel better about my game. And so that's how analytics can help as well.

Let me ask this question because we were talking about like the iterative process and how we're discovering things like, have there been opportunities where as we're rolling stuff out we realize and i know you've had a good example of this. Um like we missed something, right? Like where we caught things where wow, the model to pick that up. We didn't think about that.

I do want to say too though Kopitar on those, I remember the examples, I mean to use the terminology from, you know, traditional hockey, he was snake bit like there were some phenomenal saves in the group of things we watched, right? And he was due, I mean, there really that was kind of cool to have that be a factor in it too. Anyway. Sorry. No. And, and to, to Julie's question point, you know, when we are going through this, of course, we see things that's why we test, that's why, you know, we really run through the data with the video's branches. That's why we love it. So we can try and catch, you know, maybe, maybe some of the things that we weren't accounting for before, you know, the model really output output on this data.

One is, and we want to let you jump in on this as well is when we first released this, it was during the regular season, the regular season at max ends in a shootout in the sting of playoffs. There's no shootout so overtime can go on, keep going on. We've seen several times in the past and this season we saw one that went, what was it three or four. And so that's where we needed to make sure that the model accommodated for, you know, a longer length of time for a game.

Yeah. So that's when I get the call at two in the morning because that's when you do the overtime that says, hey, we're not getting any more like data. What happened? Um i, i also want to say that, you know, from a data science perspective, I think the reason that we were able to do this model so well is because of, of, you know, the marriage of rigorous data science, but also those that, that continue as collaborative feedback from NHL hockey subject matter experts like like Brand.

Um so earlier, I mentioned that we trade this model with 70,000 historical shots and that might seem like a lot. Um but really, if you consider the size of the ring and how many different ways you can approach each different shot, right? Each shot is kind of unique. We have two shots that have the same spot, they start from the same spot. But then, you know, and even if the goalies happen to be in the same position to defend, right, there's still a lot of difference in how they got into that situation. And so with 27 factors here, there's just a lot of degrees of freedom that the model can, can pick up. And that's why it's really important to have chosen these factors correctly.

And so the data scientist is able to do a lot of correlation analysis or multivariate analysis from the mathematical side to get that right. But also critical is kind of Brant's ability to come and say, no, this doesn't make sense or that doesn't work out, right? I have a lot of situations where i see that, hey, the model works really well and then Brant takes it and says, yeah, but it doesn't work and this just this particular part of hockey and it's real important.

And I mean, that's, that's a two way relationship too because as I go to maybe put a presentation together or put something together to take to Cassie or justin bourne wrote some articles using opportunity analysis last year in the playoffs through Sportsnet. You know, I always go back to him and say, am I explaining this right? I need to explain this from a machine learning perspective, but so that everybody can understand it. And I always have to check sort of my nomenclature and my, the way I, the way I phrase it because I mean, I don't want to be completely wrong. I don't also don't want to get out over my skis where I can't really explain myself any further if somebody comes back with a question.

But the interesting one about sample size was sort of the nuanced opportunities in hockey. We have a feature called distance, we call it distance from goal line. And that's really sorry, the goaltender's distance from the goal line. We noticed that on breakaways, the goaltender's distance from the goal line had something interesting going on.

Cassie goaltenders are typically taught what on breakaways to do. They come out, they attack the player coming down and then as the player hits the hash marks, they kind of, you know, that's how they kind of judge when to move back and how quickly. And now I've always been taught that too. They want to come down and cut the angle down and, and, and that's why, you know, I always, when I work with somebody like Cassie, I say things like this is how I always understood it. Is that how you understand it? Because like she's Cassie.

And so when we looked at that though, the interesting thing was the projected goal rate was actually raised by the goaltender coming out further and right away the alarm bells, alarm bells went off for me. And I said that can't be right. I mean, it can't be. And if it is, that's the greatest thing we could ever find because now we can go back and offer up the fact that this has always been wrong.

And the truth is if you think about how many breakaways happen a week, there might be two in a game, but typically there might be 10 breakaways a week in the NHL and I mean, full on breakaways. I'm not talking about somebody getting a turnover from the side of the ice inside the blue line and kind of attacking from the side with somebody back pressuring.

I'm talking about from the blue line in nobody near them all the time in the world, almost like a shootout. So we really had a sample size issue there and what was going on. And Leon explains it much better from a learning machine learning perspective. But like this is the only time where a goaltender really comes out that aggressively and that far from the net is when it's on a breakaway.

So we were earmarking those opportunities and raising the projected goal rate using that that feature, right. And yeah, that's really correlation cause causation, right. Right. So, so a lot of times when we build models that tries to not just make a prediction, but also explain that underlying prediction, this is what the correlation versus causation argument kind of uh often often comes back.

Um so the model gets trained via correlation, right. It doesn't understand hockey per se, it understands the training data that we feed it. And so what it sees here is that every time the goalie comes out um that, that, that has a really high chance of scoring. And so it says, ok, well, that means every time i see the goalie come out, i'm gonna, i'm gonna use and i'm gonna now explain this as goalie comes out. Therefore, the chance of scoring goes up.

But the reality is that that's not the case. There's a reason why the goalie comes out is actually to defend better and change the angles and, and and so on. So having this explainable model is very helpful and that it tells you why it does the things that it does. So that then, you know, hockey experts as well as broadcasters can take that and say this is what it means from a hockey hockey perspective.

So what you would goalie height percentage be part of that? I don't want to put you on the spot. But i'm curious because you got from nashville who is quite small compared to jacob marks in calgary.

So, yes, so that is another one that tends to kind of fluctuate up and down depending on the situation. The other one is defensive pressure, which is how close does the defender get at the time of release? That's one of my favorite, which is if you think about, you know, a shot shot release, if somebody is a shooter is very far away and releasing a shot, they're unlikely to get a defender, kind of get up in her, in her face just because you know that that's unnecessary.

But the closer you get to the net, when you release the, the puck, the um the more the defenders get, you know, concerned about your scoring and the more they get close to you, so that defense, so the greater the defensive pressure, the greater the correlation towards scoring.

And again, that's another one where the uh where essentially it's a confounded factor. If any of you come from statistical backgrounds, there are independent factors and there are confounded factors. And so confounding factors here get added to the explanation process and it creates this need to really interpret what is actually happening.

Yeah, i think the other side of that, that's interesting though too is if someone is shooting from a high danger. So there's really four models we look at two parameters. Does the puck cross, the blue line, does the puck cross the middle of the ice and we get four outcomes from that, right? Plus plus plus minus, minus, plus minus minus, right?

So i just tried to solve that. Right. So, you know, you threw me. So that's what happens in broadcasting, things get thrown off and you're on live television and you say the wrong word at the wrong time, you meant to say shift and something else happens, you know, it's just things happen sometimes, right?

So sorry, i was just trying to get you to get your back. By the way, i wish you guys had like delete quickly buttons, analytics for broadcasters that say the wrong word buttons. But anyway, but so if you, if you're in a situation where like let's say it's an offensive zone situation and the puck does cross the middle of the ice and the shooter doesn't have anybody around them. It's going to raise the projected goal rate. We see that pretty consistently and that's what we're hoping for. Right. Hockey always talks about time and space if the shooter has time and space at the nhl level and the goaltender is anyw anywhere compromised in terms of their positioning and their stature that really, really benefits the shooter. Does it not right.

And the model actually does tell us that pretty frequently the way it's determined is we look at a 6 ft radius around this tag actually. No, i think we do it around the puck first off. So we look where the puck is and that's where the shooter, you know, actually released the puck from 6 ft around that it's 113 square feet. We see if any of the defenders tags the 6 ft radius around those is occupying any of that 113 square foot area. And that's what we call defensive pressure.

And you can actually, it was on the feature list there, but you can see it's measured, it's measured in square feet. So it's an interesting way that you actually the other, the other thing that's cool about this is we're creating all these new stats out of this too. You're actually getting these real-world data points that you can actually use and someone like you afterwards can say, you know, this, this shot was actually something other broadcasters have mentioned. It's like i always talk about that shot was a 60 ft shot. I have no idea if it was a 60 ft shot, it might have been 73 but i'm just eyeballing it now. They know, right.

And that's something they can get from other data sources too. But there's even deeper features that we have that offer up real-world data now. So what you guys are starting to get into is this, you know, this sort of marriage between the, the data and the science and the art, right? And, and we were talking about this earlier, um obviously a tremendous amount of science and technology that goes into this somewhat useless. If we can't tell a story with it, right? If we can't convey why it matters, why should someone care?

So there's the science, but then there's the art to the science and the storytelling. And so i want to get into that a little bit because there are different ways of doing that, right? There's get, you know, obviously working with somebody like cassie and help her elucidate. The story there is, you know, graphical representations like we saw with um faceoff predictor and being able to put moving percentages in real time.

Well, let's talk about some of those stories and, and some maybe some anecdotes of, of those and i know you've got one with me up. So as as we mentioned a little bit earlier and when we're telling the kar story, you know, we love the phrase, you know, let the data unlock this story. I mean, cass, he said it before about the eye test, the analytics, the analytics, the eye test. I'm a huge believer of why can't these two concepts coexist? And you know, when brant and i were, were starting to put together materials to work with on air talent, you know, we would prepare for opportunity analysis, we would prepare. Here's the projector goal rate, here are the high factors. Here's a story that we see and there was in one instance, you know, we were talking about, you know, jack eichel and how he's deceptive with the puck and creates a high opportunity for jonathan marcio in the goalie angle. It created and brand. And i kept saying, oh, look at jack eichel is great, you know, fakes the pass, he can shoot, he can do it all and the honor and said, no, you're looking at this wrong. The reason why jonathan marsh so as such a creates that huge goalie angle is because right before he gets the pass, he flicks his stick to say here, i'm open. And once again, that's something that a former player, a broadcaster is going to pick up. Now, i can watch it 25 times. I didn't see it. And that's where, you know, the data unlocks the story.

Another great example is the concept of pass off pads which will show the clip for great play in the stanley cup playoffs where what looked to be a opportunity turned even into a higher opportunity. And that's where we let the talent take it away. You can see cousins here. He's coming in, he's looking to pass first. The man does a great job here. He's kind of, he's got this great spot right here to cover the past. He's kind of baiting him. He wants him to pass to cover it. So cousins realizes, ok, the past isn't there now, but he now all is gonna, gonna realize he's got a good read off his demon and the pass isn't there so he's now gonna laser his focuses come straight on this shot. My job now is no matter what this guy do, not let this guy score. My de man's got the pass so he kind of buckles down on this one, which probably makes cousins realize i'm not gonna score on all market at this point.

So my best place to go off the pad here, mark kinda gets a little off balance here but yeah, you see these angles here that i mean, mark's got enough time to turn his head and realize the fucks going. You're shooting knowing that will, he will be here. And if i can hit him right here by this, ben, that's coming out, you know, the rebounds coming right here. If i hit that tot, like do he said, worst case there in the puck still sit down in front. If i can go get that rebound, someone else can it creates offense but that's where it's at and, and we've tried it correct.

I remember we did in pittsburgh, mark, andre fleur. He knew we were coming down here to do it and you can't, unless he cheats and doesn't go down, it's kicking out, it's kicking out. I mean, for me seeing someone take this and do that is all we need to see, right. Our job is to really unlock the hockey for them and use the data to do that.

We are always challenged with the translation. I just mentioned to cassie, the graphic there, it's supposed to be distance to the goal line. It's a straight line distance. And i like that we left it in because it's really a challenge. We have, we have to translate this to the right people, right? So we have to make sure that we clearly define these features and that everybody understands what they mean. And right now there's a lot of hand holding with it. But i think, you know, the elevator pitch that we have to have isn't just one because if we talk to her about it, the elevator pitch we have is hockey related. And it's for me, that's the easy one when we talk to truck people or we talk to producers. That's another challenge because they're more concerned about the logistics and if she wants to see it three times a period, how practical is it for them to produce it? You know, so it's, we have to have this sort of catalog of these quick pitches for all these different personality types we deal with.

But that whole thing started off with us begging the nhl network to get devin dubick and mike rupp on the phone with us and have a, have a call with us. And we mentioned the word pass off pads, we showed the pie charts and the data to go along with it and the example. And then for the next 20 minutes, mike rupp and devin dubick went back and did that on the zoom call for us and we were like, this is why we need to talk to them about this stuff as much as possible. And that turned into them doing a quick pop

Yeah, here's a cool shot and here's these numbers. Thanks back to you, Jim, to them saying we need a studio, we need a net, we need goalie pads, do, but you want to put your pads on like it was, they were like all in start to finish on it. And we said, ok, this is the validation.

The other cool story is when we, the first we'll say talent that we showed it to opportunity analysis was Cassie who was recording a promotional piece, just a test piece for us. And you know, we kind of came up with our script on it, we talked through it and we said, ok, this is what we're going to present to her when we went through it, we showed her the examples and she was quiet and we're looking at her in the zoom session and we're waiting and we pause Cassie. What do you think? She's like, oh, I think this is brilliant and we were just like, like that's the biggest thing we can take. She's our voice.

We have to empower her to find something compelling for you as fans to talk about, right? Something that educates you on the game, something that makes you see the game of inches that hockey actually is, right? Like that little thing made all the difference in the world and that set up the scoring opportunity. And how many kids watching that clip with Devin Dubick and Mike Rupp went to practice the next day and realized, ok, I can't score. I'm going to shoot it off that path, you know, and that's a whole other grassroots your audience that you impact. Right?

See, there is my goal. I needed content to show the two teams that i coach. So I went through this whole thing so that I had footage to go bring to them before practice and show to them. No, I'm just joking. But that's true. Like it's what you want kids to see, they want to see the lens, the game through a different lens and kids what? Sorry Andrew kids watch. They've got like my daughter's got her ipad here, she's watching, we watch it on the normal tv. And then she, you know, she's got her iphone over here doing something else and her computer doing her homework. That's how they're watching, which is a little bit, maybe we shouldn't let her do that. I don't know. But that's the reality that they live in now, right? And so they want that instant data and that instant analysis. And also an example of ok, I'm going to try that at practice tomorrow, you know, and, and the thing too that i love about that clip and he said it on the call was Mike Rupp telling the story about Marc Andre Fleury. And we said, tell that story because again, that's what fans love they like hearing that behind the scenes. What can i see in this, what's going on, you know, in the, in the locker room, the clubhouse on the bus, what are the discussions, what's going on in practice that we can't see. And I think that's what's, what's really cool about it is it started with, you know, a projected goal rate and all these factors and all these numbers. And then it turned into, yeah, we practiced this play with Marc Andre Fleury and, you know, we did it to judge generate higher opportunity chances. And I think that's what's so cool about, you know, bringing the eye test and the raw hockey intuition in iq and putting that together with the analytics. And you can tell so many great stories with that.

It's interesting because Cassie, you alluded to, you know, these analytics weren't around when you were playing necessarily, right? And I think even if you look, I'm gonna tell the story about a different sport, you look at the athletes that are coming up today, not only are they learning and going and practicing this, they're studying this right? Arch Manning. He's a data science major, imagine how he is going to approach the game of football versus, you know, hey, as an example, right? So it's just, it's really interesting to see this evolution and sort of this adoption of this. Um and, and what this gets to is that all of this is meant to drive fan affinity and, and the fan experience. And it's interesting because we're talking about basically edifying the audience with telling them these new stories and showing them demonstrating these analytics. But there are other really interesting use cases um that this, this data can unlock and I'm looking at you Keith, but I want you to talk a little bit about some of the cool stuff that you guys have done with um with augmentation.

Yeah. Yeah, I mean, I think the future really, you were talking about two ipads and like the, the all the sort of alternate broadcast angle. Um she mentioned Keef and, and, and we did some work this year as a league with a broadcast that used the player tags and animated the game and we used Big City Green characters to actually play the game mixed in with NHL players. It was a really, really fun experience. I know parents were saying that they were watching it with their kids that I know, I mean, it was a really amazing thing to see the different ways that we can use all this technology. It's not just for kids like you could, if you have an a view of the game, you could probably say what you prefer to see? Do you want a lot of data on the screen? Do you not? Do you like the player tags? Maybe you want to bring back the glowing puck that you see sometimes now in the broadcast again, right? Like you can, in that new world of data, I said no, I figured you would in Canada you wouldn't go. No, no. That's another whole thing altogether. Hockey, south of the border, hockey, north border. Um but so like it, it really is, it's a, it's an exciting time to think about the people that say no, no, I don't want to see all that stuff on the screen. That's ok. There's a broadcast for you, right? And I think that's the exciting part of this is you can probably go in the near future as deep as you want, right? I mean, and I think streaming is enabling a lot of that, right? With the adoption of streaming, you have the linear experience where it's one to many, right? We're producing the least offensive broadcast for as many people as possible. But when you have the choice of saying, I want to go down the rabbit hole, I want, you know, the betting odds in an l bar. I want the analytics overlay, I want the Big City Greens version, right? And I think what you could do with that is drive personalization of the game in a way that makes it resonant to different populations of fans and attracts new fans. Big City Greens crushed it with tween girls. I don't know if that's what you were looking for, but god bless. How great is that? The tween girls were watching hockey because you guys did something creative, right? I think, I think that's awesome.

All right, I got to say one of the funny stories was someone was like, I think some of the characters didn't have helmets on and there were kids, I don't know who was posting it on twitter, but like someone was live tweeting their kid's response to the game. And I found it amazing because the kid was like, that player is going to get hurt, they don't have a helmet on and then, and then the player got back on the, and the player still wouldn't have had a helmet on. And the player, the kid again commented on it. I mean, that's, it's amazing, right? The kid's watching the game that intently and making the comments on it, you know, it's just amazing. It's awesome.

Well, that's the whole point, right? Engaging the fan. All right. Last question, we have a couple of minutes. Um left. What else do you want to know? Like we were talking about, these are the first three questions we asked of the data, right? Shot, save probability, phase out predictor opportunity analysis. What would be your pick for your next analytical exploration? Anybody anybody can answer, nobody has to answer but I'm going to make you answer. Ok. You know, I think, and I will cheat a little bit because it's something we are working on, right? Like if you watch the game, this thing is the major point of focus in so much of what we do. We talk about things like shot speed, we talk about things like, you know, getting to tracking the number of passes and then we do things like zone time, zone time really relies on where this thing is. But if this thing is inside the blue line, is it always a negative against a team whose defensive zone that is no, there's plenty of times where Kale McCarr, sorry, it might be the wrong word to bring him up and wheels back behind his own blue line, letting his teammates regroup and he's about to send a hail mary up ice and send somebody in on a breakaway or could be setting himself up to, to dance through everybody and, and, you know, skate down and get a scoring chance on his own. But we give him a demerit for five seconds of defensive zone time, right? So zone time is ok. But how do we start actually tracking the location of the play? And where does that lead us to? And is it something that we can increment more to lead to a feeling of momentum? We talk about things like that all the time on broadcasts, right? Like momentum's run down the hallway, right, like, and even in ceres, you're hearing that word when we do things like that and we can actually back that up with data to go along with it. And it's, and it's something that we can go back and say, well, there were this many goals scored this year with momentum where this team is a really good counterpunch team. They're ok playing the Ali Robo style back in the ropes, they don't give up a lot, they might get stuck in their own end, but then they score the next goal a lot. Getting to that point where we're not just talking about the pucks in that zone, it's bad. Ok, probably, but not always. How do we understand where the play is better and what does the play look like starting to use these things more and the location of these things and the movement of these things as they relate to each other, you know, in the order of these things in terms of this team's net is over here. Who are the first three players between the net and the puck and which team are they on? Right? I mean, that's really sort of, I think the next stage of this, I'd love to you.

Go ahead, Leon. Oh I was going to say that there's a technology part of this as well. All right. So that, that those sensors over there as, as Brand just talked about, right? 60 hertz. So we get a lot of, we get a stream of location data, we get a stream of, of speed data. But I think that there's additional data sources that we can potentially tap tap into. If we knew, for example, exactly what's the puck, not only what the puck is doing, but also what the, what the stick is doing. Uh you know what's the hands of the goalie are doing for, for example, so a lot of new kind of, a lot of computer vision could provide pose tracking that could be added to the sensor data, which will, which will give us a lot um more information for these models and, and then be able to, to predict and, and show even more interesting statistics and analysis for the NHL.

And because she's our voice, she gets to close, I'll be quick, we have 30 seconds, but I'd like to see blue line turnovers at the offensive line, defensive, blue line, you know, coaches talk about it all the time. It's a big thing as a player, just don't turn it over there, don't turn it over in possession in those areas. And then maybe even another one is how we can show if a defense man's gap is too big. You know, what, what's the probability of a player scoring if the gap is too big. So just uh those types of things, it's hard to follow that, especially in this great presentation. But um I would say, you know what i love about opportunity analysis is a lot of times, you know, you watch with friends and you hear, shoot, shoot, shoot and you say, well, should they have shot? What I would maybe like to see in the future is an ongoing tracker of a projected goal rate at all times. If somebody has the puck, so that, you know, maybe they should have shot before, maybe they should have made that extra pass. So, you know what i really like is we continue to work on these models and, and do more and you know, build off of them and add more factors. So i, you know, hope we can continue to add on to the work that we've already done.

Yeah, they do, they do that in basketball, right? They do shop probability in real time all around the course. So, I mean, all the, all the leagues are, are definitely investing in this space. You're gonna see a lot more analytics. And if uh continue to grow, someone wants to ask me, are we tapped out here? Are we done with this analytics thing? And uh my answer is always like, oh baby, we're just getting started. So thank you all for joining us today. Well, I think we'll be around a little bit afterwards. Um uh but enjoy the rest of the show. Thank you. Thank you.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值