NFL Next Gen Stats: Using AI/ML to transform fan engagement

最新推荐文章于 2024-07-20 19:31:22 发布

李白的朋友王维

最新推荐文章于 2024-07-20 19:31:22 发布

阅读量115

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134794426

版权

Hello, everybody and welcome to Las Vegas. I trust the conference is off to a great start for everybody. Maybe some of you were even at the Raiders versus Chiefs game this weekend. In the case today, we're excited to discuss with you how we leveraged AIML, not only to capture the sports complexities that are most interesting to NFL fans, but also what kind of engagement that drove and to what magnitude.

I'm Elena Ehrlich. I'm a Principal Science Manager. I've been with AWS some six plus years and in working with the NFL, our team developed some of the machine learning models that you're familiar with behind the Next Gen Stats metrics.

Before we get started, I'd like to invite my colleague, Art Sani for her quick introduction.

Thank you, Elena. Hi, everyone. My name is Art Hai. I'm a Senior Practice Manager in AWS Professional Services, also known as ProServe. My team has been helping a number of media, entertainment customers and sports customer such as the NFL to transform their business through innovation on AWS using technologies such as gen AI/ML, data analytics and more. And it's now my honor to welcome Andrew from NFL Next Gen Stats.

Welcome, Andrew.

Thank you, Art and thank you everyone for being here. Good afternoon. My name is Andrew. Um I'm the Director of Engineering for NFL's Next Gen Stats. It's been so exciting and rewarding to see the engagement of fans and broadcasters evolve in recent years since Next Gen Stats or NGS as we call it began capturing the player tracking data and delivering insights into today's game. In my eight years on the NGS team, our focus has always been how do we maximize the value of the player tracking data?

Well, it started with speed and distance, you know, simple calculations based off of time series coordinate data. And this would give us things like, oh, he reached 22 miles an hour on this play or he had to run 60 yards just to gain two. Now, since we also know who's on the field, we can capture things like offensive formation. So shotgun is the under center pistol I form, we can also capture personnel packages like 21 stands for two running backs, two running backs, one tight end. And then we also know what happens in the play. So our running back rushes up the middle or it's a catch outside the numbers. These were things that were typically done by an army of people watching every single play on game film. Not to mention, different teams would have different focuses based off of who their upcoming opponent was.

Us being able to label this in real time on every single play was a huge win and it represented a massive savings in time and effort on behalf of the teams. And yet there was still this potential to make better sense of the data and potentially quantify the impact individual players might have. All tight ends can catch a ball, all tight ends can block. Why is it that Travis Kelce has an outsized impact and role on his team than the average tight end in the league?

To convert this potential to real actionable insights, the league partnered with AWS ProServe, whose consultancy of experts include gen AI/ML business and technology. This is what we're going to be covering in today's session. I'm here to tell you about the NGS journey, our deadlines, our hurdles, the unquestionable thirst of our fans and our teams for more data. And while the NFL puts on some of the largest events in the world, every department at the NFL runs very lean. That, that sound familiar. Um how about this? My engineering team of four are responsible for everything from building infrastructure. So we're writing terraforming cloud formation, scripts, ingesting the streaming data, um building the API s to deliver the stats, gain day support and even on boarding and supporting partners.

This start up mentality coupled with our partnership with AWS has resulted in an insane number of stats amid an explosion in growth and visibility. So I'll give you a crash course on how we collect the data. Why we chose what we chose and then we'll talk through a sample of stats to give you a teaser on the types of stats that we build on every single play. And then Elena will come back and walk us through how they built one of the first machine learning models that they built the QB passing score. Why that was such a difficult problem while we thought that there was an opportunity for real innovation here and how they delivered one of the most illuminating stats that we have today. And then Art will come back and show us just how impactful this partnership has been from the perspective of our business, our business and media partners, our teams and ultimately our fans worldwide. And then there will be a Q and A.

So a little bit of background. In every year for the past 30 years, the number one most watched show has been. Thank you. Yes. Yeah, Super Bowl. NFL games accounted for 82 of the top 100 broadcasts from last year. The average season averages over 16 million viewers per game. This is more than the NBA finals for our fans. The sports continued appeal might come from the chess, like combination of strategy game preparation and sheer athleticism that goes into every play.

You may have heard that this is a game of inches, everything matters in this game from building the perfect roster to the field conditions at kickoff to making the perfect play call at the perfect time. In Super Bowl 52 the Eagles called the Philly Special. That was the only time the entire season that Doug Peterson made that play call and if you remember it resulted in a touchdown and ultimately an Eagles victory.

Now, since the beginning, the NFL has been capturing a wide variety of stats and these have been your typical box score style stats, bullying expressions like, oh, was it a sack? Was it a catch or measurements based off of the outcome of the play like, oh, that was a six yard game or aggregations over the game or the season? So seven receptions, 142 yards, we don't know how many yards after contact Walter Payton got and we don't know how many low probability catches Jerry Rice caught.

So the NFL realized that we needed a better way to capture player tracking data and try and make sense of the game. How do we quantify what football experts know intuitively, such a system could re reveal insights about game dynamics that would benefit fans and players insights such as well. How does this quarterback perform against the man versus of his own or how did this one big play completely change the win probability in a team's favor?

Matt Swenson, Senior VP of Product and Technology at the NFL says this, "You know, machine learning is unlocking potential for us to do more than we otherwise could in a timely manner with a high degree of confidence."

Said another way we're capturing so much data now and we wanted to find the best way to leverage it. With all the data that we're collecting, we can now effectively use machine learning to find what data points are relevant and what are not.

So how do we capture this data? Well, today, our tracking system uses RFID or radio frequency identification. There are chips in all of the players, shoulder pads, they're in the ball, they're even on the objects on the field like the pylons and the sticks. Every venue is outfitted, outfitted with an array of sensors to collect the signal from those chips. And it sends us the location along with the speed and direction in which it moves.

Why did we choose RFID in 2013? The league sent out an RFP to capture the player tracking data. We evaluated a lot of technologies and you might be wondering why we didn't go with another one. Optical tracking, which is used to great success in soccer and baseball is unfortunately unable to distinguish players when they're in a scrum in a pile up. And that's the situation that happens quite frequently in football.

Another technology GPS, at least at the time the accuracy was measured in feet. And again, this is a game of inches. So we decided on RFID and we deployed the system in time for the 2015 season, we learned a lot in that first season from working with the game day operators, working with the players and the players union, working with our media partners to make sure that we didn't conflict in our frequency ban.

So from all of those learnings, we typically consider 2016 to be the start of the NGS era. Now, looking at this bird's eye view of the field, you can think of it as a simple xy coordinate plane. You have 00 on the bottom left and then 120 on the x axis, that's 100 yards for the field plus 10 yards for each end zone. And then 53.3 on the y.

The tracking system is capturing the signal from all of the chips on the field. It's triangulating its location and sending us this data 10 times a second. This data includes things like the location xy but speed acceleration, directional and lateral load, even orientation and direction.

Now, those you can think of, ok. If I'm a linebacker and I'm running full speed towards a quarterback, my orientation and direction of travel are going to be the same. Whereas if I'm a corner and I'm backpedaling and I'm covering a wide receiver, my orientation is facing one way while my direction of travel is facing the other.

These data points form the building blocks for everything that we do on Next Gen Stats and it represents a tremendous resource for the league's teams and media partners.

So since 2017, AWS has been the official technology partner from every aspect of the development and deployment of Next Gen Stats. AWS stores this huge amount of data that's generated by the tracking system and all of our derived stats, which is over 300 million data points per season. And then my team, well, we use over 40 AWS services, your usual suspects like EC2, VPC, Lambda, S3, but also Amazon SageMaker to quickly build train and deploy new models or Amazon QuickSight to analyze and visualize this data.

Now, every NFL broadcast that you see today will have real time insights using Next Gen Stats powered by AWS. ProServe built the NGS machine learning pipeline to quickly and easily deploy machine learning models capable of interpreting gameplay and what ProServe provided us. My team can then maintain support and iterate on.

For example, when a season is over, we have a new season's worth of data that we can retrain the models on or we can take an existing model and see how similar it is to a different situation and come up with a completely new stat. Today, we have over two dozen machine learning models running inference during the game either on every play or even the real time streaming data.

Now, let's look at a couple of those in this animation. You see a deep pass to Devo Samuel, this has a completion probability of 27%. Now, what does that mean? You know, given roughly similar situations, the average quarterback is going to complete this one in four times. How do we come up with the completion probability? Well, we take all of the tracking data from ball snap to when the ball reaches the receiver's hands, you know, not when he catches it because um, but when it reaches his hands because that's the end of the quarterback's job and we reduce all of that data down to 11 key features.

Features like, oh, how fast is the quarterback running when he releases the ball? How close is the receiver to the sideline? How far down field did the ball have to travel? How close are the defenders to the receiver or the quarterback? Now completion probability also gives us this. If a quarterback's completion rate is 75% but the average of his probabilities was 67%. He's actually 8% over expected. And this is one facet of the individual quarterback's impact to his team.

Now, the vast majority of machine learning being done in the cloud today is being done on AWS, which is why AWS is the perfect partner for the NFL to leverage the power of its data.

Matt Swenson senior VP says this "Partnering with AWS meant that we didn't have to arm ourselves with a team of data scientists. We were able to get up and running quickly on the maintainable iterable and extendable solution from ProServe."

Said another way, we didn't have to re invent the wheel when we wanted to try something new. So the NFL is using the power of AWS ML, creating new stats, improving player health and safety while delivering a better experience for fans and players all in real time. Real time is never a nontrivial constraint.

And so now to show us exactly how ProServe built the QB Passing Score, Elena Ehrlich, thank you.

Thank you, Andrew. And yes, these are exactly the kinds of considerations that we take into account when designing machine learning solutions, we don't only look at the immediate problem and its constraints. We also want to look ahead at reusability and generalisability constraints.

So in this session, I'm gonna walk us through exactly how we approach the modeling for the quarterback passing score. So let's zoom out a little bit and let's look at the business criteria that NFL really want to address.

Now, when fans are evaluating a player's performance, what they're really doing is they're measuring that player's execution in a specific play against their innate sense of his potential. Now, preceding metrics did try to capture this. Madden NFL passer rating, ESPN's total quarterback rating, Pro Football's focus grade. But these did draw criticisms from fans and from commentators alike.

Actually, those kinds of feedbacks proved very useful to us because we used them to enumerate the kinds of intuitions that were being violated by those old metrics. And therefore, this became our business criteria. So let's walk through those.

First of all, we want to have a quarterback passing score that can remain calibrated to the relevant data. Previous metrics had been calibrated to data that was deemed relevant in the 1960s and 1970s but is obsolete in today's game of football. For example, the NFL has moved the kickoff line three times since that era and the absence of that or other lead changes in those metrics models have rendered them inaccurate.

Secondly, we want to have a quarterback passing score that is correlated to winning. A quarterback is measured on how well he performs under pressure, decision making and then how well he executes those decisions with precision. So the better he is at that we expect to have more points outcome and therefore more winning. Now, I don't expect the quarterback passing score to be perfectly correlated with winning because there are other factors, other players that are gonna influence that outcome. But previous metrics were too uncorrelated to be accurate.

Additionally, we also want whatever quarterback passing score we metric, we create the formula, we want it to be robust against player anomalies. For those of you who remember Kyler Murray back in 2019, he won the offensive rookie of the year award ironically, the same year that he received the low Madden score of 77 so we were determined to eliminate discrepancies like that.

And lastly, we wanted to be able to design our metrics so that it can scale to different granularity, but with consistency. So what do I mean by that? I mean that I want to be able to score the quarterback across his whole season or even down to one specific play. A root problem with preceding metrics is that they weren't able to scale to different granularity. Whether you want to just look at the play, look at the whole game, look at his performance across the week across the season or even other split level contextualization.

Now trying to encode those kinds of intuitions into a machine learning model actually proved quite nontrivial. Our science team worked very closely with the NextGen Stats team. We worked to get it from a proof of concept to go live for season 2021 and then three months later to go public in time for Super Bowl 56.

Now the Next Gen Stats team, they were able to provide us with subject matter expertise on football, on their data and on their downstream integration requirements. Conversely, what ProServe really needed to deliver to them is we needed, needed to be able to use the breadth and the ground polarity of AWS services so that we could tailor our solution to the customer's ideal of what's maintainable, extendable.

Um this means that they can maximize their value proposition but also so they can have lasting self sufficiency after ProServe exits the project. As senior director, Michael Che said, uh NFL works with, they have lean engineering teams and they also have very tight deadlines. So what ProSer really needs to be able to do is deliver yes, impactful machine learning models, but that remain intuitive for their teams to own and expand and control.

Now, for the Next Gen Stats team and for fans, it's very intuitive to mark to measure, to grade how well a quarterback is performing. But our goal is to do that in a completely data driven way that isolates the quarterback's uh contributions to the play separately from his team.

So what is the goal we want to measure how well the quarterback performs given game clock and the pressure that he's under now game clock and the pressure that he's under these are measurable quantities, game clock, this is a time measurement and the pressure that he's under. Well, this can be numerically represented from the time series that we're getting from the 22 players RFID shoulder pads in our case, creating a predicted distribution.

Oh, sorry, sorry, sorry. No, actually what I want to say is um yes. Right. So those two, those two inputs are measurable, but the output that we're looking for to mark that how well the quarterback plays. This is not something that's immediately measurable. Right. Well, it's not a tangible tractable metric in and of itself. So we needed to look at something else that can measure this.

What we focused in on is the yards gained. So what we want to model is we want to be able to predict how well a league average quarterback could get the yards gained. And against that, we can compare it to the actual yards gained by that specific quarterback in comparing this. If I can calibrate how well he got his actual yards gained against a league average quarterback's predictive yards gained. I can therefore quantify his relative over or underperformance.

Now, in sports, the most exciting plays are the ones that have extreme outcomes such as a breakaway speed or extreme yardage gain. Now predicting and calibrating extreme outcomes is very difficult because extreme extreme outcomes by definition are rare and they're infrequent. And we have very few examples from which to learn from them.

In our case, trying to predict um a distribution for the yards gain based on the 22 players non stationary time series. This is going to require a lot of robustness and we needed two types of robustness. One, we need to be robust against the prediction of yards gain that we want, but we also need to be robust in the percentage of probability of how likely the actual yards gains were that we observed.

So for example, yes, I want to be able to predict 25 yards gained. But conversely, if i observe a 30 yards gained i need to be able to quantify. Was that 15% likely so medium difficulty or was that 3% likely very difficult? Ie an amazing accomplishment by that quarterback since i said that we want to take a completely data driven approach. Let's look at exactly what those data are.

We're receiving updates every 100 milliseconds from those 22 players RFID shoulder pads. So this means that we're getting position, speed, acceleration, the players' direction and their orientation. That's eight telemetries per player per time step.

Now the input sequence, the input time series sequence that we're gonna extract from the RFID time series. This is actually a variable length. It's gonna begin when the, with the snap and it's gonna end when the quarterback releases the ball. So for example, a quarterback that takes four seconds to release the ball after the snap is gonna result in a time series of 40 time steps or a quarterback that takes 2.5 seconds to release the ball after the snap is gonna result in a time s an input time series of 25 time steps.

The table that we see over here, this represents how a play is gonna a plays data is gonna look as a matrix. So what we see here is each row is gonna correspond to a single time step of that play with eight telemetries per player times 22 players for 100 and 76 columns of that matrix.

Now, what you'll also notice is that from going from left to right, we see the offensive team followed by the defensive team where the players are ordered according to their ascending distance from the passer.

So now in order to collect the data from which we want to train the model on, we collected data from the preceding three seasons. This gave us approximately 50,000 plays now 50,000 plays. This is a huge amount of data. Um this is approximately 34,000 completions, 15,000 incomplete and about 1200 interceptions.

We definitely want to use all 50,000 plays to train our model. Um this is because we don't, we want to have maximum coverage and representation from any possible play outcome that can occur. And like we said, extreme plays don't have very many samples from which from which we can throw away. So to train 50,000 plays, 50,000 data, this is a really high volume. So you can expect a a model training time of around 8 to 10 hours. And actually, this is perfectly suitable because we train the model ahead of the season. So this is not a problem.

A key constraint for us was that when we perform the inference call on the quarterback to get his score as soon as he's finished the play, we need that inference call to be returned to us in real time. So under 1 1/1000 of a second or in other words, in time for the first replay.

Ok. Now in the previous slide i showed you how one play is represented as one matrix or one table here. What i want to show you is how we fed 50,000 plays. So a huge amount of data into our model architecture you see here in the lower in the lower left, uh the 50,000 matrices are fed into a temporal convolution network. A TCN. This is the neural network that is best suited for learning the dynamic complexities across all the plays.

Um it can handle plays of different durations and it can also learn the long term dependencies between sequential inputs. Now plays also have static features such as game clock, the current down the number of games remaining this season because these all affect other player strategies. It affects their decision making and it affects their performance under pressure.

So what we do is we concatenate the vector of static features with the dynamic game state that has come out of the temporal convolution network. And that concatenated state, we pass through the multi layer perceptron for our network's final output. In our case, a probabilistic distribution of predictive yards gained.

Now the output of this network is really worth careful consideration. Naively, we could have just output it a point estimate like um we predict seven yards gained and actually it was five. However, this would fail to achieve the desired outcome of measuring the play's outcome against its potential. And this brings me to my most critical point, which is that not all errors are created equal.

So two yards gained under very easy circumstances is not the same as two yards gained in difficult circumstances. And yet they would both have a mean absolute error of two yards. So naturally, we want to go for a probabilistic distribution, our distributional forecast.

So now we want to consider what kind of probability distribution best suits our problem. So we definitely know from our business case that we need a probability distribution that is flexible to all the play types.

Let's have a look here at our data. So in the right graph, we see the empirical distribution of yards gained from interceptions and we can see that whatever probability distribution we want for our model, we'd need to capture symmetry since an interceptor can either be tackled at the 20 yards mark where he intercepted the ball or he could run it back, maybe negative 20 yards.

Conversely, if we look at the blue distribution, which is the distribution of yards gained in completed passes, we see that our probability distribution for our model would also want to be able to capture asymmetry since it's vastly more likely that the quarterback is going to throw in the positive yards gained direction of his end zone.

There are even those plays for which we would want to capture by modality. So imagine that the quarterback passes to the receiver who has one defender closing in on him. Now, the range of yards that's likely to be gained is either going to be the incremental 1 to 2 yards. If he gets immediately tackled by that defender, or if he evades that one defender and makes a run, he'll have a high yardage gain either way when we're looking at classical distributions like the gamma and the gaussian that you see in black here, attempting to fit those two play types, we see that they are not flexible enough to, to handle all play types.

And furthermore, classical distributions like those, they focus their accuracy on the center of distributions. And like we've already discussed in sports, the most exciting outcomes are the ones that occur in the extreme outcome tales

So we really want to have a probability distribution that can accurately capture what's happening in the tails for this. I'll redirect you to the right hand side where you'll see the splice bin parado model and its corresponding distribution. Now, the splice bin parado model is something that AWS ProServe invented to solve specifically this problem and therefore other heavy tailed extreme outcome problems.

Ok. So if we string together the data, the model architecture and the splice bin parata model, we now arrive at the solution that allows us to score the quarterback on his performance. Since the spleen parado model allows us to quantify how likely each incremental yard gained was what this allows us to do is this allows us to take the actual yards gained by that specific quarterback and evaluate it in the cumulative distribution function of the predictive yards gained for a league average quarterback.

So I'm taking the actual yards gained by the specific quarterback in the cd f of the, of the league average quarterbacks prediction. And I'm getting a ranking between zero and one, which that ranking is actually the quarterback passing score at the play level. So now if we have the quarterback passing score at the play level, you can see that it's straightforward to aggregate across other other plays. So for instance, we can aggregate across all the plays in that game, all the plays in his week, all the plays in the season or other split-level contextualization. Like what we see over here in the upper right hand corner, we see Geno Smith when we're aggregating his score across all deep passes, he scores a 99 which makes him the highest ranked quarterback for deep passing or what we see here in the lower right hand is we see quarterbacks listed according to their score for passing EPA expected points added.

One of the business criteria I mentioned is that we want to make sure that our quarterback passing score correlates well to winning. What you see here on the left hand side is let's look down at the bottom. We see when the quarterback passing score is low, like in the red sixties, we see that it has, it corresponds to a very low wind probability, the red 10%. But as the quarterback passing score increases, let's say to the blue nineties, we see that correspondingly the win percentage increases to like the green 66 and that corresponds to an 80% chance of making the playoffs.

What we also want to see is, does the current, does our new quarterback passing metric correlate well to winning better than the preceding metrics correlated. And what we see over here on the right hand side is yes, it does.

So the Next Gen Stats team they created and deployed these models in Amazon SageMaker. What ProServe did is we delivered a containerized model that was seamless to deploy in the EKS cluster. This meant that there is a full control over resource allocation, the response times and the reliability.

So what ProServe did is we do the heavy lifting, we do the model development, the analysis and the production organization and Next Gen Stats has full control over when to call the model API where to store the outputs and how best to serve our partners. This actually allows us to have real time insights during the NFL game.

So when the data streams in from the stadium, so it streams in from the stadium and now it's in inside the AWS environment. Once it's inside the AWS environment, it undergoes 100 processes in under a second. So through that whole pipeline in under a second, and this results in the NFL being able to uh produce APIs, they're able to create onscreen graphics, they're able to supply sports announcers with more unique data points about which to talk. And this allowed them to expand their media team and tools.

If you want to see this firsthand, you can use this QR code to redirect you to demo code that shows the splice bin parado model fully pipeline. However, I'd like to invite my colleague Artis Sani back on stage to enumerate the kinds of business outcomes that NFL was able to achieve from fan engaging metrics such as this one. Thank you, Elena.

Let's first talk about some of the rising wins the NFL NGS has experienced over the recent years. Next Gen Stats has now over 242,000 followers on X formerly known as Twitter. What started as just one broadcast partner in 2017 is now all five broadcast partners in 2023. For all the games including non A games. The number of NGS segments has increased from one or two in a single prime time game to three plus in every NFL game.

Next, let's talk about the resulting Next Gen Stats and what they mean for the NFL among the number of Next Gen Stats that we helped NFL develop. Let's start with passing score as Elena just shared this is first of its kind of a I tool that combines seven machine learning models, including a new one to predict the value of a pass before the ball is thrown to evaluate quarterbacks passing performance.

Additionally, one of the key aspects of the pro of engagement model is to enable the customer teams such that they can take some of this work themselves and become independent. And this also helps them become comfortable in in maintaining this work in the long run, which is a machine learning models. In this case, Mike Band, senior manager of NFL NGS team stated the intricacies of this battle over ball control and field positions is hidden from advanced analysis. Yet punt and kickoff makes roughly 1/5 of the game and can have a major impact on the field position and game flow to address this gap. The NGS team developed expected return yards. The first ever set of advanced set of models that focus on punt and kickoff returns.

It's fourth down and inches. The clock is ticking, the pressure is mounting. The decisions have to be made in a split second. Does the coach play it safe and kick it or risk it and go for it? Coaches relied on their experience and gut instincts in the moment to make those decisions. However, we have helped NFL develop the stats that can, that can help coaches make more informed decisions. Additionally, fans can know what the optimal call is based on the numbers. The fourth down decision guide answers the question, what is the, what is the win probability of a team if they go for it, attempt a field goal or punt? This is an AI tool to evaluate fourth down and two point conversion decisions. And being right in those situations can spell the difference between winning or losing. And our model shows teams are increasingly making the right call. In other words, when the number says go teams are increasingly going for it. And when the number says punt or kick teams have typically gotten it right there too, as you can see on the chart on the right Coach, Stefanski Cleveland Browns head coach suddenly optimized those decisions by making the right call when the number said to go for it 38 out of 48 times which is 79%

Not all passes are created equal yet, quarterbacks get the same credit of completion, whether the pass travel 60 yards downfield to a receiver in a double coverage or pass travel two yards behind the line of scrimmage to an open running back in the flat in a complex environment like the game of football. these traditional box score stats such as rushing yards and yards per carry, they do not provide the complete context to reliably quantify teams and players rushing performance. these stats only provide a partial picture when the rush was attempted. the ball was carried by a certain number of yards without telling you how and why the ball carrier succeeded or failed.

Our completion probability model delivers that added context to each passing play. Have you ever wondered how many yards a ball carrier will gain after a handoff expected rushing yards model can help estimate that this model produces the full probability distribution of outcomes in terms of yards gain, which also derives the likelihood of a ball carrier gaining a first down or scoring a touchdown. The neural network modeling architecture built for handoff was replicated and trained on the quarterback run place using Amazon SageMaker, a new quarterback drop back classification model for a play with a hand off the relative speed direction location of all 22 players is taken at the moment of handoff and for plays without a hand off, the model uses a snapshot as soon as the quarterbacks makes clear his intentions to run as it becomes more common for the quarterbacks to make plays with their legs. Fans can now quantify the impacts of these quarterbacks runs have on the game and lastly expected rushing yards is the foundation model. The NGS team used to build those additional two models expected punt and kickoff returns.

The design of an offensive play is predicated on the immediate movement of the quarterback. Does the signal collar takes a straight drop? Does he roll out of the pocket? Is he forced to scramble? Is it a designed run Next an NFL team designed this model to classify quarterback drawback types based on real data using the player tracking data by diving deeper into the splits. And using this new quarterback logic NFL can now more accurately classify a play calls, intended play type a play where the quarterback drops back to pass, scrambles and run can now be credited as a cold pass play. This distinction allows for more robust analysis of run past play calling tendencies.

The top plays of the NFL game week or season are the players worth watching again and again by combining several different NGS model. Next in stas big play score grades every play from score of 0 to 100 based on three primary components, win probability effect expected points added and play in probability, wind probability fact is derived from wind probability model, expected points added comes from expected points model and play improbability factor is a play type specific complex combination of completion, probability, expected rushing yards and expected receiving yards.

Modern fandom can be driven through one's fantasy team as much through allegiance to one's favorite football team as such. Nixon's team has been focusing on fantasy football to deliver the actionable insights and metrics for fans to win their leagues and for some leagues avoid that dreaded last place. punishment in fantasy football opportunity is the king. Historical snap counts, touches and targets are considered as proxies for the players' involvement in an offense, more volume equals more fantasy points according to many fantasy football fans, but not all opportunities carry the same weight according to many fantasy football scoring formats, a carry at midfield yields much less value than a rush attempt from an opponent's goal line expected. Fantasy points and derivative stats such as fantasy points over expected are the two metrics fans used to make in season adjustments expected. Fantasy points, strips away players, talent and efficiency and focuses solely on the opportunity.

Have you ever wondered how the play would have evolved if the quarterback had targeted a different receiver, would the defensive backs have reached the receiver in time to stop the big game defender ghosting model can help with that. This model predicts the trajectories of the defensive backs after the ball leaves the quarterback's hand. Thus, it can help evaluate quarterbacks decision making to summarize in this session. We shared how AWS Pros and the NFL NGS team have been leveraging cutting edge technologies such as machine learning to build these stats from ground up or retrain existing ones. We also shared how these stats have helped transformed the NFL business by increasing fan engagement.

Mike Band of NFL team stated there are so many games within game, there are so many insights to find and there are so many stories to be told. Proof continues to help NFL transform their business by building more stats and and telling more stories if you would like help in defining strategy to transform your business through innovative technologies such as gen AI/ML data analytics and others need help with execution. Please reach out to ProServe through your account team or through this request form.

And lastly, your opinion matters greatly to me, Elena and Andrew. So please take a moment to complete the short survey in your mobile app, so we can continue to deliver the content you need to be successful. And with that, we will now open for Q&A and I would like to have Andrew and Elena join me here. Thank you.

"But you know, when I mentioned that we, the plays do have static features. So the, the way you introduce the question, you know how you, you qualified it by saying what the, what the exact situation was. And ok, it doesn't matter in that case because it's five minutes to the end and they're already up, they're already winning. So it's not so important that he performs well. Right?

Then those we did include those static features. So those would let the model know that if we're going to compare it to how well other play is, so that will let the model know that these are the conditions and then we know that, that those conditions make it not so important a play and then there's gonna be other plays like that where we see how other quarterbacks performed.

So if, yeah, generally speaking and they won't put so much effort in doing it. Whereas if it's a really important play, all the quarterbacks will try to be executing very well. And then if he, if that specific quarterback that we're looking at surpasses their, like their average performance, then we know that he's able to perform very well in important plays like under pressure. If, if this was your question.

Sorry question here. I have a question about the RFD units. You mentioned they're in the shoulder pads. Are they at all noticeable burdensome, you know, how did you integrate it into? The, they're, they're really small. They're about the size of a dime, maybe like three dimes thick and, uh, they're placed in the epaulettes of the shoulder pads and so they're actually underneath the flap.

So, um, the players don't notice them. Um, and because they're so small, they're also able to be installed in the ball as well. So, yeah, they do. Sometimes they can fall out. And so thank god we have two, we're able to, you know, disregard one of the tags if it's just like lying on the field somewhere and then just focus on one tag. But yeah, they're, they're really small, two quick follow ups.

What type of manpower do you have, you know, monitoring the RFID units like that? Like if one falls out during the game, is that a big, you know, lift in terms of people monitoring?

Well, yeah, I mean, we're monitoring every single game. We've got, we've got crews at every single stadium, uh, and they're responsible for like activating the chips and, you know, making sure that the chips are assigned to the right player, you know, sometimes like players will change their shoulder pads and we will have to scramble to try and get emergency tags on them.

And so we have a really good relationship with all of the team's equipment managers. Um and so they're monitoring all of the data as it's coming in and then almost immediately once we realize that like how a dropped chip will manifest itself is like just very erratic tracking behavior. And so we're able to discover that really quickly, um find the exact point where the erratic tracking began and then retract the data from that point. And so we'll have clean data throughout the entire game.

So, yeah, and ii, I think you addressed it a little bit there but just was wondering like how hard it is to coordinate everything with all the different people, like the event? Was that a process or was that pretty easy?

It's a huge process. Um I mean, you know, everybody's really lean, there's one game day coordinator at the stadium and he's responsible for coordinating all the frequencies that are all the media partners have at the stadium. So getting his attention and, you know, just working with.

So another another indicator is the crew that's there to operate the tracking system, they're there six hours before the game to make sure that this is all going smoothly. So that's a, it's a lot of work to do leading up to kickoff. Um and then, you know, we've got crews who are remote, you know, and the knock and um and then we're also monitoring, like my team are also monitoring the games as they come into.

So do you ever see this data being used to help the referees like calling a false start or like, like those fonts. No. Um, so, but we do, um, not false start. Um, but we have used, uh we are now using the data to kind of track where the ball goes out of bounds on kick place. And so, you know, if it goes, if it's like super far out of bounds and the ref like is like five yards off, like we know exactly where it goes out.

And so we like there's a signal that gets sent and then the football ops, people will see that note be like, oh actually the ball has to be placed on the 16 yard line. And so they get that information like within a couple of seconds of the ball going out of bounds. And so we're able to get that, you know, in time for the next play.

Um so for the paper that you guys produce, i'm actually curious. So with the streaming peaks over threshold methodology, i'm curious when you guys look at how to learn the tail quantile, what are some of the new methodologies that you're using to make sure that you can reduce domain knowledge dependency? Like are you looking at maybe generative approaches or what, what are some of the new techniques that you might be looking into?

Sorry, can i again say the question back to you just to make sure i capture, you're asking. So for the distribution, we said that we want to make sure we focus very well on the tail. Well, not just only but equally on the tails of space. You're wondering what, how we did it and how our method would compare to other potential methods out.

Now, i'm actually just interested in some of the newer approaches that you're looking at to make sure that there's less dependency on the tau parameter being calculated like oh right from the paper. Ok. So i'll just, i'll just say it so that other people can understand what he's asking is. It's actually a very good question because it's still an open research problem.

So yeah, let's say the way i presented it, it might imply that we know exactly where the tail of a distribution begins. But this is not the case, right? If we knew exactly where the tail begins, there is already a huge leg up on the problem. Um so yeah, we want, once you know where the tail begins, ok, you can, you can try to fit how heavy it is, how light it is or maybe if it's even a finite tail. But um yeah, that's only once, you know what he's calling t is the threshold where we now say before this tough, it's part of the middle distribution. And after this tough, it's considered the tail.

And so yeah, for our paper, no, i would, i would say this no, for our paper, we didn't look into a way that does it very robustly, but we did do it in such a way that doesn't depend on the domain. So the, the way that most most applications will do this is like, you know, your industry very well. So if, let's say you need to predict how high rivers are going and that's important because if once it floods, you know, you can't take your cargo ships down and all your suppliers are stuck, right?

Um so what we did instead is like, even though we fixed it, saying that we're going to say that the tail begins, the tough is going to begin at the fifth and for the lower tail, the fifth percentile and at the upper tail, the 95th percentile, actually, this gives us a dynamic tough because the quant is fixed. But every time we update it with as a stream of data comes in and we update the distribution where the fifth quantile and the 95th quantile land, this actually moves. So there is some inherent capture that's not based on the domain but does provide a time varying threshold. Thank you.

Uh i guess it's a sort of similar question to the kickoff return out of bounds point you made earlier. Is it today or do you possibly foresee it being used in the future to help officials make decisions about whether or not a field goal or a extra point actually went sort of between the uprights or not? I think about situations, you know, with sort of the famous ones that we've all seen where it's slightly over the height of where a goalpost is. Right.

Um, i mean, that doesn't happen very often because those goalposts are pretty tall. But, but, yeah, i've seen that and unfortunately the, you know, there's, the accuracy isn't so much that we can get it like to within an inch. Right. So, because we know that the accuracy is only so much, um, we're not going to make decisions on uh the game outcome based off of that.

So we are almost at time. Uh we have time for maybe one more question. Very. Ok. Um ok. Uh i may just have two points. Um so for the, uh for the sps pp, um uh the splint bin parado model that you mentioned, um that, that's the distribution of the, the yards past, right? Um so basically your upper bound is 100 yards. Does that help you constrain the problem?

Um and second question, uh when you do this model training, is there anything counterintuitive such as a single factor or combination of factors that you found that counter intuit counterintuitively contribute to the success of the team or quarterback or skills player, et cetera?

Ok. Uh so sorry for the first part. Um yeah, so the, the distribute when we are talking about this place been pre distribution, it can be for anything. It's not just for this problem, but in this problem. Um yeah, so you're saying because the field is, we know it's 100 yards, that's like an upper bound on uh how far it matters that the quarterback threw the ball, even if he overthrows it, it's like you're gonna cap it at 100 yards. This was the question.

Um yeah, so if, if he throws it from wherever, from wherever the line of scrimmage is through the end of the field, then yeah, that would, that greatly reduces the problem that you don't need to have. Uh so, so in li a model, but since i'm, this is not like what's always happening, you need to have a model that can account for when this doesn't happen, right? And usually this doesn't happen.

Um so in that case, actually, what, um yeah, so you, you don't know. So you know how, when the other gentleman asked the question about where the tail is going to begin, uh given a certain play type. So given this play is very hard, then the quarterbacks might not throw so far. But then if he, if one does manage to throw a little bit further, you want to be able to get those tails and you only have a few data points from these, you know, the preceding three seasons where some quarterbacks were able to do that.

So actually, even though you have your bin distributions, all, you know, actually we did do 100 bins one for every yard of the field to be comp but then what that gentleman asked when we are looking at the 95% percentile for that specific play type or similar play types. You'll notice that that 95% threshold cuts off really early. So the rest of the bins don't matter. And then you end up fitting the tail for what is suitable for the tail for those types of plays. Does that make sense?

And i don't remember your second part of your question. Oh, that were counterintuitive. Um you know, i, yeah, you know, i, i will just need a few more minutes and we can discuss it afterwards. I'll just need to recall it because, you know, in hindsight, once i, when i think about the problem, i just remember how things worked out well and i'll just need to have like a minute to remember like what when i was doing it like uh maybe i thought something for sure was gonna work and then i was probably embarrassed in front of andrew when it didn't and i had to revise it.

So, no, i'll definitely jog my memory. Oh, no, that i'm not sharing that. So we have a time now. I think we'll end the session now, but we'll all wait outside there in case you have any additional questions. Thank you so much for attending the session today. Hope this was helpful and have a great rest of green one. Thank you."

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
NFL Next Gen Stats: Using AI/ML to transform fan engagement

Hello, everybody and welcome to Las Vegas. I trust the conference is off to a great start for everybody. Maybe some of you were even at the Raiders versus Chiefs game this weekend. In the case today, we're excited to discuss with you how we leveraged AIML,
复制链接

扫一扫

NFL Next Gen Stats: Using AI/ML to transform fan engagement

“相关推荐”对你有帮助么？