Harnessing 110 years of insights: Using AI in content classification

Before we start, let's see some hands. Raise your hand if you remember the last time you watched a movie and had the BBFC age rating in mind.

For over 110 years, the British Board of Film Classification has helped families and people across the UK make safer viewing choices through the use of our trusted and widely used age ratings.

Americans in the audience may be familiar with the Motion Picture Association, the MPAA. The BBFC fulfills a similar role on UK cinema, but our ratings also extend to content released across DVD, Blu-ray, VOD and streaming.

So back to our question about when we watch movies and have the age rating in mind, our research has shown that 9 in 10 parents and 8 in 10 young people consider age ratings and content advice on VOD and streaming services as important as films viewed in the cinema.

My name is Martha Edible. I’m a part of the business team of the BBFC in the UK. I’m here with our AI project collaborators from AWS, Pauline Ting and Kai Man.

To meet the explosive growth in content following the shift to online viewing, we’re working with AWS to see how AI can bring scale to human compliance processes.

Today, we are excited to share our proof of concept AI solution that has been specifically designed to review video content and identify compliance issues and the levels of strength.

During this session, we will look into the BBFC, who we are, explore our purpose and discuss how our compliance process has evolved over the years.

We will also examine how film consumption is developing and how the needs of our audiences are changing.

Together we will identify a challenge and a solution. We will learn about the collaborative approach we have taken together with AWS. We will also envision the future and restate key takeaways.

If you've ever been to the cinema in the UK, this will look familiar. It's our iconic BBFC black card that runs in front of nearly all films shown in UK movie theaters. It is a legal requirement and it reminds audiences of the age limits applied legally to the films you're about to watch.

First, a brief background about the BBFC. Our mission is to protect children and vulnerable groups from harm through informed viewing decisions. We are a non-profit funded by fees charged from the content we classify. We're independent, an NGO, and we hold two government designations.

This means that since 1912, content released in UK cinemas must display a BBFC age rating. And since 1984, all films and episodic content released across physical media formats. Such legal designations are not extended to the same content released on VOD and streaming.

So, since 2008, we worked with UK distributors to extend the use of age rating systems on a voluntary basis. This has been a self-regulation success story with over 33 VOD brands now displaying our BBFC age ratings.

So our ratings are seen in cinemas, on all DVD and Blu-ray packaging, and on VOD and streaming services. Since our inception in 1912, the consistent presence of BBFC age ratings across film content in the UK means that our age ratings are expected, understood and trusted.

Over the last decade, viewing has shifted from physical media, broadcast television to online. And this change is reflected in the large volume of original online video content being made available to audiences globally.

This makes the ease of access to scalable, cost-efficient classification and guidance ever more important in a digital media landscape with increased access to content.

How can we better protect families online, helping them to make informed decisions about what to watch? Kai will tell us more about this.

To ensure our work continues to reflect the latest thinking of people and families in the UK, every 4 to 5 years, we talk to over 10,000 people in the UK. This year we spoke to 12,000 people and the findings will be released in March next year.

There are clear trends and articulated needs. Online viewing is now the main mode of film consumption, particularly amongst teenagers. There's a significant increase of exposure to age inappropriate media.

Online concerns about young people's mental health are being flagged by parents. It is a topic of huge importance. There is an increasing number of people who mention either being personally affected or knowing someone who had been affected by suicide or self-harm.

92% of teachers are concerned about the material their students view online. Half of adults and 1/5 of teenagers say they mainly watch films via other streaming services including illegal ones. These are less likely to carry age ratings.

So how does the BBFC determine what age rating is correct for a piece of content?

When we issue an age rating, our human moderators, the compliance officers, watch the content in full looking for issues such as dangerous behavior, discrimination, violence to ensure the age rating issued fully reflects our classification guidelines.

Our annual student survey has shown year after year that sexual violence is the number one issue of concern for teenagers.

We are very proud of our Advisory Panel on Children's Viewing and our Youth Panel. We work closely with charities, experts and other regulators. Our job is to listen.

We also conduct research on, for example, domestic abuse language and discrimination, and we follow the latest trends in viewing patterns and needs of our audiences.

Our compliance officers also consider context, tone, impact, how it makes the audience feel, and even how and where content will be accessed. For example, content generally watched at home has a higher risk of underage viewing compared to content being watched in cinemas.

Over the last 110 years, our compliance has evolved. In 1912, our officers wrote down details of media content in leather bound folders. We moved to paper files in the late fifties.

Since 1984 we have maintained a video archive of all content classified for use in the home allowing us to review classification decisions if challenged.

Between 2008 and 2012, we digitized over 160,000 VHS tapes. And in 2012 and 2013, we digitized all the historic paper files from the late fifties.

In 2020 we transitioned to a cloud-based system. We now view and tag the content allowing our compliance officers to easily record metadata tags as they view content.

Although much has changed in the way we view content over the last 110 years, our mission remains broadly unchanged – to protect children and vulnerable groups from harm and to empower consumers to make informed viewing decisions.

So how does a century old organization manage the rapid growth of video content while ensuring that all content with a BBFC rating has been viewed in full?

We've been working with AWS to explore how AI content scanning can help bring scale to human content moderation. Our proof of concept is trained on our rich archive of video data and time-stamped metadata.

It identifies a wide range of compliance issues and their strength levels. It generates compliance metadata allowing a more focused and efficient human moderation.

This solution does not produce age ratings directly. It supports human compliance viewing wherever it is needed. This will streamline content compliance processes, reduce cost and support the use of a more localized and consistent age labeling and content information online.

It will help us tackle the challenges that the new media landscape brings and continue to promote safe viewing experiences online in the UK.

Thank you, Marta. And yeah, great to be here. I'm really excited to talk to you all on this topic and thank you for turning out as well. I'm aware it's 5:30 on a Wednesday. We're probably the last session between you guys and going into networking drinks. So we really appreciate your attention this evening.

To recap, our children are growing up in a vastly different reality from what we knew growing up. The significant rise in content that is now available online presents an increasing risk of inappropriate material being surfaced and viewed by potentially vulnerable audiences.

I for one, with two young children of my own, am particularly concerned about what they will be viewing as they grow up amongst this.

The BBFC still remain dedicated to their one mission – empowering audiences to make informed media choices, providing helpful guidance, enabling people to choose what's right for them and what isn't.

So with this vast volume of content now being uploaded to streaming services and video sharing platforms, it's now more important than ever that we find sustainable and practical solutions for content moderation to meet this demand.

So what is the solution?

Hello, my name is Kai Man. I'm a senior account manager here at AWS. I'm here to talk to you today about the collaboration between the BBFC and AWS ProServe in which through using the power of AI, we've been able to harness 110 years of content classification experience from the BBFC to meet this very challenge.

I'm excited to talk to you about our multimodal AI solution that we've developed as part of this collaboration, which is capable of analyzing what is said, what is heard, what is seen within video content to determine classification issues and their presence and their strength levels.

Key objectives for this project were firstly to develop and train an AI that was able to recognize classification issues across these six classification categories: violence, threat and horror, injury detail, sex and nudity, discrimination and sexual violence.

We then wanted to ensure that our AI was capable of returning all of the issues as and when they occurred, which we call recall. And we wanted to make sure that the AI was capable of determining the correct strength level of the issue as well, which we call precision.

What was the high level approach that we took to tackle these objectives?

Well, the first step as with any ML challenge was to look at the data. The data in this case was movies from the BBFC that had compliance issues tagged within them.

We organized the data first into 32 second clips with the corresponding BBFC compliance tags. And my colleague Pauline will talk to why we chose 32 second clips as the ideal length later in her session.

Once we've identified the data and organized it, we can then start to train the model with this data. And if you think about a movie clip, there's three different components that are useful when training an ML model.

Firstly is the text, the speech, what is said in the movie clip. Secondly, is the audio – what you hear in movie clip, which can include things like gunshots, screams, punches, etc. which is useful for training.

Lastly and most importantly is what you see in the video clip, the visual component. So once we've organized the data and trained the model, we're then in a position to start presenting new clips to the model that it's not seen before. And the model can predict whether these clips contain compliance issues and tell us what the severity level, the strength level of these compliance issues is.

So I'm now gonna show you a quick demo of the tool in action. And this is an early version of the tool, just a caveat for the purpose of this demo. We focus solely on the violence category.

I wanna say thank you to Paramount for allowing us to use their film, their 2022 sci-fi horror film Significant Other in this demonstration. The film is the equivalent of a 15 due to the strong bloody images, threat, violence and language.

So here's the first clip. At the top of the clip, there you can see a white overlay. At the very top of the white overlay, you can see some red text. This red text presents the ground truth for this clip. This is what the BBFC compliance officers have rated this clip. And this example is “no issue” - so no violence in this clip.

Below that you can see a series of colored bars starting with green at the top, very mild, but going all the way down to red at the bottom, very strong. This is what our model is predicting for this clip. So you can see in this instance, our model is predicting the lowest severity level, the lowest strength level for violence, very mild. And that matches with what the compliance officers have rated as “no issue”.

So spoiler alert, nothing really happens in this clip. It's just to show that the model is working in the absence of violence.

In this second clip, the model makes an interesting mistake. So we can see the ground truth is “no issue” - so no violence. And the model predicts “very strong violence”. So let's see if we can spot why the model gets this wrong.

Yes - hopefully you spotted there that there was a stick getting poked into some goo and the model has confused that for a stabbing. In this instance, there's also a scream from a bird. So the model could have confused that with a human scream as well. It's interesting, right? Because it's recognizing some components of violence but not getting it quite right. So training the model with further data will improve this.

In this next clip, I'm afraid I'm not able to show you the video itself due to the violence within it. But what happens is there's a proposal, a marriage proposal that then quite quickly turns to violence. So the woman starts to wrestle the man and throws the man off the cliff.

We can see that our model gets the prediction correct. So the ground truth is “strong violence” from the compliance officer. And our model predicts “strong violence” which is encouraging because this clip has a mixture of tones in it. It goes from romance to violence. So it's good that the model gets this one correct, is able to pick up the violence in this clip.

In this next and final scene, again apologies I'm not able to show the actual video itself due to the violence within it. But from the still image, you can see a man with a knife protruding from his finger, a bit of sci-fi horror coming in there with his hand on another man's shoulder.

In this scene, what happens is the man with the knife is threatening the other man and he ends up chopping off his hand and we see the hand drop to the floor. So strong violence in the classification. The compliance officer rates it “strong violence”.

And our model gets this one correct as well. So from these last two examples, we can see that the model is, is picking up when violence occurs in scenes which is very encouraging.

So now I'm gonna hand over to my colleague Pauline to walk us through how a WSS ProServe built it in more detail.

Awesome. Thanks Kai. Now let's talk about how we actually built the solution with the BBFC. My name is Pauline Ting and I'm a data scientist here at AWS Professional Services. Let's dive right into it.

So there's six main issues that the BBFC was looking for in their model. This is violence, injury, detail, discrimination, threaten horror, sex and nudity and sexual violence. However, not all of these six issues have the same age ratings, sexual violence only has two, yes or no while violence has a spectrum, it has UPG 1215 and 18, we also added an extra category for no issue to differentiate between you and no issue.

Now, there were a couple of considerations we had to take into account when building out this model. The first consideration we had to take into account was languages. We initially, since we are taking into account the audio of the movie, what's being heard, what's being said, we focus initially only on English speaking movies.

The second thing we had to take into account was animation. Animation is interesting because violence that would happen to an animated character would not, would not produce such a high rating than if it were to occur to a live actor. A good example of this is a cartoon that I grew up with called Spongebob. So Spongebob, actually, if you, it's about a sponge who lives under the sea and it's an animated cartoon, but if you look back at these cartoons, they're actually quite violent, but it's ok for children to watch these because we know it's not to a live actor.

The 3rd, 3rd thing we had to consider was a profanity filter. Profanity can instantly raise an age rating of a movie. And the BBFC wanted to make sure we were catching any profanity that was being said.

And the last thing we had to consider was audio and sound, audio and sound can influence an age rating, even without anything being said, one of my favorite movies is Steven Spielberg's classic Jaws and the ominous soundtrack of an incoming great white shark is really famous and we all know it and it sets the tone even without anything being said explicitly.

Now, if we look into our initial model, there were we see that we take our three components. These are video, audio and text. First, we take the first we extract the visual features from the video and from the audio, we extract audio features. And then lastly we get text, text is taken as a written transcription for what is being said in that clip from here.

We see that we use one model to output each of the uh six age ratings. We we chose specifically in this case to use one model because the features are the are the same for each of the six issues. It also makes it easier for the bbfc to retrain to scale, maintain and host by having one model. In addition, by having one single model, the model is able to learn not only the features for each of the six issues but also how to differentiate these features from the other issues. So in this, in this case, our models not only learning the features for injury detail, but also how it differs from the features needed for discrimination.

If we take a little bit of a deeper dive, we can see that we first take our videos and we split them into 32nd clips. We chose 30 seconds. Because after talking with the experts at the BBFC, it's long enough to capture the scene to see if there's any issues in that scene. But it's also not too long that it's adding redundant information that it's adding extra processing time and any resources to train our model. However, we parameterized it so that users can change this to be shorter or longer.

From the second one, we have the audio, we chose audio because we wanted to make sure we're capturing anything that's being said from the audio. We get a spectrogram. A spectrogram is a visual representation of what's being said in the clip. So here you can see there's different wavelengths in their spectrogram.

And lastly, we have transcription because all our clips in our initial model are in english or our videos, our transcript will be in english. However, Amazon Transcribe also supports many other languages.

Now there's one last thing we had to take into account. Once we split the clips into 30 seconds, we then sampled it for frames, which is what you see in this top part. We used a pre trained image to then see how or to then get the visual visual vectors from these, from these clips and from these frames and see how they changed across time. We mash these across the audio spectrogram to see how they changed their cost time. And finally merged these features and a pended one disjoint layer on top to perform our six classification or six predictions.

So all in all this project took about uh 6 to 8 months and we trained on over 1600 movies. Now, some of these included just clips from movies rather than the entire movie itself. Currently. right now, the model takes about 8 to 12 hours to train. However, this is really heavily dependent on how many movies you want to train the model on.

We also wanted to work on an ems pipeline because we wanted to enable bbfc to have the ability to expand to other categories. Let's say they wanted to train on non english speaking movies or that they wanted to take into account animation by having an ml ops pipeline. The bbfc would be able to retrain the model to take into account new movies and to take into account new categories.

So we started off by on our m ms platform by connecting their github account to the aws cloud with a code star connection. This means that any changes that they made to their code would automatically be reflected into the account.

The second step was that the code pipeline would check for either new videos in s3 bucket or if it was mainly triggered, immediately start running this lambda function from here, this lambda function will run a stage maker pipeline. We stage maker pipelines are continuous integration and continuous support services for ml workloads on the cloud and they're really great because they help scale once you get to large amounts of data, which is important when you have movies from here, you can see that we have our model training component within our sage maker pipelines.

We split the videos into chunks. We get the separation of audio, text and visual and is also able to call outside services from n aws such as amazon transcribe. Even within sage maker pipelines.

We start training our model and then we save this model into model registry model registry within sage maker studio enables versioning and tracking performances of for each of your models. So you can see which data you use to train your model and how it performed. This will make sure that you can choose which model you want to deploy and make sure it's the best model that you have um behind the scenes model registry will also save your model package in s3.

So now that we've had a high overview of our model development and platform, let's walk through how it looks like when we actually run a model through inference.

So here's our architecture solution to run inference and how we extract the different features to predict the um to predict the different issues. So we start by uploading a movie through the front app app application which is then saved in s3. This will then trigger a lambda function to start our stage maker pipeline, which is very similar to our training pipeline. However, instead of training a model. it instead then calls the latest approved model or whichever model that you choose to deploy.

The model output is then saved in s3, which then can be overlaid in a web application. So here we see the sage maker inference pipeline, which is a graph containing the necessary steps to perform the end to end inference. When we first start in our first step, we can see that the first step is splitting all of our clips or movies into different clips. The step consists of ingesting an entire movie and breaking it down and these clips are fed into the next step of the pipeline.

Here. The processing step takes all of these chunks and start tracking the necessary features to run inference. We first extract image features from each of the frames and we also in parallel call, amazon transcribe to get the written transcription of everything being said in that clip.

Once we have the frames and the audio transcription, we perform one extra step to generate features for the audio channel. These features are the extract of what's called a spectrogram for the audio channel. So here you can see our spectrogram which is an image and we treat that spectrogram as yet another image feature to take into account when running riner on our model.

So once we have the image features, the audio features and the text, it's time to run inference on our model fitting all of these features against the pre trained model that we have from our model registry, we will then perform the six predictions for the six different compliance issues.

And once that's done, it's almost time to consolidate these predictions and tag the relevant outputs. But we have one last step and that's making sure we catch any profanity that's being said. So using a list provided by the bbfc, we tokenize and limit our transcription to compare against the list of profanity words, tokenization is converting the text into individual words. Well, limitation is converting words into their base root or form. So for example, running ran and run will all count as the same word. This allows us to identify all forms and potential derivations of profanity words to indicate what should be the minimum age to watch a speci this specific clip.

So now we reached our last step which is to consolidate and tag our output file. This file is pla is tagged and placed next to the original movie so we can show the results and overlay it in a demo.

Now that we've seen how the infer pipeline works, let's talk about how we had to evaluate our model. And there were a couple of considerations that we had to take into account. And first, we had to make sure that well, first then not all metrics are the same. If our model was to output that a clip is you or no has no violence. This would be a huge difference than if it was to rated 18 plus when the original was 15 plus. So it's not just getting it wrong, but also making sure that we're close as possible to the original one metric used for measuring models is an f one score which sta statistical definition is the harmonic mean between precision and recall. But what does that mean exactly.

So here precision, it's identifying only the relevant data points. So we're making sure that we're capturing all of the 18 plus films when they're actually 18 plus. If we had perfect precision, we would have no false positives. On the other hand, recall is making sure we find all the relevant cases finding all of the 18 plus videos in our data set. If we had perfect recall, we would have no false negatives.

So we made a custom metric to account for this to making sure that we would accurate, that we would wait our answer for if it was more closely to the original ground truth. And how did we perform? Well, we found that we were really good at identifying cases where there was no issue and also finding cases of of, of extremes such as extreme violence. But where we struggled is the nuances between issues such as between moderate violence and strong violence, which is very difficult to detect and often needs context.

A good example of this is if a protagonist of a film is to become a victim of violence in their own home. This would result in a higher age rating than if they were, if he or she were to become a victim of violence in someone else's home. But since our model is only looking at clips, it may not have that context that's required.

So now that we've talked about what is how we trained our model and the ml ops platform, i'm going to hand this back to kai to talk about our future vision.

Thank you, Pauline. So now that we train the model and we've built the mrs platform and we've demonstrated that the model works for its intended purpose. What is next? Well, next, we'll focus on improving the model further and this will involve tuning the model. So hyper parameter tuning and looking at the arch architecture and making some tweaks and changes there, it will involve increasing the training data set. So training them all across a more diverse larger range of movies and potentially some data augmentation. So artificially increasing the data set, it will of course include optimizing costs. So looking at the architecture and optimizing it for cost efficiency, we will adapt the model for different styles. So animation, for example, um at the moment, um the model doesn't really support that. So we, we need to adapt it for that style. And lastly, we wanna spend some time understanding the model so that we can figure out uh why the model surfaces, what it surfaces and understand the, the direction to take it in.

So sorry. In, in summary, i'm, i'm really excited about this project. Um and we're encouraged by the results that we've achieved so far. Um looking forward to the next step, it's gonna be, it's gonna be fun. So i'm now gonna hand back over to, to marta to wrap us up with key takeaways.

Thank you. Thank you so much kai and thank you, pauline and thank your audience for a great engaging session together. We hope you feel it's been inspiring, encouraging you to look into how you and your organization can see opportunities of technology and a i in this session. We have looked into how the bbfc is combining over 110 years of our classification experience with a i technology with human expertise at the center of the compliance decision making process and in line with our core mission of helping uk audiences choose the content. That's right for them as you complete the survey in the mobile app. thank you. Here is the remaining transcript formatted:

We would like to open up the floor for questions. If you would like to talk to us directly, please feel free to reach out and connect Kai Polly. And I are very happy to continue this conversation. Also after this presentation.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值