AWS Clean Rooms ML and AWS Clean Rooms Differential Privacy

Good day, everyone. My name is Tia White and I am the GM of AWS ML Ad Tech and Data Tech. And I cannot tell you how excited I am to be with you to unveil, even though you probably have already heard it all, the great capabilities we've added to AWS Clean Rooms to increase what was already a strong privacy enhancing posture.

Today, I will be joined on stage with Ryan Malecky, Senior Solution Architect for AWS Clean Rooms, as well as Julianne Jennings, Head of Data Products and Sam Taha, Head of Customer Data Product Architecture, both extraordinary leaders from The Weather Company.

Today's agenda is jam packed. We will dive deep into the investments we continue to make into AWS Clean Rooms. Throughout 2023 we will discuss the importance and the benefits of privacy enhancing data collaboration. Then we'll go deeper into our recent launches, AWS Clean Rooms Machine Learning and AWS Clean Rooms Differential Privacy. You will likely hear me refer to AWS Clean Rooms Differential Privacy as AWS Clean Rooms DP or Differential Privacy as DP say that 1000 times.

Finally, we will hear directly from our customer, The Weather Company. Take a moment. I want you all to think about what your life was like pre and post COVID. COVID has dynamically changed our lives. Think about the way you shopped, even necessities such as groceries, think about the way you engage with friends and family. Think about the way you entertain, especially during COVID. There's so many more dimensions of your life that changed because of COVID. But there was one good thing that came from COVID - the explosion in the accessibility of digital data.

Let me put this into perspective. The total amount of data created, captured, copied and consumed globally has forecasted to reach a compounded average weekly growth rate of 30% more in 2020. There were 64.2 zettabytes of digital data. It is forecasted by the industry in 2025 to reach more than 180 zettabytes. What does that mean? Data is power. You have more power in your hands. This growth is triggered by digital engagement such as streaming video, the way you interact with friends and family, the way you shop, just the way you live your lives and educate your kids. Data is power.

Companies need to understand how this data can drive business outcomes and they want to collaborate with their partners to generate insights and unlock the value from the data. But data is fragmented. It's located in different repositories, different applications, channels, departments and often times if you are partnering with someone outside of your company, they have their own data. It creates interoperability and scale challenges. Let's face it drastically limiting your data acquisition slows ideation, it slows delivery. And most important importantly, it slows decision making.

But the most important challenge that companies face is how do you protect your user, your consumer, your customer data. Many organizations need a better way to manage how their sensitive data is collected stored and used while being compliant and ensuring a strong privacy posture.

Now, an interesting fact about me, I just celebrated my two year anniversary here at AWS on Wednesday. But prior to that, thank you. Prior to that, I was a customer for six plus years. And so I like to tell stories. We'll use this as an abstract hypothetical story to preserve the protection of the companies I worked for. So I've spent my entire career in financial services, a heavily regulated industry. The first few years of my career, I was in engineering. I was an engineer led engineering orgs led massive transformations. The last five years of my career working for financial institutions, I led applied AI groups. So it's how do you apply machine learning and artificial intelligence and skill. I was a data producer. The first part of my career. The end of my career, I was a data consumer and let me just tell you that data consumption part is hard, especially if you're collaborating with people outside of your org.

There were hypothetical times where I was collaborating with a cobra. Let's say I work for a credit card company. And let's say that company had a cobra deal with someone like Neiman Marcus and doing that, we often had to collaborate. But it was hard. How do you share data? How do you move fast? How do you have a profound impact on your business while preserving the underlying data? And often times we went through lengthy contractual agreements that took months quarters, sometimes even up to a year. And it stifled business development that created an inherent tension between how companies collaborate to have an impact on the bottom line as well as how they preserve the privacy of those customers.

Sometimes these mechanisms have required companies to copy data which can become costly and as I share, go through lengthy processes to protect the customers. But customers want to also limit the movement as possible. They want to prevent misuse, they want to prevent leakage as a result. They often choose not to collaborate, which means missed opportunities.

What if you didn't have to make a decision or a trade off? What if you could have it all? What if I told you there was a solution that allowed you to collaborate while protecting your data and your intellectual property. You can still experiment, you can innovate, you can create productivity, you can empower your employees and more than anything you can have a profound impact on your business with data clean rooms. You can do just that you can collaborate with your partners to unlock valuable insights while allowing you to have a profound impact on your business, to understand customer priorities.

The su where we conducted a survey, we asked about 50 AWS existing customers that were primarily from the marketing and advertising industry. First, what was the most important aspect as it related to collaboration, data, collaboration products and making that decision. And first was consumer or customer data protection. That was an overwhelming response we got from the survey.

The second part was what's an important criteria or the most important criteria in your decision making process of what data collaboration product you select and guess what the answer was enhancing privacy. This response is underscored by the numerous conversations I've had before re invent and here at re invent with customers like yourselves.

The challenge businesses face is the reason why just last re invent we launched AWS Clean Rooms. You all have an advantage because I'm assuming your data already resides on AWS. And because we are ginormous and a huge cloud provider, many of your partners data also already probably relied resides on AWS.

AWS Clean Rooms helps companies and their partners more easily securely and in a private privacy enhancing way, analyze and collaborate with their collective data sets without sharing or copying the underlying data with AWS Clean Rooms. Customers can create secure data clean rooms in minutes. So going back to that time to value and removing the lengthy toll that you have on your processes and collaborate with other companies to generate, generate meaningful insights about advertising campaigns, investment decisions, research and experimentation. And I'm excited to share that just yesterday. Swami announced in his keynote AWS Clean Rooms Machine Learning and AWS Clean Rooms Differential Privacy where we further increase the privacy enhancement case capabilities we're offering today, we're going to dive deep into those two recently launched capabilities, AWS Clean Rooms ML is available for public preview. You can go use it today and helps you and your partners collaborate with each other and apply privacy enhancing machine learning to generate predictive insights without having to share your underlying data. This capability removes the need to share the data as you traverse through the entire machine learning development life cycle from train to build, to deploy. The first model available is specialized to help companies create look alike segments with AWS Clean Rooms ML Look Alike modeling. You can train your own custom model using your data, invite a partner to collaborate with you and bring a small amount of records or see data and generate, generate look alike segments. This capability was built and tested across a wide variety of data sets from ecommerce and streaming video and we can help customers improve accuracy through our, through our testing, our benchmarks on average exceeded the industry between 25 and 36%. Which means not only is it easy, not only do you have speed to market, you also get a highly accurate output, this can translate into saving millions of dollars health care look alike modeling will come in the um in upcoming months. So be on the lookout for that

AWS Clean Room Differential Privacy or AWS Clean Rooms DP is available also for public preview and helps you protect the privacy of your users with a mathematically backed rigorous intuitive capability. Differential privacy is a rigorous mathematical definition of data privacy protection. However, configuring this technique can be complex and hard. AWS Clean Room Privacy Differential Privacy is intuitive, a fully managed capability that helps you prevent the reification of your users. You don't need to have DP experience. And that's the reason why we've built this managed capability. There are open source versions that are available today, but like many of the customers i meet with on a continuous basis, it's hard because they don't have the DP expertise, AWS Clean Rooms Differential Probably um Privacy sates the contribution of any individual's data from a clean room co collaboration aggregating the output. And it enables you to run a broad range of SQL queries to unlock insights about advertising investment decisions, clinical research, and so much more, you can set up AWS Clean Rooms DP through a custom analysis rule in your existing collaborations or your new collaborations.

Now, let's dive deep into each of these capabilities. So I don't know how many of you all truly have experience with machine learning. And it's taken, the artificial intelligence and machine learning has taken the world by storm. But it's hard first, you have to have the skill set and that's hard to retain and uh keep happy and engaged in this current market because everybody wants to do AI. In addition, it's a lengthy process and it could become hard, especially in a privacy preserving way. If you think about what we're launching this fully enhanced off the shelf modeling, privacy enhancing capability, we're the first to do it in a clean room construct

One challenge customers face is that they want to protect their user privacy and their intellectual property because that model is IP when building and running machine learning models with partners and they don't want to sacrifice using data sets to build them. AWS Clean Rooms ML removes the need to share the data portion of the machine learning development life cycle.

Customers also want to find users that are similar to their best customers without revealing first party data. Now they can apply AWS built models that are custom trained with your data and never reused. AWS Clean Rooms ML allows you to retain full control and ownership of these train models including when to use them when to generate the segments as well as when to delete them. Your data is only used to train your models. We never reuse it. We never share, we never benefit from it. It's only for you.

Customers desire to use ML look alike modeling with their partners. But as I shared, don't have the resources nor do they have the time so you can do this in days instead of months. So I want you to understand what that means for your business.

Lastly, customers want flexibility and configurability in their machine learning models to generate predictive insights with their partners. AWS Clean Rooms ML provides intuitive machine learning controls that enable you and your partners to tune the results that greater benefit your specific use case.

Now, these are just a few examples of how you can use this capability. Please understand that there are endless possibilities to apply this and my team and i would love to meet with you to dive deeper to figure out how we can collaborate with you.

First say United Airlines wanted to leverage their data about loyal customers and they wanted to partner with someone like Booking.com and they want to offer promotions to users with similar characteristics for a greater conversion rate. You can do that with this model.

Auto lenders and car insurers wanna identify prospective customers that maybe recently leased a car and also need insurance. You can do that.

Brands and publishers can model look alike segments of in market customers and deliver highly relevant advertising experience, increasing return on ad spin.

As an example, you can also do that. And last but not least research institutions and hospital networks can find candidates that are similar to existing clinical trial participants, which today in healthcare and life sciences is a very cumbersome and lengthy process. That's highly manual, you can do that AWS Clean Rooms ML is super easy to use and we've intentionally built it that way.

First, a collaborator provides data to train a custom model. They invite another partner to come to the table with a data set that's small that resembles records or what we also known as a seed. Together, they create a look alike segment and that output is simply delivered to an S3 for you to use as you need to. It's that simple.

Now, we get into the topic that I remember when I first took this job. I said, wait, what, what is, what is it that you want me to do? What is DP I had read about it prior to joining because again, I worked for a financial institution and we were always focused on preserving privacy, but I didn't know a lot about it. So it took me some time to really get it to speed. I probably spent the first three or four months really reading academic pages and talking to scientists to really understand how it could benefit various use cases and what it was all about before we get into what we've actually launched.

I want to take a moment to pause and explain what differential privacy truly is. In short DP is a framework of data privacy protection of individuals. Its primary benefit is to help protect data at the individual level by adding a controlled amount of randomness to obscure the presence or absence of an individual user in a data set that is being analyzed. It helps with a wide variety of use cases specifically for large data sets where adding or removing a few individuals has a small impact. But we can still create a statistically st sound output. You can use it for population analysis using count queries, histograms, benchmarking a b testing and even machine learning. You can use DP with your partners to analyze data while decreasing your reliance on anonymize data or auditing queries, opening up new avenues to generate insights that has an impact on your business.

A real world example of using DP is with the US census. The US census bureau uses differential privacy in analyzing and reporting sensitive data related to the US population. When they publish their most recent census report and shared, they shared that it applied differential privacy to housing unit data as well as earnings data so that they could publish the report, give statistical analysis without having to worry about compromising or showing the underlying user or people data.

A very recent within the last month US executive order on the safe secure trustworthy development use of artificial intelligence stated that differential privacy is a key privacy enhancing technology or PET and will issue guidelines on the explicit usage of it by October 24 that lets you know that AWS is ahead of the curve and we're already bringing to market these capabilities that can help make sure you have the right posture, protect your user's privacy and can handle and respond to what this US executive order will further dictate.

Now, a practical application of DP is when you are collaborating with the business partnering or analyzing your collective data sets, we want to ensure that no attributes about an individual is shared. Differential privacy adds a carefully calibrated amount of error but not to impact the output to the SQL query results at runtime, masking the contribution of individuals while still keeping the query for the use case safe based on the use case, you can set DP parameters such as how much error is added to an analysis or a threshold of how many queries can be ran and protecting the privacy of an individual. That way like the cool graphics, the more insights gained through the more DP that's applied, you mask the presence or absence of an specific user in that data set. And the way that we have designed AWS Clean Rooms, DP, it makes it super intuitive for you. You can take what we provide as a recommendation or you can toggle just a few inputs and figure out what the most or the best approach to DP is for your use cases.

Now we built AWS Clean Rooms, DP to help customers do this at scale accurately. And fast companies want to use differential privacy. But again, don't have this expertise. Now they can apply this sophistication to their data sets to strengthen their privacy posture. They want, they want to easily apply DP for specific use cases and data collaborations. AWS Clean Rooms helps reduce that heavy lifting. You have to do with this fully managed capability. It recommends a parameter as I share but gives you the autonomy to adjust as you see fit.

Companies also want to collaborate with their partners to generate insights about their specific use cases. With AWS Clean Room DP customers can use highly flexible and configurable controls to do just that. And finally, companies want to do sophisticated SQL queries. We understand what that means. We understand the benefit of that and you can do these sophisticated sequel queries with AWS Clean Rooms DP.

Again, these are just a few of the use cases that you can accomplish with using AWS Clean Rooms DP:

  • Plan your advertising spin by determining user overlap with marketing partners without revealing the underlying customers.

  • In common measure, the return on investment of marketing with a media publisher to optimize campaigns based on aggregate insights

  • Complement auto insurance policies with market insights about our driver population without revealing the underlying user data

  • Advance clinical trial insights by collaborating with medical institutions without revealing highly sensitive patient data

Now, AWS Clean Rooms takes three easy steps, not just to get started but to return impactful results. When setting up, setting up a collaboration, you turn on DP while configuring a table in AWS Clean Rooms. From there, you can use intuitive controls to set the parameters or you have the option of using the default settings.

Then your partners can run queries like they normally would no change for them. And DP is applied to each query results. AWS Clean Rooms DP does not require any additional configuration or set up parameters to query the data, making it easy for them to continue using this great product for your data collaboration.

Now for the fun stuff, I'll turn it over to Ryan. Thanks Tia. I'm Ryan Maleki. I'm the solution architect on the AWS Clean Rooms team. And I have the pleasure of giving you a first look at both AWS Clean Rooms, Differential Privacy and AWS Clean Rooms, Machine Learning.

In today's demo, I'm going to be hopping back and forth between two different browser windows to show you both halves of using the service, both halves of the collaboration to make it a little easier to follow along. And so I don't completely trip over my tongue. I have a bit of a scenario, I'm going to be showing the case of a, of a coffee company that wants to run an advertising campaign on a streaming service.

They're going to be first using AWS differential privacy and AWS clean rooms to figure out if they have common customers so that the coffee company can reach the population, they're looking to with the messages they want. And in the second half of the demo, I'm going to show how they can reach net new customers. So the high level flow of using AWS clean rooms, differential privacy starts with customers starting to use AWS clean rooms.

The first customer who's enabling differential privacy on their data will associate data to the clean room service and they'll configure a few settings from there. The second customer in this demo, the coffee company is going to associate their data and start running queries. Every query they run will consume a small amount of the of the query budget of the privacy budget. And once they've run enough queries, they'll consume the entire budget and no more queries can be run.

Now, I'm gonna take you through using the service. If you're interested in learning more about AWS clean rooms, I'd encourage you to check out the resources that we'll link to at the end of this talk. So I'm starting off in AWS clean rooms. I have a collaboration, set up a collaboration you can think of as a logical boundary that defines the set of accounts that can collaborate on an analysis who can provide data, who can take various actions like running queries, receiving results. We even let you configure which party is responsible for the bill. You can also set things like whether logging is enabled, whether you can use some of our advanced features like cryptographic computing. All of that is defined at the collaboration level, the instance of a clean room level so that each data set owner has visibility into how their data will be used.

Next, I'm going to take you into the other account. So this light background account is the streaming service provider. I'm creating a configured table that's mapping the data from the streaming service provider that they already have in their account stored in s3 and cataloged in the glue data catalog into the clean room service. Here, you can see I have data like email, phone number and an internal id. And I'm going to create a configured table from that a configured table is just a reference to data already in your account.

Next, I'm going to be configuring an analysis rule. The analysis rule controls how data can be used in AWS clean rooms. We have a few different types of analysis rules today. I'm going to use the custom analysis rule that has differential privacy enabled on it. The next step is to choose to turn on differential privacy. I'm also going to select the column that represents the users whose privacy I'm going to be protecting.

Next, I'm going to specify the query controls, these control, what kind of queries can be run in AWS clean rooms. With the custom analysis rule. I can either allow templates or I can allow any query written by a specific account. In this case, the coffee company, I'm going to choose that option because honestly, it makes for a much more interesting demo. And with that, I've configured the analysis rule. I can review all of the settings and I'll apply it to the table. At this point. I'm going to take the table that I've set up and associate it to an instance of the clean room. The collaboration with the coffee company at this point, I once i've done the association, the table will be available here.

I'm setting up the permissions that allow AWS clean rooms to access the data in the streaming services account by default. AWS clean rooms doesn't have access to data. You have to explicitly grant it to the service. Now, I'm going to do the second half of setting up a clean rooms differential privacy, configuring two privacy settings. The two settings I have here are a privacy budget and the noise for the budget, you can have that budget reset monthly or not. In this case, I'm not going to have it reset for the for the noise and for the privacy budget, you're able to choose between these ranges and the choices will impact how many queries you can run.

So different types of queries, different types of aggregation functions consume different amounts of the privacy budget. Things like average consume a lot, things like count distinct, consume less of that budget by changing the amount of noise, the amount of error it contains the number of queries that can be run for a given budget. So with a low noise setting, i can relatively few queries with a higher noise setting, I'd be able to run a lot more queries. So depending on the type of use case and the accuracy you need for your use case, you can adjust these settings. And the same with the budget. If I have a lot of trust in my collaborators, i could set a larger budget to allow more queries to be run more insights to be generated. If not, i can set a lower budget. And with that, i finished setting up differential privacy.

I am able to come in from the streaming service provider side and monitor what queries have been run and how much of that budget remains. And now i'm going to flip back into the coffee company side and start running queries. I'm able to see all of the data available in the collaboration. I can see the differential privacy settings and how many queries can be run with the remaining budget. I can also see the schema and the analysis rules for all the tables in the collaboration.

I can either write queries directly or i can use templates that have been pre approved by all parties. This is useful if you know exactly the queries you plan to run in your collaboration here. I'm trying to run a query that is selecting specific ids and this is blocked when differential privacy is enabled, only queries that are generating aggregate insights are allowed and the service will automatically block queries that aren't allowed. So now i'm changing this to a, a count distinct and aggregate query. And i see the, the query starts running when you first run a query in AWS clean rooms, it takes a few minutes while we spin up a compute environment that's dedicated to your collaboration. So rather than wait around, i'm going to flip over to a collaboration. I set up ahead of time in this second collaboration

I've already run some queries and the environment is ready to go. So with this, I can see that I have already used most of the privacy budget - about 86% of the budget consumed. And I'm going to run a few queries.

Now, looking at the number of common users between the streaming provider and the coffee company, I run the first query and I see that I have about half a million common users - half a million 218 to be exact. And I can see that I consumed some of that budget. So I only have three queries at most that I can run.

I'm going to run this query again and I get back the results before I had half a million 218. Now I have half a million 212. So small amount of difference between those.

If I run it a third time, I see half a million 220. So almost exactly the same result every time.

Here I'm going to cheat, I'm going to try and run a query that looks for the presence of ryan@example.com in the data. If differential privacy wasn't enabled, I would be able to tell from the result if this user was in the data set. The result I get back here because of the added error is ambiguous. I can't tell if that user was present in the data set or not.

To emphasize this point, I'm going to take all of the data that I generated and put it on a graph. So I have a little script here that I wrote. Here's all of the results from running 20 queries on that data set. And I can see that the results are really similar. So this is zoomed in, this is at the scale of half a million results. And I see that the result tests are overlapping. But if I ran enough queries, I could average out that noise, which is where the budget comes in. If I run through all the queries and exhaust the budget, if I try to run this query again, the service will block it. That's the role of the budget - to make sure that you're not able to run so many queries that you can access the sensitive insights.

So that's AWS Clean Rooms' differential privacy. Next, I'm going to look at AWS Clean Rooms' machine learning.

So in this case, the coffee company and the streaming provider have started working together and the coffee company now wants to reach new users, not the users they are already familiar with. They want to find users that are similar to customers in their loyalty program that let's say are fans of tea.

To accomplish that, first the streaming service provider is going to associate a dataset of users - subscribers that are watching shows - they're going to train a model on that data.

Then the coffee company is going to provide seed data, a list of users, the tea drinkers who they are similar to the customers they want to reach more of.

Then they're going to train a model and generate a lookalike segment that is users from the streaming service provider's universe that are similar to the users the coffee company provided.

Finally, those results are going to be returned to the streaming service provider. They can then take those and ingest them into their system to use for ad targeting.

Alright, now I'm going to jump back into the demo environment here. We're starting with a streaming service provider. We're going to go and start by training a machine learning model.

So the first thing is to specify the dataset like I showed before with configured tables. I start by mapping a dataset that's already in my account into the Clean Room service. So here I have a dataset that shows the subscriber viewing history - who watched what shows when. I'm able to look at details of the schema of that dataset.

And I'm going to next map the schema of my dataset to the fields that the model is going to use for training. The first thing I select is the column that identifies users. Then I'm going to select the column that identifies the shows - the items that the users are interacting with. And then I'm going to select a timestamp column.

In addition to these, I'm also able to bring in additional columns that can be used to improve the accuracy of the model. So here I'm adding ratings, but you can bring a range of numerical and categorical columns.

Next, I'm going to provide a role that's giving the Clean Room service permission to access the data. And I'll create that training dataset.

The next step is to train the model. So I take that dataset and I'm going to create a lookalike model. I'm going to specify the dataset I want and a date range of the data from that dataset. In most cases, we expect that the data providers will retrain these models periodically to pick up new users and new activity and that users who are no longer active roll out of the model. So here I'm selecting a one year time range.

The next setting is encryption. AWS Clean Rooms ML is always going to encrypt your data. You have a choice between providing a KMS key that the service will use, and if not, the service will create a key and use it on your behalf.

Training the models can take a while depending on the volume of the data, anywhere from an hour to a few days for very, very large datasets.

I'm going to jump into a model that I've already prepared. And the first thing we're gonna do when looking at this model is look at the metrics that were generated during the training process. I can use these metrics to assess whether the model is useful or whether I might need to go in and clean up my data or bring in additional dimensions.

If I'm happy with the model, I then go and create a configured model. This is going to include some settings relevant to the data collaboration process. So the configured model, I go and I enter a name, I select which of the models I want to configure, and I'm going to enter some values here like the size of the seed audience, what's the minimum number of common users for me to generate a lookalike, as well as whether I want to share relevance metrics. I'll show those metrics in a minute, they're really important.

And with that I've created the configured model and I'm ready to start collaborating with my partners. In this case, the coffee company. I'm going into the collaboration, the same collaboration I showed earlier in the demo, and I'm going to associate that configured model. You can add many models associated to a collaboration potentially from different partners.

Once I finish associating the model, the role of the streaming service provider is complete. I'm going to hop back into the coffee company account. Now I'll navigate to the collaboration and I'm going to go back to the ML tab. Here I can see that model that I just associated and I'm going to now create a segment from the model.

To create a segment, the key thing I'm providing here is the seed data that I already have in my account. This is just an object on S3 rather than having to be in the Glue Data Catalog. Again I'll give the service permission to access data and specify which of the models I want to use. And with that, I am able to create the segment, it'll take a few minutes while the segment creation process happens.

If I go and refresh, I can see the create is process. And if I refresh again, I can see that the training is complete. Now, I'm going to look and see whether the process was successful.

The key metric here is the relevance metric. This is showing how similar various parts of the universe of the streaming service provider's users are to the seed provided. So I can see at about 5% it's a 50% score and then it drops off really quickly from there. This is giving me some indication of what size of audience I want to export. Depending on your goals, you might choose a larger or smaller export. But here I see that the 5% had a good relevance metric. So I'm going to choose to export that top 5% as the segment that will be passed back to the streaming service provider.

One thing I'll highlight here is there isn't an S3 location shown for the coffee company. If I go and I look at this from the streaming service provider, I see there's an S3 location there because the data was delivered to their account and I can go and I can look at that object in S3.

And with that, we've reached the end of the demo of AWS Clean Rooms ML. Next, I'm going to turn it over to Julianne Jennings and Sam Taha. I've had the pleasure of working with them for about six months now and I'm really excited for you to hear all the great things they're doing with AWS Clean Rooms.

All right, great. Um so what you want as a product manager is speed to market. All right. So I'll give you um maybe three examples of what Sam and I were looking for uh in a clean room solution, right?

So the first thing is speed to market, the less time I have to deal with lengthy contracts, procurement and going back and forth with different data vendors, the quicker I can get to market to create a solution for my advertisers. So that's thing one.

Thing two is ease of use. I'm looking for a platform that is intuitive for not only a business user like myself, but for an engineer.

And then finally, I want to understand how quickly this solution can compute an insight. All right. So I'll give you an example about that. So I've been working in, you know, ad tech for about the past 10 years and some of these legacy platforms that I've worked in, it takes a lot of time to get to compute an insight.

So an example of that is I might need to get on Sam's roadmap, an engineer and I may need to buy for, you know, his priority schedule. Then his team finally is like, ok, great. I can take a data set and put it in the proper schema that the said vendor needs to ingest the data set in.

Then I have to get the data set, ingested, wait and see if everything went. Ok, maybe wait 24 hours. I'm sure you guys can relate to this. And then finally, I can create a segment.

Then once I create a segment, then I have to, I finally can provide insights. Either I can pro provide insights to my customer on what's the overlap or what is the data set index highly for or what's the campaign effectiveness of the data set?

Sometimes this whole process, it was probably really laborious hearing that will take a day in the life of a data products manager that can take 2 to 3 months. That is not speed to solution.

So the AWS clean room solution checked all the boxes for us. All right. So we turn to AWS to solve one of the more convoluted things required to create a data product and that's selecting third party data providers.

So we have great first party data, an immense amount of scale. However, sometimes based on a brand's kpi, I may need to augment with third party data sets. So in the past, I'm sure you guys can relate to this. I've had to rely on guesswork and vendor sales pitches, who is the best sales pitch.

So there's a level of bias. There's no math in my decision over which vendor I select. Also. Potentially, if I want to test something out with a data provider, I may need to copy and replicate data sets.

So with AWS clean room solution, I'm able to get to an answer of which data provider works best for my product with math, no guessing. All right.

So the offering that I have created with our team, the clean room offering that we've created is a great foundation. However, I don't want to stop. I want to supercharge my products.

So I believe that these two new services being revealed today are gonna help me to do that. So let me give you another practical example of how we can use machine learning. All right.

So we have an automotive advertiser. We have a lot of automotive advertisers on Weather and they're looking to target a net new addressable audience. All right. So maybe they want to target parents of teenagers in market for an SUV.

So they want to create maybe a look alike. So we can do that by having the machine learn on our inventory, we can put that inventory into the collaboration room and then the automotive company can put that seed audience into the collaboration and we can create a look alike audience.

Now, the advertiser has a net new addressable audience. To target on Weather.com. All right. So, but did the campaign perform? Did that new segment perform? I need to be able to tell the advertiser the effectiveness of said segment or campaign and differential privacy will allow us to do that without possibly compromising any PII.

And says Sam, I'm gonna turn this over to Sam to give us the reference architecture on how we would possibly do that. Thank you.

Hello everyone for the first reference architecture. Um I will describe a clean room collaboration scenario between a publisher and an advertiser. Uh the two parties will come together in the clean room and they will plan a uh advertising campaign um and use uh uh clean rooms ML to build a uh a look alike audience uh that will be used in the advertising campaign.

So if we look at the um the reference architecture here on the left side, we have the publisher which is being uh played by the Weather Company and we have the advertiser on the right side. Uh whi which is your favorite auto company.

They both reside in different AWS accounts and uh the clean room sits in the middle. And uh what's important to note here is that during this collaboration, uh no data is copied or shared between the different uh AWS accounts.

If we focus in on the um the Weather Company environment, on, on the far left, we have our um weather applications. These are web and mobile applications. Um uh uh uh um users uh access these applications to get, you know, weather information and weather data.

And we also uh serve um um advertisements uh to our customers as users are interacting with our applications. Um our, our um CD P is collecting the interaction data and the first party data and the user signals.

The CD P is streaming our um customer data in the real time into our data lake and our data lake is built on Lake Formation and Redshift. Uh I mean, just to give you an idea of scale uh we and sort of volume and sort of kind of what runs through the pipes, uh sort of in the Weather uh Company uh sort of data platform.

Uh we're talking about 400 million monthly active users using our weather applications, our web and mobile applications and we're processing about a billion events a day and a terabyte or more of uh data a day gets into our data warehouse and it's our data lake.

So now going from our data lake um sort of for this collaboration between the publisher and the advertiser, we're going to as the publisher, we're going to provide um the uh the customer data uh that's gonna get trained uh with a clean rooms ML, what we'll do is we'll bring in our customer IDs and the user signals for each customer.

We'll curate that into a data set and pass that over feed that over to uh to a clean rooms ML which will train on the model and um after the model gets trained um uh and, and, and, and, and evaluate it, we're happy with it.

We will um we will um we will associate the model with the clean room and then we're kind of done with the, with the publisher side of the collaboration.

I will move over to the advertiser. Now, what the advertiser will bring is their customer IDs as seed data into the collaboration and that will feed into the ML model. And then the output of the ML model is a lookalike audience segment that we will output uh from the clean room and uh upload into our ad server.

Uh and from there, uh we can use uh you know that uh that you know the audience for advertising and targeting. So in summary, we've used the clean rooms ML to help us build a lookalike model that's larger than the or you know, organic overlap in customer IDs between the advertiser and the publisher.

So now I'll go to the next uh reference architecture for differential privacy. Uh so in this collaboration, this, this, this is also a clean room collaboration with the same. Uh so with the same two players, the publisher and the advertiser. But uh but, but in this case, the uh advertiser wants to perform measurement and analysis on the results of an advertising campaign.

So it's a little bit different sort of twist than uh than the planning sort of use case that we talked about earlier. Uh um so, in this case, we, we'll focus in on the uh TWC environment here.

What, what we're doing is capturing our ad server logs into our data lake and we're taking our, our ad impressions and um and, and associating them with whatever campaigns that the advertiser is interesting and is uh interested in measuring and will bring the ad impressions uh into the clean room as configured tables.

And we will enable uh a differential privacy on the configured tables. And by doing that, we will be able to s uh set a um a threshold on the amount of uh noise we want uh injected into the configured table.

And we'll also be able to set a, a differential privacy budget in terms of how many queries and how much analysis that the advertiser can perform on the data.

And so we're done now with the, with the sort of publisher side of the collaboration, we'll move over to the advertiser, the advertiser. In this case, will, you know, perhaps bring their purchase data or campaign data into the collaboration as a configured table.

And so once that's done, now, the advertiser will be able to run queries uh using SQL um on the combined data set, uh just like they would run SQL and probably any other sort of typical a sequel editor.

And because we've um added uh or implemented a differential privacy uh settings and, and, and controls um uh uh uh data about any individual uh user will be uh will be um restricted and won't be available so it won't be exposed.

Uh and, and at the end of the um uh this collaboration that you know, the analysis and the queries that we run can be output to S3 and down to their favorite BI or visualization tool if they want to report on it.

Uh so kind of in uh sort of enclosing. Uh yeah, we're pretty excited about using uh uh clean rooms and the new ML and the new ML services. Uh we so recommend, you know, you know, you folks taking a look at the service and um I think you'll like it and especially I think your lawyers will like it.

There's a lot less uh paperwork and red tape involved. So, um thank you.

Thank you Sam. So at the end of the day, these products and services are going to allow me as a product manager to build products faster scalable and more efficiently. And I'm really looking forward to speeding up the time that we can create predictive models. And um that's really exciting to us.

So I'll kind of close here and just talk about some learnings from the clean room. I loved that I could use the AWS service with my existing AWS contract. In fact, when Sam and I heard about the service, we were able to do an internal POC that day and play around with our, the different data sets and we were like, ok, great. This really fits our needs.

I love that I could eliminate guess work and I love that I was able to have speed to product. So I know you hear me say that a lot, but at the end of the day, I have an extreme bias for action in order to solve problems for my customers and their service has allowed me to do that.

I'm going to turn it back over to Tia and bring it home. Thank you.

So Julie Anne, I think you know all of our LPs so I'm gonna close this out and I'm gonna do it in record time so that we, I get you out here, get out of the session on time.

So first AWS Clean Rooms can have a profound impact on the way you do your business, the way you collaborate internal again, like a financial services institution, even the way they leverage data internal can be cumbersome because there are rules about how you share data across different lines of business as well as a profound impact on your data external.

It helps you to minimize copying, share collaborative data in a fast amount of time and do it with a strong security and privacy posture with our first AWS Clean Rooms, ML model, you can create look alike segments again without having to share your IP with the machine learning related aspects as well as preserving the underlying data.

And last but certainly not least AWS Clean Rooms Differential Privacy helps you protect your user's privacy with mathematically rigorous controls in an easy to use way if you want to learn more. I think Ryan alluded to this. Here are the QR codes where you can download the AWS Clean Rooms. eBook. Read more. Please read more about AWS Clean Rooms. ML, read more about AWS Clean Rooms, Differential Privacy and last but certainly not least is Re:Invent, winds down.

Take a chance, go visit the Re:Invent booth for a capability demo. We encourage you to stay engaged. Let us know how we can help you. We would love to partner with you and I hope you have a great day.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值