Understand your customers better with a modern data strategy

最新推荐文章于 2024-09-07 11:35:15 发布

李白的朋友王维

最新推荐文章于 2024-09-07 11:35:15 发布

阅读量106

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134791370

版权

This session, you are all here to see is understand your customers better with the modern data strategy.

So as we get started, what I want you all to think about is yes, you are possibly all customers of AWS in some way, shape or form, but you also have customers as well. So I want you to come out of that box just a little bit before we get right into the material.

And I want you to think about what was the last thing you purchased anywhere, right? Think about that for just a second. What was the last thing you purchased anywhere? It doesn't matter what it is. And yes, I know that for a lot of us, it is Cyber Monday. So you very well could have been purchasing something not too long ago as well.

And the reason why I want you to think about that is because how do you want to be treated as a customer? How do you want your customers to react to the products that you're providing for them? Can you make their lives a little easier when you look at something that you are working with a website? A product, whatever it may happen to be. What else could we do to innovate? Perhaps one of these things may have come to mind, right.

We are trying to make our customers lives easier. We want them to feel like we're taking care of them and with so much data today, what do we do with it? All right. How do we get to the root of what makes them keep coming back? How do we not lose a customer? Right. How do we get new customers? How do we reach that wider audience with all the data?

So that's really what we're here to see today. So for any of you that might be new, a little bit new to the data, um data analytics world of AWS, this is gonna give you a nice little sampling of all of the services for a modern data strategy.

So whether it's a retail experience that you work with mainly or gaming, uh experience that some of us play in our spare time, maybe just a little bit, right? We want to think about what do our customers want us to understand about them? How can we take care of them? Maybe just a little bit better.

Next time we want to listen to them, we wanna rele things, we wanna make sure that no matter what we do, we're trying to be proactive in all situations, not only for the things that they are seeing and doing and buying and be reminded of, oh, did you forget to maybe put this in your cart the next time that you go in? I kind of appreciate those things myself. Uh and as we go through, I don't believe I actually introduced myself as well.

So welcome to this afternoon's presentation. My name is Michael Lynn Cozi. And I'm gonna take you through how to look at that customer data and also a number of powerhouse of AWS services that we're gonna look at today, right?

Data comes in super, super fast anymore, whether we're wearing it, whether we're carrying it, whether we're talking on it, whether we're typing on it one or the other, we need to figure out how are we going to store that much data, perhaps exabytes of data set of bytes of data. And I can't remember what comes after that one, but I know it's a lot.

So how do we store it then? How do we access it and how do we analyze it because that's what's going to allow us to go back and take care of the customer, right?

So all of our data needs to have a place to be stored, accessed and analyzed and sometimes a multitude of those as well. So we're gonna take a look at what we call a modern data strategy, right? Modern data strategy and in this modern data strategy, the nucleus of where we want to be able to take in all of this data from multitudes of sources, whether it's structured data, whether it's unstructured data, whether it's semi structure data that we have in the center of it all is called a data lake.

So I'm going to introduce you to our AWS services that are all going to correspond to the services that you see, not only around the perimeter but the nucleus, which is our data lake in here.

Now, if we want to add some of those AWS services to the mix, oh, here's the powerhouse. All right. So everything from starting at 12 o'clock, all the way around that perimeter, notice the arrows outside, in, inside out. But the great thing about it is we have the ability to use data in its originating place.

So I know I came from the world of data warehousing. We won't mention how long ago. But every time I wanted to analyze data, I had to move it from point A to point B maybe back to A again, a little bit of C and so forth. In order to be able to even see that data, it was very siloed, right? It made it very difficult. And then we had a lot of, you know, duplicate rows of data that were in there. So we want to figure out a way to get rid of all of that stuff.

These services here in which we're going to take a look at are all going to give you a various amounts of capabilities to do what you need to do with that data. So the first one that we're going to look at is a data lake.

Now AWS has its very own data lake and that very own data lake has a few services that are going to work with it in conjunction. And a couple of those since it is built upon our primary service of the Simple Storage Service, also known as S3, right, we already know the capabilities of S3. We can grow storage to pretty much infinity if we really want it.

In addition to that, there's a few other services that we see here. So if we needed to query that data in our Lake Formation data lake service that sits on top of S3, we have the ability to do that. We can use a service called Amazon Athena. Is that a cool service?

So we can, we can query all the data as necessary right in place. I'll mention a little bit later on something about a federated query that we can use with some other services that are out there. So it has many, many integration points.

And then if we said, well, what if I needed to catalog that data or put tags on it or maybe do a little bit of ETL pro extract transform and load? What if I needed to do that? Can I do that, Michael Lynn? Yes. Yes, you absolutely can.

There's another service down there on the bottom called AWS Glue. And that allows you to crawl your data to see what do I have in that data lake, right. We don't just want to keep dumping mounds and mounds and mounds of data in there without understanding what we have.

So we also need to ensure that we have governance built in, right. So get over those data silos, bring our data all together structured, unstructured, semi structured. And with the use of that data lake, we have the ability to optimize our storage engines.

So remember I said it sits in S3 S3. If you haven't had the pleasure of working with that particular service has the ability to have tiered data. So depending upon how frequently you are accessing it, there are different storage capabilities for it as well. There's also we have Glacier in there. So we'll talk a little bit more about that as we go, you can also apply granular permissions.

So we talk about security all the time and don't get me wrong. It is of the utmost importance no matter what we do, whether that's at the database level, whether it is at the table level, the record level, the column level row level or field level. All right, we can do all of that using Lake Formation sitting on top of S3.

So we have very granular control with that. And then data ingestion, we can implement that data lake and import data from databases we can get it from external sources, we can stream it in, we can do just about everything with that data lake. As far as ingestion is concerned, storage opti optimization.

We just talked about the tiered levels of storage that you have with S3. Don't forget the 11 nines of durability that AWS S3 provides for you and then also data sharing.

So security and governance, we know that we can apply. So we have our standard policies plus Lake Formation or data lake itself also has layers of security that are built on top of that. So we can use the Identity and Access Management with our S3 service. But in addition to that, we can further secure and apply governance to the data that we're putting in our data lake in Lake Formation and then data sharing and access.

We can have this be a self service type of data lake as well. So if you said, well, i don't want everyone to have all access to all that data all the time. Well, no one does, right. We want to make sure that only those who need access to certain levels of data are going to get it.

So we have the ability to apply the security, the governance and then also share pieces of that data down to the field level as we deem necessary. So all of that for your data lake just a few more benefits as if i haven't mentioned quite a few already.

So open formats cost effective as far as scaling is concerned, we can decouple the storage from the compute. So if we need to use the data, we can go into Lake Formation and get it to work with other services. If we have data in other locations such as Relational Database Services or one of the other 15 plus purpose built databases that we have or other services in AWS, we could say, well, hey, i don't really need to use that data any longer. Can i put it back? You absolutely can.

And depending upon the frequency of use, remember you pay for what you are using and the storage inside of S3. So we're not using it on a frequent basis. We can put it in different tiers that we have and not only that but machine learning, machine learning also choose that analytic tool of your choice.

So, machine learning is also built in to our data lake as well. Already talked about the processing the data in place and then just a little bit more about integration for our management purposes permissions S3 being the storage layer that we have glue for etl purposes.

Athena for querying RDS and DynamoDB for a purpose built databases, which we're going to learn a lot more about in here. In addition to that all of the other data warehousing services like Redshift, Redshift Spectrum, OpenSearch services.

So we're all accustomed to opening that browser platform of choice and typing something in and saying, hmm, how can i find what i'm looking for out there? Well, in addition to that, we can also use that data to help our customers.

So SageMaker for machine learning, which we're gonna also see a bit more on as well. In addition to our data lake or Lake Formation, Amazon S3, Athena and Glue, which are nucleus of our modern data strategy.

We start to move now around the perimeter and take a look at a couple of database services. So first one we're going to introduce to you is our Relational Database Service. So if we need something that's a bit more structured, so rows and columns, we want to be able to have, you know, ACID associated with the data that's being stored there. Our Relational Database Service is going to be something that's really, really well defined for that.

We also have DynamoDB and DynamoDB is our NoSQL database. So it has its own purpose in and of itself our purpose-built databases. I mentioned just briefly a little bit before that we had over 15 purpose-built database and you might think, oh, that's a lot, that's a lot. What do they all do, right.

So we know relational databases...

That was, that was my, my, my backbone before. Uh I came to AWS. We also have key value databases, both of which we're gonna talk about in a bit more detail coming up here. If you have a use for a document database, we have one of those as well. We have in memory databases, graphing databases for demonstrating relationships uh time series of data ledger. So if we needed to complete um uh immutable uh and verifiable history of our application and changes to that application data that we have, we also have ledger data for that. And then if you are working with something like a Cassandra database, we also have wide column for it.

So we're gonna start out by talking about the relational database service. So we're gonna take a look at just a little bit of the uh features and functions of RDS. And when we look at relational database service, we have a centralized data model. So this is going to give us a place to store all of our relational data rows, columns, primary keys, foreign keys, all of that good stuff. We can also join those tables together by our primary keys. Of course, no cartesian joins. We don't want those. And in addition to improving our data integrity. So I know there are are uh you know, we don't want to store duplicate rows of data with our relational database as well.

So our acid transactions that we have some of the features and boy there are actually quite a few if you don't like to have the capability to manage a relational database, let's say that hmm perhaps you would really like to focus on the optimization of the application that you have running on your relational database service. Ah that's a good thing to think about, right? I can focus on this because this is what my customers want me to do, right? This is what the data is telling me that i need to do with that. So remove the administrative test, let AWS take care of that for you, the patching, the fixing, the updating, turn on some features like multi AZ and for i hate to use uh i hate to use acronyms without explaining what they are. So availability zones just in case anyone was unaware of what i meant by that.

But deployment and scaling so high avail ability the six pillars of the well architected framework. Every time we choose a service, we want to put those six pillars in place as well. So security cost efficiency, high availability and so forth, always want to build those into any infrastructure that we have as far as deployment and scaling i mentioned multi az and when you do turn that feature on for multi az what happens? It's gonna create a standby database in another availability zone, all right, standby database in another availability zone. And if something should happen and by the way, it's syncing that data for you asynchronously. If something should happen to that primary database, how long do you think it takes to fail over? Right. How long do you think it takes it to fail over? All right, i'll give you a moment just to think about that might shock some of you, but anywhere between 60 100 and 20 seconds that's fail over. That's pretty cool. Right. 60 100 and 20 seconds. Oh, and by the way, i forgot to mention once it fails over, it's gonna also create another database for you in another availability zone. So, automatically giving you high availability and making sure that no one notices disruptions in your service, right?

So deployment and scalability, auto scaling at its finest. And then i mentioned the high availability that we have with rds. Oh, but wait, there's more. So access patterns. Um how are we using our relational database? How what are we, what are we doing to access the strict table schemes that we have in here? Um scalability i had mentioned before about adding in our um ec2 are not adding in but uh taking a look at the difference between an ec2 instance and rds. So ec2, we have to do the patching fixing and updating rds. Take that away, give it to aws, let them worry about that, let you optimize your application. And then of course, for performance, we have very fast and consistent inputs and outputs for performance. You can also pick the instances that you wish to stand up your art relational database services with as well as far as integration the sky, it's limitless, right?

So it can integrate with our data lake in our lake formation. It can also integrate with regular old standard s3. And of course, we can query our data using athena. So sometimes when we look at the relational database service, you might think, hey michael lynn, do you have another option for me? Because uh i might need something that's a little bit faster. Maybe i have a retail application or maybe i'm getting ready to launch the next latest and greatest um gaming application or, or game that's out there. So, do you have something for that? I certainly do. I certainly do. It is called dynamo db and it is our key value database.

So think about this rows and columns for relational databases with dynamo db. These are going to be our items and our attributes. So very quick millisecond ways to be able to retrieve your data. So think about something that you may have purchased lately, put it in the basket, pay for it, put it in the basket, take it out of the basket, do something else with it when we want, i mentioned gaming applications as well if we needed to create multiple teams of individuals, but very quickly, you know, two teams of 40 right in a moment's time notice. But i don't want, uh i don't want, you know, maybe all the scrubs on one team and all the good people on the other i want to be able to intermix those two together and i need to be able to average that and use algorithms with it to get those teams built very, very quickly for it. Can i do that? Absolutely.

So that is uh or a couple of the uh of the selling points of dynamo db. So i always like to think what is, what is the services superpower that i always want to think about. So here extreme concurrent users, right? You get to pick because this is a serverless service, dynamo db, you get to pick how fast you want the service to work. So read capacity units and write capacity units. And if you said, well, what if i needed 1 million reads and 1 million rights per second? Could you do that? Certainly could. Now, um we'll talk, uh we'll talk a little bit more about the price for that 1 million reads and writes per second in a little bit. But for the most part, can you do it? The answer is yes, store and retrieve large volumes of data. It is optimized for that very purpose with the key value stores for it and then our different access patterns.

So this is gonna support our key values, document data models. So we don't have to always have matching um items and attributes that are in there. Uh it is a key value pattern that we're looking for. It's a type of a non relational database. So joining tables together like we normally would do with the relational database service is not its superpower. But like i said, we have purpose built databases, pick the right service for the right job, pick the right service, right job scalability. Wow. Um horizontal scaling in auto scaling building the tables in our dynamo db service. We also have global tables with dynamo db. So as if one table wasn't good enough, global tables are actually stored, replicated tables in multiple regions. So not just one region, but now we have true disaster recovery through multiple regions that are there. And depending upon your business and your use case, we talked a little bit about whether it's uh software application development, uh creation of media transactions or me uh metadata store session data, all of these things, uh gaming platforms. This is truly what dynamo dvs strong suits are.

So cus database cost optimization as much as we possibly can thinking about. Well, what could we do to maybe save a little bit of money with dynamo db. So on demand or provisioned throughput, we can also select with our dynamo db um support for those mission critical workloads that we have. So um our consistency isolation of our workloads that we have durability. So again, with our acid compliance that we need and of course data security, we also have point in time recovery features that we can use automatic backups with dynamo db. Uh some of the integration points for it. So very much like rds still database, but different purposes for each one of those, it has integration points with the data lake. So lake formation, we can export data into other areas such as s3, we can work with redshift amazon emr. So elastic map reduce. So we'll talk a little bit more about that coming up as well. Uh and amazon s3. So i'm not uh i haven't found a service yet that doesn't integrate with amazon s3. All right.

So in addition to that, we have our relational database service which aws can manage for you dynamo db serverless. Uh i didn't mention the aurora database, but we also have amazon aurora. Of course, our true cloud native relational database service that we have. In addition to that when we think about relational databases, no sql databases. And then we said, well, is there something else i can use for the data that i want to put in my data warehouse? Right? Because that's an entirely different set of data. It's not just coming from one application. I might want to take it from a multitude of areas. What if i have multiple lines of business and i want to see all of that data? What if i want a 360 degree view of my customer? What did they buy today? What did they look at? What are they looking at in the future? What does that data tell me what is that predictive model going to say to me with all of this data that i can gather together and be able to form a complete picture of what the customer wants or needs or how i'm going to give that to them in the future.

So we are going to take a look at redshift. So think about this, it mentions concurrency scaling. Well, just remember that relational database service has a little bit of a limitation from uh database size. You can you can pick those instances that will give you into the into the terabyte range. But what if you needed virtually unlimited concurrent users, unlimited concurrent queries, millions of rows of data, right? This is what takes us to that next step of wanting to use a data warehouse. So this is where red shift truly truly shines in its superpower of the concurrency scaling

So it automatically adds additional cluster uh capacity to process and increase your workloads uh right queries. It also has uh storage uh overcomes the storage and capacity limits. So by far that of a relational database service, uh you get to pick the number of nodes that you want to set up for redshift. And I'll mention one of the database types here coming up called r a three. So not exactly sure what the r stands for, but I've come up with my own terminology for it.

So redshift is awesome. So you'll be able to see that note here in just a moment and then performance at scale and minimizing your data movement. Don't move the data to redshift if you don't have to, can we do query in place? We absolutely can.

So here's some of the features of redshift. So from our data analysis, we can see all of our data across all of our services that we want to use, whether it's uh r ds dynamo db s3, whatever it may happen to be with performance.

I did mention something called amazon redshift r a three. This is a really cool uh um uh option for redshift. It allows you to separate storage and compute because often times i know what i would run into when i was working with the data warehouse. I would say, boy, we could use a bit more compute for here. Oh, we, we have a lot of data over here. So i need a bit more storage but they came at different times and i never had the opportunity to separate those out. I always had to buy one with the other or purchase one with the other or scale up one with the other while r a three allows you to do that separately.

So you said, hey, i need a little more compute here. We just up the compute for it if i need a little bit more storage, but i don't need any more compute. We can do that too. So r a three allows us to separate that with the underlying storage being that of s3. So our simple storage service, it is a very, very fast uh performing database with our r a three and cost effectiveness.

Think of it this way, you can always start small with red shift. So for, for that much of a uh I'll just say, and i'm going to go on the minimalistic scale here 25 cents per hour to get started with red shift. And yes, of course, it can grow from there. But always remember the cost effective approach with not only red shift but our other services as well.

So we also have spot instances. Now, I'm not saying to use this for a production workload by any stretch of the imagination, but you can intermix some spot instances in there which are gonna give you some a much lower cost. You also have reserved instances and savings plans that you can take um take advantage of as well with redshift.

So some of the other benefits in here are consistent performance are structured or semi structured data that we want to bring into redshift already talked about the r three and decoupling the storage in the compute from one another built in security. So we have our encryption that we can turn on as well for redshift.

Integration with other aw s services. So very many things like cloud watch and cloud trail for auditing purposes. This also is a fully managed service. So you are taking those administrative tasks and placing those on aws and then queries at the source.

This is where i said you don't have to move your data to redshift. We have a little service with redshift called redshift spectrum and red shift spectrum allows us to perform what are called federated queries. So it allows us to go into our data lake and get data. It allows us to reach into a relational database service or dynamodb, gather that data together and query it in place. So you don't have to move your data from place to place to place. We have the ability to do that right inside of redshift and where the data sits or where it originally resides.

Here are some of the integration points with redshift. I did mention just now spectrum the data lake export that we have federated querying for multitudes of services. We also have data sharing. So if you only want to allow portions of that data to be available to certain parties, we have that as well. And then optimization, i want to mention one more item in here. When you run queries, we have what is called a query execution plan and a query optimizer with redshift. So in talking about performance and scalability with that, it's going to look at your query and it's going to say this is how we're going to divide that data out. This is how we're going to accomplish the millions of data that are now being requested, being placed inside of this query for that. So it has those built-in features with it.

Last but not least red shift machine learning features as well, integration points with dynamo db. And then one other thing here, i had seen the data lake export there at the top. Almost forgot to mention that we also have what is called an unload feature. So if you said ah i really need to keep this data but maybe don't want to store it in redshift any longer. Can i still maybe run a query off of it but not have it in redshift? Of course, you can, we're going to use that unload feature to put it back into s3 in the tier of storage to where we can still access it when we need to by using redshift and redshift spectrum.

In thinking about maybe even larger amounts of data uh and data sets that are going to be structured and unstructured data, right? Because it's not red shift superpower, unstructured data from multiple sources. We could turn to a little service called amazon emr.

Now, you all know that that's not a little service, right? Anyone in here who has ever had to set up a hadoop cluster, right? Takes what? 23 minutes, right? Not true, not true. It's a lot of time, it's a lot of effort. It's a lot to manage the administrative tasks are great so if you said, well, aws have a managed service for that. We do, we do amazon emr that you are seeing here for your big data purposes building and operating all of those big data environments, taking care of the administration process for you behind the scenes.

So remember i said 23 minutes tops, right to provision those clusters in your on prem environments. And i saw some of your, some of your faces. You're like, are you kidding me? 23 minutes of you? Fantastic. Well, you can provision an amazon emr cluster in a few moments, either through the use of the console or through the use of the command line interface, whichever one best suits your needs.

We also have ultimates in scalability with our big data environments here, applications um related amazon emr features including provisioning, managed scaling, um reconfiguring of the customers uh clusters. If you need to do that, you can also utilize emr studio for collaboration development as well. And then uh talked about the provision clusters in minutes.

Some of the other features that we have with emr is you can also run emr on elastic ker neti service. So if you said ah we're running some containers and we are using ker neti, we can use big data along with eks.

In addition to that emr has transient clusters, they also have long running clusters. So depending upon the type of workload that you're running, if it's a batch job where cost of running transient clusters for jobs is less than the cost of running long running clusters. That may be an option for you long running clusters. This is where we really want to use the managed scaling uh to scale out when workloads increase. And then when they scale back down, we want to be able to um ensure that our workloads decrease. We only want to pay for what we're using. And that's also a very common theme with all aws services only pay for what you are using.

Ah and then one more important one, look at that last bullet point there with our amazon emr service. So this is a little bit of a newer service and uh some of you might say, hey, yeah, i really really haven't, haven't used that or seen that before. Uh it is a serverless deployment option for emr. So if you have more sporadic workload, serverless starts up when you need it to start up, it shuts down when you're not using it any longer. So amazon emr serverless just a little bit more about simplifying the building process and operating your big data environments.

So it's going to give you the building and operating environments and applications related to em r features. So provisioning, manage, scaling, reconfiguring those clusters as you deem necessary provision clusters in minutes, as i mentioned before, scaling out your resources when the workload per or when the the workload requires it and scaling back in when it doesn't and of course using flexible aws data stores, so manage cluster for the big data.

Oh you know what? Sorry about that. I went backwards a little bit. My apologies. This is where i want to go sorry with our er benefits that we have. So scalability cost effectiveness pay as you go model our flexibility in our support wide range of big data processing framework.

So hadoop things like hadoop spark hive, custom code applications that you have um flexibility integration with multiple other aws services built in security features. And then of course, this is a managed service. So remove those management features, place those onto the responsibility of aws and be able to have your big data framework available to you.

Other integration points with amazon emr. So various stores engines like s3 where we can get our data from glacier hdf s, amazon dynamo dv, redshift. So all of the services that we've currently been talking about amazon sage maker for machine learning, our lake formation, our database integration for emr.

So things like hive, we can export data um processed by amazon er to databases, redshift, dynamo db and other database services. And then last but not least, we also have our open search service so we can utilize these more.

So in our dev ops culture a little bit on application monitoring, log analytics and searching so can be used for a broad set of use cases. So real time application monitoring, logging, analytics, website searching. I know most of us in the grand scheme of hey, what about fraud detection? Can this help me see be proactive before something occurs? The answer is yes, this is your managed service that can help you run and scale big analytics search the clusters themselves without having to worry about managing monitoring or maintaining your infrastructure.

So this is our open search service and with our manage service, you can use this to run scale opensearch clusters in minutes without having to worry about the managing the monitoring and the maintaining of that infrastructure. It's gonna give you the uh fast search capabilities of your applications, websites that you have data lake catalogs.

So anything that you need to search log files, perform security monitoring as well for our infrastructure, making sure we don't have malicious attempts at various components. Our opensearch service is going to also have dashboards incorporated with it. So you have the ability to observe if something is happening and be proactive in the approach to resolving it.

Some of the other search features, search setup and configuration can also be done through the management console or via an API.

We have the event monitoring and alerting so built in to the environment itself, alerting you. So you can monitor the data stored in your clusters as well as automatically send notifications based upon the preconfigured thresholds that you've set up.

Integration with open source tools. So built in OpenSearch dashboards, things like Kibana or Logstash are available. And then we also have security features built in with our service. So our managed OpenSearch environment from our Virtual Private Cloud.

So we're deploying them on to two instances. We can also use things like our standard Identity and Access Management, security groups, network access control lists, things like that for our virtual firewalls.

Some of the additional service benefits:

Fully managed cluster
High security for our data. So audit and secure your data as you deem necessary.
You can also meet and maintain those high security, authentication, authorization. Encryption is all built into it.
Monitoring system data and detecting threats is also the capability of your OpenSearch service. So you can do queries, you can see visualization of things that are actually occurring in OpenSearch.

And then optimize time and your resources. So for your strategic work, help your customers find what they need by integrating fast and scalable full text searches, make it easier for them to have a good user experience with your applications.

Other integration and data movements. As we were talking about the modern data strategy, the nucleus in the center are data lakes and then around the perimeter, OpenSearch is one of those services.

So RDS to OpenSearch service, we can integrate OpenSearch directly from RDS by configuring OpenSearch as a target. We can also utilize the Data Migration Service which would allow us to copy a copy of the data directly from RDS into OpenSearch.

We have DynamoDB to OpenSearch as well. That approach is gonna use our DynamoDB Streams as well as Lambda to integrate.

And then last but not least we have SageMaker to our OpenSearch Service.

Last but certainly not least we have a little bit of our machine learning services. One in particular is going to be SageMaker. And so machine learning is a subset of artificial intelligence that involves training a machine on historical data so that we can try to predict future behaviors or trends.

So corporations want to use this to detect or anticipate trends with their customers - seeing are they continuing to buy products from us or maybe customer churn, maybe there's something else we could be providing for them, future product development within their own organization.

So with SageMaker, you have many different learning levels and someone may say, well, you know what, I don't have any knowledge base at all. Is there something that I can use for that? You absolutely can.

We have starting from the top to the bottom:

Application Services
Platform Services
Frameworks and Interfaces

So look at that very top one, Application Services. These are the first layer in the stack anyways - pre-trained models. So something like Amazon Rekognition can take a look at a picture and tell you exactly what it already sees.

So if we have a person standing on the street with cars all around and big high rises around it, it can identify those individual components for you using something like Amazon Rekognition. So these are what we call our pre-trained models from machine learning.

If you said, well what about that middle layer with the Platform Services? So developers can use and add intelligence to any application without needing machine learning skills. So pre-trained models, we said, well, you know, I wanna, I have some knowledge, I wanna, I wanna be able to add a little bit more complexity to what we're doing with our models in machine learning.

So SageMaker and DeepLens are for that. And then last but not least in the very bottom of our layer, we have expert practitioners. So somebody, hey, I'm a data engineer or a data scientist. I want to go on and I want to use my own platform. Can I do that? Absolutely. There's a layer for every learning and knowledge level that you have with our AWS SageMaker.

In addition, this will also facilitate your data ingestion. So it is a fully managed machine learning service. It facilitates the ingestion of data from a lot of different sources. So they have streaming data, we can ingest objects or file storage, you name it SageMaker can work with it.

We can very, very quickly train and debug machine learning models from inside of SageMaker. And then we can also have our choice of deployment methods for the trained models that we have.

Some of the features for SageMaker include:

Labeling the data for supervised prediction models. When we think about those, they use historical data. So for example, think about this, we look at the picture and it says is this a picture of a cat? A yes or no.

So many times some information is missing from the source files to be able to interpret things appropriately. And from that, that includes a contract service where humans will actually pre-label and complete before the model gets trained. So this allows you to be preempted in that process so that we're not missing items of data.

Models - Many times data can come from many different repositories and it may or may not be in the correct format to train the model. So SageMaker can connect to it, it can assemble the data from many different sources and transform the data as needed.
Store Features - Other times we might say, hey, I have parts of data that are used for different models and different training scenarios that we have. What else can I do with that? So we have store features that will allow us allow us to store portions of that data to be utilized for reusability of that data and other models that we have.
Visualizing our data through SageMaker
Choosing an algorithm or bringing your own (BYO)
Deploying models into production
Detecting bias and explaining predictions

So model bias is a systematic error from an erroneous assumption that someone has made in algorithm modeling. This can happen for a variety of reasons. But the goal is to minimize the bias that we have. So SageMaker can detect potential bias during data preparation and after model training to ensure that doesn't happen.

Some of the additional benefits of SageMaker:

Deploy, manage machine learning and models at scale. So it's a fully managed service.
It also offers more than 70 different instance types with varying levels of compute and memory.
Select the endpoints with your SageMaker Studio to review endpoint data including model quality, data quality, endpoint configuration settings like traffic routing or URLs for API calls.
For our ML Ops, our SageMaker model deployment features are natively integrated into our ML Ops capabilities including SageMaker Pipelines.

There are many, many different services. I'll just name a couple that I made some notes on:

SageMaker products for CI/CD with machine learning
It also features the Store which I had mentioned
Model Registry for model artifact catalog to track lineage
Automatically converting notebook code into production ready jobs. Machine learning practitioners can now select a notebook and automate it to run as a job in production through just a few simple clicks.

So in our SageMaker Studio, we have a visual interface for utilizing notebooks.

Last but not least, receive automatic deployment recommendations. So if in fact, when you go to make deployments, it has something called an Inference Recommender. So it helps you to automatically select the optimal compute instance as you are making your deployment. To ensure we're not over or underspending on what we need.

As far as integration, the standard usual suspects - connect with EMR, S3 and more. Interactively access, transform and analyze your data from a very wide range of sources. And then build, train and deploy your models into your preferred framework.

This brings us to the last little bit of our modern data strategy. So the whole picture being seen here, our nucleus in the center of our Lake Formation, our data lake, our outside in data movement.

So all of our services around the perimeter being able to integrate data in place with our data lake are inside out. So from our data lake out and then all around the outside. So whomever needs to share data with whomever.

Just a little bit more on our modern data architecture. And then I'm gonna get you on your way here this afternoon. These are some of our classroom links. So if you are interested in further going into a very much deep dive of any of these services, these are some of the classes that we offer - everything from data lakes to batch analytics, streaming data analytics, practical data science.

And I'm going to come back to that Q&A in just one moment. But I want to make sure that you know, Training and Certification Booth and AWS Challenge Lounge are open for you. And in your downtime, take a look at the 100+ Builder Labs for free - build real skills, self-paced labs, training sessions, boot camps.

There's an actually excellent boot camp coming on Wednesday - on data analytics. So that is on Wednesday from 8 to 5. So if you are interested in coming to a boot camp for some hands on there as well, we also have that for you - Skill Builder.

I know you're probably all familiar with Skill Builder, but just in case you're not, this gives you digital training. They have a 7 day free trial starting whenever you scan that QR code.

Cloud Quest - labs, hands on things that you can practice with including machine learning.

Last, but certainly not least I will just wrap this up by asking you if you have any questions for me. I'm happy to stick around and answer any of those questions. But if not, I hope you enjoy the rest of the conference.

Be brilliant, innovate, think outside of the box and enjoy the rest of the conference and have fun. Thank you.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Understand your customers better with a modern data strategy

Right?Takes what?
复制链接

扫一扫