Unlock insights on Amazon RDS data with zero-ETL to Amazon Redshift

最新推荐文章于 2024-07-25 13:56:03 发布

李白的朋友王维

最新推荐文章于 2024-07-25 13:56:03 发布

阅读量109

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134838533

版权

Good morning, everyone. Welcome to this session. Thank you for joining this early morning session. I'm Sudipta Das. I am a Senior Principal Engineer working on Amazon Redshift. And this talk is about "Unlocking Insights on Amazon RDS Data with Zero ETL to Amazon Redshift" - this is a brand new capability that we launched in Adam's keynote yesterday and you are going to learn more about all the details of how it works and how you can use it to get value out of your data.

Data is your differentiator. IDC statistics published says that about 90% of today's data was created in the last two years. A study from Forester says that by using this data, you are 8.5 times more likely to at least have 20% more revenue growth. Another statistic provided by Accenture says that only 32% of the organizations are able to realize the value of this data. So that is what is at stake and this is what we are trying to unlock - the value of data for your organization.

What does it mean for you? Right. Data is a source of competitive advantage in this highly competitive setting that we are in. Having access to near real time insights on the data that is critical to your business is paramount to getting the lead. It comes across various scenarios - such as if you're tracking your customers, you want to personalize that experience, you want to optimize and deliver the best product experience. You want to predict churn, you want to detect any anomalies in the behavior. If you are in the supply chain, you may want to optimize your supply chain. You want to drive efficiencies - there are applications across the spectrum.

So just by a show of hands, how many of you need to analyze data from one or more of your operational databases? I see quite a few hands. And how many of you build and operate or maintain these data pipelines in order to drive analytics from your operational data? Again, many of us here, right? And who likes the complexity of pipelines? I see one hand, but most of us don't seem to like the complexity.

So connecting data requires us to maintain complex ETL pipelines. And that is why AWS has been investing on a zero ETL future - a journey that we embarked last year that we continue to keep investing over the upcoming years as well - to get you the value out of the data.

So this talk, we will talk about one specific integration - zero ETL integration that we launched, which is with Amazon RDS MySQL with Amazon Redshift. We'll talk about how the zero ETL strategy is going to help you and what it enables for your business in terms of analytics. We'll walk you through some of the details of how it works behind the scenes and what are the capabilities it has. And at the end, we'll tie it all together into what insights you can drive out of your data along with a demo to see the system in action.

So, Amazon RDS for MySQL has been a service for more than a decade which is to set up and operate and manage your MySQL and MariaDB databases. It is a fully managed experience that allows you to deploy familiar open source databases with full version compatibility and access to the latest versions. It takes away the manageability overhead from you while being cost effective and performant. So you can get the best price performance for your operational use cases for driving your applications and your transactional scenarios.

It is backed by highly high availability guarantees where you can configure how you want to drive the availability of your systems. And it is geared for easy migration from your on premises or other self managed offerings that you may have. RDS MySQL is used by hundreds of thousands of customers to drive their businesses - this comes across from all of the different industry segments and verticals.

Here are just a few instances - there are customers from the financial services, operating payments, mobile banking, marketing and advertisement for targeting linking of social interactions, games, software and internet retail, as well as media and entertainment. And you can see across the spectrum these different scenarios and use cases that come in.

On the other side, you have analytics and data warehouse offerings from Amazon Redshift, which is a fully managed AI powered cloud data warehouse. So as you get all of your data into Amazon Redshift, it allows you the scaling and price performance benefits to drive insights from SQL all the way to machine learning. Its advanced capabilities allows you to get rich insights out of your data.

So Amazon Redshift is an industry leader in price performance. This is a chart from based on an industry standard benchmark TPC-DS where lower is actually better price performance. And as you can see compared to other cloud data warehouse vendors, Redshift can be up to 6x better out of the box price performance.

The previous chart was for single query performance - as you run queries one at a time, most of our workloads are concurrent, especially the BI dashboards. And this is where Redshift's price performance shines even more - where compared to our competitors, Redshift can be about up to 7x better price performance for concurrent lightweight BI dashboards kind of workloads.

And Redshift is today used by tens of thousands of customers and again across the different industry segments, whether it's media and entertainment, financial services, healthcare and so on.

So, coming back to the topic - we have these two powerful services but connecting them requires you to build pipelines and worry about the operational characteristics of your pipelines which comes in the way of your deriving analytics. We have high performance transactions in Amazon RDS and we have scalable and rich analytics in Redshift and this is where these complex pipelines are problematic, not only to build but also to operate, especially as your data changes, your schema changes your system characteristics change and so on.

And the zero ETL integrations are really meant to address this problem so that you can focus on your insights. It's easy to set up and manage so that you can get started in a matter of minutes instead of days, weeks or months. You have access to time sensitive data - so as your business and customer patterns are evolving, you can adapt your application and drive actionable insights in near real time in the matter of seconds to minutes of when the data is changing.

And you have easy access to global insights - it's very common to have multiple operational databases across different business segments or across different geographies. You can bring them together consolidate into one Amazon Redshift warehouse in order to derive global insights. And this also allows you to tap into the strategy that AWS has embarked on with this overall zero ETL roadmap.

So at the core of this is Amazon Redshift - Redshift is a data warehouse. You have compute, its data is backed by durable storage called the Redshift managed storage - that's decoupled from the compute. On the left hand side are various ways the data gets generated. We'll spend a lot of time today on the operational databases - not only we have Amazon RDS MySQL, which is the focus of this talk, there are multiple zero ETL offerings coming in from Amazon Aurora for PostgreSQL and MySQL as well as Amazon DynamoDB.

If you have data that is generated as files that lands in Redshift, that lands in S3 - Redshift has a feature called Auto Copy that also allows you to automatically load that data into your warehouse. Similarly, if your data is generated via streams via Kinesis or Kafka, again, you have an easy integration to get the data into Amazon Redshift. So depending on what type of data is getting generated, these integrations help you to get the data into Redshift and store it reliably and cost effectively in Redshift managed storage.

Once the data is in Redshift, you can use Redshift's rich capabilities such as materialized views, machine learning integration in Redshift or data sharing in order to scale or cater to a broad class of use cases that you may have. You can have the data cataloged and discovered via AWS Lake Formation which also integrates with Redshift - this is critical business data and often your organizations would want to enforce governance on this data. And this is where services like Amazon DataZone provide you those governance capabilities.

Again, once the data is in Redshift, in addition to the Redshift surface area, having the data in Redshift also opens you up to a broad class of use cases via again integrations with the various analytic services. So you could have transformation workloads or analytics workloads written in Apache Spark via these services such as Amazon EMR, AWS Glue or Amazon Athena - they query the data in Redshift in place via the Spark integration that we launched last year. So once the data is there, not only you have access to SQL or the UDFs that Redshift supports, you have, you can write your Spark applications access the same data in near real time.

In addition, there are integrations with BI tools such as Amazon QuickSight or machine learning via Amazon SageMaker. And not only are we supporting our traditional machine learning models - earlier this week, we also announced integrations via SageMaker to bring in your own MLMs so that you can use those MLMs for your machine learning scenarios.

Stepping even further back, once the data is in there in Redshift - in addition to all of these ways of getting data in which we talked about in the previous slide, Redshift also supports federated queries which allows you to run queries and insights on transactional data in situ without really moving the data or the integration with Amazon Data Exchange that allows you to tap into data marketplaces to again use your own customer data with other data assets that may be available through a marketplace to derive even richer insights.

Redshift has a huge collection of partners and where Redshift certifies these integrations with a broad class of BI tools. Redshift has access to data APIs so that if you are designing your serverless applications that are going into Fargate or Lambda, it gets becomes easy, RESTful access to the data so that you don't have to manage your JDBC or ODBC connections. And there are many more integrations with other AWS AI, analytics and machine learning services that I didn't even get to talk about.

So that is the richness that you get access to. And you're not only limited to AWS services - if you have your open source Spark offerings, the Spark integration that we launched last year is also available in open source. So this is what it enables you for - if you embark on this strategy that AWS is on, it unlocks the value of the data for your organization.

So now that you saw what value you can get, I'll hand it over to my co-presenter, Jim, who will walk you through how to get this in action.

Alright, thanks Sudipta.

Hi everyone, I'm Jim Tran. I'm a Principal Product Manager with RDS. And the thing I love most about zero ETL is it's just so simple to use and it just feels like magic. And so Sudipta covered the value of zero ETL and how it fits within the broader AWS strategy and what I want to cover is one level deeper - so let's actually kind of walk through from start to finish all the various steps, right, from the setup to the initial data sync, dealing with the ongoing data updates. And then I have a couple of demos that will showcase the analytics and monitoring experiences.

So let's start with setup. So as you know, if you're trying to build a data pipeline that is secure and reliable, it's actually a major undertaking. So you've got to bring together a highly technical team, you gotta research the various implementation options out there, looking at AWS capabilities, open source technologies, commercial offerings, and then you probably wanna do prototyping. And when it comes time to actually building out the solution, you're gonna run into a lot of different pitfalls and edge cases that you're gonna have to deal with. And when it comes down to it, it will take weeks or months to actually build this out.

And so with zero ETL, what we're doing is eliminating all of that heavy lifting so that you can literally get this working within minutes. So let me show you how that works.

Ok. So if you go into the RDS console today, there is a new item in the left hand nav here called Zero ETL Integrations. If you click on that, then you get the introductory screen which explains that number one, you have to prepare the source database, you have to prepare the target data warehouse and then you finally click to create the zero ETL integration.

So let's just get started. I'll just create "reInventDemoIntegration", give it a name. And then from here, I need to select a source database. And if I click on "Browse RDS Databases", this will only show me the databases that are actually eligible to be a source for the zero ETL integration.

So the database has to be running MySQL version 8.0.28 or higher. So if you're on an older version, this is probably a good reason to consider upgrading. And what you see here is I have normal databases, I can also point to a replica as the source of a zero ETL integration. And the reason you might want to do that is so that you don't impact your, you know, your primary database.

So this database also needs to be configured in a particular way...

It has to have backups enabled as well as the bin log enabled. And so I'm gonna show you what it looks like when those prerequisites are met and also when they're not met.

So here's an example when they're not met. If I click on this database here I click, choose, you'll see that it's checking the status of that database. And it says, hey, look, there's some parameter settings that aren't set up yet. You do you want me to fix it? And if I were to check this box, I say fix it for me RDS will then update the RDS database parameters to actually enable the bin log, right? And set those prerequisites. It does require a reboot. So I'm not gonna do that as part of the demo.

Um but if you were to select a different database that actually has those prerequisites already met in this case, database two, then I can just proceed. Ok. So you, if you preconfigure it ahead of time, this is what you get. But if you don't have those prerequisites in place, you can have the uh UI experience, automatically take care of it for you, then I'll click on next.

So now the target, which is the Redshift data warehouse. So for the rest of data warehouse, I actually can either choose one that's in my current account. We also support cross account access as well. And this comes into play when for example, you have a business division that's, you know, owning an application with under one AWS account. But your analytics team is running the Redshift data warehouse in a different AWS account.

So if you were to, so we do support that, you can provide uh an Amazon uh resource name here. If you want to specify a different account, if it's within the same account, you can simply do the same thing, browse Redshift data warehouses. Again, it gives you the list of uh clusters that you can um select from. And a Redshift cluster in order to be eligible to be a target for a Zero E to integration has to have encryption enabled and it has to have a case sensitivity turned on.

So I'll give an example of what it looks like when you have the prerequisites met and when it's not met. So for example, Redshift cluster three does not have those prerequisites met. And if I were to choose that than what you see it's checking and it says, hey, do you want me to fix that configuration for the Redshift? Um I can enable case sensitivity and then also you can have the workflow automatically create the IAM policy that authorizes um the ability to create an integration into that Redshift data warehouse as well.

So if I were to click on fix it for me, it will reboot the Redshift cluster. Ok. If you already have those prerequisites in place, then uh in this case Redshift cluster two. Then I simply click on next, I can add tags like I can with every other um AWS uh resource. And then I can also customize encryption if I want to use my own KMS key, for example, ok. And then after that, you click on next.

So literally, the steps to create a Zero ETL integration, it's select the source, select the destination and click create. And if you haven't met the prerequisites yet, uh the workflow will actually help you um get those resources set up. And so literally, that is how easy it is to actually set this up compared to actually building out a full reliable data pipeline. Ok?

And if I were click on Zero ETL integration, then it would uh kick off the actual workflow. So I'm gonna switch back to the slides and show, show you what's actually happening under the covers. Ok?

So we've got the RDS MySQL instance on the left, the Redshift on the right. And I once when I click on create, then that Zero ETL integration is gonna be in this creating status, right? And it'll stay there for about 30 to 40 minutes on average. And what's happening behind the scenes is we are provisioning all the replication components, we're provisioning the intermediate storage, we're rebooting the instances as needed um and doing all the you know the setup and configuration.

So you wait 30 to 40 minutes at that point. Ok. Oh, sorry. I hit back on this button. Ok. So, so it starts with the reting state and 30 40 minutes later, it will switch to the active state. But there's one more thing you need to do to truly activate it. You need to log into your Redshift data warehouse and you need to issue the SQL query which is create database, you know some name from the integration and that will then link that Redshift database to your Zero ETL integration. And once you do that, the Zero ETL integration is officially active and then from there on out, it just works. All right. And you can step away and the data will continue flowing from MySQL over to Redshift and you don't necessarily need a babysitter.

So let's actually dive one level deeper and actually see what's actually happening under the covers to make this possible. So the first thing we do is an initial sync of the data. So if you have an existing database, you are probably already have dozens hundreds, possibly thousands of tables already that exist in there um containing gigabytes or terabytes of data, right? And what we don't want to do is you know, do a query against the database to get that information to copy into Redshift, right? Trying to query and fetch you know gigabytes or terabytes of data against your production database that could lead to pretty negative performance impact so what we do instead is a different approach and the approach that we use ends up being more efficient and performant and it does not impact the source database at all.

So what we do is we actually take a back a recent backup of your database. And since that backup contains a point in time snapshot of your data, we then instantiate a temporary database instance, load up that point in time snapshot of the data into that temporary database instance. And then that's actually what we read. And then we have a big data processing job that then reads that data and then in parallel uh writes those tables into Redshift.

And so the benefit of this is that there is no impact to your source database. It's highly performant, we're processing it in a highly parallelized way and it just works. So at this point, we now have effectively a replica of the data in the Redshift cluster.

Zero ETL will handle all the nuances of the data type mapping. So you know some of the data type mappings will be fairly straightforward, right? A var in RDS MySQL gets translated to a var in Redshift. And for the other data types, we will perform, you know the most appropriate mapping to a corresponding Redshift data type.

So for example, a datetime in RDS MySQL gets translated to a timestamp data type in Redshift. So there's a lot of nuances with, you know, the data type mapping, Zero ETL handles all of that for you. Ok?

So we've got the initial, you know, seed data into Redshift, but your database is not static, right? You're constantly updating it, adding more data um and evolving the database. So how do we keep it in sync? And to do that, what we rely on is you know, the MySQL bin log and we um do change data capture.

So CDC and we process those change data records that are in your MySQL bin log. So as you're issuing SQL statements to your database and you're making changes to the data or you're making changes to the schema, those deltas are being recorded in your MySQL bin log. And then our Zero ETL processes will then be continuously reading and tapping into those changes as they occur, we'll then buffer them and then we will replay those events um into Redshift.

So with Zero ETL, we're transparently handling all of the, you know, 80 plus different statement types that you can imagine um in order to reflect these changes and they typically will appear in your Redshift data warehouse within, you know, 15 to 20 seconds. So near real time.

So for example, if you were to do a create table with the primary key in RDS MySQL, a few seconds later, you'll see that table pop up in Redshift. Similarly, if you were to update or delete rows or add hundreds thousands millions of records into your RDS MySQL tables, they would show up on the Redshift side if you were to do an alter table statement to change the data type of a column or to uh you know, to drop a column, of course, that will work as well. And certainly when you add a column to your table that will be reflected on the Redshift side.

So, I mean, these are the, the basic, you know DDL uh operations, but we also handle all of the other nuanced ones as well. Ok. So the net of this is that from your perspective, as changes are happening to your RDS MySQL databases, it almost feels like magic that they just appear on the Redshift side.

And it's not just changes to the data or changes to the schema, your systems themselves are also always evolving. So, you know, on your RDS MySQL instance, if you are scaling your compute or comp you know, scaling your storage, um if you're experiencing a multi AZ failover or if you are making a configuration change and uh you know, need to reboot the database for any, any reason, all of those changes are seamlessly handled by Zero ETL, meaning your Zero ETL integration will continue being active.

Once the database is available again, it will just pick up where it left off. Similarly, on the Redshift side, if you want to change your cluster configuration, by adding new nodes, removing nodes, changing the node type or rebooting, in order to pick up a configuration change, Zero ETL will just keep chugging along once Redshift is available. So that's pretty cool.

So whether the data is changing the scheme is changing or the systems themselves are changing Zero ETL will continue to function. There's one more uh edge case that I wanna talk about it, which is there are certain edge cases for which we will have to reseed the data.

Um for example, if you have a table with a primary key and then you change that primary key in MySQL, we'll have to, you know, do a full reseed of that table into Redshift. Uh this is also useful to also handle some of the extreme edge cases and failure modes.

So for example, if you, um if you experience a network connectivity issue, right? And uh the data is out of sync for a very long period of time and we can't simply pick it up and replay it from the MySQL bin log. Then at that point, Zero ETL will just transparently do a full reseed of of the data and start fresh again. Ok.

So, so in this case, we've kind of, you know, gone behind the scenes and showed you like what's happening. We're doing the reseeding, we're doing the seeding of the data, we're handling all the CDC updates, we're reseeding as necessary. But from your perspective, this is what it looks like, right? It's all transparent to you and, and from a customer experience, you just simply see a Zero ETL integration that's active working and seeing the the records show up in Redshift.

So when we were designing Zero ETL, we had various design priorities. And so this is how we thought about our design priorities. Our highest priority is data security. So we protect your data at all points along that journey by making sure that the data is encrypted at rest or encrypted in transit. And in fact, we require that the Redshift cluster itself has encryption enabled.

It's super important that you know, we do this accurately and that data correct, you know that you have data correctness. And so with a comprehensive data type mapping, dealing with all the nuances of that, dealing with all the DDL statements. Um you can you get the assurance that the data appears in Redshift the way you expect and we, you know in the face of changes um whether it's changes or potential disruptions, we design the Zero ETL integrations to be resilient and self repairing, right?

So we do checkpointing. So we're keeping track of the transaction IDs, you know where we are in the in the bin log, which transaction ID have we processed so far having checkpoints. So that if we need to, um we can uh you know, self repair if, if we run into any issues.

Um and so there's, you know, dozens of these different failure modes that are just generally handled by, by the Zero ETL implementation. We talked already about, you know, minimizing the performance impact on the source database that's super important again, particularly for your production databases.

And when we're doing that full initial sync, there's no impact to the source database. When we're doing the CDC, we're simply reading off of the, the CDC records. And we are not uh at any point ever querying for the data against the database which could potentially impact it. So keeping the performance impact as minimal as possible was a big design um consideration for us.

And lastly, uh it was very important to us to make sure that this was performed. You know, if you are trying to get some near real time insights from your data, we want to make sure that the data is replicated in a very quick fashion within seconds. And also um that we can handle the, the data throughput that we expect, you know, the majority of databases to be pushing through.

Ok. So let's actually switch over to a demo where i can actually show how this really works. Ok. So what i've done for this demo is um so we have the RDS MySQL instance, we have the Redshift cluster and then I also created a quick QuickSight dashboard.

Ok? So let's start with the goal here, which is the QuickSight dashboard. This is what it looks like:

Um and if I were to refresh it, you'll see that it's broken. Uh and it's broken because right now there is no schema. Uh and there is no data that's powering this dashboard yet. So we're gonna fix that. Ok?

So let's look at the actual rest of their warehouse. So in the Redshift data warehouse, I've created a Zero ETL integration already uh to Redshift Cluster two. So if I were to just expand that here, you can see uh I created the database is called xz, probably not the best name. But um so that's the, the database that's actually connected to our Zero ETL integration. And right now, uh there's only the public schema. Ok?

But what we're it, what's powering the dashboard itself is uh it's looking for the orders in the Reinvented Demo schema. And here we'll just verify that that doesn't exist yet. If I were to try to select from that table, you can just see that the schema does not exist. Ok?

So let's actually populate this Redshift database with all the data that we need to power the dashboard. So I'm gonna, so I have uh my SQL here. So that's reloading the environment. So what I'm gonna do here is first log into an EC2 instance that has connectivity to my my SQL in to my MySQL instance. Ok.

Ssh to the two instance. And then I want to connect to my sequel. Ok. So I'm inside um I'm connected to my, my SQL instance. So I'm now going to uh actually create the tables and populate with data. And I've already prepared a whole bunch of statements here.

So here's uh set of sample tables, there's 20 tables with all of the foreign key relationships and indices and so on. Ok. So this is this, these are my DDL statements. I've also have the sampled um records as well. So this is records for all of those 20 tables. And then as a bonus, I also created another SQL file in which we add 400 more orders. And so we'll see that at the very end. Ok?

So we have three files here, one to create the schema and all the tables, right? Uh the second one to actually insert those records and then finally, we'll insert more records. So let's actually just start that here.

So I'll go here. Source, ok? So this is now creating my tables. Oops, sorry. Ok. So there's no database selected. So actually, I need to create the database in the first place.

So I'll create ReInventDemo and every database that I create, it's create database and every database that I create in my SQL ends up being translated to a schema on the Redshift side. Ok? So this will be the ReInventDemo schema when it shows up in Redshift, uh then I'll use ReInventDemo.

So let me switch to that database and then let me now actually populate it with the tables. Ok? So I populated created 20 tables and might as well while we're here uh also populate it with all of the sample data as well. Ok. So that's, that's good.

Let's go back to Redshift and let's check it out. So in Redshift here, I'm gonna reload, see my resources Redshift cluster my database here is xz. Ok. There we go. So you can kind of see here that the schema has been replicated. Uh the ReInventDemo schema here is basically the replication from uh the MySQL ReInventDemo database that created and it's created all 20 tables.

There's an orders table in here and all of the data type mappings of course has already been handled. So let's just do a quick count of that now and now I do a count of the orders table. I see that there's 48 orders and we will check our dashboard and I'll refresh it and there we go. Ok?

So what, what we did basically is, you know, created the the tables loaded into MySQL. A few seconds later, it actually showed up in the QuickSight dashboard that was uh reading from Redshift.

So just as a bonus, I also have the sequel statement to add 400 extra orders. So let's see how quickly this thing gets replicated over. So source 400 more orders. Ok. I'll add the foreign orders. I will then go to my dashboard and I'll refresh. Let's see. Ok, so it's still 48 orders.

So again, it takes about 1012, 15 seconds usually. So let's refresh again and you can kind of see there, it's updated and we have all the orders that we expect uh showing up in Redshift. And that really is how easy it is to set it up and test it and see that data flowing both from all the way from RDS, MySQL into Redshift, all the way to, you know, QuickSight or what other other AWS services you're integrating with. Ok.

So I'm going to switch back to the slides as Siptu mentioned, you know, Redshift's got all this rich functionality and the cool thing is once your data is replicated into Redshift, you get to take advantage of all those features as well, right? Things like being able to create a materialized view where you can then further transform your data or doing complex drawings across, you know, multiple tables. And Redshift has the integration with the data lake with S3 with other AWS services.

Um it's also introduced machine learning functions as well. So there's quite a lot that you can do once you get the data out of uh RDS MySQL into a system like Redshift. And we, as far as monitoring, we actually support multiple ways of monitoring Redshift integrations.

So uh let me just switch back to the demo and just show you those three ways. So the first way that you can monitor your integrations is you can query some system tables that are provided in Redshift. So there is this table called SVV_INTEGRATION. If you query that you can see the status of your Zero ETL integrations, right? This is the one that's active right now.

You can also excuse me, you can also see the status of all the replications for each of the tables. So the table state table will give you the sync status for each of the tables in your your MySQL database and whether it's been synced over to Redshift successfully or not. And if it hasn't synced over successfully, for some reason, it'll give you the status as to why.

And then, and then for, you know, deep uh inspection, we also provide a table where you can actually see all of the the checkpoints that are created. Um and you know how many bytes were replicated over and so on. So this is for the deep dive.

Um so these are the Redshift system tables. And for those of you like me who kind of like a more visual one, we do make this data also available in the Redshift console. So you go to the RDS console to create the integration, but you go to the Redshift console to actually manage and monitor it, right?

So in the Redshift in the Redshift console, um I can view so I i'll just connect to the Redshift cluster here. You can actually view the status of your integration directly in the console. So I can see that it's active, the database state is active, you get to see CloudWatch metrics like the replication lag, right?

Um how many tables were replicated and how many of them failed? And you can also see the advanced table statistics for each of the individual tables. So you can see these were just recently updated right now. Um and they're all in a successful sync status.

And what I actually like to do is pull some of these CloudWatch uh metrics and you know, just present it into a CloudWatch dashboard. So if you have a lot of Zero ETL integrations, you know, being able to monitor multiple of them at the same time is useful. Uh and then if you're using other monitoring solutions, you can also of course integrate with those as well. Ok.

So that's the monitoring experience. And this uh we've been talking about, you know, one Zero ETL integration, of course, you know, the fun really begins when you actually have multiple sources feeding in into your Redshift um data warehouse.

So you can certainly set up, you know, additional Zero ETL integrations with other RDS, MySQL databases with Aurora, MySQL. We are PostgreSQL and DynamoDB, right, all feeding into your Redshift data warehouse where then you can do your advanced analytics queries and as the dimension all the integrations with other AWS services.

And so this you know coming full circle back to Sipos diagram, this is what Zero ETL enables, right by making it so easy to get your data from your source databases into Redshift. And then to the rest of the AWS services, we make it super easy for you to do analytics on all your data and address your analytics, use cases, machine learning, use cases, gen AI use cases.

Um and, and that's the goal, a few, a few takeaways. We're excited to be investing in a Zero ETL future. We want to make it super simple for you to be able to do um to do the analytics on your transactional data without having to build these complex data pipelines that takes months to build and a lot of manpower in order to, to operate and it's a fully managed offering.

So there's no code for you to manage, there's no infrastructure for you to manage. And this really frees up your team's time. If you're not doing that, you can then focus on, you know, your other business priorities and getting the value out of your data and by connecting the dots between the different AWS services.

We want to be able to accelerate your, you know, your data strategy and in the AWS cloud. And ultimately, all of this is really in service of making you more productive and helping you get the most out of your data.

So last two slides here, I hope you're enjoying your time at ReInvent and um and your time in Vegas, we do have other Zero ETL sessions during this conference. So uh you can search in the session catalog for Zero ETL and and check them out if you're using, you know, Aurora, MySQL or a PostgreSQL or, or DynamoDB.

And my main request is check out the public preview and take it for a spin, right? We're super excited to make it available to our customers. We know this is something that will save you a lot of time and really help to um help you drive more value out of your data.

So, one of the key saying within Amazon is it is still day one and we feel like this is still just the beginning of the Zero ETL journey. And we really want to get your feedback and understand how we should be evolving our Zero ETL offerings.

So if you have any suggestions or questions, feel free to send them to Siptu or myself. And we can also uh put you in touch with, you know, the other team members who have made this project possible. So uh please complete your, your survey.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Unlock insights on Amazon RDS data with zero-ETL to Amazon Redshift

Ok?
复制链接

扫一扫