How Yahoo cost optimizes their in-memory workloads with AWS

最新推荐文章于 2025-05-12 01:29:01 发布

taibaili2023

最新推荐文章于 2025-05-12 01:29:01 发布

阅读量489

点赞数 6

文章标签： aws

本文链接：https://blog.csdn.net/weixin_46812959/article/details/134609985

版权

So excited about today's session as more and more of our customers adopt the cloud, you know, some of them more mature, some of them still early days as you know, no matter where you are in the journey, if you're dealing with massive workloads, and we're talking about terabytes or even petabytes of data. The common question a lot of our customers ask is how do we cut the cost, right? Um so cost optimization is a big thing in today's session. We're gonna talk about yahoo's journey of moving their ad platform into aws and how the cost optimize their workloads using elastic cash.

Today with me, we have senior principal software engineer from yahoo maulik shah shah and from aws side, i'm really honored and excited to share the stage with general manager of aws in memory database services. It may and i am siva karuturi karuturi worldwide specialist essay for in memory databases. As you know, we now have two databases in the in memory categories, amazon elastic and amazon memorydb for redd.

So for the agenda today, we're gonna kick this off with maulik shah talking about yahoo's use case and how they solved a typical big data problem using amazon elastic cash and how they further cost, optimize the workload using data tearing. And since today's session is all about a cost optimization, we're gonna deep dive into data tearing with elastic cash. And towards the later part of the session, we're going to briefly cover our other in memory database service. Amazon memorydb for redis and then close it off with all the latest and the greatest new features that we launched with amazon elastic ash.

With that said, i want to turn this over to maulik shah for yahoo's use case. Thanks ya.

Hello everyone. My name is maulik shah shah. I'm a senior principal software engineer at yahoo. I work within our ads, data and common services organization and i help to architect design and implement big data processing solutions for our ad platforms in this session. I'm going to show you a data processing challenge we faced how we solved it using elastic cash and discuss how we cost optimize the solution with data tearing feature in elastic cash.

I want to start with a brief introduction to who we are. As yahoo, we connect with hundreds of millions of users around the world with sites and apps like yahoo mail, yahoo sports, yahoo finance and many more. We also connect brands with publishers and their audiences on our ad platform as part of yahoo at tech business. Since this talk is focused on yahoo a platform use case. Let me set up a context of a talk with an overview of advertising ecosystem at yahoo.

Our ad business family provides two advertising solutions dsp, which stands for demand side platform and ssp, which stands for supply side platform, dsp is a as facing solution. while ssp is publisher facing solution publishers can be yahoo's owned and operated media properties or they can be our third party supply partners. Publishers have consumers or end users who consume their content and these consumers are the source of most of our business and data signals. Consumers connect with us from many different devices. For example, smartphone tablet connected tv, desktop digital out of home and few others.

Let me show you a typical request flow. When a consumer visits publisher content, a user like you and me visits a publisher web page. For example, yahoo.com which makes an a request to our supply side platform, our supply side platform air server logs this request in form of an air request. It then starts an auction process and sends out the bill requests to multiple demand side platforms. One such request lands on yahoo's own demands have platform. Our ds p server performs air selection process, finds the best matching air for the user and returns the response back to our ssp. Once our ssp receives responses from multiple dsps, it closes an auction process by selecting a winning air and a winning gsp. All the bidding and winning information is logged in form of an auction at event, a winning dsp is notified and a winning ad is sent to the user who requested it.

Now, user can view this ad along with the content which results in an ad impression event being locked by our server. And if user clicks on an ad, it results in an ad click event being locked by our server, all the events are transported by data transport system processed by a event, data processing system and summarized by reporting and finance system for our customers.

Let's take a look at scale of the air events and a critical operation that needs to happen inside air event data processing system. Our ad server generates about 320 billion events per day a request. And at auction events are high volume events with data size of 400 to 800 gb every five minutes. On the other hand, our impression and our click events are lower volume events with data size of 10 to 100 gb every five minute.

If you take a look at air even timeline for a given user session, a request and at auction event are logged seconds, two minutes apart while at auction and at impression or click events are logged minutes up to four hours apart for various reporting and analytics use cases. An auction event needs information from request event and similarly impression and click events need information from both requests as well as auction events. And since these air events are logged at different intervals. Challenge is to perform joint operation across these air events at scale.

One might think that this is a typical big data, joint problem events can be transported to a delict and can be joined using joint operators in framework like a partic park. Let me take an example of an impression event join with at auction event. We know the size of impression data for five minutes is about 100 gb. We also know the size of auction data for five minute is about 600 gb. If we join these two finance data sets within spark, it will perform well with reasonable number of resources. But if you recall from the previous slide impression and auction event may be locked four hours apart as a result, five years of impression data needs to be joined with four hours of auction data which is 30 terabyte.

We now need to join scan 48 times more data set and shuffle it across the network in order to perform the join. Not only that we need, we need to rescan most of the 30 terabytes of data during next five years of impression data processing. This is very resource intensive and nonperformance. We need a better solution.

Better solution would be to use distributed key value stores, distributed key value stores support random lookups, which can help us locate the joining event very quickly. It also helps us avoid repeated scan of the same data data sets again. And most importantly, key value stores are suitable to work with both batch as well as streaming data processing workloads.

Let's take a look at our use case and see how we can use key value tools to perform our joint operations. First, a request event would be loaded into kelley store. A minute later when at auction event arrives, we will retrieve a request event using request id and perform an auction joint. The joint auction event is then stored back into key valley store. An hour later when an impression event arrives, we can retrieve joint auction event that we previously stored and perform an impression joint. We can repeat the same process for air click even as well.

Here are some of the key value store solutions that we considered. We started off with dynamo di me, dynamo db is a sur less and fully managed service. It provides durability and persistent guarantees which was not really fully required for our use case. Additionally, the cost model of dynamo b is based on read and write requests as well as payload size. When we calculated the cost for our use case, it was quite expensive. So we decided to start considering other options.

Next option we considered was based. Amazon emr provides based on top of hdf s as well as s3. We have good experience working with h bs from our on prem streaming and bass data processing workload. But on aws, we did not want to maintain a very large based cluster footprint and perform regular maintenance tasks like os updates and performance tuning. We really needed a fully managed service.

We then considered amazon elasticache. Elasticache is a fully managed service. It can serve as an in memory key value store. It supported a ra engine which we were familiar with from our on prem streaming data processing, workload and elastica also provided very low latency, read and write operations with sub millisecond response time. So we thought it would best suit our needs because our air events would need to be stored in memory only for about four hours to perform our joint operations.

This is the setup that we started proto typing with. We use red engine with no type of r six g two x large, which came with about 52 gb ram. We enable the cluster mode and configure no eviction for the keys and encryption in flight as well as at rest. In order to communicate with the cluster. We use lettuce client. We then loaded about one hour of auction data into our elastic gas cluster which required about 115 0. We measured the performance of our five years of impression data operations and we observed that our joint operations finish within 5 to 10 minutes based on the number of processing resources that we allocated.

In terms of cluster sizing. We calculated that for our full production workload, we need about 650 0 to get about 30 terabytes of data. Now caching all the data sets in memory results in a very large cluster footprint. And thus very high cost for our use case. In order to optimize for the cost, we started gathering joint heuristics, we calculated the time difference between two joining events and plotted a histogram of event counts by the minute bucket. As you can see in a graph, 95% of the event joins within one hour of window and rest of them joins within four hours of window, which means that most of our joint operations only need about 25% of memory or cash and very small fraction of joints need three times more memory or cash.

So the question was, can we offload these three hours of cold data to a more cost effective dis based storage without sacrificing our application performance, redirecting of solutions. One solution we thought of was side cashing. One hour of data can be loaded in elastic cache in parallel load all four hours of data in h based cluster, which is configured with less memory backed by hdf s and s3. But this would be more complex setup, more complex read and write logic and require more maintenance data as well. We decided not to pursue this option.

Meanwhile, amazon team reach out to us elastic s team reach out to us to see if we are interested to be a better customer for a new feature called data tearing. We were very excited to test this feature because this was exactly what we were looking for. The cheering feature was supported on a no type with large ssd. In our case, we were provided no r six gd two x large which came with about 52 gb ram and about 200 gb ssd.

We on boarded on to beta cluster and we started performance testing our full production workload in terms of cluster sizing since we have four times more storage capacity. Thanks to ssd, no requirement went down from 650 no 250. No, this is a four a reduction in footprint of our cluster and effectively 50% reduction in cost of operating our cluster. Our joint operations still finish in about 5 to 10 minutes. Same as before with about the same number of processing resources without impacting our customer sls.

We can say that did a tearing feature enable us to product our solution in a cost effective way with a single service and without additional complexities. Here's a final end to an architecture as it stands in production today. Our a server blocks the add events which are transported by a double sk sis and deposited into our s3 data lake. We have an emr stack which ingest these data sets from three performs the joint operations using elastic cash and publishes the joint data set back into our s3 data lake. The joint data set is also registered with aws glue for data discovery and our customers can query this joint data set from athena or they can directly ingest this data set from s3.

We all know that no system can function without effective metrics and monitoring. We actively monitor many standard elastic cash metrics and we categorize them in two groups, metrics for alerting and scale out operations and metrics that help us monitor issues during back processing or during peak traffic during normal data processing. If engine cpu or by use for cash goes beyond 70 to 80%. We perform scale out operation. And during backup processing, we monitor metrics like get and set agencies current and new connections along with engine cpu to see if we need to perform scale out operation or throttle our data processing workload.

Let me show you one example matrix from our current production elastic as cluster. This graph shows storage metrics for our auction data. Total size of auction data in cluster is about 50 terabyte cluster has two tiers ssd and memory memory has about 10 terabytes of data and ssdt has 40 terabytes of data since most of our data set is stored in ssd. This demonstrates how we are able to expand our storage capacity and cost, optimize our cluster.

Optimizing the cost was a great win and every win comes with learnings. Let me share some of this learning with you. After about a week of deployment in production. We have we had an issue with our cluster which led to a backlog in our data processing. Many of our jobs started retrying and established many new connection with our elastic gas cluster. In this process, we managed to get our elastic cluster very busy which impacted our normal data processing. The fix was to reduce the number of connections to our elastic as cluster by sharing connection between threads. The the key lesson that we learned here was to have persistent connection to the elastic gas cluster or or minimize them by sharing connection between threads or use connection. Pulling the lesson we learned was to implement exponential back off with three tries. That's a standard procedure for the engineers like any solution. There is always an opportunity to optimize further in future.

We want to further optimize our cost by reducing the size of the payload that we stored in elastic s cluster. We also want to implement auto scaling feature for data cheering. Additionally, we want to migrate our ready based on prem data streaming workload onto aws to work with elastic cash.

Let me recap what we have covered so far, we started with advertising ecosystem over you. We then introduce our joint problem which we solve with elastic cash. And finally, you learn how we reduce our cost of elastic gas cluster by about 50% using data tearing feature. With that. I want to thank you all for your interest in our use case. If you have any further comments or questions, please feel free to meet me outside the hall after the talk back to you.

Thank you, maulik shah. Thanks so much for sharing yahoo's use case. This is yet another example of how aws constantly innovates on behalf of our customers staggering 70 to 80% of all the product features that we release every single year is based on direct customer feedback. Data tearing is one such example. We are super proud of our product team to reach out to yahoo just about the right time to take a look at the use case and see how they can not only solve the big data challenge but also cost optimize the workloads by using data tearing.

So let's dive a little bit more deeper into what their tearing is. Data tearing is the ability to expand the storage capacity of elastic cash clusters by independently or transparently moving the data from memory to a locally attached ssd. So if your elastic cash cluster nodes reach max memory, elastic cash will then start moving the data from memory to the disk. The way we choose which data to move is based on an lr or at least recently used algorithm elastic ash keeps track of last access times of all the items in memory. So if your memory fills up lc cash will move the items with the longest lr times to the disk. So in the later times. If you need to access the items back again, data tearing engine will then push the items back into memory for access.

With data tearing, we are now providing a new price performance option for our customers. Ssd s definitely have higher latencies but they provide significant cost benefits compared to memory. If your application has can tolerate slightly higher agencies hearing can definitely provide good, good cost options for you.

Now, if you wanna ask me? All right. So can i move the entire data into ssd? s just keep tiny little bit, maybe 1% in memory? That's probably not a good idea because you'll probably end up having all your requests being sent to ssd, which is not going to be performed. The sweet spot for data tearing is about 20%. So if you have 20% of the most frequently access data or hard data, you can put that in memory and then move the remaining 80% or not so frequently access data to the disk.

If you were to stand up an elastic cash cluster saying with whatever nodes and then completely fill up your memory as well as the sst s, you have the potential to save up to 60% in costs compared to if you were to load the entire data set in memory, that's a lot of cost savings towards your right.

You can see an example of our largest data tearing node. It's r six gd 16 xl. We have expanded the storage capacity from about 420 gb bytes to about two terabytes. One key point that i would like to make here is that data tearing is completely transparent to your application. It requires no application level changes and it is designed to have minimal performance impact. So if you were to stand up an elastic ash cluster with this particular node type r six gd 16 excel, and let's say you configure for about 500 nodes, you can scale to one petabyte in one single cluster.

This is the quick comparison of our data tearing node with its closest non data tearing node. So the biggest difference you're gonna see here is the letter d, right. Yeah, d stands for disk. It's not just the letter d, it's the storage capacity. So if you look at the r six gd xl, we have expanded the capacity from about 26 gb bytes to about 125 gb bytes. Right? from a pricing standpoint data, the nodes are slightly more expensive because they come with the fully managed data, the software as part of the node, but they also provide you with 4.8 x more capacity. So the real metric that you want to look at is the one right at the bottom, which is the price per gb byte per hour with data tearing node, it's about 6/10 of a cent per gb byte per hour. That's about 60% in cost savings.

So our data team notes range from r six gd xl all the way to r six gd 16 xl. So this is a quick rundown of all the specs of our data notes. This information is available on a public website. I'm not going to go over everything, line by line. You guys can access it. But the key metric i want you guys to focus on is the one on the right, right. No matter which data tearing node you guys use xl all the way to 16 xl. It is the same price per gb per hour, 6/10 of a cent.

Now, the pricing here is based on us east region and on demand pricing, right? You can further save on top of this. If you were to use our reserved instances

if you have any sizing related questions, please reach out to us and then we can assist you further.

We constantly work on expanding our support for data hearing nodes across all the regions in the globe. Since last year, we have added support for four new regions. We have supported regions in Montreal. We have regions in Sao Paulo, Paris and Mumbai.

Now let's dive a little bit more deeper into data theory with data tearing. We talked about how elastic cash constantly stores the last access times of all the items in memory. The way ru algorithm works is that it works atomically at the item level on an all or nothing basis.

So if you were to work with redis, how many of you guys are actually familiar with, you know, have worked, have some level of expertise in elastic cash or red, pretty much everyone, right? So if you have worked with hashes or sorted sets or some other complex data structures, um so if you are accessing only certain attributes of a hash, that means you're accessing the entire hash data turing.

Let's say your hash has not been touched for some time. Data telling will then push the entire object to the disk later point of time. If you were to access only certain attributes of the hash, then data tearing engine will then push the entire item back into memory for access.

So guarantee that there's going to be a little bit of a performance impact. But keep in mind that it's a one time performance hit to bring the item from ssd back into memory.

Now, key point here also is that all the keys will always all the time be in memory. It's only the value portion of the key that gets pasted from memory to the disk back and forth.

Also, there's an synchronous communication between elastic ash redis and the data tearing engine. So this is designed to have an optimization for latencies.

Also, I want to reemphasize that data clearing does not require you to have any application level code changes.

Let's go down a little bit one layer down and take a look at a typical get request flow for an elastic cash data clearing cluster. A client will issue a get request to redis writes, will first look up additional information about the key from an online dictionary to identify if the value portion of the key exists in memory or on the disk.

If the value portion exists in memory, the standard request flow follows right, you then return the object back to the client directly. No changes. It's easy.

But if the value portion of the key exists on the disk, the client will then go into a blocked status. But all the other requests are still being processed. Redis will then spawn a new thread to bring the data from the disk back into memory. Once done, it will notify the main thread. The main thread will then unblock all the clients and then start executing the commands and send the returns results back to the client.

Now let's double click on the flash cash, right? Um flash cash is an app and only log structure that actually houses both the key and the value. But it also has a hash map that's in memory which again indexes back into the append only log structure.

The hash structure that is in memory has a couple of entries, right? You have oh a hash of the key that points to a hash bucket. So this hash bucket has a bunch of keys and values. The offset is then will help you to go traverse this hash bucket into an individual key and a value.

We also keep track of the log structure using a head and a tail pointers to make sure how it expands circuit shrinks.

We did a lot of benchmarking for our data clearing clusters. We used a standard read this benchmark and we ran this workload for about two weeks using an r six gd to excel. We loaded for about 400 million keys. The key size was about 16 bytes, 15 bytes and the value was about 500 bytes.

We had about 200 client connections and the get to set ratio was about for one. We also wanted to make sure that we curate the workload such that at least 10% of the of the requests go to the desk.

So from a performance standpoint, we did not notice any significant performance difference. The latency profile on your left, you can see that p 99 was about 1.21 0.4 milliseconds. And then the average latency was about 820 microseconds.

From a throughput standpoint, we are able to generate about 240,000 transactions per second. So if your application can tolerate such latency profiles and throughput data sharing could be a good option for you guys.

So now if you want to really test your clusters, your workloads using data tearing. We have released four new metrics for data tearing. The four new metrics on your left numb items, written to disk, numb items, right from disk, bytes, written to disk and bytes read from disk.

All these metrics will help you gauge how much of iops your application is, is loading against your database. We also added two new dimensions to existing metrics for instance, car items and bytes used for cash. It has it's broken down now by memory and ssd.

So you could break down your monitoring to see where your data is between memory and ssd. So a lot of our customers constantly monitor our data clusters. And you know, it's very important to look at the latencies.

The metric towards your lower right by its written and right from disk is very, very important in this particular aspect. So from a monitoring standpoint, if you take a look at that metric and then compare it with total amount of operations against your cluster, like total get set operations against your cluster.

If the ratio is high, that means that all of your requests or hard data requests are going to the disk, which means that it's time for maybe scaling out your clusters adding extra memory capacity because the hard data is now sitting on the disk.

That's a key takeaway from here is that you want to constantly monitor your items, read or written from ssd.

Now, I want to close here and recap about what we discussed. And the key takeaway here is that with data tearing, we are now providing a new price performance option for our customers. It's the ability to expand the storage capacity much beyond just the memory. It's adding at least 4.8 times more capacity using the sst important point also is that this requires no application level changes.

It's completely transparent and it's designed to have minimal performance impact. It's now available in 13 aws regions.

With that, i'm going to turn this over to it. i to talk about our other in memory database service and also the latest and the greatest features.

Thanks siva karuturi and maulik shah. I'm always excited to see our customers share their success stories. My name is maoz and i'm the general manager of in memory databases at aws.

It's so cool to see the journey that yahoo did and how to use data turing to scale to a really massive scale with great performance at a low cost. And now i will briefly talk about a new service that we launched in 2021 memorydb. I'll briefly talk about it and then talk about the new features that we launch with elastic this year.

As you can see on the slide memorydb four is already supports data turing as well. So you can get the same kind of cost benefits that we talked about in this service as well.

All right. So what is amazon memory to be for. It is it is the fastest durable database that aws offers today. It has microseconds read latencies and it has a low single digit millisecond write latencies. A single node can support up to 1.3 gigabyte per second read and about 100 megabytes per second. Right?

It is compatible with redis, which means that you can take advantage of all of the data structures that redis has all of the rich api s with sets and sorted sets and um list and so on. And you can benefit from the rich reddit ecosystem with client libraries in over 50 languages.

And you get to do that with durability like all of our services, it is fully managed, we do all of the heavy lifting. So you don't have to worry about it and focus on what matters for you most, which is your business.

So we do the installation configuration, monitoring, patching, snapshots and so on. It is highly scalable on a single cluster, you can scale up to 500 nodes. And if you use the configuration that we recommend, which is a highly viable configuration, one replica per shard, then you can scale to 100 and 28 terabytes of data all in memory with in memory speed.

It provides durability. So one of the key innovations that we built in memorydb is using a multi availability zone transactional log. So the way it works is that every right is acknowledged after getting a uh written to three copies in two different availability zones, it is backed up by the same technology that's been used by s3 and by amazon.com ordering data.

It is highly available. If you use one replica per shard, we take care of all of the monitoring and then fail over when the the primary node fails, promoting it to promoting a replica to primary and then replacing replicas.

And it is has security and compliance. It has features like encryption in transit encryption. At rest, it has role based access control for authorization and authentication and it has compliance with uh um uh hp a eligibility and it has pc i ds s and soc.

You can try it for free. We have a two month free trial and you can try it on uh starting today.

So who uses memorydb for redis? So we have a lot of different business segments that use memorydb for redis. We have web and mobile customers using it for user content data store, for session management, for chat and message cues. And for geos special indexing, they can benefit from the rich red uh data structures.

For example, for session management, you can use the hash data structure and it makes it really easy to implement session store or you can use the list data structure in order to implement lists and cues and the message queues. You can also use the geospatial commands to do geospatial indexing.

We have customers in the retail business doing uh customer profiles, inventory tracking and fulfillment. We have a lot of customers in the gaming industry. They too use the session management and they also use the reddest data structure, the sorted set for leader boards.

In fact, using the red uh uh sorted set data structure is one of the most common solutions today to implement leader boards. We have customers in the banking and finance using it for fraud detection and for uh user transactions.

We have several customers in the media and entertainment. In fact, netflix was one of the first memorydb customers and we have customers in the iot using it for streaming device data and operational insights.

If you want to learn more about it and more about memorydb, you can check out the session that we have tomorrow with samsung smart things. It's an iot uh application and then uh that is that 215. So that is happening tomorrow.

So to quickly summarize the in memory databases in the eye of durability. So on the far left side, you have amazon elastic cash from mkd. It is super fast with microsecond reads. Microsecond writes, it supports the popular mesh d api. But if a node goes down, you lose the data on this node.

So if you have a 10 node cluster, you lost the node, you lost 10% of your cache is really useful for primarily for caching and ephemeral data.

In the middle, we have elastic four red is which has microsecond reads and microsecond right latencies. It is semi durable because you have snapshot and restore. And also when you use the highly available configuration with one replica, we will do asynchronous replication between the primary and the replica.

And if the primary fails, we'll do a fail over to the replica because the application is asynchronous, you will lose that tail end of the rights. So you will have some minimal data loss whenever there is a fail over. And in a very unlikely and extreme case, you might lose more data.

And the reason is because at the end of the day, it's stored all in memory and not on a durable media elastic for is also the most popular solution that we have because of its rich data structure, all the features that we have and its latencies.

If you want to go all the way to the right, we have the most durable solution we have, which is memorydb for. And as i said earlier, it was designed like all of our databases for zero data loss and it has the multi availability zone transactional log to support that kind of what we said three copies and two and at least two different availability zones for every right request.

Memory db has microsecond reads and low single digit millisecond right latencies. So these are the different offers that we have today with memory databases all right.

So coming back to the main topic here, which is elastics and we'll talk about the new features that we launched this year with amazon elastics.

So before i start, i want to highlight that all of these features are available to you at no additional cost.

We launched a new and improved management console. It is available to you in the aws console and it simplifies the user experience when creating and managing elastic cache clusters.

Starting with our mmksd engine, we support me m cd 1.6 0.12 which has operational improvements, uh performance improvements and better thread management.

We launch encryption in transit in mm cd using tl ss 1.2 similar to all of the security features that we have who are already suffering. Our me mc hd customers also wanted to get this kind of security layer. So we built encryption transit in the server side and also in the client side in two clients that we provide the java client and the php client.

And you can find those clients on our website in the aws console and on github launching support for uh me mkd encryption paved the way to get it certified with fed ramp and hp a eligible and we will continue to work it to get it certified with the same certifications we have for redis including pc i ds s and so

we launch i pv c support, some of our largest customers achieved such a massive scale that they needed support for i pv six. We supported both for r and from mkhd and we support it in two modes. One is i pv six only mode, which means that the cluster will accept only i pv six connections and we support it in a dual mode where the cluster will accept both i pv six and i pv four connections.

Back in the beginning of the year, we launched the readiest log delivery through kinesis data, fire hose and cloud watch logs. This is for all of you who are power users who would like to have this additional transparency and visibility in what's happening inside the engine.

So how can you use it? You can use it to troubleshoot it to troubleshoot problems. For example, let's say you have a slow command. So what is a slow command in reddit? Where is this super professor just said it has microsecond, reads and rights.

So for example, you also have a lot of functionality in redis. You can have the keys command that fetches all of the keys in redis. And when you have millions of keys, this could be pretty expensive or if you have a very large hash set or a sorted set and you want to fetch the entire thing that could be expensive.

So you can use this feature to do this kind of troubleshooting. And i would like to highlight that this feature is only for operational purposes. We by design, omit all of the customer data from these logs for your protection.

We launch native json for our reddest offering. So now you can store fetch and update red uh json objects in your red cluster without the needing to do complex sterilization and deer work, you can fetch a portion of the jason object and you also store a portion of the jason object and putting this on something like elastic for which has microsecond reads and write latencies is super powerful and super efficient.

We also support the json uh jason path in order to do searches with within the jason object. So this is a really cool tool that you can get really fast. um uh json and really great latencies with a very popular json format.

We want support for aws private link. So what is private link, private link provides private connectivity between the aws services and the vpc s. So the traffic does not go through the public internet with a vpc interface endpoint.

You can connect between your application inside the vpc s to elastic api s and you can also connect through vpc peering to other vpc s and also from on premises using aws vpn and aws direct connect.

We launch ready seven reddit seven is the latest and greatest reddest version out there. It has key innovations like the access control is v two like the readiest functions and the shouted pop sub that skills very well with the cluster mode enabled configuration elastic supports up to 500 nodes in a single cluster.

And we have a lot of customers using pub a very popular feature. So we really needed something that skills. I'm also proud to say that my team has worked together with the open source community and that we contributed the shit pops up and the ac lv two to redis open source.

So you can find it in reddit seven open source as well. And then the last feature that we released this year is the im authentication uh for redis im is identity and access management.

Up until we launch this feature, you were able to do authentication using the native redus system. Now, you can use the im authentication system to do associate im roles and im users with elastic cache for red users.

This gives you a very powerful solution because you can use the role based access control for authorization and you can use im for authentication and then this aligns well with how you do identity management and authentication across all of your uh application in the in aws.

So always excited to talk about the features. We're gonna have tons of new features coming up. uh so just stay tuned.

A quick word about global availability. As you may know, aws is available in 30 regions across the world with 96 availability zones. Just this year we launched four new regions in spain in dubai in hyderabad and switzerland elastics is a foundational service and it exists in all aws regions and it will be part of every new region that we launch.

So you can take this kind of dependency on elastics when you plan for your global scale. So that's pretty much it. uh we really thank you for your patience.

We talked about the journey that yahoo did and how they use data turing to achieve massive scale with great performance at a low cost. We also talked about memorydb and the new features that we launch with elastic.

Before we go into questions, you can scan this barcode to get access to webinars videos, uh blogs, white papers and so on and everything else that you need to get started with elastic cache.

So thank you so much. Please complete the survey. We really appreciate your kind word and with that, we'll take questions.