How Yahoo cost optimizes their in-memory workloads with AWS

Here is the speech transcript formatted for better readability:

Malik Shah (Senior Principal Software Engineer, Yahoo): Hello everyone. My name is Malik Shah. I'm a senior principal software engineer at Yahoo. I work within our Ads, Data and Common Services organization and I help to architect, design and implement big data processing solutions for our ad platforms.

In this session, I'm going to show you a data processing challenge we faced, how we solved it using Elasticache, and discuss how we cost optimized the solution with data tiering feature in Elasticache.

I want to start with a brief introduction to who we are. As Yahoo, we connect with hundreds of millions of users around the world with sites and apps like Yahoo Mail, Yahoo Sports, Yahoo Finance and many more. We also connect brands with publishers and their audiences on our ad platform as part of Yahoo’s ad tech business.

Since this talk is focused on a Yahoo Ads platform use case, let me set up the context of the talk with an overview of the advertising ecosystem at Yahoo.

Our ad business family provides two advertising solutions:

  • DSP, which stands for Demand Side Platform
  • SSP, which stands for Supply Side Platform

DSP is an ad-facing solution, while SSP is publisher-facing solution. Publishers can be Yahoo's owned and operated media properties or they can be our third party supply partners.

Publishers have consumers or end users who consume their content and these consumers are the source of most of our business and data signals. Consumers connect with us from many different devices - for example, smartphone, tablet, connected TV, desktop, digital out of home and a few others.

Let me show you a typical request flow:

When a consumer visits publisher content, a user like you and me visits a publisher webpage - for example yahoo.com - which makes an ad request to our Supply Side Platform. Our Supply Side Platform ad server logs this request in form of an ad request event.

It then starts an auction process and sends out the bid requests to multiple Demand Side Platforms. One such request lands on Yahoo's own Demand Side Platform. Our DSP server performs ad selection process, finds the best matching ad for the user and returns the response back to our SSP.

Once our SSP receives responses from multiple DSPs, it closes the auction process by selecting a winning ad and a winning DSP. All the bidding and winning information is logged in form of an auction event.

A winning DSP is notified and a winning ad is sent to the user who requested it. Now, user can view this ad along with the content which results in an ad impression event being logged by our server. And if user clicks on an ad, it results in an ad click event being logged by our server.

All the events are transported by data transport system, processed by an event data processing system and summarized by reporting and finance system for our customers.

Let's take a look at scale of the ad events and a critical operation that needs to happen inside ad event data processing system:

  • Our ad server generates about 320 billion events per day
  • Ad request and ad auction events are high volume events with data size of 400 to 800 GB every five minutes
  • On the other hand, our impression and click events are lower volume events with data size of 10 to 100 GB every five minutes

If you take a look at ad event timeline for a given user session, a request and ad auction event are logged seconds to minutes apart while ad auction and ad impression or click events are logged minutes up to four hours apart for various reporting and analytics use cases.

An auction event needs information from request event and similarly impression and click events need information from both request as well as auction events. And since these ad events are logged at different intervals, challenge is to perform join operations across these ad events at scale.

One might think that this is a typical big data join problem - events can be transported to a lake and can be joined using join operators in framework like Apache Spark.

Let me take an example of an impression event join with ad auction event. We know the size of impression data for five minutes is about 100 GB. We also know the size of auction data for five minutes is about 600 GB. If we join these two 5-minute data sets within Spark, it will perform well with reasonable number of resources.

But if you recall from the previous slide impression and auction event may be logged four hours apart. As a result, 5 years of impression data needs to be joined with 4 hours of auction data which is 30 terabytes.

We now need to scan 48 times more dataset and shuffle it across the network in order to perform the join. Not only that, we need to rescan most of the 30 terabytes of data during next 5 years of impression data processing. This is very resource intensive and non-performant.

We need a better solution. A better solution would be to use distributed key value stores. Distributed key value stores support random lookups, which can help us locate the joining event very quickly. It also helps us avoid repeated scan of the same datasets again. And most importantly, key value stores are suitable to work with both batch as well as streaming data processing workloads.

Let's take a look at our use case and see how we can use key value tools to perform our join operations:

First, a request event would be loaded into key value store. A minute later when an auction event arrives, we will retrieve the request event using request ID and perform an auction join. The joined auction event is then stored back into key value store.

An hour later when an impression event arrives, we can retrieve joined auction event that we previously stored and perform an impression join. We can repeat the same process for ad click event as well.

Here are some of the key value store solutions that we considered:

We started off with DynamoDB. DynamoDB is serverless and fully managed service. It provides durability and persistent guarantees which was not really fully required for our use case.

Additionally, the cost model of DynamoDB is based on read and write requests as well as payload size. When we calculated the cost for our use case, it was quite expensive. So we decided to start considering other options.

Next option we considered was EMR + HBase. Amazon EMR provides HBase on top of HDFS as well as S3. We have good experience working with HBase from our on-prem streaming and batch data processing workloads.

But on AWS, we did not want to maintain a very large HBase cluster footprint and perform regular maintenance tasks like OS updates and performance tuning. We really needed a fully managed service.

We then considered Amazon ElastiCache. ElastiCache is a fully managed service. It can serve as an in-memory key value store. It supported Redis engine which we were familiar with from our on-prem streaming data processing workload. And ElastiCache also provided very low latency read and write operations with sub-millisecond response time.

So we thought it would best suit our needs because our ad events would need to be stored in memory only for about 4 hours to perform our join operations.

This is the setup that we started prototyping with:

  • We use Redis engine with node type of r6g.2xlarge, which came with about 52 GB RAM
  • We enable the cluster mode and configure no eviction for the keys and encryption in flight as well as at rest
  • In order to communicate with the cluster, we use Lettuce client

We then loaded about one hour of auction data into our ElastiCache cluster which required about 1150 GB. We measured the performance of our 5 years of impression data operations and we observed that our join operations finish within 5 to 10 minutes based on the number of processing resources that we allocated.

In terms of cluster sizing, we calculated that for our full production workload, we need about 650 nodes to get about 30 terabytes of data.

Now caching all the datasets in memory results in a very large cluster footprint, and thus very high cost for our use case.

In order to optimize for the cost, we started gathering join heuristics. We calculated the time difference between two joining events and plotted a histogram of event counts by the minute bucket.

As you can see in the graph, 95% of the event joins occur within a one hour window and rest of them join within four hours window, which means that most of our join operations only need about 25% of memory or cache and very small fraction of joins need three times more memory or cache.

So the question was - can we offload these 3 hours of cold data to a more cost effective disk based storage without sacrificing our application performance?

One solution we thought of was side caching. One hour of data can be loaded in Elastic Cache in parallel. Load all four hours of data in HBase cluster, which is configured with less memory backed by HDFS and S3. But this would be more complex setup, more complex read and write logic and require more maintenance data as well. We decided not to pursue this option.

Meanwhile, Amazon team reached out to us. ElasticSearch team reached out to us to see if we are interested to be a beta customer for a new feature called data tearing. We were very excited to test this feature because this was exactly what we were looking for.

The tearing feature was supported on a node type with large SSD. In our case, we were provided r6gd.2xlarge which came with about 52GB RAM and about 200GB SSD. We onboarded onto beta cluster and we started performance testing our full production workload.

In terms of cluster sizing, since we have four times more storage capacity thanks to SSD, node requirement went down from 650 nodes to 250 nodes. This is a 4x reduction in footprint of our cluster and effectively 50% reduction in cost of operating our cluster.

Our join operations still finish in about 5 to 10 minutes, same as before with about the same number of processing resources without impacting our customer SLAs. We can say that data tearing feature enabled us to architect our solution in a cost effective way with a single service and without additional complexities.

Here's the final end-to-end architecture as it stands in production today. Our ad server blocks the ad events which are transported by Kinesis Data Streams and deposited into our S3 data lake. We have an EMR stack which ingests these datasets from S3, performs the join operations using Elastic Cache and publishes the joined dataset back into our S3 data lake. The joined dataset is also registered with AWS Glue for data discovery and our customers can query this joined dataset from Athena or they can directly ingest this dataset from S3.

We all know that no system can function without effective metrics and monitoring. We actively monitor many standard Elastic Cache metrics and we categorize them in two groups: metrics for alerting and scale out operations, and metrics that help us monitor issues during back processing or during peak traffic during normal data processing.

If engine CPU or byte use for cache goes beyond 70-80%, we perform scale out operation. And during back processing, we monitor metrics like get and set latencies, current and new connections along with engine CPU to see if we need to perform scale out operation or throttle our data processing workload.

Let me show you one example metric from our current production Elastic Cache cluster. This graph shows storage metrics for our auction data. Total size of auction data in cluster is about 50 terabytes. Cluster has two tiers - SSD and memory. Memory has about 10 terabytes of data and SSD has 40 terabytes of data. Since most of our dataset is stored in SSD, this demonstrates how we are able to expand our storage capacity and cost optimize our cluster.

Optimizing the cost was a great win and every win comes with learnings. Let me share some of these learnings with you. After about a week of deployment in production, we had an issue with our cluster which led to a backlog in our data processing. Many of our jobs started retrying and established many new connections with our Elastic Cache cluster. In this process, we managed to get our Elastic cluster very busy which impacted our normal data processing.

The fix was to reduce the number of connections to our Elastic Cache cluster by sharing connection between threads. The key lesson that we learned here was to have persistent connections to the Elastic Cache cluster or minimize them by sharing connections between threads or use connection pooling.

The other lesson we learned was to implement exponential backoff with three tries - that's a standard procedure for engineers. Like any solution, there is always an opportunity to optimize further in future. We want to further optimize our cost by reducing the size of the payload that we store in Elastic Cache cluster. We also want to implement auto scaling feature for data tearing.

Additionally, we want to migrate our Kafka based on-prem data streaming workload onto AWS to work with Elastic Cache. Let me recap what we have covered so far. We started with advertising ecosystem overview. We then introduced our join problem which we solve with Elastic Cache. And finally, you learned how we reduced our cost of Elastic Cache cluster by about 50% using data tearing feature.

With that, I want to thank you all for your interest in our use case. If you have any further comments or questions, please feel free to meet me outside the hall after the talk. Back to you. Thank you, Malik.

Thanks so much for sharing Yahoo's use case. This is yet another example of how AWS constantly innovates on behalf of our customers - staggering 70-80% of all the product features that we release every single year is based on direct customer feedback. Data tearing is one such example. We are super proud of our product team to reach out to Yahoo just about the right time to take a look at the use case and see how they can not only solve the big data challenge but also cost optimize the workloads by using data tearing.

So let's dive a little bit more deeper into what data tearing is. Data tearing is the ability to expand the storage capacity of Elastic Cache clusters by independently or transparently moving the data from memory to a locally attached SSD. So if your Elastic Cache cluster nodes reach max memory, Elastic Cache will then start moving the data from memory to the disk.

The way we choose which data to move is based on an LRU or Least Recently Used algorithm. Elastic Cache keeps track of last access times of all the items in memory. So if your memory fills up, Elastic Cache will move the items with the longest LRU times to the disk. So in later times, if you need to access the items back again, data tearing engine will then push the items back into memory for access.

With data tearing, we are now providing a new price performance option for our customers. SSDs definitely have higher latencies but they provide significant cost benefits compared to memory. If your application can tolerate slightly higher latencies, tearing can definitely provide good cost options for you.

Now, if you were to ask me - can I move the entire dataset into SSDs, just keep a tiny little bit, maybe 1% in memory? That's probably not a good idea because you'll probably end up having all your requests being sent to SSDs, which is not going to perform well.

The sweet spot for data tearing is about 20%. So if you have 20% of the most frequently accessed data or hot data, you can put that in memory and then move the remaining 80% or not so frequently accessed data to the disk.

If you were to stand up an Elastic Cache cluster with say whatever nodes and then completely fill up your memory as well as the SSDs, you have the potential to save up to 60% in costs compared to if you were to load the entire dataset in memory. That's a lot of cost savings towards your right.

You can see an example of our largest data tearing node - it's r6gd.16xlarge. We have expanded the storage capacity from about 420GB to about 2 terabytes.

One key point that I would like to make here is that data tearing is completely transparent to your application. It requires no application level changes and it is designed to have minimal performance impact.

So if you were to stand up an Elastic Cache cluster with this particular node type r6gd.16xlarge, and let's say you configure for about 500 nodes, you can scale to one petabyte in one single cluster.

This is a quick comparison of our data tearing node with its closest non-data tearing node. So the biggest difference you're gonna see here is the letter D, right? D stands for disk. It's not just the letter D, it's the storage capacity.

So if you look at the r6gd.xlarge, we have expanded the capacity from about 26GB to about 125GB, right? From a pricing standpoint, the data tearing nodes are slightly more expensive because they come with the fully managed data tearing software as part of the node, but they also provide you with 4.8x more capacity.

So the real metric that you want to look at is the one right at the bottom, which is the price per GB per hour. With the data tearing node, it's about 6/10th of a cent per GB per hour. That's about 60% in cost savings.

So our data tearing nodes range from r6gd.xlarge all the way to r6gd.16xlarge. This is a quick rundown of all the specs of our data tearing nodes. This information is available on the public website. I'm not going to go over everything line by line. You guys can access it.

But the key metric I want you guys to focus on is the one on the right - no matter which data tearing node you use - xlarge all the way to 16xlarge, it is the same price per GB per hour, 6/10th of a cent.

Now the pricing here is based on us-east region and On-Demand pricing. You can further save on top of this if you were to use our Reserved Instances. If you have any sizing related questions, please reach out to us and then we can assist you further.

We constantly work on expanding our support for data hearing nodes across all the regions in the globe. Since last year, we have added support for four new regions.

We have supported regions in Montreal.

We have regions in Sao Paulo, Paris and Mumbai.

Now let's dive a little bit more deeper into data theory with data tearing.

We talked about how elastic cash constantly stores the last access times of all the items in memory. The way ru algorithm works is that it works atomically at the item level on an all or nothing basis.

So if you were to work with redis, how many of you guys are actually familiar with, you know, have worked, have some level of expertise in elastic cash or red, pretty much everyone, right?

So if you have worked with hashes or sorted sets or some other complex data structures, um so if you are accessing only certain attributes of a hash, that means you're accessing the entire hash data turing.

Let's say your hash has not been touched for some time. Data telling will then push the entire object to the disk later point of time.

If you were to access only certain attributes of the hash, then data tearing engine will then push the entire item back into memory for access.

So guarantee that there's going to be a little bit of a performance impact. But keep in mind that it's a one time performance hit to bring the item from ssd back into memory.

Now, key point here also is that all the keys will always all the time be in memory. It's only the value portion of the key that gets pasted from memory to the disk back and forth.

Also, there's an synchronous communication between elastic ash redis and the data tearing engine. So this is designed to have an optimization for latencies.

Also, I want to reemphasize that data clearing does not require you to have any application level code changes.

Let's go down a little bit one layer down and take a look at a typical get request flow for an elastic cash data clearing cluster.

A client will issue a get request to redis writes, will first look up additional information about the key from an online dictionary to identify if the value portion of the key exists in memory or on the disk.

If the value portion exists in memory, the standard request flow follows right, you then return the object back to the client directly. No changes. It's easy.

But if the value portion of the key exists on the disk, the client will then go into a blocked status. But all the other requests are still being processed.

Redis will then spawn a new thread to bring the data from the disk back into memory. Once done, it will notify the main thread.

The main thread will then unblock all the clients and then start executing the commands and send the returns results back to the client.

Now let's double click on the flash cash, right? Um flash cash is an app and only log structure that actually houses both the key and the value. But it also has a hash map that's in memory which again indexes back into the append only log structure.

The hash structure that is in memory has a couple of entries, right? You have oh a hash of the key that points to a hash bucket. So this hash bucket has a bunch of keys and values. The offset is then will help you to go traverse this hash bucket into an individual key and a value.

We also keep track of the log structure using a head and a tail pointers to make sure how it expands circuit shrinks.

We did a lot of benchmarking for our data clearing clusters. We used a standard read this benchmark and we ran this workload for about two weeks using an r six gd to excel.

We loaded for about 400 million keys. The key size was about 16 bytes, 15 bytes and the value was about 500 bytes.

We had about 200 client connections and the get to set ratio was about for one.

We also wanted to make sure that we curate the workload such that at least 10% of the of the requests go to the desk.

So from a performance standpoint, we did not notice any significant performance difference. The latency profile on your left, you can see that p 99 was about 1.21 0.4 milliseconds. And then the average latency was about 820 microseconds.

From a throughput standpoint, we are able to generate about 240,000 transactions per second.

So if your application can tolerate such latency profiles and throughput data sharing could be a good option for you guys.

So now if you want to really test your clusters, your workloads using data tearing. We have released four new metrics for data tearing.

The four new metrics on your left numb items, written to disk, numb items, right from disk, bytes, written to disk and bytes read from disk.

All these metrics will help you gauge how much of iops your application is, is loading against your database.

We also added two new dimensions to existing metrics for instance, car items and bytes used for cash. It has it's broken down now by memory and ssd.

So you could break down your monitoring to see where your data is between memory and ssd.

So a lot of our customers constantly monitor our data clusters. And you know, it's very important to look at the latencies.

The metric towards your lower right by its written and right from disk is very, very important in this particular aspect.

So from a monitoring standpoint, if you take a look at that metric and then compare it with total amount of operations against your cluster, like total get set operations against your cluster.

If the ratio is high, that means that all of your requests or hard data requests are going to the disk, which means that it's time for maybe scaling out your clusters adding extra memory capacity because the hard data is now sitting on the disk.

That's a key takeaway from here is that you want to constantly monitor your items, read or written from ssd.

Now, I want to close here and recap about what we discussed. And the key takeaway here is that with data tearing, we are now providing a new price performance option for our customers.

It's the ability to expand the storage capacity much beyond just the memory. It's adding at least 4.8 times more capacity using the sst important point also is that this requires no application level changes.

It's completely transparent and it's designed to have minimal performance impact. It's now available in 13 aws regions.

With that, i'm going to turn this over to it. i to talk about our other in memory database service and also the latest and the greatest features.

Thanks siva and malik. I'm always excited to see our customers share their success stories.

My name is itai and i'm the general manager of in memory databases at aws. It's so cool to see the journey that yahoo did and how to use data turing to scale to a really massive scale with great performance at a low cost.

And now i will briefly talk about a new service that we launched in 2021 memorydb. I'll briefly talk about it and then talk about the new features that we launch with elastic this year.

As you can see on the slide memorydb four is already supports data turing as well. So you can get the same kind of cost benefits that we talked about in this service as well.

All right. So what is amazon memory to be for. It is it is the fastest durable database that aws offers today. It has microseconds read latencies and it has a low single digit millisecond right latencies.

A single node can support up to 1.3 gigabyte per second read and about 100 megabytes per second. Right? It is compatible with redis, which means that you can take advantage of all of the data structures that redis has all of the rich api s with sets and sorted sets and um list and so on.

And you can benefit from the rich reddit ecosystem with client libraries in over 50 languages. And you get to do that with durability like all of our services, it is fully managed, we do all of the heavy lifting.

So you don't have to worry about it and focus on what matters for you most, which is your business. So we do the installation configuration, monitoring, patching, snapshots and so on.

It is highly scalable on a single cluster, you can scale up to 500 nodes. And if you use the configuration that we recommend, which is a highly viable configuration, one replica per shard, then you can scale to 100 and 28 terabytes of data all in memory with in memory speed.

It provides durability. So one of the key innovations that we built in memorydb is using a multi availability zone transactional log.

So the way it works is that every right is acknowledged after getting a uh written to three copies in two different availability zones, it is backed up by the same technology that's been used by s3 and by amazon.com ordering data.

It is highly available. If you use one replica per shard, we take care of all of the monitoring and then fail over when the the primary node fails, promoting it to promoting a replica to primary and then replacing replicas.

And it is has security and compliance. It has features like encryption in transit encryption

At rest, it has role based access control for authorization and authentication and it has compliance with uh um uh hp a eligibility and it has pc i ds s and soc. You can try it for free. We have a two month free trial and you can try it on uh starting today.

So who uses memorydb for redis?

So we have a lot of different business segments that use memorydb for redis. We have web and mobile customers using it for user content data store, for session management, for chat and message cues. And for geos special indexing, they can benefit from the rich red uh data structures. For example, for session management, you can use the hash data structure and it makes it really easy to implement session store or you can use the list data structure in order to implement lists and cues and the message queues. You can also use the geospatial commands to do geospatial indexing.

We have customers in the retail business doing uh customer profiles, inventory tracking and fulfillment. We have a lot of customers in the gaming industry. They too use the session management and they also use the reddest data structure, the sorted set for leader boards. In fact, using the red uh uh sorted set data structure is one of the most common solutions today to implement leader boards.

We have customers in the banking and finance using it for fraud detection and for uh user transactions. We have several customers in the media and entertainment. In fact, netflix was one of the first memorydb customers and we have customers in the iot using it for streaming device data and operational insights.

If you want to learn more about it and more about memorydb, you can check out the session that we have tomorrow with samsung smart things. It's an iot uh application and then uh that is that 215. So that is happening tomorrow.

So to quickly summarize the in memory databases in the eye of durability. So on the far left side, you have amazon elastic cash from mkd. It is super fast with microsecond reads. Microsecond writes, it supports the popular mesh d api. But if a node goes down, you lose the data on this node. So if you have a 10 node cluster, you lost the node, you lost 10% of your cache is really useful for primarily for caching and ephemeral data.

In the middle, we have elastic four red is which has microsecond reads and microsecond right latencies. It is semi durable because you have snapshot and restore. And also when you use the highly available configuration with one replica, we will do asynchronous replication between the primary and the replica. And if the primary fails, we'll do a fail over to the replica because the application is asynchronous, you will lose that tail end of the rights. So you will have some minimal data loss whenever there is a fail over. And in a very unlikely and extreme case, you might lose more data. And the reason is because at the end of the day, it's stored all in memory and not on a durable media elastic for is also the most popular solution that we have because of its rich data structure, all the features that we have and its latencies.

If you want to go all the way to the right, we have the most durable solution we have, which is memorydb for. And as i said earlier, it was designed like all of our databases for zero data loss and it has the multi availability zone transactional log to support that kind of what we said three copies and two and at least two different availability zones for every right request.

Memory db has microsecond reads and low single digit millisecond right latencies. So these are the different offers that we have today with memory databases all right.

So coming back to the main topic here, which is elastics and we'll talk about the new features that we launched this year with amazon elastics.

So before i start, i want to highlight that all of these features are available to you at no additional cost.

We launched a new and improved management console. It is available to you in the aws console and it simplifies the user experience when creating and managing elastic cache clusters.

Starting with our mmksd engine, we support me m cd 1.6 0.12 which has operational improvements, uh performance improvements and better thread management.

We launch encryption in transit in mm cd using tl ss 1.2 similar to all of the security features that we have who are already suffering. Our me mc hd customers also wanted to get this kind of security layer. So we built encryption transit in the server side and also in the client side in two clients that we provide the java client and the php client. And you can find those clients on our website in the aws console and on github launching support for uh me mkd encryption paved the way to get it certified with fed ramp and hp a eligible and we will continue to work it to get it certified with the same certifications we have for redis including pc i ds s and so

we launch i pv c support, some of our largest customers achieved such a massive scale that they needed support for i pv six. We supported both for r and from mkhd and we support it in two modes. One is i pv six only mode, which means that the cluster will accept only i pv six connections and we support it in a dual mode where the cluster will accept both i pv six and i pv four connections.

Back in the beginning of the year, we launched the readiest log delivery through kinesis data, fire hose and cloud watch logs. This is for all of you who are power users who would like to have this additional transparency and visibility in what's happening inside the engine.

So how can you use it? You can use it to troubleshoot it to troubleshoot problems. For example, let's say you have a slow command. So what is a slow command in reddit? Where is this super professor just said it has microsecond, reads and rights. So for example, you also have a lot of functionality in redis. You can have the keys command that fetches all of the keys in redis. And when you have millions of keys, this could be pretty expensive or if you have a very large hash set or a sorted set and you want to fetch the entire thing that could be expensive. So you can use this feature to do this kind of troubleshooting. And i would like to highlight that this feature is only for operational purposes. We by design, omit all of the customer data from these logs for your protection.

We launch support for native json for our reddest offering. So now you can store fetch and update red uh json objects in your red cluster without the needing to do complex sterilization and deer work, you can fetch a portion of the jason object and you also store a portion of the jason object and putting this on something like elastic for which has microsecond reads and write latencies is super powerful and super efficient.

We also support the json uh jason path in order to do searches with within the jason object. So this is a really cool tool that you can get really fast. Um uh json and really great latencies with a very popular json format.

We want support for aws private link. So what is private link, private link provides private connectivity between the aws services and the vpc s. So the traffic does not go through the public internet with a vpc interface endpoint. You can connect between your application inside the vpc s to elastic api s and you can also connect through vpc peering to other vpc s and also from on premises using aws vpn and aws direct connect.

We launch ready seven reddit seven is the latest and greatest reddest version out there. It has key innovations like the access control is v two like the readiest functions and the shouted pop sub that skills very well with the cluster mode enabled configuration elastic supports up to 500 nodes in a single cluster. And we have a lot of customers using pub a very popular feature. So we really needed something that skills.

I'm also proud to say that my team has worked together with the open source community and that we contributed the shit pops up and the ac lv two to redis open source. So you can find it in reddit seven open source as well.

And then the last feature that we released this year is the im authentication uh for redis im is identity and access management. Up until we launch this feature, you were able to do authentication using the native redus system. Now, you can use the im authentication system to do associate im roles and im users with elastic cache for red users. This gives you a very powerful solution because you can use the role based access control for authorization and you can use im for authentication and then this aligns well with how you do identity management and authentication across all of your uh application in the in aws.

So always excited to talk about the features. We're gonna have tons of new features coming up. Uh so just stay tuned.

A quick word about global availability. As you may know, aws is available in 30 regions across the world with 96 availability zones. Just this year we launched four new regions in spain in dubai in hyderabad and switzerland elastics is a foundational service and it exists in all of aws regions and it will be part of every new region that we launch. So you can take this kind of dependency on elastics when you plan for your global scale.

So that's pretty much it. Uh we really thank you for your patience. We talked about the journey that yahoo did and how they use data turing to achieve massive scale with great performance at a low cost. We also talked about memorydb and the new features that we launch with elastic.

Before we go into questions, you can scan this barcode to get access to webinars videos, uh blogs, white papers and so on and everything else that you need to get started with elastic cache.

So thank you so much. Please complete the survey. We really appreciate your kind word and with that, we'll take questions.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值