Driving down the cost of observability (sponsored by Coralogix) (Coralogix )

最新推荐文章于 2024-07-25 13:56:03 发布

李白的朋友王维

最新推荐文章于 2024-07-25 13:56:03 发布

阅读量84

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134817732

版权

Welcome to the talk, dragging down the cost of observable. A little bit about me first and a little about the clog of the company. And we're gonna talk about some open source techniques, architectural patterns, anti patterns, things to avoid uh to stop that observable cost from becoming very, very high. And we're going to discuss some of the things you can do with your SAS offerings as well. Um and some of the questions you should ask as well as some of the ways you should grill the sales reps of many of these ass companies for questions they may not have prepared for because they prepared for most questions. But the reviewer always catch people off guard and they give you a real insight into what the nature of that company is all about.

So I'm Chris, um I spent the past 10 years of my life as an SRE, a Java engineer. And ironically, I started as a Java engineer making Java ups and then I became an SRE supporting Java ups and I learned by earth karma, I suppose.

Um the um later on I became a front of engineer and principal engineer eventually where I was responsible for uh 20 something teams uh building various different types of architecture, servers, cities much, much more.

Um the original title of this talk is Observable is Too Expensive. I decided to change it for the region then uh crowd.

Um I made this talk uh well, before I worked for Coals observable company, I made it when I received my first invoice from a well known observable provider and said the exact words, wow, what is this? Um it was very, very expensive. I also realized that nobody has ever said the phrase, wow, this invoice is so low uh because it is, it's high.

Um so there's a real sense in the industry right now, not that it's just expensive, expensive is often ok. It's that it's not proportional. The value exchange isn't really even um sometimes this is down to very, very high prices and some uh particularly unfair practices, but often this is just down to an efficiency.

Um and so what we're talking about today is some of the ways you can be more efficient. Bit about Co Logics Cos is a SAS observable platform, we process logs metrics, tracers security, we have some really nice features.

Um and some things that make us completely unique in the market. But the thing that really makes us uniquely qualified to talk about this particular topic is that we have, we're one of the only platforms with a true cost optimization toolkit. So everybody else has some documentation that says, why don't you delete a dashboard which isn't the most helpful advice.

Um and but yeah, I guess that I can sense the uh yeah, you've been through this. Ok. I feel you're playing.

Um but also um we offer com like comprehensive tooling for analyzing your data in a use case based approach. And this regularly drives very significant cost savings, especially coming from other SAS providers.

Um but even from in house solutions, we drive cost savings as well. So um how are we gonna break it down? Why is it expensive? What are the trends that are happening in the market? How are people using data and how are the technology changes happening? What can we do about it as engineers? So what are the technical changes we can make architectural changes for open source solutions and what can we do about it as consumers? So people who are consuming SAS products and wondering why the invoice is three times. It was the original agreement.

We're going to start with some facts, some things that we've um we know about the industry generally.

Uh the first is that um it's regularly uh touted as a normal thing to spend between seven and 10% of your cloud budget. But some providers are talking about up to 30% being normal, which is wild.

Um it's not normal, it's crazy. 30% is a lot your entire cloud budget is a lot of money and 30% of that just to monitor things is, is pretty out there. If you build a house for $100,000 and you want to put some cameras in and someone says no problem, that will be $30,000 you're gonna be upset, you're gonna be upset and quite rightly. So this is essentially a CCTV for your software. It shouldn't cost as much as it does. And there are ways to drive down that cost the drive uh in the increasing costs, the the thing that's at the root of many of these problems.

Uh a few key practices. The first is microservices. So microservices introduce a series of complexities into tracking uh architecture. Also, a lot of developers are very microservice happy. So we have ACMs that takes like 100 requests a day. It needs 15 microservices. Of course.

Um this, the problem with this is, it includes my failure modes. So in a monolithic application, generally speaking, i mo i was responsible for mono monolithic applications in the past. It's either on fire or it's not.

Um and that's pretty much it. That's all you have to worry about. That's uh and you have some really strange memory leaks and things, but most of the time, if everything isn't red, everything's green with microservices, you get many very strange, weird and wonderful failure modes based on the network based on the memory of individual nodes based on aaa regional failure because you've balanced across multiple different availability zones in the w us. For example, all of these things drive up the complexity.

The increase in complexity drives up the need for data to cover all those different cases. It increases the volume, the cloud um which has been a blessing, an expensive blessing, but a blessing all the same.

Um the cloud generally encourages more servers because now I can before, when uh many of you will remember the times of having your own in house data center, there was one grumpy guy and you had to go and beg him for a server. It would take two weeks.

Um so you made the most of the resources you had. Now you just click a button. You have an only server, more servers, more data, more failure mode and drives things up and finally things like chaos engineering um or what i used to call chaos engineering, which was just breaking things.

Um and the thing of chaos engineering is and the practices that are incumbent in it. Now you're introducing new intentional failure modes in conjunction with all of these new additional unintended failure modes around network failures and microservice and so on. Each new failure mode interacts with every other new failure mode and you have this exponential increase in the need for data to cover all those different things.

And if you're doing chaos engineering, you're not monitoring properly, you're just destroying stuff. You have to, you have to track it properly.

Um, like there's a lot of p yeah. Yeah. This is like a support group now. Yeah.

Um, so essentially the, the, the, the beauty here is that we're doing a lot of these things to ourselves, but we're doing them for good reason. This isn't, this isn't just like, you know, self inflicted pain. We're doing these because these are the things that we think are the best to move us forward.

So what often people say at this point is ok, we've got all these things, they're driving up the volume of data that's driving up the cost and complexity. So we need less data, maybe, maybe, maybe we need less data sometimes. That's, that's the case often though we don't need less data, we have to see actually how the data is being used.

If you ask any data scientists, you say i've got this massive um silo of information. I wanna query it, analyze it, understand it. They're gonna go. The first thing they're gonna ask you is how do you want to use this data? How often do you want to query it? How, how many, how complex are those queries going to be and so on.

Um we don't really do that in, in, in the observable side of things just yet.

Um and we're missing out because of that. The first fact that I think is really, really interesting is 99% of index, observable data is never searched. Indexing is the most expensive thing you're going to do with any of your observable data because you're essentially storing very deep, very rich metadata about the information that you've got. Every the more indices you add the more complex and the job of managing those indexes become.

So indexing your data and then never searching it is just a waste of money essentially. So 97% of data, index data is never searched it. That, that, that begs the question. Why do we index so much?

The next thing is 95% of call outs are from the same five errors. So if you've done any SRE role, you'll know like that thing is broken again is a pretty common uh phrase. The same things break repeatedly.

Um even if it's an error that covers many different services, the same failure modes occur in different servers and different services repeatedly over and over again. So we have strategies that involve declaring hundreds, often thousands of alerts. It's fine on some platforms, cos they don't charge you per alert, but on many platforms they do. And so if your strategy involves creating thousands of alerts and five of them are only giving you most of the value that is a waste of money.

The next is that 99.9% of crews do not pass seven days.

Um this is uh a fact that we've found quite fascinating the average retention is typically between two and four weeks for to, to hold things in high performance indexed data. This is primarily focused on logs this particular one. Although it is true for tracers as well,

um slightly shorter for tracers actually. So most of your queries aren't passing seven days, but you're holding on to them for at least twice as long as that on average, often,

uh four times longer than you need to. So, retention periods are longer because we're over cautious. And the reason we're over cautious is because we associate the retention period in high index performance with the, the, that's the time window. We have to do anything useful with the data. After that, it's either deleted or archived and compressed and it becomes painful and slightly expensive to re index and re ingest that data.

So we have this association between indexing is useful and not indexing means it's basically compressed and gone. It's in the archive as it were and my favorite is this.

Um so we've had a few nuanced things here, you know, 99% of index data and so on. Generally speaking, and i i know this is a risky statement as a SAS observability company, but 30% of your data disappeared tomorrow, you would not know, you wouldn't know.

And I've, I've seen this in open source platforms. I've seen this in other SAS platforms. 30% of the data. If you imagine one of your applications is just a little bit devo g log happy and that's it that, you know, that, that could, that could massively increase uh the costs, but you never need to know about every millisecond, the, the heap size of a java application. You need to know that at a much lower granularity, you know,

um i bashed java a lot for some reason. I don't know why. I think I'm just scarred anyway.

Uh so the um the 30% data, it, it, it would never add any value whatsoever. So when you think about that statistic before 30% of your cloud budget, but like 30% of that is useless. So that's 10% of your cloud budget in some cases, is, is essentially just w literally wasted money. All these are the ones that could be used in different ways, but this is literally just wasted money.

So all of those facts lead us up to a really interesting place. It forces us to ask the question. Well, what can we do about it? What are the patterns that we're doing? And how are we managing the data today? And how can we do it better tomorrow? And, and most importantly, how much impact is that actually going to have?

Is this just an expensive game? Um cos some of some something if you have a lot of data, you have terabytes of data, it's gonna be very difficult to do that for a few dollars. Ok. It's gonna, it's gonna cost you some money. The game here is to make elegant use of the data, make efficient use of the data to build pipelines, architectures, be data scientists about this problem. And that way you can drive pretty significant cost savings.

Um and indeed these engineering principles are the foundation of the company i work for and we're doing quite well. So it's, it's a, it's a good uh indication, a good litmus test that these things work in practice.

Step one in anything you do. It's a, it's easy for me to stand up here and say 30% of your data is useless. 97% of the data is never uh accessed and so on, this data is crucial. So you can't, once you come out if you left the 12 right now and you go and delete half your data, you guarantee you'll delete the half that you need the most guaranteed.

So step one is create usage statistics. This is use prometheus query logs. Prometheus has a, a log mode that will tell you the queries that are being issued. You can log those, put them into OpenSearch if you wanted to and you can group by the query. Now, you know the queries that they use the most, you know the metrics that appear the most in, in, in usage.

This method is nice as well because any automated services like Alert Manager, for example, that's quit at polling. Prometheus will also tell you that as well. OpenSearch has slow query logs if you're feeling brave. And this is,

um only for the really sort of the ones that like to take a risk with an adrenaline rush. OpenSearch has slow query logs. What you can do is you can change the definition of a slow query to zero seconds and then it will log out everything. Everything is considered a slow query. If you do this on a busy node, you will quickly start a fire in your server room. So perhaps do this cautiously, but some people have had some success by doing this uh for like five minute samples on not crazy busy nodes.

Um and it does work, it gives you a really clear idea of what's happening.

Um so the idea here is just, just begin by what data is being used and how is it being used? How often does it appear? Does it at certain times of the day? Is it certain query volumes and so on then?

And i this is a step that people always skip.

Um and it it, it results in chaos slightly later on. So once you know what data is being used, define some use cases, the use cases are crucial to solve this problem. The reason why they're important is because if you don't have clear use cases, 3 to 5 is is a recommended number. If you don't have those use cases. What you have instead is everybody's own, very bespoke, very pet definition of how they use that data. It becomes very, very difficult to affect change at scale.

What happens is that people will say, well, we only query this monday on a full moon. It's so brilliant

Thank you. That's amazing. We'll keep it there, that's worth the money. What you do is you create, you, you break down some use cases, some examples, logs that only appear in dashboard where they're never queried uh metrics that you use constantly. These are like the really, really important they matter. The, the running of the company depends on you are having reg regular access to these metrics or traces that you ingest and indeed never look at ever again. Um is a pretty common one. It's pretty common traces as well. A lot. Actually.

Now what you can see is that these are use cases, they're based on how people are consuming the data. Ok? It's not super specific and you will have a bit of a battle on your hands with convincing people to categorize your data in these ways, but it is a worthwhile battle to have because when you get to the end of this process and you have a really clear categorization, it's gonna be super obvious uh where most of the money's coming from which use cases are causing the most cost and so on doing that use case analysis makes a much more simple scalable conversation. If you don't do it, it's tricky, it's challenging. So I'd strongly strongly recommend adding this. Um it's not an engineering step, but it's certainly an important step for keeping people organized.

The next thing. And this has ha i i've been booed at one conference for saying this. I, i loved every second of it. It was fantastic. But um essentially, i'm gonna do a quick show of hands here. So who has seen this pattern before? So essentially, you have data in like an opensearch cluster, for example, um it's there for like a week and really expensive solid state and then it moves to magnetic storage or it gets compressed and put it hands up. Has anyone seen that? Yeah. Keep hands of if you've implemented this? Yeah. Yeah. Shame. No, no, they um I've done this. I've done this about 10 different companies. I now apologize to those companies. I advocate that we drop this pattern. The reason why it's not at all. use case based. Essentially you say the the value of your data is based on the age of the data. That's the, that's the presupposition of this architectural pattern. It's not true. Some data's never queried in its entire lifetime. Some data is queried constantly for weeks. So what you do is you, you cut down the value of the the constantly used data and you massively inflate the value of the never used data. You pay for nothing. Essentially rather than a pattern like this where everything moves in time based windows. And this is by the way, if you have like a small amount of data, then this is fine. Who cares? You know, the cost savings will be negligible. If you have a large amount of data, this pattern alone is a massive impact in your cost.

Instead, what you can do is you can say, well, app one can go into the routing logic and that will go to uh ebs for example, go to some magnetic based storage app two is really, really important. App two, the, the whole company depends on this. These are info level logs or they're debug or error level logs. For example, error level logs are really common for this. By the way, they have to go to s sds, we have to be able to query them. We can't wait 1020 seconds for a query. It has to be immediate and app three goes straight to our amazon s3 bucket because we might need it for regulatory reasons if we get audited. But back away slowly, we don't need it very much anymore. Let's not index this thing. I remember what i said before. Indexing is the most expensive thing you can do. So index cautiously, i'll talk about more about that later.

Thank you. So, uh that's a bad joke. Sorry. Um so the, the idea here is, is really simple as well. I said before that people associate indexing with usefulness. Essentially, if the data has been indexed in high performance storage, then i can access it. If it's not, i can't do anything with it. Again, a data scientist would say no, absolutely not. What you can do is you can use something like an amazon s3 bucket. Treat it like a data problem. You can query that archive directly using athena. Now, athena will generate like a metadata table uh for you very cost effective. Um and if you're querying this data very rarely, for example, just running large scale reports. Once a year, just do this, you'll save a whole ton of money, you'll, you'll get the reports you need. Um and the only thing i would say is that it's a good idea to know exactly what you want to query for athena costs can mount quite sharply. Um if you don't know exactly what you're looking for. So it's really important that you have a, you have a, a period where you iterate on the query and then you really, really use it. If you, if you know what you're looking for, it's very, very efficient.

Um the next thing is to optimize your demand. So we talked a lot about storage and the other things you can do but some data you just don't need. Um and this is that use case driven analysis. If you haven't done this, you can't do this confidently, you can't block the debug logs of an application. Cons confidently cos you don't know if people are using it because you've done your use case analysis. You know, nobody is using app one's debug logs. They're just sitting around taking up space, slowing down dashboards and annoying me. So instead you can just drop them. It's a bit of a brave move. Um dropping them often means that you don't have access to the definite, you don't have access to the data anymore. But there are some cases where it's very, very useful and very, very important.

Most importantly, just cos your application produces the data as logs. You don't have to keep them as logs if you produce the log because it has a number in there. That's really, really interesting. Turn into a metric stick in prometheus, move on with your life. Have a, you know, have a happy time. Prometheus is much more scalable than any log sort of um retention. Then you could hold prometheus metrics for a really long time, a very, very long time with very high performance and it will work really well for you. So if you're holding the data, you can transform into metrics options, you've got there. So for example, if you log on into cloudwatch or something, have a lambda running, consume the consume the log, generate a prometheus metric, push it into prometheus, you can have a push gateway into prometheus and delete the original log or archive the original log or do whatever you want with it. But keeping the log a whole document just for the sake of a single number, that's a waste of time and money. So transforming things into metrics is a really nice way.

The good thing about metrics as well is you can hold on to them for like a year doesn't really cost you very much at all. The queries will almost always run very fast. Um and you, you only keep the thing that you really, really need. It's always nice. I said this before, i'll say it again. Um index extremely cautiously be the annoying person. Ok? Like if this, if someone says the word index, they're scared, you're gonna appear out of a corner and just like leap at them index cautiously. It is by far the most expensive thing that you're gonna do. It also is the most impactful thing you're gonna do. What i mean by that is just because things are expensive doesn't mean that it's gonna impact you on a day to day basis. You're gonna annoy your cfo but you, your life is very much unaffected. Indexing. Generally speaking, the more data you index, the more complex. Your challenge of keeping your queries performance is and the more complex a challenge of maintaining your entire storage solution.

So for example, if you're using opensearch and you index everything, it's a different problem to solve at a larger volume, you have multiple nodes, different responsibilities and the different nodes, you have split brain issues and all sorts of complexity. So just being cautious about what you index means that you have not only a lower cost but a much simpler operational challenge that translates into less time spent on this whatever cluster of stuff you've got more time focused on the product, more time focused on what you're actually wanting to do with your job and your life. Indeed. Yeah. And i'm just gonna say it one more time just for a laugh. Index cautiously, really, really cautiously be the annoying person. Don't, don't worry about people not liking you. You will save a ton of money, your boss will love you. So index cautiously

High cardinality in metrics. This is the next thing. So i said metrics are very, very efficient, high cardinality, metrics are like the annoying cousin you have to invite to the family dinner. Um but you don't want them to be there. That's high high. They kill prometheus clusters, they kill memory usage, they kill cpu. So there's a few things you can do. You can split up your time series to make sure that there's not so many different um values for each individual label. Um you can aggregate label values. So you don't have so many labels on a particular time series metric and you can just delete labels, you can do that. It's an option. Um that the reason why um prometheus generally most time series solutions they work really well, with lots of simple time series, they, they work really, really well with that. They don't work so well with single massive time series uh or lots of massive time series. If a few of them, it's not so bad. So if you can keep the time series as simple as possible, you're gonna have an easier time with it. So definitely lean hard on the um avoiding high cardinality.

The next one. I do ii i said this after my last conference, a a guy came up to me and said, i completely disagree with this slide. I was like, cool. So um you don't need all your traces. So essentially most the vast majority of your tracing data hopefully is just saying that everything's ok. Uh if it's not, i'm sorry, i don't know what to say to you, but um i'll pay some therapy for you maybe. But the um most of your data, the vast majority is 200. Everything's ok. You don't even, the software engineers aren't so needy to be constantly reassured that everything's fine, you know. So instead what we can do is we can use something called tail sampling. So most um uh sort of uh tracing exporters will support this. If you use open symmetry, it supports this um tail sampling is essentially just being selective about what data it sends out. The power of this is that you don't lose that many insights. So if there are any errors, they'll all make it through. If there is any uh large changes in latency, it'll make it through. All it's doing is just being slightly more selective about how much data you actually send to your provider. This is really useful if you're indexing a lot of data, if you're using something like jr internally, just for the performance of yr, for example, you can make your life easier. If you're using a sas platform, it just directly translates into lower costs doing this, this will just and this plus the compression rate on some of the solutions will massively lower your costs as well. If you, if you get charged by data transfer rates, which egress and ingress costs in the cloud can be a sneaky uh cost that you have to avoid.

So i've talked a lot about open source solutions there. I want to recap on some of the points. Um what's really, really important is that we talk about the, the the things that we can do as engineers, the things that we can architecturally avoid. So that pattern of um high performance and then like magnetic and then low performance, we need to break that conception of indexing equals useful, not index is kind of useless.

We need to break that completely. Um by doing these things, by going down these processes, you can massively reduce not just the amount of data that you've got, but you can be very, very efficient and very effective at how you use that data.

Treat it like a data scientist and you will save money, you will save time. At the very least, you'll spend less time debugging a broken cluster with 1000 nodes in it. You know, you'll, you'll have a much easier time with it.

So at the very least it will translate into operational simplicity for you, which is a very significant cost because as we know, engineers are very well paid these days. So, um and if, if your, if your expensive engineers can focus on the product and not on keeping a creaking uh cluster alive, then this is always a good thing.

So even if it doesn't turn into material cost, doing these things will make you a more efficient shop when it comes to managing your absorbability data.

However, there's the other side of it as well. So many of you will be customers of s observ, indeed many of you will be both. I've worked at companies that have sort of open source clusters floating around and also contracts with s observ providers.

It wouldn't be fair of me to um only discuss the open so open source side. So um step one, when you are dealing with the self observable side of things, abstract your provider, be very, very cautious of proprietary software.

Um the reason why i'm not saying that they do this intentionally, i'm not saying it. So uh the um but the, it, it definitely benefits people, you know, um if they can get their code into your cobase as much as possible, your, in your incentive to leave is much, much lower. It's much more difficult to leave. And actually there, i if your, if your bill is increasing, they have to add on the engineering costs to, to work out if it's even worth the effort.

Bender locking is a serious, serious problem, serious business. Um things like open inflammatory are nice. Why is open inflammatory? Nice. Most providers insta with open flary. In fact, they have their own built in exporters. So you don't even need to download extra modules or anything. You just use the image that openly provides.

If you don't like a certain provider, you change a few lines in the configuration, you are now pointing to a different provider. That's it. So open symmetry and, and tools similar to open symmetry, even if you're using things like fluent bit fluent d anything open source that's conflict driven, they're essentially some of the last integrations you're ever going to have to do.

Because after this point, you can send to different providers, you can shop around. If for example, you go to buy a car, you go into the ford showroom and you say i only want this car, the dealer knows that that that's the car you want and they can charge you whatever you want for it. If you have no choice to move away.

If you can shop around, they have to be competitive and so your ability to move quickly and easily between vendors will improve your negotiating position. And at scale negotiations are as as many of you will already know if you have a lot of data, they never charge you the list price, they will always negotiate a rate with you.

Um this is really, really important so that just being able to, this is your leveraging position as a as a company to not be locked in.

This is my favorite question. I love making sales reps, sweat. Um it's my, i'm not, i don't work in sales, it's great. Um but the um the, the essence of it is this ok. Cost optimization is kind of a um a funny topic in the world of observable right now.

Um the reason why is because obviously cost optimization means you're spending less money, you're optimizing your costs. Um i say this as an observable as a, the advocate for an observable company. Uh we have to get better at this as an industry. Yeah, it's a, it's a real failure to our customers uh uh for a product in general. If we're not offering real solutions for cost optimization, many uh platforms will say, oh we do cost optimization and then when you actually get into it, it's a, it's a single document that's like 500 words long and it says like delete the things you're not using. Not the most helpful piece of advice that you could use instead.

Um you need something that will tell you what you're using, how you're using it, what can be, what can be dropped and also you can be pre emptive about it. So it's not just about how you're wasting money, it's just about building a decent pipeline for your data.

So this is my favorite question to ask because it's very specific. What tools do you offer for customers to optimize the costs? Why do i like this question? If they've built tools, they've probably dedicated some of their engineering time to this problem. If they've dedicated engineering time, it's been a priority. Engineering time is almost unanimously the most expensive thing that a company, the the most expensive force that a company has. They have to be very, very cautious about how they allocate it, how they use it.

If they have spent engineering time on this problem, you know, at least they cared about it at some point. Um if they only have a single doc, if they only have a single page or some very vague advice isn't very useful, then, you know, they haven't really committed very hard into that uh problem to solve. You know, it gives you a sense that maybe it's not gonna be as easy a journey in the future as one might expect.

Um all this is to say is that there's, there's a lot less in the sas side because obviously you don't, you don't own the infrastructure, you don't own the solution itself. So it's all about being a savvy consumer, but just to go back a step just to make this point again, if you're locked in, nothing else matters. If you're locked in, they can do whatever they want and they know they can do whatever they want. And it takes a very serious amount of, um, push for you to move.

So i just really want to double down on that point again, avoid the lock in. If you can avoid the lock in, that's 80% of the battle one right there. And then the second thing is if you're looking to avoid those scar ring costs, make sure that you ask this question. Like i say, it's my favorite. It points towards incentive, it points towards intention within the company.

Um lots of companies have very slick people that can talk very well about various different topics. This is very difficult for them to get around. They either have the tools or they don't. Documentation is not tools, tools in like interactive like pieces of the platform that i can work with. What are features if you built to solve this problem.

So just a quick word on corelogic. So um we do have cost optimization tools. We took a bit of a leap a few years ago and went all in on this. So one of the things that we noticed in the industry was that um there's a few key problems with pricing.

One, it's not very transparent that is there's various different ways that you're gonna be charged money based on different services, different data types, different retention periods, different channels through which the deal came and so on and so forth.

This uh this is a, this is actually a more of a bigger problem than um just general expensive costs if it's expensive but you know how much you're gonna pay? You can budget for that, that's fine. You know, it, it's, it's rough, but it can be dealt with if you don't know how much it's gonna cost you. By the time your invoice gets there, you're in serious trouble, that's very, very difficult for you to, to work as a business every single time your new invoice comes. It's a new war that you have to fight to bring, to shave off a bit of money that you think is fair.

So cost transparency is really, really important. That's why what we do is we show you very clearly how much you're spending. We also show you um a use case driven data management approach. So you can, as we talked about before that engineering principle of being sort of um data focused here and, and being use case focused and how you optimize your data. We do that in the platform. It's called a tc optimizer.

Essentially, you can categorize into three use cases frequent search monitoring and compliance, uh frequent search being indexed data that's used all the time uh queried all the time monitoring data that has basically everything in frequent search just without the indexing.

Um so you can drive dashboards with it, you can trigger alarms, you can generate metrics, you can train machine learning models, you can do all sorts with it. Um but it just doesn't get indexed into high performance storage.

Um and compliance level goes straight to an archive. Now, an archive in co logics is different. We store data in s3 or in your cloud account. Um but it's in your cloud account. The only thing you pay for is the cloud storage costs.

Um so you pay for the ingestion cost, each level of these have a different um discount associated with them. So if you're in frequent search, you'll pay the essentially the list price you'll pay whatever the the normal rate is because it's getting access to all the full features and all of the indexing and everything monitoring immediately gets you a 65% discount just by not indexing. That's because indexing is the most inexpensive thing you're gonna do. So you help us avoid that and we'll charge you a lot less um compliance comes with like an 85 88% discount.

Um because it's going straight into an archive, an archive being cloud storage in your account because it's in your account, there's no markup involved. You pay just the cloud storage costs and the ingestion costs. That's it.

Now, one of the things i said before is the the relationship between indexing and usefulness. This is a, a very tightly coupled idea right now. So what we do is we say if once your data is in the archive, if you ever want to use it again, you don't have to re index. Instead you go into a log explore screen, you can see here. And when this arrow is pointing, you can see it goes to a button that says archive. You are now in archive mode, you now issue queries directly against your archive at no extra cost. Without the need to re index, you can query as much data as you want.

Um and the, the only, the only limit that i'm aware of is that there's a time out of five minutes on the query. So you can cover a lot of data very quickly for no additional cost. What this means is that you have a very, very large growing data set um without the additional very large growing costs associated with that, you don't have to index everything. You don't have to manage this crazy cluster of stuff. It just all works really nicely.

S3 for example, is really nice cos it can deal with huge volumes of data and nobody has to care about it except our good host today, amazon.

Um we also offer uh over 200 sas and open source integration. So if you have, if you, if you have an existing approach tool, you can use that or you can connect with completely open source solutions, we actually contribute directly into open telemetry as well.

Um where approves on some of the repositories, especially on the cuban side of things. Um the reason why we do that is because this gives you a nice negotiating position. But also if we do, we do manage it on boarding for you um which we do offer at no extra cost.

Um we will install open symmetry for you and we don't, we're not, we're not here giving you a proprietary magic clogs log collector that does the same thing as every other magic proprietary log collector. It opens summey. Life's much, much easier.

Um it works really well. The documentation is great. There's a big community around it. Why would we waste time re reinventing the wheel when this thing already exists? It works very, very well. We also offer various things like cuban ti's operators.

If you're big on um uh monitoring or sort of infrastructures, code monitoring is code. Um you can declare cuban tis cr ds and custom resources and they will do it. They'll create dashboards and alarms and tc optimization rules and much, much more and finally custom endpoints as well if you have um downstream services that you wish to invoke.

So for example, in alarm fires, i want to invoke this endpoint here that triggers a script that will do whatever um you can do that. You can do that with downstream. You can also do it with custom endpoints internally as well. So we can expose an api on the log on the core logic side and you can just send the logs straight to this api.

Uh you can send a diary as well, metrics and traces straight to this api as well. So it's very, very flexible and the idea here is we connect precisely how you need to but using as much open source as possible.

Um, so we're not bogging you down with vendor lock in. It's a much easier, um, proposition for you guys. If you don't like us, we can just switch the config and move on. No one's ever done that yet. So it's all good.

So that's most of the talk. Um i've covered, i just want to do a very, very quick recap of some of the things that i've covered.

Um, one thing is the indexing. If there's, if there's anything you leave the room with today, be, be fearful of indexing. Ok. Wake up at night at three in the morning screaming the word indexing, be terrified of it. This is a thing to avoid, ok? If you can avoid this, you will lower your costs and guarantee it at the very least you'll make your operational complexity much less.

Um, and the last thing is, um, asking that question, what tools do you offer for cost optimization? Not what documentation, not what consult, not, what $25,000 consultancy do you offer? And so on. What tools can i use on a self service basis to optimize my costs? Those will give you an idea about the intention and the earnestness of the company that you're working with.

So um if anyone's got any questions, feel free to um feel free to grab me more than happy to answer the questions. Um there's a bunch of qr codes there for various social things.

Um yeah, thank you very much for attending. I really, really appreciate it and uh hopefully you've learned some lessons today about how to optimize your cost, be more efficient with your data and make more elegant, more responsible decisions of the data that you've got to lower your costs, but also make your lives much, much easier in an operational perspective. Thank you very much, everyone. Cheers.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Driving down the cost of observability (sponsored by Coralogix) (Coralogix )

Welcome to the talk, dragging down the cost of observable. A little bit about me first and a little about the clog of the company. And we're gonna talk about some open source techniques, architectural patterns, anti patterns, things to avoid uh to stop tha
复制链接

扫一扫