Observability best practices for hybrid environments (sponsored by Splunk) (Splunk)

最新推荐文章于 2024-11-13 17:19:45 发布

李白的朋友王维

最新推荐文章于 2024-11-13 17:19:45 发布

阅读量172

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134818640

版权

Hi everybody. How's everybody doing? Come on. I know it's after lunch. I know everybody's food coma and tired from walking around in the expo. We can do better than that. Come on. How are we doing? There we go a little better. One more time. How are we doing? That's better.

Alright. Uh welcome to my session. Thank you for showing up. Uh today, we're gonna be talking about observable best practices in hybrid environments. So let's get right into it.

So that is me. My name is Chris Crocco. I am the Director of Observ Technical Interlock for Splunk and really, that's just a fancy way of saying that I am the advocate for customers and partners to our product team and vice versa. So I let them know what it is that you're asking for in our products. I, and I let you know what realistically we can do where we're going and kind of make sure that everybody's level set so that you're successful and we have a lot to cover today.

So we're going to talk a little bit about what hybrid means because that means something different to everybody. So we're going to level set on a couple of different ways hybrid can exist observable versus monitoring. So a lot of times that gets conflated, we'll clarify that standardizing data collection with hotel, some best practices on using hotel and observ ability in general um tear in your data. And then of course, i'm from splunk. I've got to show you our fancy new toys in the observable cloud. So we'll take a quick look at those. Um and then leave some time for q and a today.

Alright. So common scenarios for hybrid environment. So when we hear hybrid, it can mean a couple of different things. So it can mean that you have to have an on prem environment and a cloud environment that on prem environment is for compliance or your legacy monopoly stack that is so old that you can't break it into microservices or whatever else. it may be, it could be workload migration where you are in the midst of moving that on prem or legacy environment into the cloud or into multiple clouds. Uh and you, you're in some stage of that migration journey and it can be geographic availability, right? So you may have the same service running in multiple aws regions of multiple aws tendencies to support different geographies with the same services. So these are really common scenarios that we tend to see a lot when we talk about hybrid and all of them have relatively the same challenges that they present even though they have different requirements.

Um and what we're seeing a lot of, in terms of complications with those hybrid environments is the amount of long tail acids and the complexity that comes with uh that long tail as well as your new technology stack where you're having to support a lot more services, a lot more infrastructure, a lot more data sources and data emitters um and more points of failure in your environment. And usually what that means is you're going to wind up with a tool for each one of those things, right? you might have. so for your old microsoft stuff and tiy for all your old network gear and you know, aws uh dashboards for your ec2 instances and it's all a whole bunch of stuff that you have to manage, which can be pretty complicated.

In addition to that long tail of assets and things that you're going to have to manage just in hybridization itself, there are other challenges of just having hybrid environments on their own. First one is lack of visibility. So even though you may have that long list of tools that are, are for each thing, not being able to see how that hybrid environment is operating is a problem or lack of visibility when you're migrating something from a monolith into microservices and making sure that you don't lose fidelity um or cause problems for your customers.

Um again, those complex tool sets. So if you're going from something that was, you know, a compiled uh application back in the day. And you're moving that into a microservice environment that's, you know, eks and you're doing um answerable and, you know, new languages and all sorts of things that add complexity. Your tool set becomes a problem.

Port mtt r. So if you don't know where everything is, if you can't see it, if you have that complex tool set, you have wind up with a lot more finger pointing than you have in the past because there's a lot more responsibility and a lot more points of failure and in scaling difficulties, right? If you have to scale vertically or horizontally and you're in a hybrid environment that can create some challenges because not all microservices are scaling at the same time. So you need to be able to understand what scaling one microservice does to others in your environment.

So observable versus monitoring. So at splunk, we talk about this all the time. This gets conflated a lot. So what we mean by observ when splunk talks about observ is a practice used by software developers, sres it ops teams and others to improve digital resilience and lower cost of unplanned downtime.

So that doesn't mean that you're waiting to react to something that means that you are being proactive in ensuring uptime and availability so that you don't have to react. So when you are talking to your boss about monitoring versus observ, these are some things that you can show that tell the difference, right?

So monitoring is again a very reactionary approach to interacting with your systems and your infrastructure, it tells you whether something is working or not. Usually you have some kind of alert or trigger and it's a collection of metrics and logs specifically from a system for that purpose. It's very failure centric. Again, you're waiting to react to an adverse condition happening in your environment, waiting to respond and it's something that you do, right?

So we've all been in environments where you have a traditional knock or a traditional sock and you have just a room full of people with a giant flashy looking board full of dashboards waiting to react to things. But it's something that you do and then you are responsible for monitoring those environments for being proactive, active and kind of herding the sheep, so to speak at all times with monitoring.

Whereas observable is kind of the opposite of that. It's a measure of how things are performing and it lets you know why it's not working. It's not what but why or it's what and why. Rather useful insights from those data. So not just that the data is telling you what went wrong, but you're able to see the context of the first point of failure of the root cause and get to m tt r faster overall behavior.

So again, observable is not about reacting to adverse conditions it's about measuring the current state of a service or a system. So if you just need to make sure that the code push that you did or the version that you put out, or that replication that you did into another region is working successfully. That's what it does, right? It's not just waiting for it to break. It's something that you have.

So most, most folks that are doing observ, that's not their day job, right? An s r's day job is not just to stare at dashboards all day long. It's something that you have as an asset that allows you to do your job more effectively and you make yourself observable. So when you create a microservice, when you create a containerized environment, you make it observable to your observable pipelines so that you're able to see it automatically.

And the building blocks for this are three primary data sets. So i'm sure some of you here have heard of the three pillars. But for those of you who haven't uh first is logs and events, second is metrics and third is traces. So each of these gives you a very important component of observable and you need all three in order to truly be successful.

So logs and events give you the why it's the smoking gun and the context of what happened. It's your, your information rich event metrics are telling you the performance of an environment. So think of it as the gauges on your car, right. It's the same thing. It lets you know when you need to put gas in it, lets you know how fast you're going. Same thing with metrics. That's, that's what they're there for is to let you know your performance and traces tie all of that together to let you know how those systems are, are interacting with each other. And whether you have points of failure between your microservices or between your monoliths and your microservices or whatever other hybrid architecture you may have.

So let's talk about some best practices for implementing this. So everybody has a different approach to implementing observ and nine times out of 10, it's missing one crucial thing, which is how do i get the data, how do i make it uniform? How do i make this consistent across everything that i'm looking at, whether it's on prem or in one cloud or multi cloud? And the first best practice that we recommend is standardized data collection with open telemetry.

So splunk is the number one contributor to the open telemetry open source project. We are putting all of our eggs in that basket from an observable perspective because we understand the importance of those open standards and those best practices being shared. Uh so that when you move from one service or one system or one piece of infrastructure to another, everything's familiar consistent and giving you what you need.

The second thing is to maximize value of data tearing or you may hear this referred to as data rebalancing. And what that means is if you're used to collecting metrics from logs, just collect them as metrics and use your logs for other things. We see this a lot where if you're using, you know, a legacy system where all of your cpu memory iops, et cetera for maybe a hypervisor is in a log. When you're going to open telemetry, you have the opportunity to refactor that into a higher fidelity, more granular metric that's going to be more performant for you in the long run. And that's just one example of data tearing. We'll talk more about that in a little bit.

So standardizing data collection with open telemetry. Again, we're very, very big advocates of open telemetry at splunk. Um and we continue to support and contribute to that project uh going forward. And the reason again, the why behind the open telemetry and why we're so excited and, and invested in it is addressing some of the common hybrid hybrid data collection issues that we see out there.

So if you're used to proprietary agents from multiple vendors or writing everything into a cis log environment and pulling it out of there when you need it or you know, just sending everything through a kinesis fire hose and hopefully it makes it to a destination. Am i speaking in anybody's language here? Has this happened to anybody before? Yeah, a couple of folks those all create problems because you're disconnected from the visibility across your entire environment.

Um it also increases a lot of noise, right? Whether that's a vent noise or metric noise or trace noise, you're not able to see everything that you need in a consistent fashion and you're having to go dig for those needles in the haystack and that puts you back in that monitoring realm. It also deals with the problem of partial data collection or correlation where you may not have a standardized set of tags or attributes or fields that allow you to correlate a metric with a log or a trace with a log or all three together.

So it addresses the fact that you need normalization and standardization in your environment and then siloed and no single source of truth. So if you just let your your developer do standard out on everything and call it a day, um that can cause a lot of problems, right? So work breaking down those silos creating consistent standards and making it easy to adopt for your ad and cloud ops teams.

Um so why open telemetry, you know, there are a lot of other open source projects that do this kind of thing. Um but open telemetry is kind of the the conglomeration of a couple of other open source projects um starting with open census and it's a way for you to do again, not just a single type of data collection. So not just metrics, not just traces, not just logs and again, not in a vendor proprietary way.

Um and it allows you to extend a single collector for a lot of different use cases. So there are SDKs, uh, and frameworks for that tracing and, and code instrumentation uh that continue to get built out and supported by the community.

Um and it allows you to do a lot of the processing that you might do further, right in the, the ingest stream uh in, you know, kind of an aggregator or a, a intermediary tier uh closer to the data emitter. So you have a few concepts that we'll talk about that allow you to do that normalization almost in real time.

Um so again, speaking of real time, the latency, if you're shifting left, if you're shifting closer to your data emitters, whether that's Kubernetes or the application itself or, you know, some service that you're consuming from AWS, the closer you get to it, the lower your latency is going to be as you're processing in your ingest pipeline.

It also helps with your traffic. If you're not aggregating everything into an intermediary tier to do all of this processing, it makes it a lot faster to get it into your receiver and your back end where you're going to consume all of those metrics, traces and logs much more efficiently in a higher scale.

Um it also lets you find um errors in your environment a lot faster, right. So if you're doing a lot of this stuff from your open telemetry collection and you're standardizing that latency, traffic errors and saturation all become very easy. Once it's in the, in that back end to say, I know exactly what I need to look for and it's consistent every time and then saturation of your environment. Again, you're able to see very quickly when you're saturating a service, when you have services that are under utilized that you might rebalance or or descale. But these are things that are benefits and these are what we call the golden signals.

So these, these are things that are going to be applicable in every part of your hybrid environment, whether it's on prem multi cloud, just AWS, all, all of them can be repeatable in every part of your environment. And this is kind of what we're talking about.

So i i know i said a lot of buzzwords there and a lot of, you know, what does that actually look like? But this is really the intent.

So who, who has microservices that look kind of like that where that question mark is there or like I've got a whole bunch of stuff that i just decoupled and i have no, no idea who's talking to what right now look a little familiar. Yeah.

So one of the benefits of OEL particularly with distributed tracing is finding those relationships between your, your distributed cloud native services and those services that may be in other clouds or on prem and showing the relationship between them as interactions are taking place in your environment.

So this is really, instead of finding needles in haystacks, it's finding trails in the woods and instrumenting with open telemetry is actually relatively easy. And it has a lot of things that are, are pretty common in individual components and other types of data collection that have been again conglomerated to make it a lot easier to do.

So the first thing is receivers. So that's your inputs. What you're collecting data from those can be push based, those can be pull based. Um those can be, you know, anything from sending HTTP event collection from an old Splunk forward or into it or pulling from, you know, CloudFront or whatever it may be. But that's your, your input.

Processors are what you're going to do with it in the collector itself, right? So if you're going to do things like standardize your tags based on which tenant you are in, in a multi-tenant environment as an example or put a geographic tag on something or a service tag on something. All of that processing takes place in your OTel collector and then exporters are how you get it out where you're going to send it to. Obviously Splunk is a supported one and we have the Splunk distribution that makes it really easy to do that, but it's not the only one if you want to send it to S3 or uh Amazon Security Lake or some other uh analysis or, or um visualization tool. There are a lot of exporters out there. And again, this continues to grow as the community contributes to this project.

And extensions are things that you can do with that typically outside of that, that standard receive process export. So this can be things like enrichment or, or other more complex capabilities that you want to do in the collector itself prior to egress that that environment.

Um so there are a couple of other things that you can do outside of the collector itself. So what we just talked about is kind of what you might think of traditionally as the agent piece it's referred to as the collector.

Um but there are other components of other open telemetry that allow you to extend that capability and get a lot of what we're talking about so far.

Um so additional code for one time inspection uh via the tracing system. So again, there are lots of SDKs code libraries that continue to get enriched enhanced and, and published that allow you to do anything from, you know, Ruby and Java JavaScript. Um and that list keeps growing.

Um it captures function, call duration, custom metadata and can model call graph where applied. So again, if you're getting into those more complex capabilities um where you may have some custom metadata for a net new microservice that needs to have a lot more fidelity in that metadata so that you can interrogate it a little bit more. Once you receive it in your analytic system, it allows you to do that.

You can modify application behavior by injecting custom headers and you can use standardized interfaces to track the app client framework state, right? So this is something that we hear a lot in terms of an issue and complexity where it becomes really hard and not everybody is using the same thing. It gives you a standardized way to see that, that app in client interaction and framework and and assign things in a way where you kind of have the same thing happening everywhere and it can be applied dynamically.

So I've had a lot of conversations with people just in the last day on how much of this am i going to have to do manually every time i spin up a service or horizontally scale my service? Um it can be done without much user involvement. So particularly if you're using specific distributions like the Splunk distribution that's pre optimized, it's pretty close to plug and play and you can customize as you need, not a lot of instrumentation that that's required.

So I've been talking a little bit about instrumentation libraries. Here's an example of some of the ones that are supported right now. And you can see we have instrumentation libraries for all of the different things that you might want to do. Right. So there are application libraries, there are RUM libraries and then there are service serverless libraries. So, you know, I've heard the question a lot over the last couple of days. Can this support Lambda? Yes, absolutely. There's a library for that and it continues to be enhanced.

So all of these libraries provide you a lot of capability but still make it standard to collect all of these things regardless of what you're instrumenting.

And this, it looks a little complicated. But when once we break this down, we're going to talk a little bit about how all of this comes together.

So from an application perspective or a serverless perspective, you have your code level instrumentation that's going to that collector piece again, that agent piece which is doing, it's receiving, it's processing and it's exporting.

Um you'll have your collector gateway, which is that additional control if you need additional points of egress or additional aggregation types of things taking place.

Um and all of this is happening within your edge, within your boundary, right? And then from there, those exporters are able to send the cumulative output of all of this open telemetry work into things like the Splunk Observable cloud, which we'll look at it a little bit or if you're using Splunk Enterprise or like I said, a data lake or third party tools.

So if there's stuff that's not operationally relevant, but you still need to keep it for. So two or PCI compliance, uh open telemetry can allow you to route that appropriately depending on what that data is and how you're going to use it.

All right. So we're going to talk a little bit about how to implement all of this. I know that was a lot of complex stuff. Let's talk about the best ways to put it into your hybrid environment.

So the first thing to talk about is implementing context propagation. So context is incredibly important and making sure that you're doing that fully and propagating that across your hybrid stack is really, really important so that you can get that end to end visibility once you receive the data.

Um standardizing tags and attributes. So having a standard set of bare minimum tags and attributes that every single service in your environment, every single in infrastructure component in your environment is going to use allows you to do that correlation much faster and drill down to see the relationships between infrastructure application um and interactions with each other.

Defining golden signals. Again, we talked a little bit about the golden signals earlier. How defining them earlier and and closer to your data emitters provides you a lot of value but defining those golden signals in addition to the ones that are kind of out of the box and well known, if there's something else like an SSO or another standard that your business needs, and it's going to impact every single component of your hybrid environment, make sure it's defined.

Um and then utilized standard detection and alerting for critical alerts. So understand what critical means, understand how to detect it and make sure that those standards are propagated across your environment so that everybody is using them regardless of what the service is, how ephemeral it is or where it's rolling out.

We talked a little bit about context propagation and what that means this is kind of the most important part of instrumentation. And we're talking about the application layer mostly at this point.

So we have a couple of do's and don'ts here when it comes to context propagation.

Use auto instrumentation libraries as often as possible. One, it's going to save you a lot of time and a lot of headaches. But it's also going to allow you to use those out of the box and standardized propagation components that are in those libraries so that you're not building something from scratch or having something that's unique or out outside of your standards in your environment.

Standardize on a header format that aligns for your service distribution. So there are a couple of different header formats that are out there that you can use. Just make sure that you pick one right and standard that across all of your hybrid services.

Configure span kinds based on their use. So a span is part of a distributed trace. It's that individual component of a an interaction transiting all of your services and configure those span kinds based on what that service use is going to be.

Um and then standardize again, attributes and tags. You're going to hear me say this a lot attributes and tags, make sure that those are common across your application and infrastructure so that you're, you're making it very easy to see the correlation and causation that's going to be there.

A common don't is use multiple auto instrumentation libraries in the same application. So if you're using the Splunk distribution of open telemetry, you may not want to use another auto instrumentation library on top of that for performance issues. If nothing else, right, you're going to create a lot of back pressure and performance problems in the application itself because it's having to run those libraries in addition to whatever else it's intended to be doing.

Manually instrument without knowing the performance implications. So if you're in tinkering with just the standard open source version of open telemetry, don't just go start throwing things in manually or changing code without understanding how that's going to impact the code. It's also it's a similar problem or potential problem to using too many auto instrumentation libraries.

Instrument with unsupported headers. So again, this gets back to standardizing which header it is. Um if you know what your your back end is that you're going to be sending to and you have a standard for headers that that is going to be using, don't mess with it, make sure that you have that standard otherwise you're going to start losing things and it's a bad, bad day.

Um, and set propagators locally. Um what this means is if you, if you're setting this at a global level, you may run into issues where there are components of your instrumentation or of your application um that need to have that uh locally so that you're not, you're not running into an issue where that propagation is delayed or creating back pressure or latency elsewhere.

All right. So you heard me say tags and attributes, I think four times now. Um but I'm gonna say it probably four more times. Uh so tags and attributes again, super duper important uh for being able to see what is actually broken and follow those breadcrumbs.

Use the same naming convention within your services. So, um this isn't just camel case or snake case that we're talking about here. So like if you're naming everything in your AWS regions with us, one or us west one, make sure that those are common and you're using the same convention, even if it's not the same name that allows you to make sure that, that that propagation is going to work correctly and, and that correlation insurer.

Tagging and attributes and workflows. So, tagging and attributes are super duper important, but a workflow is going to let you know from a tracing perspective what is actually taking place how it's impacting the service, what part of the workflow is having a problem and it gives you a much cleaner view of what you're going to need. Once you're interrogating that data.

Use semantic conventions common to both apps and infrastructure. So what this means is that those tags and attributes, even though you might be doing tags and attributes from an application instrumentation perspective, it's running on infrastructure, it's probably running on the Kubernetes environment or something similar. And you need to be able to pass those tags and attributes or find those tags and attributes for the infrastructure as well as the application.

"So make sure that those symantec conventions are, are common a common don't and this is one that we see all the time is don't overuse custom tagging. So if you have some developers that are, are making some super neat custom metadata that they want to pull out and it's just for their one little microservice and they have 45 different tags that they want to add. I mean, that's, that's nice. But let's find how that works with the rest of your services, the rest of your hybrid environment and standardize there.

Um, custom tagging can create a lot of other issues when you're trying to find things in your environment, allow missing tags and attributes in traces. So the reason that this is important particularly in hybrid environments is you may not have those tags and attributes fully deployed everywhere, right? Or not be, you may not be able to deploy those tags and attributes um in particular spans within your trace, but you still need that span to come through. So your trace is complete um set tags and attributes without documenting.

So who has had a developer set something, write some code and not document the code and then you're like, they, they move on to the next job and you're like, what are you doing? Like what, why did you build this? And then you go find it on stack overflow and you're like, oh copied and pasted, right? Anybody have that. Yeah. So document your tags and attributes just like you would any other piece of code and then don't deprecate legacy schemas until the replacement is validated.

Really, really important when you're doing a cloud migration or a lifting shift where you might have some legacy schemas that are for an old code stack that you're breaking into microservices until you have validated that those microservices are working, you may need to roll back. So make sure that you have that capability and make sure that you've validated everything that's going to be replacing it before you deprecate that in the environment.

So standard detections and alerting again, this is that mtt rmt tdmm tt x piece. One thing that's a really good common practice is to include traces and errors um in events and logging, right? So what a good example of this is if you have an invalid request come through in a span um during a trace in a pm, you may want to include that span, invalid request information uh in your logging so that you can tie that to something like an invalid api token, right? It gives you additional context and it allows you to follow the breadcrumbs all the way through across the three tiers of data.

Um utilize automation before creating alerts. Uh what this means is if you have a mechanism to self recover. So a great example is um you know, if you have a replica set out there, that's going to help you with resource constraint issues. Use that um if you have other technologies in your environment or other uh automations that you're using in your environment that are going to help you self recover, use them first before creating alerts.

Now that doesn't mean that you can't create a detector that something was wrong. But an alert is for situations where you need hands on keyboard or you have an impacting incident. And if it can self recover, you don't want to break the velocity of your team. If there's not something that they need to do, um use attributes and tags to define service ownership.

So again, attributes and tags, it's not just so you can tie your applications and infrastructure together. It also defines who is going to be the responsible party for that. So let's say you have a version that was just pushed out, you know, who to go talk to if that's causing adverse conditions.

Um and then t detection to c i cd inversion. So we see this a lot in terms of um you know, i just did a push, i just did a merge, i just did a branch. Is that impacting prod, right? Who's ever pushed something direct directly to, to prade or a main branch? And it wasn't ready and it hadn't gone through testing yet or has had that happen and then they have to get on a phone call at three o'clock in the morning. Yeah, not fun.

Um again, don't use overuse custom tagging. You may want a lot of contextual information about what's going on with your detections and alerting, use your logs for that, right? You, you can put a lot of that contextual information into the logs without having to do that in the tags and attributes in your traces and your metrics.

Um i'm going to move forward just for the sake of time here. So we talked a little about how to implement your data collection, how to implement across a hybrid environment and what to implement. One of the things that naturally comes up when we're talking about this is how to, how to tier and how to prioritize these different data sets in terms of operational impact and value to your teams and to your company. Uh so making sure that you maximize that value.

All first thing is time, right? The more real time you are, the more operationally relevant it is from an observable perspective. All right, the closer you are to it, the closer you are to things like your m tt ds solos, et cetera. Now, as that time horizon expands, the value of those data types for observable use cases begins to diminish and that's really important because what you're going to do with it changes, right?

So the longer you keep it, particularly with things like logs like logs, you may not need to keep for a very long time for your observable use cases. But they may have a lot of importance to your security team or your compliance team. And so the use case for it shifts. So be aware of that, be aware of where you're putting your data and what the value is because that helps you not only uh optimize this for operational performance, but it also also optimizes for cost performance too.

So data tiring approaches are really kind of common regardless of what your hybrid environment is. So use an abc model to determine how important that data is and where it should live in your environment. So a tier business critical, you know, if your cto talked about it in your, your 10-k release to your stockholders, that's a high priority use case that gets first priority in terms of instrumentation retention um and sending into your environments.

Tier b again, this is something either a low signal or something with a longer age window to it where you may need that for things like post mortems or root cause analysis. Um and it may, it has value but it's not as important and doesn't necessarily need to be retained or instrumented first or as long.

And then tier c use cases are the ones where you really don't want to keep that in a high-value data store or as a metric. Necessarily, these are things like everything that you have to keep in s3 buckets for soc two compliance audits who's been told to shove everything in s3 because compliance few people, right? If you work in banking or you know, international stuff or health care, you probably have to do that. This, that's a tier three use case because you're never going to look at it, but it is important.

Um and then align the cost to value when applying splunk capabilities. So we're moving away from the instrumentation piece. We're, we're moving into our bread and butter at splunk. Not only do we want you to tier how you're collecting data and what that data is, we want you to tar it based on what you're going to do with it. Once it gets into any part of your splunk environment, whether it's observable or enterprise,

Um so filter and route based on the value of your enterprise. So everything that we just saw in that pyramid, make sure that you're using those capabilities to filter only the logs that are operationally relevant for observable into observable indexes and anything that security needs into security indexes. And if it doesn't need to be in splunk, put it in s3, we have ways for you to suck that back into spunk if you need it later

Um and then access your data via search where it's stored. So if you need to search your metrics again, if you just need to observe what's going on, we're going to store those metrics in a place that gives you that high fidelity view. If you need to search something that was 30 days ago, that in your logs that you didn't realize was important, but you need it as part of your post mortem. We give you that ability too. So make sure that you're putting it where the value lies.

Now, let's talk about the new shiny toys from splunk. He knew this was coming. Um so splunk observability is a purpose-built part of the splunk cloud ecosystem for observable. So what this is doing is taking everything that we talked about with open telemetry once you've instrumented and implemented those best practices and makes it very, very easy to find what's going on in your environment, what needs what the root causes and who is responsible so quickly detecting and isolating issues anywhere in your environment.

What you're seeing here is splunk a pm. This is a near real time uh service map of all of your services. And again, this could be services in any part of your hy hybrid environment, right? So if you're running, you know google functions for, you know, particular front end or checkout service and you're using an aws uh database of some kind for like a cart inventory and you have some legacy thing that is somebody's maybe in a data center, you can visualize it all in this way and find what's going on and what's impacting upstream and downstream.

Um and this is u utilizing all of that tracing uh capability that we talked about. So you're able to say what was the relationship of this interaction across all of these services in my environment, regardless of where they live? Um and get visibility down to the code level. So we also allow you to do um always on code profiling for certain libraries to say, ok, what happened here? Um and let me take a look at what happened in the code um when it was, it was running to see if i had an issue there.

Um so measure how deployments impact other services, customers and business outcomes. So a lot of us that are here are really interested in. What is my service doing? What's my app doing? What's my microservice containerized environment doing a lot of times what gets missed is how is this not only impacting my upstream and downstream services, but how am i impacting the customers that pay my, my paycheck? How am i impacting the business and our ss and ss to other businesses? How does this make me look good to the business and show value to my business?

So using a lot of those code libraries that you're looking at, particularly for things like real user monitoring and front ends helps you contextualize that experience, not only from a, a technology service perspective, but a business service and a customer experience perspective.

Um and one of the things that ties these things together is that tagging and attribute. So this is why i was talking about it so much if you want to see what that real user monitoring was doing with that interaction with the customer had with your front end environment and how that relates to what's going on in your a pm back end. We allow you to do that because we have those tags and those attributes that give you that seamless interaction and then centralize observable practice to control costs and usage as you grow.

So what you're seeing here is our infrastructure out of the box dashboards, we make it very easy to find all of your infrastructure in one place. So you have one tool instead of a multitude of individual siloed tools to allow you to look at everything in one place in a much more cost-effective fashion and onboard that data a lot easier.

Wait, there's more. So i'm going to talk about some stuff that may not necessarily have to do with observable, but it is super duper cool. Um and it was recently announced by spl. So one of the things that we've, we talked about earlier and we are able to do now with splunk is search outside of splunk, meaning you don't have to index the data in splunk to search it with federated search for amazon s3, meaning if you send something to an s3 bucket and you or your security team or somebody else needs to pull that back, we allow you to do that now directly from your s3 buckets.

Um and it's one centralized interface, it's the spl that, that if you're already a split customer, you know how to use. Um so it's very, very simple and intuitive to get started and use. And it also is, you know, everybody's looking to reduce costs. Um it allows you to reduce costs by leveraging s3 keeping everything in that cost effective data store and pulling it in just when you need it at the time, you need it.

So i talked about a ton of stuff today. Um and i'm guessing at least a few of the people that have been scribbling furiously throughout this thing want to know what to do first thing right after this session, i'm going to be at the splunk booth down on level two. Come hang out with me, come say hi, i'm happy to talk your ear off or you can talk my ear off, but let's have a conversation and we've got a lot of other smart people down there as well and we have all of these things running in action and with real data and real services. So you can actually see what this looks like.

Um, start a free trial. We will give you all of this for 14 days for free. And if you need it longer, talk to your your splunk account rep, we can make sure that you get a pov or a poc explore splunk lantern for best practices and product tips. So, lantern is a community asset and artifact repository that allows you to ask questions, find if other people have asked those same questions and get answers in addition to looking at official documentation and then other related products.

So splunk application performance monitoring, splunk infrastructure monitoring, and then the observable adoption board are all components of what we talked about today, but it's only a subset of what we talked about today. So if you want to look at one aspect of this, in particular, if you're more interested in how are my applications working or i need to know what's going on with this architecture that i just moved from bare metal to two. Uh we have a lot more uh information about those components of observable cloud

Um on our documentation page and just at spunk.com and with that, i was flying through, i wanted to make sure that we have time for questions. So who's got questions?"

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫