SnapLogic and Amazon Redshift transform insights at Lumeris

最新推荐文章于 2024-09-27 13:55:26 发布

李白的朋友王维

最新推荐文章于 2024-09-27 13:55:26 发布

阅读量61

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/135120112

版权

Kisho: We have the mic. Hi guys. Are you able to hear us clearly? Ok. So I'm Kisho and I'm here to present about what we have done at Lumer. Along with me is my colleague, Pag. Um I'm having almost like 25 years of experience and I have been working in creating data strategies for multiple organizations in multiple sectors like uh finance, health care and education, etcetera. And I'm a multi cloud architect and uh I love always to be like, you know, a hands on guy.

Pag: Myself, I hope you can hear me myself. I am Associate Architect at Lumer. I've been in the industry of more or less on data management side for the last 20 years or so. And just a fun fact, I'm playing for US team as a K player. I'm a third ranked US player I haven't seen. So just, you know, we are here to talk about what Lumer is and what it does.

Kisho: Ok. Lumer is just not a healthcare company and we are like, you know, one of the leading providers for value based care under our belt. We, we have, we manage around 12 billion of medical spend across multiple clients. And we are there in more than 12 plus uh uh markets across US. Uh in the value based care in this, we have created uh it is not only uh just we are not creating a platform and giving it to the clients. We are there with them in the trenches in terms of both the risks and the reward, right? So we will be there in terms of managing end to end for the clients and we share the risk along with the clients. So that's the one of the usb of what we do in numerous, in the terms of data.

In every organization, we know we have huge amounts of data. Our company does make money by finding proper insights to figure out how the population is doing to the level of an individual patient. So when we are helping the health systems, we do look at the overall population as such in that particular health system as well as to the lowest level of a patient combination to a director to a doctor. And when the decisions are clear, right? Then, uh from this data, we try to derive as much information as possible about uh each individual as well as at the level of the group. And then we give specific insights to the patients and as well as to the doctors or the providers saying that, hey, how do you make them, how do you keep the people healthy so that, you know, they are not being using the like, you know, emergency cares or like, you know, surgeries that are required.

So we predict the data based on certain elements, we predict saying that, hey, this may be the high risk population. Why don't you guys take, do this abc d so that we can bring them from high risk to low risk members. So that is where you know, all our analytics and predictions do happen.

So once we have the like when we have the data, it's just like a raw diamond. And when we clearly get the insights about those particular people, it's just like a jump like you know, where the value is like, you know, 10 to 20 x. So similarly, the quicker we do it, the more lives we save.

So I mean, like I think most of the technology company, right, we have these are the common challenges that we go through. We have like like any other company we have also have an ecosystem where our systems are mostly on prem and we want to take it to the cloud and how do we and these are, I mean, with the modern architecture and modern technology that is in place, how do we utilize them to uh get, I mean, to uh to achieve the objectives? And then at the same time, right, what are the different challenges that we have?

So like most of the time in our case, right, when the analyst or data scientist says i need the data, i need it yesterday. But then what end up happening is being on print system, there is a lag or there is a delay in delivering the data to the data scientist. So those are the i mean then obviously as you see like uh on print system has challenges, like you submit a uh request to the technology team and saying, ok, load the data for me, it takes xyz a number of days to load the data by the time your opportunity has passed away, right? So all these objects to you and the challenges keeping in those challenges in mind. This is what the i mean when we came up with the idea about this framework, we have thought about all these challenges and objective keeping the objective in mind.

Pag: So now coming to the project fuse, ok. Now farag has explained what the challenges we have and this project fuse is like, you know, one of the major challenge that we have is the legacy systems on the on the on prem systems. So in order to find better insights, we need to bring liberate the data from on prem to cloud so that you know, we can do much more quicker insight, bringing out much faster one like if we need to bring a data set like in a few sets of tables, faster ticket needs to go, then the data people, you know, the data engineers are going to work on top of it, bring the load the data into the cloud and then the other people can start working on it.

And it used to take like, you know, from three weeks, 2 to 3 weeks to get this whole thing from all the way from development to the production. But now uh it it happens in less than 2 to 3 days. So which is like almost 10 x, the productivity that we are getting uh just to bring any data sets from there.

So in this framework is just not only just bringing the data from on prem and other different different sources which i'm going to show in the next slide. Uh it's going to be, it's not only going to do bring the data, it's going to do a lot of uh have built in functionality. So for example, you want to do an audit log if you want to do data validation checks. These are all like these are all the plugins that are there on top of this particular framework where it is like very much extendable.

And so that's going to help like for example, if you want to change the columns in the source to the target, like the standards in sql server are completely different, for example, in red shift, right. So we are going to this framework automatically gives the way to change the sql server standards use like the caps and in the red shift, we use small letters or a camel case or whatever, right? So it does all that mapping everything in built so that you know, just use put few metadata records and that's going to bring it, it's going to load the data from wherever is the source and whatever is the target.

So in this one, if you see here, like we have the uh this is the overall like in the slide where we have multiple data sources, like the application data, like an on premise sales force, other cloud based data. And then these are all the things which i'm saying, like say for example, in this data load, another issue is ok. My data load has happened today. Did it load successfully if any error is there, has it been informed to the incident management team?

So we have another plug in into this particular framework. Any errors are happening, it's going to take based on the complexity, whether it is a warning or whether it is a big complete failure, it's going to create incident tickets and it is going to create like based on the level like whether it needs to go to immediately all the support engineers or whatever it is, that entire framework is built into this particular thing.

Similarly, like we can go back and see what are all the logs, what are all the audit, these are all built in and another data validation is another plug in into this particular framework wherein we can set up uh in every table. What based on the kpis for each kp, we can configure what type of rules that we want to use to ensure that the data, whatever we have in source is matching with the target or any transformations that we have done on the way.

So these are like just i'm putting it in a different format here before you go there. I mean, another thing aspect of this framework is we are on a daily basis as of now, even though the data source is coming from on prem to the cloud, from the stream around half a billion, half a million rows, we are loading in 15 minutes or so including all tracking all these stats that we doing all those validation on the type of the data that gets trans transitioned from one place to another. It's all these processes is done in 15 minutes.

And on the top of that, once the data is loaded, if you want to load the dependency processes that needs to be loaded further down the line after the data loads are complete, those are also are tracked, are executed as part of this framework and it's all configurable.

So what we have majority of our data is running on the on prem like you know, the all regular transactional systems are still like, you know, running on on prem. So that's majority of them are in sql server. So that's the main one of the source for us. And the second one is like the cloud based, we get few applications are using post grace. We are bringing data from post grace, then dynamo db, then you know s3, we have sales force, we integrate some data from salesforce service. Now all these s platforms also are integrated into it.

So once we have the source and target configured any number of tables like just its a metadata, it's just keep inserting into metadata and you have the data whatever you want in your red shift. Or as of now, we are using like metadata. All the transactional things like for example, post grace or sql server can be used for setting up the metadata. And our our requirement is to bring the entire data into red shift. But this framework can be is being used by extending it into other transactional sources as well like sql server and post grace. Also people are using it to load some data on the flight.

So um as you can see, this is just a representation of single snap logic pipeline which does the job of the framework that we just talked about, which has covering all the objectives, which are getting rid of all the the roadblocks that we had. As you can see this is this framework is just a plug and play. As you can see it says, what is my source. And what is my target? You just the number of source and target increases, you just add a plug in there and the process will take care of it no more than that. So that is it, it gives you all these benefits while loading the data.

Now we talked about the framework. Now how do i use that framework? This basically gives me i have, let's say for example, all those different sources that we saw data coming in from all those sources. And then there is interdependency between the load. How do i configure that? So this is the workflow that you can see as easy as simple workflow that is built, which is using that framework. And this is what is loading over half a million rows. And it took me probably half a day to configure it. So that is how powerful that framework is and how easy it is to load that information.

So this is again, when we talk about the framework, right in meta data, what are the things that you can capture as you can see there is obviously you, you set up the list of jobs that you need to run. What are job related? Whether and what are the different entities that you are loading from the source and the target. And then after the the data load, what are the dependencies that i want to execute on the target side? After the data loads are complete? See all is all configuration. There is no i i can.

What we have done is this is to start with, we had a manual process to load this configuration. We have built another framework on the top of this which allows you to uh figure out what is my source and target. Understand what is the the data types on both sides and write to the configuration directly. So which takes the manual process out of the picture? And this again, towards the end of it, we have the error logs. What are the errors that we during the daily loads? Where are they? We also have the load logs? What time and how much time it took? How many rows it loaded? When did the process started when it's ended?

So this is the and then we are now building the dashboard on the top of it, which gives us more insight as to whether there is a changes needed to be done in the configuration, which makes the load run faster.

Ok. Yeah. So as a result of that this sorry, these are all those different benefits.

Ok. What we have also achieved by doing this is this configuration, not only is done by the engineering side or technology, we are giving this configuration in the hands of data scientists and analysts who can run, generate that configuration and load the data that they want from a three weeks period to a two minute process for a data scientist to load for the data that they want. The technology is out of the picture, they are able to run their process on their own.

And uh here are different best practices that we have followed. As you can see, we have approaches modular approach when we talk about the plug and play any no new source that comes in and then error handling, logging, monitoring, and the performance on a daily basis.

So just, you know, only one thing which i wanted to point out here is ok. We are talking so many aspects for this particular framework. Um people might think like, you know, it might have taken a year or a year and a half or two years to build this kind of framework, just any wild guess from the audience, like, you know, how much time did it took?

It would have taken 10 months. No, sorry. What was the question? How much time did it take us to come up with the framework? Just like, you know, it was developed, the main core was developed over a weekend. The main core was developed just over a weekend, that's it. And all the plugins everything, 1.5% developed only one resource and a half resource part time, right? We are doing other projects and we were able to develop this whole thing in less than a month and it's a production, it is production and it has been loading data for the past 9 to 10 months. And we are loading, as we said, like, you know, half a million rows on an average, like we are going all the way to like, you know, much more than that and that we are loading in 15 minutes.

So the main power of snap logic is to like, you know, make these kind of frameworks very, very, very fast. So thank you so much for giving us this opportunity. And if you have any questions, we'll be more than happy to take it. Thank you.

Yeah, we have about a minute left. If anyone else has a question, if not, of course, you can get the answer at the snap logic, which is closely, right, michael, right over there. No questions. None. You're gonna let us off that easy.

Ok. All right. Thank you for coming. I wanna thank these two rock stars, kesha and padre.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫