0 to 25 PB in one year

最新推荐文章于 2024-10-07 20:55:20 发布

李白的朋友王维

最新推荐文章于 2024-10-07 20:55:20 发布

阅读量94

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134836524

版权

we really appreciate it, re invent day four. so we know there's a lot of sessions that you've already attended and many more that you could attend right now. so thank you for being here to talk a little bit about kis life science and their journey from zero aws usage to 25 petabytes in less than a year.

and it's funny we were having breakfast yesterday, mason logged in the console and he said it's, it's no longer 25 joe, it's 29.4 petabytes. we didn't have time to update it.

so uh we think this is gonna be a great use of everyone's time before we hop into the goals of the session and what we're looking to accomplish and help you get out of the session. let's just take a step back and, and do some introductions.

so i'd like to introduce mason hensley, chief technology officer at kis life sciences. uh mason has former experience at various health tech companies, start ups and a number of public uh public companies today.

so mason received his biomedical engineering from vanderbilt university and mason. i just want to give you a compliment and props you've been a great partner to work with and he drove this technology transformation. this is his first large cloud transformation to aws. he did a really great job. so mason brief recognition for you

next david kathy, who's really my partner in crime. he's been in aws for two years as a senior solutions architect. over 30 years experience in technology across numerous industries such as broadcast media, transportation and ecommerce.

david's job is really to work with customers and provide technical guidance in architecting solutions. he has a number of specialties, some of them i want to call out there are with our storage technologies, which was perfect for everything that mason and kis life sciences was looking to accomplish

last and certainly least uh myself, joe, how i'm a principal uh of enterprise sales here at aws. i've been with aws for six years. um and really, i, you know, i'm focused on partnering with customers and helping them adopt aws technologies that are best to help them achieve their business outcomes for both their organization and most importantly their customers and in the case of caris patience.

so you're gonna hear patients and customers interchanged quite a bit today as far as a brief agenda and i'll go through this quickly, you know, we'll talk a little bit about the goal of the session and what we hope you'll get out of the session today.

uh mason's gonna provide an overview of who is caris life sciences. and then i'll quickly walk through the partnership summary, how we work backwards from kiss key objectives and some of the outcomes to date that we've been able to accomplish with caris.

david kathy is gonna walk through some of the the action plan and some of the technical diagrams and technologies that kri adopted to help make this transformation possible.

and then mason's gonna walk through some of the results that kris has achieved and lessons learned as cto who really led this transformation for kris.

and certainly we want to save time at the end if possible for q and a from the audience.

so as far as the goal of this session, it's really about you, we realize that many folks in the audience are new to amazon, maybe they've dipped their toe in the cloud or maybe they're looking at doing a significant cloud transformation with aws.

and we want to kind of walk through what that entails both from a people technology and operational transformation perspective.

and what i'm actually most excited about today is hearing directly from mason um who led this transformation. this was his first time doing it. he did a great job with it and he's really just gonna talk about lessons learned what went well and potentially opportunities when looking in the rearview mirror to make things go um even better moving forward.

so hopefully, um if any of you are in a similar situation to mason, this will help you mitigate any speed, speed bumps or pote potential risks with a large technology transformation.

and then finally, we want to make you aware of some of the partnership possibilities, both with aws and with amazon overall

mason, i'm gonna turn it over to you briefly for a overview of car life sciences.

sure. so first off could not have done this without the rest of our leadership team uh from the long term vision of our ceo and president, our chief operating officer. uh it really took a village.

um we had team members collaborating from finance as we're trying to build maturity in our uh fin ops growing muscle there. uh but at the core of it, and i know i'm glossing over this for a lot of you.

uh caris takes uh cancer biopsies, runs full genome sequencing on them and provides feedback to oncologists to improve the treatment and outcomes of patients. we primarily do this with tissue biopsies, but we've been expanding into a new newer field called liquid biopsies.

and as we have been doing this over the years, we have been working with our data scientists and identifying gaps in treatment opportunities for patients. so that has really invigorated a technology transformation to improve access to our data.

you'll see here in a little bit, we'll show a picture of a tape fault that was a constant frustration for our data science teams and moving into aws is unlocked data to become more available for us.

so anything else on this there, joe?

oh well said, awesome talking a little bit through about what caris was hoping to accomplish with this project.

um really, it came down to data collaboration and innovation both externally by democratizing data in a safe manner and giving the right folks internally at caris access to the right data at the right times.

um so that they could boost r and d research and development capabilities to save patient lives. and then from an external standpoint case pic caris has really the best in class precision oncology, alliance p a alliance with leading pharma and academic institutions with the sole purpose of working together to develop best in class therapeutics to save patient lives.

so they wanted to, you know, collaborate with those folks as well. from a data perspective, from a cost savings perspective, we saw an opportunity to provide a pretty significant ro i to caris and make the bottom line more attractive.

and from a scale perspective, when you look at kiss business, they are growing very rapidly. and what that means in their business is more patient samples per day and ultimately more tests run throughout a a specific year or time frame.

and it was really important to kiss that it and infrastructure resources would not be a bottleneck to accepting patient samples and saving more patient lives.

so how do we engage with kas at amazon? we have a principle called working backwards. what's that mean? really? what it, what it comes down to is figuring out the most key important objectives to kas.

and in that in carris case, from their founder to their president to their entire c it all comes down to saving patient lives and helping patients.

so we wanted to work backwards from the patient and then find ways that from a strategic organization standpoint, we could make kris's business better and, and then last, but certainly not least the enabling technologies to make this all possible.

and david's gonna talk about that quite a bit from an executive alignment standpoint. i can't stress how important this is to any organization having top to top alignment between aws and kis to make sure that we're marching towards the same objective and climbing the right hill together.

and you said it really well, mason kiss entire leadership team was really bought into this from their founder to their president to their coo and that made this all possible.

from a strategic collaboration standpoint, we saw an opportunity to help caris democratize data both internally but also externally with their precision oncology alliance to help advance patient therapeutics through amazon data exchange

from a commercial construct perspective. we worked with caris to understand their current landscape and put together the right economic package to help return the right roi for their business.

i think that's something important for everyone in this audience today. you know, working with the right line of business executives, as well as all of you in the room, getting together to make sure that you're putting together the right package for your business, not just for your it organization.

and then from a migration horsepower perspective, you know, there was an opportunity to help come alongside caris and accelerate this project with our professional services organization.

and there's a number of ways you can do that both with aws and our third party systems integrators.

so last slide for me, what were the outcomes that we've accomplished to date? and i say to date because we're just getting started. this is one year, all these things that we've accomplished and we think for years and years to come, there's so much more that we can do together.

but um i talked about it already, but it's enhancing innovation both internally getting data democratized to the right folks within the organization to help with research and develop capabilities and, and you know, giving patients the best tests possible, but also externally with their pre precision oncology alliance and having the right data collaboration there.

and then finally, you know, kas is their blood pipeline test. it's growing rapidly. so we've worked with caris to move that blood pipeline to the cloud in aw s so that infrastructure and scale is no longer a constraint.

and as more tests are consumed and as patients crave this test, more infrastructure will no longer be a bottleneck and then from a bottom line contribution perspective, i've never heard a cfo say we want our bottom line to look worse.

we worked with them to, to find the right economic package to help provide them the right return on investment.

so david, i'll pass it over to you next for an action plan and how you help engage from a technology standpoint.

well, joe teed it up really well, getting the customer involved and, and getting, getting them wanting to move to aws and then it was just, you know, up to me and, and the, the aws team behind me to actually make it happen.

now, one of the things we had is we have multiple challenges to deal with. they had a constant data growth of their genomic data. i mean, i mean, these files that come in from the jic sequencers uh are, are terabytes of uh what tens hundreds of terabytes a day practically sometimes.

so they, there's a huge amount of data that's incoming, but they had a limited capacity on their gp fs dis drives to do it. so when you have that much data coming in, you can only keep a small fraction of it online.

and that includes not only incoming data but data that the data scientists working in. so they were constantly having to shuffle data, you know, coming in off the sequencers off the tape and then trying to keep track of how much they can limit the data scientists to use as well.

there wasn't very much room to expand the data center. so they're running out of room there. even though they had an increasing footprint of data in with, with, with their their environment, their tape library was approaching some of the limits.

they already had several tape libraries, 32 drives capable with them, somewhere around 6000 tapes worth of, of data and trying to manage all that with their, with their data scientists keeping data coming in, keeping, keeping data for the data scientists managing that was really becoming a major headache for them.

there's also the latency of getting the genomic data for the data scientists. you know, as they said, their, their main thing is trying to save lives. the best way to save lives is to have the data available as soon as possible.

when a data scientist wanted to have a cohort of data that they wanted to use for doing some analysis for sometimes it would take up to weeks because they would have uh the data would be scattered across multiple tapes and it would take weeks for, for them to get all those tapes, downloaded, retrieved to get the files that they needed.

not only just for that data scientist, but they're trying to do that for multiple data scientists at the same time. so there's a flurry of activity with the tape drives and, and the amount of space they have available on online for, for doing an actual analysis.

so again, it was a lot of confusion sometimes and it was becoming a major headache for them. and really there was a big latency for their data scientists to be able to do some of their work.

so our constraints we had to deal with is the network topology. they have multiple data centers and labs and they have network connections between those they have interconnects there that they were, they were dealing with transferring data from some of the places where they do the jic sequencing in the actual hardware with info seeks and no or novas and alumina sequencers.

so they had that data coming in

There's also places where they were doing some of the research, they had to transfer data. So there's, you know, there's all that network transfer going on. So their network bandwidth, both internal and external was definitely a factor we had to look at also their security and devops staff.

They have some good staff. They've been learning a lot in the last year, but they were new to AWS. They did not know some of the things they could do, they did not know some of the things they should do and they needed help in order to do that.

And also we had to interoperate with their current functions. We had to make this migration happen in time with what they're doing. Now, we could not disrupt their activity. They can't shut down for a while while they move all the data and retool everything and migrate, we had to migrate in place.

And so that would definitely creating some some challenges in order to make sure this would work seamlessly.

The genomic data, as I mentioned, was fragmented across multiple tapes. If you had a single the when you go through and create a flow cell, which is a genomic sequence of a particular study that they've done. Uh that tape, that information may be across multiple tape drives. So it's not always easy to say, I want to get this particular data and grab one tape drive, you might have to grab 1020 tape drives to do it. And that was just an artifact that kind of grew out of the way that they built it, you know, built the system originally. And that was one of the challenges we had to deal with whenever we pull data down, we had to make sure that we could properly track and validate. Not just that we got the individual files, but the complete set of files for an individual flow cell were complete before we could actually retire some of the drives and tapes that were involved with it. So that definitely added some extra challenge to it.

So how can we transfer this data? One of the things that they have for their own data protection is they had data in Iron Mountain, Iron Mountain was asked to come up with a solution of how can we move all this tape data that we've, that we've sent to you? How can we move it to AWS into S3 or anywhere in AWS? They came up with a good proposal. There was nothing wrong with it just really in, in light of all the other options that we had. It didn't really align with some of their long term needs and views. So we kind of had to skip that one.

We had the AWS Snow fling. Well, the AWS Snow family is very good for moving large amounts of data very quickly. The only problem is when you have 25 petabytes worth of data. Let's do the math. The, the Snow family device is 80 terabytes. The 210 wasn't available yet. That's 320 snow devices that would have to be cycled through their data center in order to migrate this data. Did I mention that they have some staff issues sometimes with trying to make sure they do some of this transformation. So trying to manage 320 snow devices in a cycle. That's quite a bucket brigade. They, they'd have to have going on. Uh, that just was not going to be workable for them really. Uh also back to the tracking. If you did, if you get a tape done, uh you would not have to be, you just really have to work to make sure you track that properly. So the Snow family was not really going to be a good option.

So that left Data Sync. Data Sync provides uh you know, very good tracking and logging of all the data that's being transferred so that they know where it's going and they know they know it's gotten there securely and completely. So Data Sync was the way to go and it required some network upgrades, but that was already in scope for the project because they knew they were increasing their data loads, they were increasing their network needs. So they were going to have to upgrade the network anyway. So this provided a good opportunity and a good impetus to actually get those upgrades in play.

So the solution we ended up coming up with them for was Amazon Simple Storage Service S3. We were able to use Glacier Instant Retrieval that gave them very fast, instant access. But it was also very cost effective for the large amount of data that they had a s data sync. So that provided them with their managed transfer, gave them logging, gave them validation, gave them a way of going through and analyzing if there were any problems, they could go back and identify what they were and restart appropriate tasks and do that. So it's much simpler for their data, for their data team and their, their um the security people. There are dev ops people, really a lot of great collaboration between all those different groups in order to get uh in order to get professional services started understanding the problem, understanding the things that needed to happen and make sure that all the pieces that were needed were in place.

The professional services team went through Control Tower and set up all the proper landing zones once they, when they knew what they needed to build out and that worked out very well. Uh their dev ops team was, was in, was in Terraform. So we built out a large number of Terraform templates to go through and build SageMaker environments when, when they were needed to make sure the buckets were properly provisioned and set up and secured. So everything was all controlled through Terraform to get the environment set up.

Then they went through all the workflows and they helped manage the workflows, find out what they were doing, what all the individual processes were and started going through and interjecting some of the ADBs tools such as Data Sync and other tools uh Glue Athena to make sure that that entire pipeline of getting data off the, off the Novos and limit the sequencers into S3 worked as soon as seamless as possible. And also that was kind of replacing some of the data moving to tape. So the final phase was to make sure that that now that we got to going to S3, we can stop moving to tape and alleviate the problem of the tape libraries filling up.

Now migration phase two, we had to do some math 25 petabytes of data in a year. You start, you know, this is one of our standard Snow, uh Snow and Data Sync slides. D Kelly telling you what should you use? Did you use Snow or should you use Data Sync. So we kind of extrapolating on this and their 10 gigabit pipeline, they had, it was gonna take 232 days to migrate data, 25 petabytes of data or more or actually just 20 petabytes uh into S3. That's sort of a year. It's doable. But that's assuming you have full access to that 10 gigabit pipe. We don't have access to that full 10 gigabit type gigabit type. We're gonna have to do something a little bit different, but that's in their budget. That's, that was one of the things that we're looking at doing.

So we have the data on tape. They had 32 LTO eight tape drives. Uh the LTA tape drives can do 2.5 gigabits a second uh at max throughput, this is uncompressed data. So that's the effective throughput of it. So we knew they had 32 tape drives. We're good on that. We have bandwidth in order to move 25 petabytes of data within a year.

We also looked at the, at the uh drives that they had freed up by not having as much stuff uh going up to tape, we could free up some space on their drive. Uh also uh as part of the process of, of figuring out how do we kind of defragment some of this data on tape. Uh we, the uh the HPC vendor decided, hey, let us help out, we'll have you get some more disk space on your, so you have less, you have more room to play around with and free up some of that impact. So they were able to upgrade the amount of online storage they had as well and, and had some extra capacity on their LTO eight in order to migrate data. So that was a big help.

They still had some problems though. They had to update some of their network infrastructure. Some of the, some of the hosts only had a 10 gigabits uh nick in them. So some of those had to be upgraded, uh top of rack switches, some of those were a little slower. They had to upgrade some of those, their firewall needed to be upgraded because it didn't support higher speeds. And I, I got a typo here. It's actually a 60 gigabit internet, not 6660 megabit internet. Uh but they, they upgraded their internet capacity so that they have more headroom. And I think right now they're using, you're using about 3040 gig a second uh uh fairly consistently now. So that upgrade there would prove very valuable along this way.

They have still occasionally found some bottlenecks in the networking capability. Uh we've had to add extra uh a to bs data sync tasks occasionally to make sure they, they work. Uh sometimes we've had to shut a data sync task down because some of the network didn't balance right for some of the other internal operations that are going on. So there's still some issues we're working out to make sure all the networking capability works out. But again, we got a good relationship with, with uh Caris and we're all, we're all on board making sure this happens. And we've been making great success in keeping a good forward motion on this.

So the migration results uh starting back in December of 2022 we had almost no footprint within AWS on S3. And you can see that over time we have migrated up and right now we're at about 29.3 21.4 petabytes of data. Uh we'll probably actually get 30 petabytes of data before the end of the year is up at the rate that they're still continuing to move next week, next week, next, next week will be a good week. There is a, you can in the chart, there is a couple of dips during the process. We did identify that there were cases where there was some data that was redundant that they didn't need to keep, that was already in another another bucket. So some of that was able to be purged during some of the upgrades. They, they found out that there was some data that was, that was duplicated. So there was multiple versions. So they went through a purge operation on that also have identified that since they were doing some of these stoppings of these data sync tasks. Some of the multipart uploads of these large files were incomplete. So they needed to go through and do some life cycle rules to make sure that those incomplete uploads were purged out as well. So they again, as they improved their life cycle and improved their operation and their knowledge of AWS, we need to get them some increased performance and, and savings on that. So we got the, we got the data moved and we're, we're moving good.

So I'll pass it over to Mason to talk about how the result, the results that have been achieved and how they worked for them.

Sure. And I'll touch on this graph real quick. I know says uh two different labels. One's for our total storage. We're approaching 30 petabytes. She also knows the object count, which is about a billion files. We do a lot of work with the professional services team to build a data pipeline to help manage and track the inventory on prem and in AWS. I think, ok, you just do a comparison but we didn't just have data in our labs. We had data in data center and merging all that together across the different file systems. Uh was at first a very daunting task, but the team helped make it simple and very consumable by team members across our organization. Uh you'll see here in a minute that some of that data we ended up, I'll show a graph in a minute. Uh we end up having some executive dashboards to convey the progress here. But the main thing is we're trying to improve patient lives and outcomes.

So I mentioned earlier that our data scientists, we have a couple here today. Uh they would start projects to identify gaps in treatment where patients were struggling to have good outcomes through the treatment their oncologists were giving. And we've been able to identify gaps based on specific mutations in patients cancer where there's not a good drug on the market. So our team's been identifying some of these gaps. Uh and then we've been bringing those to pharmaceutical companies and saying, hey, here's a gap in treatment for patients with small cell lung, a certain mutation in small cell lung cancer or a certain type of mutation in pancreatic cancer. And that has just been a, you know, a great growth opportunity in terms of collaboration with the pharmaceutical companies. It's something that we had dabbled in a little bit previously, but moving into AWS uh has really helped fuel that we announced we had a press release last week. Uh I think it was on Monday that it finally got out the door that we have a partnership with Moderna five year partnership where we are helping show them opportunities for drugs and oncology space. And that's gonna help hopefully reduce the cost of discovering these drugs and getting drugs to market to help patients. If you think back uh years, there's kind of the proverbial tens of thousands of molecules being created in labs and then they're looking for opportunities and how they can treat patients. We're going the opposite direction. Saying here is a problem. And from those mutations, we're able to determine the DNA and see how those pro what proteins are affected. And the pharmaceutical companies are able to uh develop pinpoint targets for them uh on the cost savings, you know, um we over the years have spent enormous amount of capex uh building out uh our data centers uh infrastructure in our labs. And in some cases, we showed the, the tape vault in one of the previous pictures, it was built out to the walls, there was no room for expansion and that was a forcing function that took us to a fork in the road and said, hey, are we going physically move this infrastructure or do we have other opportunities for further scale, limitless scale? Um and that's helped us too from the cost saving perspective because as we've looked at our environments, when we've been making these large capital uh capex outlays, it's hard to quantify the exact cost that we can attribute to a test. Uh now, as we've been moving to uh migrate our bioinformatics pipelines into AWS, we're able to see a cost down to the penny for how much it costs uh to identify the mutations in some of these sequences here in this graph uh sorry, I uh redacted some of the numbers here, but we had some color coding, uh orange denoted data then moved into AWS dark blue was data that moved to dis light blue is data that was on tape. And this is a graph from uh earlier in the year. But you can see here that um we had quite a bit of data still on tape. And one of the problems also was not a problem with Snowball, but a problem with the situation we were in was that even if we were bringing Snowballs to our facility, the data wasn't coming off tape fast enough. So we'd have some kind of weight, fill it up. It wouldn't be uh maximizing that network connection between the Snowball devices. Uh we were able to shift our workloads away from that tape fault. We had a situation, we were essentially pouring water into the bucket and then someone else was trying to empty water on the other side of it. So that helped us.

David mentioned that we ended up buying quite a bit of disk that allowed us to optimize the reads out of it and accelerate to where we are today where we've over the last three weeks have been averaging about a petabyte and a half a week being uploaded over our network connection.

I want to circle back the work we did at professional services really helped accelerate building out that landing zone and was instrumental in the comfort of our organization moving into a new environment. There was a lot of comfort in knowing that our data was in the data center or we had data in our lab facilities and it was a, you know, big leap of faith for a lot of team members, but you know, it clicked with them as we were bringing in experts from the proserve team, as we dove into specific topics, David and team would bring in experts from outside of our core proserve team from different product teams around AWS that was instrumental in us accelerating the work we did.

And here's our uh snapshot from yesterday. We weren't able to update the title of the presentation. We had to lock that uh a little over a month ago. But uh it's, it's been amazing to work with our teams uh both within Caris and then the teams at AWS to be identifying bottlenecks, troubleshooting and accelerating. It's been amazing to see, see here x 10 and lessons learned where we shot there, lessons learned in advice.

Um we had we built dashboards where you showed the migration that was uh made available in our core executive dashboards that people are looking at uh alongside our metrics that are going through the lab that transparency helped show progress, you know, in a lot of major it software development projects, you know, it is really easy to find yourself in a situation where it's a month away two months away to show value, but the team was great and showing value in increases in progress on a daily basis. Uh our dashboards kind of from our on prem, do our on prem inventories get updated every night. Our S3 inventories get updated every night. We had the report get sent out every morning. And it was one of the things i, every morning i woke up and was waiting, uh, actually came about an hour after i woke up based on the timing. But it was always something i was waiting for that buzz on my phone to see the updated progress.

I'd say another thing uh lessons learned is, you know, engage your finance teams if you're in a very capex heavy mindset. Um there's definitely a, a lot of changes in thought budget, you know, some budgets are set years in advance and if it's really capex heavy, uh it's gonna take a lot of education and discussion around moving into a more opex environment, that flexibility has been great to see. But there's always that transition point within some teams. Uh we have great cfo that really wrapped his arms around that our finance team uh is salivating it, being able to see per sample cost that they can attribute back. Um the transparency that we're able to provide them on different departments, different projects and how much they're costing and roll that up is something that is continuing to surprise and delight a lot of our team members.

Um anything else I missed?

Well, said, awesome. Well, said, so we've got, it's been great working with you. Thank you. It has been great. We got about uh you know, 24 minutes left. So i don't feel like you need to stay for q and a. Uh but if there are questions, it's very hard for us to see hands in the crowd with the lights in our face. But we're uh we'll take q and a, the way that will work is if you just kind of in an orderly fashion, just yell at your question. We're fine with that. I'll repeat the question so that the video can hear it and then whether

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫