Omics innovation with AWS HealthOmics: Amgen’s path to faster results

最新推荐文章于 2024-07-23 14:46:56 发布

李白的朋友王维

最新推荐文章于 2024-07-23 14:46:56 发布

阅读量118

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134810469

版权

Good morning, everybody and welcome to re invent. My name is Ariella Sasson. I'm a Principal Solutions Architect at AWS supporting customers across the healthcare and life sciences. I am joined today by ETA and today we are going to show you how to accelerate your bioinformatics analysis at scale.

So it's been a pretty amazing year for AWS Healthomics. We've been very excited by how our customers have been using Health um Healthomics to support their work flows. It reminds us why we built this service with our customers needs in mind.

Today, we're going to spend some time most of our time together with the, talking about his success at Amgen. But we're gonna spend a little bit of time reviewing the year and all the amazing things that's happened so far and then laying the foundation by talking a little bit about the technical aspects of the service to, to lay the foundation for what it is gonna talk about later in it today.

So here we go at AWS. It always starts with our customers needs in mind and they have been doing some amazing things with, with the service. What reassures me and the service team that we're on the right path is that customers across many diverse areas of healthcare and life sciences are finding the value in their own way. Whether it's sequencing clinical diagnostics, academic research centers or pharma r and d, we're helping them address a multitude of problems.

How do i, how do i secure my data at a low cost, make it easily discoverable and share it. How can i collaborate with other research groups? How can i scale up and achieve the throughput required for my clinical diagnostics tests while still achieving my turnaround times? And how do i a accelerate protein structure prediction or molecular dynamics at scale for my scientists who need them? Ultimately, what we at aws try to do is focus on removing the undifferentiated heavy lift so you can focus on your science.

Aws has over 200 services and we've heard a bunch about a bunch of new services this morning in adam's keynote and we're gonna probably hear about some new services and exciting features coming out all week. But what i want to highlight here is that aws is making investments in several areas of the healthcare and life sciences.

So for clinical records, we have um aws health lake amazon, comprehend medical and our newly announced aws health scribe for omics. We have healthomics which will be diving in deeper today. And for imaging, we have the recent general availability of aws imaging and what we've learned talking with our customers is while this data isn't all the data they have, it is the vast majority of it and they all have very similar patterns.

Customers need to extract metadata and do some normalization. Create etls for fire data which is heavily nested and then query it turn semi structure variant data and make that query and finally extract key metadata from dicom images. There are all also aspects around how we compute to generate insights on this data. Knowing all this our goal is to make it easier for you.

We wa we want to make it easier to bring the tools of your choice, your data, combine it with other data modalities for analysis. We're working hard to simplify this, bring it all together to generate novel insights.

Aws healthomics was born on the re invent stage about a year ago in adams 2022 re invent keynote. We were known as amazon omics. Back then, we've had a lot of things happen this year including a mini identity crisis resulting in a name change. But it's been a really exciting year.

Aws healthomics has been investing in three main areas for our customers. First trying to solve the problem of how do i store large amounts of data cheaply and efficiently while still making it easily discoverable. This underpins all of our investments in the sequence stores. We can provide some pretty compelling uh cost optimizations for customers with um very common access patterns for fast cues, bams u, bams and crowns.

On the other side, we hear a lot of our customers asking us that they need to structure and organize and analyze their variant data at scale and combine it with other modalities. That's where we're making investments in zero etl transformations for the variant and annotation stores with a single api call. You can take semi structured data and make it structured and available to query.

And the last piece required to generate insights, the bioinformatics compute we offer manage workflows. We provide a couple of different capabilities there to make it easy for you to process your data. But what i wanna leave you with here today is that if you don't see it in our service, please ask, we want to build on your behalf to make it easier. We wanna make sure to continue to innovate and deliver in that area and it isn't just services and service features.

It's really important for us to think about how our customers consume all of aws and make that easier too. In addition to actual service features, our team, both the service team and our team of essays have been developing a lot of prescri prescriptive guidance in that area.

One really good example um is how to make bioinformatics analysis easier to run. The common pattern is you find a repo, you clone a repo and you register it, you register the workflow with healthomics. So we've been working on this, bringing easily configurable workloads that can be brought easily into healthomics with some uh sim very simply whether it's gatk best practices and f core pipelines vary in effect predictor or alpha or esm fold. There's a lot here and it's not just aws.

Our partners like senton and via and element biosciences have done the same and what's been the result our customers have been doing amazing things while i can't go into details on everything we've seen and we're gonna hear from et in a little bit about the amazing work he's done at amgen. I can provide the highlight reel of the stats to help you understand how our customers are using aws healthomics to achieve their goals.

Number one is scale, whether it's being able to support hundreds of concurrent runs or thousands of concurrent tasks spinning up hundreds of gp us or tens of thousands of vcp us in a single set of analyses to storing petabytes of data in their variant store. We have been very pleased at the diversity of ways our customers have been using the service to advance their science.

So now now is the portion of the talk where we're gonna go slightly deeper um into the technical aspects of healthomics. I promise not too deep, just a few more arrows and some architecture diagrams to better understand how it all works together for us.

When we think about aws healthomics, it's all about the end to end journey for our customer. You store your sequence data, you process your sequence data and then you take the variants that are outputs of your whole genome sequence, let's say, and then you store it and easily analyze it with athena or emr the light blue lines. You see here are intentional. When we think about the service, we want it to work seamlessly together, but we built it in a highly modular fashion. You can use one or all three services whatever fits your needs and we want to support you in your particular workloads.

This modularity is one of the key takeaways to keep in mind when you're starting to explore healthomics as with all aws services, security is our top priority. We never lost sight of this guiding principle as we built, healthomics, aws healthomics was built from the ground up to support and enable security guiding users to security best practices such as resources isolation.

Every run in healthomics has its own dedicated resources per run customer manage permissions. Healthomics only uses the permissions and resources that you explicitly granted end to end encryption, data provenance and logging. So you can know who ran what, where and when and finally compliance.

Aws healthomics is h ip a eligible high trust certified gdpr ready fed ramp and we have fed ramp, moderate and iso certifications to support your compliance needs storage. We have two stores that very often work hand in hand. The reference store lets you um put all your references in a sing single location and tag them for easy identification. The sequence store supports fast chooses bams and lu bams and crams with their associated metadata.

Each file once it's loaded is immutable and of course, this data can always be easily exported or copied to the local file system with a simple api call analytics. The simplest way to think about this is manage etl and there are no compute costs for you. You can bring your vcfsg pc fs and annotation sources. And from there, it's a simple api call to start importing and transforming your data for querying.

One key benefit to remember that we've built into this service is the separation of the variant and annotation sources. They can be independent and have independent update cycles. So you can update your annotation source and never have to re annotate. A vcf. All you need to do is upload your new version and point your athena query to the new version of your annotation store.

We've talked about storage, we've talked about analytics what's left. But the compute there are two modes of bioinformatics workflows. There is private workflows which gives you full flexibility. You own all aspects of the workflow from the pipeline to the docker container to the workflow scripts in the ready to run mode, aws healthomics um manages it all for you all for a fixed price.

In either case, either of these work flow, either of these workflows can be run with one easy api call. And our customers are running a diverse set of bioinformatics pipelines. They run the gamut from a variety of sequencing pipeline to enable precision medicine to protein folding, to help drug discovery to molecular dynamics, to better understand the molecule of interest. And this is just some of the bread that our customers are using workflows for we're going to hear from et very shortly about how he used private workflows to enable his scientists to accelerate innovation.

There are three main steps to workflow. One, build your tools by containerizing them and putting them in your ecr, build your workflow in whittle next flow or cwl register it with healthomics and finally bring your data, whether it's in healthomics storage or s3 and run your workflow. You pay for the tasks and file systems you use while it's running. Not during the spin of times.

If you want to get started with workflows right away or want to test aws healthomics workflows. Think about ready to run. It's the fastest way. Think about this as your happy path. Bring your data, kick off a run all for a set price.

Aws healthomics currently supports 35 ready to run workflows if ready to run is working great. But you need to run bigger files or longer protein sequences. We can help there too. We have blogs and tools to help turn a ready to run workflow into a private workflow so you can run whatever data you need.

And now we've come to the part i know we've all been waiting for. I'm delighted to introduce eai so we can tell you about the amazing work he's done at amgen to accelerate innovation.

Thanks arriella. So my name is res i have a phd in human genetics and i'm a senior scientist in computational biology and bioinformatics technologies at amgen. And i'm very excited to be here today to share with you the work that we've done so far using aws healthomics.

So who are we at amgen? Our mission is to serve patients, especially those who are suffering from serious illnesses. This mission defines the impact amgen has in the world and it informs almost every decision that we make. Currently, we have 27 fda approved medicines in our portfolio and around two thirds of these are first in class innovative medicines for serious illnesses. Our pipeline focuses on high quality candidates that demonstrate large clinically relevant effects. And around three quarters of the molecules in our pipeline represent potential first in class medicines for serious diseases in which new treatments are very much needed.

Here are a few, a few important facts and a little bit of history about us as a company, we were founded in 1980 in the early days of the biotech industry with just a handful of staff in 1000 oaks, california and that location remains our headquarters today. Since then, we've come a very long way today, we have more than 25,000 staff operating in around 100 countries globally

And that global reach enables us to serve approximately 10 million patients annually with our medicines with a total revenue of a little over 26 billion in 2022. We rolled $4.4 billion of that right back into our research and development efforts. And I'm also very proud to say we provided $2.2 billion of our medicines at no cost to eligible underinsured or uninsured patients around the world through the Amgen Safety Net Foundation.

At Amgen, we focus our efforts on three therapeutic areas, namely oncology, inflammation and general medicine. And particularly we like to focus on illnesses with significant unmet need where much needed treatments have the potential to impact millions of people globally. So we're talking about things like cancer, cardiovascular diseases, osteoporosis, severe asthma, rheumatoid arthritis, and all sorts of other inflammatory diseases.

And when it comes to actually doing this and how we do our science and actually make sure we're effective, we want to ensure that we follow some key strategic R&D priorities. First, we want to improve our success rates in our efforts to discover molecules and develop medicines that matter for patients. And in order to do that, we have to understand biology both from a disease perspective, as well as from an individual target and overall health perspective. We focus our strengths on the most promising technologies and activities including artificial intelligence and machine learning, precision medicine and innovative clinical trials.

Secondly, we wanna reduce our cycle times to ensure that we can get our new medicines to the patients who need them faster than ever. Speed to patients for us is mission critical. And hence we pursue innovation in clinical trials including improving clinical trial diversity, applying human data to enable precision medicine approaches to provide faster and more successful routes to approval.

And then lastly and super, super importantly for me, for every patient a solution thinking beyond purely FDA and other regulatory body approvals, and actually thinking about how do we improve access to and use of our medications for all patients all around the world. And in pursuit of that, we continue to make use of human and real world data to build our relationships with partners, regulators and payers to enable broader and faster access to our therapies.

So kind of putting it all together, what are we trying to do with our science at Amgen? We want to accelerate scientific research to bolster the fight against the world's toughest diseases. And indeed, Amgen's research and development strategic vision includes a world where science defeats all disease. To support this vision, we aim to discover, develop and deliver life-changing medicines to patients everywhere.

As part of this, I work as a biologist within Amgen's Computational Biology and Bioinformatics Technologies unit helping enable a wide variety of omics, data analysis for Amgen scientists across functions. With the union of technology and biotechnology, we really are at a hinge moment in drug discovery and development.

Amgen's scientific success is rooted in its unique capabilities that include cutting edge tools like artificial intelligence, generative biology, precision medicine and human data, as well as new clinical trial innovations to increase speed and accuracy as we discover and develop new types of medicines. And really none of this would be possible without our diverse workforce and the partnerships that help us innovate and drive progress. And so I'm very pleased to be here today to discuss our partnership that we've had with AWS.

But before we dive into that, just a quick little overview, what's the point of computational biology at a place like Amgen or any biotech or biopharma? And, you know, it's kind of funny sometimes I get people who are a little confused when I tell them I'm a computational biologist. Certainly more so if they haven't really heard of that as much, right? But I'll get some people who say to me, I thought those were kind of the opposite. I thought computers and biology were diametrically opposed to each other.

And then I'll get other people maybe with a little more knowledge about it, but they still think when I say computational biologist that must mean I'm working on some kind of synthetic biology. Um, but no, I, I do not. I'm sure as many of you know, uh, it turns out these days, especially you need computation for almost everything when it comes to biology, especially when you start to consider the scale at which we are collecting data.

So to develop the best therapies, we really need a deep understanding of biology, both target, which is to say individual gene or particular molecule as well as the disease biology. And, and both of these understandings really hinge upon the generation management and analysis of many, many different types and vast swaths of molecular and omics data. And both storage and computation on these data are nontrivial and critical to help us make useful inferences for drug discovery.

And what I find really fascinating is that this really isn't limited to just one place in our pipeline. Uh it really comes into play in almost every stage of drug discovery from protein manufacturing to target discovery and truly, truly everything in between. So computation matters a lot for us.

Um and so now I want to talk specifically about a couple of use cases that we had before I get into some of the nitty gritty details of what we're doing with AWS Healthomics and provide some guidance for you for the future.

So here on the left, I'm demonstrating a particular use case we had where we use the nf-core cut and run pipeline to analyze some data. This is an interesting example because here what we had was wet lab scientists who wanted to ascertain which antibodies may be best for going after a particular target for epigenetics. And those of you who are familiar with epigenetics may be aware of the fact that these antibodies tend to be notoriously finicky and promiscuous so it's really important for you to be able to make meaningful inferences, to pick the best one, the most accurate one that you can.

Um and so here we have the availability to quickly analyze these data from multiple antibodies looking at the same target on AWS Healthomics. Use of this pipeline and platform probably cut down what would have taken me about a week into more like a day. And with that, I was able to quickly come back to the wet lab scientists and say for each target you're interested in here are the best antibodies based on the enrichments at genes and genomic locations.

On the right here, I'm just listing out some of the implemented pipelines that we already have in our private workflows on AWS Healthomics. These are the data types but each of the pipelines in this case are from nf-core and I'll get into that a little bit more in a few slides. But nf-core is a really good resource for looking at nextflow bioinformatics pipelines and really thinking about which ones would work for you best.

And in addition to these currently implemented pipelines, we also have plans to migrate another 30 or so both nf-core as well as custom internal pipelines, given the utility that we've seen so far with this platform.

So a bit about the history of this particular problem because it's not as if we weren't working on omics at all prior to AWS Healthomics existing. We were, we, we've needed to analyze omics data for, for long before this service was announced and have done so for a long time.

Uh and our prior solution for omics analysis, it also utilized AWS, it was powerful and effective in some ways. However, there were certainly downsides. It wasn't as easy to self-service. We had to have our IT come and kind of set things up. Uh it was very difficult to troubleshoot if anything doesn't go smoothly at all. Uh and I'm sure many of you have experience with bioinformatics know that it rarely do things go smoothly on the first try.

Um it would also lead to repeated efforts much of the time to create bespoke environments and kind of have redundant effort and less transparency in the analysis because it wasn't as easy and centralized to see everything.

And so for us, AWS Healthomics is helping address each of these problems and represents just another way that we're staying at the cutting edge of cloud computing to accelerate our research and help deliver to patients faster because that's our driving mission.

So why are we using AWS Healthomics as opposed to some other potential solution here? In short, it's because we're focused on discovery, not IT. What this enables us to do is drive our understanding of biology, target and disease. Ultimately informing various business decisions for a company of Amgen's size, having everything in one place is tremendous for transparency and permission and security. And this already comes with that out of the box.

Additionally, it supports Nextflow and has numerous tutorials around nf-core as well as being future proof. Should we decide we want to expand to other workflow languages like WDL or CWL? And thinking about the future, we're also interested in using Healthomics Storage and Healthomics Analytics which integrate seamlessly with the workflows.

Lastly, the other really big benefit here, why are we using this? Healthomics gives us a scale and a stability that our previous solution really didn't. And in the past, this is actually one of the things that our IT people really struggled with was toying with different queue sizes, toying with different instance parameters, trying to get things to work. Um and, and, and, and in some cases, really having a lot of struggle, but now that it all kind of taken care of for you on AWS.

And so ultimately, you know, why are we using this service? There's no need to re-invent the wheel. They've already got a really great service here and I think it's likely gonna serve a lot of different companies needs. Um and so again, there's no need to re-invent the wheel. They've got great stuff and you can just engage with it directly.

So what are some of the other advantages of using this service? Scientists, as I mentioned, are empowered to perform self-service. We don't have to wait for our IT people to set everything up. The shared pipelines being in one location lead to robustness, reproducibility and reduced cycle times, right? Nobody's gonna want to set up their own compute environment to go and do a bioinformatics workflow if I tell them, hey, it's already set up in the cloud for you. Just a few clicks and you're there.

Additionally, depending on your inputs and the particular pipeline that you're using, we've seen cost savings of between 40 to 60% relative to our prior solution. And the compute times for these workflows are either as fast or faster than our prior solution.

Really importantly, for someone like me who's engaging with these workflows all the time, the debugging is also a lot more transparent and the troubleshooting much more straightforward. The run metadata is retained in CloudWatch logs and enables real-time progress monitoring and rapid troubleshooting as well as connection to other systems like Laboratory Information Management Systems or LIMS. And this is really critical of course.

And then lastly, perhaps not as excitingly but also very importantly, it provides run level visibility into billing. I'm not someone who deals with billing personally thankfully. Um but obviously, this has really big implications for a company of Amgen's scale and billing complexity.

So if we kind of put all this together about the backgrounds here, what do I think about this service broadly? I think that if you're interested in broadening accessibility usage and visibility of omics pipelines that AWS Healthomics really represents a uniquely appealing and fit for purpose solution that I've gotten a lot of utility out of.

Um and so now I would love to dive a bit further into a little more detail about what goes into actually migrating a workflow to give you a taste of what this might look like for you if you're interested.

So first I'll start with kind of some general broad tips, right? Um and the first of all, I'll notice that every system is a bit different. I'm sure that's not news to anyone here, but it's important to keep in mind that just because something worked in the past on one system doesn't mean it's going to work identically in a new setting. Um it's always worth a try, but it's important to keep in mind and that's part of why it's so important that you define your compute environment with things like containers.

And that brings me to my next tip as Ariella noted to be sure to validate your container and s3 permissions, your containers are gonna be running the individual processes in your workflow. Um and they define the compute environments they're in. So you want to be sure that you have the appropriate permissions to draw down from them.

Uh third, the open source helpers, I've been engaging with AWS around this service pretty frequently for about a year now and I've been very satisfied with kind of the back and forth and the development that we've had relatedly, I would say consult their AWS nf-core pipeline migration workshop. If you have found that an nf-core pipeline is suitable for your needs and will work for what you're looking for.

I'm a big, big fan of nf-core for those of you who aren't familiar with this. It's Nextflow Core. It's a community effort to essentially settle on a a gold standard set of bioinformatics analysis for different data types. So, you know, as an analyst and a scientist, I'm always thinking about how do we analyze these data correctly?

Uh thankfully, this takes abstracts away a little bit of that by enabling broad community academic as well as industry efforts to settle on best practices for bioinformatics for different data types.

Another interesting thing that I didn't really understand early on, but I've come to understand and use very effectively in Nextflow is to use Nextflow retry strategies per task to avoid losing progress. And here I noted due to out of memory issues, but it's not just out of memory, there are all sorts of things that could lead to a task failing.

And in Nextflow, you can actually set it in your pipeline such that given certain types of error failures, you can have it retry the test with different parameters. So one very common one is out of memory, right, where you can have it retry the task and say, give it some more RAM if it fails and gives you an out of memory error. But of course, we don't want to bother doing that if it's not an out of memory error.

So conversely, I also have examples that I've worked on where the task fails, not due to out of memory, but due to a different error code. And then I say, ok, please retry the task but use slightly different parameters this time and see if that works.

And then lastly, I would say, really tell the service team what you need. I think AWS as, as we've seen at this conference, as many of you probably know already is very customer facing and we've had a really great experience with them in working on this, that's really kind of followed along the lines of us telling them, hey, we're really interested in this feature and building this out and, and them kind of responding. Ok, great. How can we build that for you?

So I want to dig a bit deeper into some specific tips for getting started with a workflow migration. And I was alluding to this just now.

Um but, but first I think the biggest thing you have to do is ascertain your needs. What data modality would you like to assess and how, because there are numerous, numerous ways to assess any given data, right? And again, I would recommend nf-core if you're not familiar. I recommend you check it out, go take a look, they have a pretty extensive website.

Um it's a great community resource. It really reduces your efforts and increases your confidence that what you're doing is a valid way to look at these data, as I mentioned before.

Make sure your permissions for the container registry are good to go.

That's pretty straightforward. I believe from the command line. It can be a little bit tricky from the console. You need to click the ECRs check box and then look for a little drop down action bar at the top right to edit its permissions. But it's all very doable and laid out for you um in in the guides that AWS provides.

Thirdly, parameterize everything. This is one that took me a bit to get to as well. But I highly recommend this, remove almost everything that's hard coded in the pipeline. You know, you want to be as flexible in your approach as possible, even if you believe that your company is only going to look at one particular reference genome with this particular analysis for now. It just, it just behooves you to set it up such that when someone later on the line says, hey, I a custom genome or I have a different genome that I want to look at that. It's just very easy for you to put that parameter in. You don't have to make a new pipeline. You just tell them, oh, just change the change the parameter. I have it as a default, go to human. But if you want to do a different one, here's a new file with the parameters to put in. So parameterize everything. Um and, and just make it simple for yourself in the future.

And then lastly of course, confirm the published directory. Uh you want to modify the script such that healthomics appropriately writes back to the s3. It, it would be very sad if you spent the time and effort and energy and, and money on the compute to do all the work and it looks like it worked. But then uh then it just fails to push the results to s3. You don't wanna do that, that's very frustrating. So that's kind of how you get started broadly just from thinking about it.

But then once you have the workflow modified as you actually need it, and you've kind of either gone through the AWS o makes migration workflow and looked at those steps. And you know, you're good to go, you create a workflow artifact by zipping it up and then you upload that to an s3 location. Then you prepare a json javascript object notation file of the parameters and you want kind of two files here. One of these is a template for the parameters that describes what they are and whether or not they're optional. And then the other actually has some input test data that you can try running the pipeline with um ideally for a small test case. And I'll emphasize that again, critically, you want to use a small test case so that you can move quickly through the pipeline. Um and, and confirm that it works as you were expecting.

And this is another place where nf core is really helpful because every pipeline that they release, they also release with test data. So you can pull the test data directly from them and ensure that the pipeline works as you expect. And again, I'm hammering this home. I know i've said this a couple of times but confirm those permissions, make sure you have the right im policies. You really don't want to spend the time and energy and effort to set it all up. Um or, or have it perform the compute only to have a permission, denied error, hinder your progress. That's always very frustrating, right?

Um and i now i want to provide a few quick ideas about troubleshooting here as well and, and when it comes to troubleshooting on this platform, as i mentioned, it's a lot easier than our prior solution that we used at am g. Um and so i encourage you to use the logs both at the run level and at the task level, individual runs are gonna spin up many, many different tasks. And you can tell a lot about what failed with an individual process in the task log. But then you can also go one level higher to the run log. And essentially, that's going to let you look at the next flow or other workflow engines direct output. And that can give you additional information about why the task failed. What kind of error message you got after how the uh how the actual call was made for the task. Um and what type of error you're getting so that you can do the appropriate retry strategy if you want to implement that.

Um with the retry strategies again, as i was mentioning before, it really depends on the error, right? You don't want to up the ra m. If you didn't get an out of memory error, you'll probably just get the same error. Again, there's no need to do that. Um and then you'll just be wasting compute, right? But um again, you could, you could do ra m, you could do different uh parameters being called for the actual function that you're using. There are all sorts of ways to do different retry strategies with next flow. It's very natively built into that.

And then lastly um really use your community. The nf course slack uh is very up to date with information and troubleshooting about all the pipelines even more so than their github. Um in my experience since there's a lot of development that's happening in the slack, there have even been a couple of times where i thought that a particular pipeline isn't working for me on healthomics because i implemented it wrong and then i go poke around in their slack and it ends up. Oh no, this version just isn't working for anyone right now and i just need to roll back to one version behind. Um so yeah, nf and of course slack is very helpful. And then the account team, the service team aws in general is really here to help you make the best use of this service possible. So when in doubt, ask them, not only for new features that you're interested in expanding on, but also if you're stuck in trying to get something to work that you expect to work, reach out to them. They really, they really are here to help.

Uh lastly, i just want to briefly discuss the idea of integrating continuous integration and continuous delivery into your workflows. I think a lot of developers myself included. This is something we care about a lot so we can have a sense of whether or not something actually works once we've made a change. And i'm very happy to say that the healthomics platform is tailored to work with existing c i cd. You can use your platform of choice. And indeed at amgen, we've been using git lab to set this up within our own environment. And again, here, i just want to highlight, especially for the case of ac i cd workflow. It's really important to choose appropriate and relatively small test data so you don't waste time and compute.

And so that's kind of where we're at right now at amgen. What we've done with aws healthomics and some tips about how to use it effectively. I want to give a brief taste of what's next for us and what we're envisioning with the future for this platform. So as i mentioned previously, we have 30 some odd pipelines that we're migrating and the vast majority of our internal pipelines were migrating to this service both for cost and shareable purposes. I would like to update some of the nf core pipelines that i've already implemented on the platform. Um as well as adding custom pipelines um in an xfl workflow and i'm not an xfl developer, everything i know about next flow i picked up and learned in the process of engaging with aws healthomics. So i just throw that out there for those of you who may be interested but yourselves are not next flow developers to let you know it's not too hard to learn and i don't have any prior java background either. Um so, so it's possible to learn.

Um we want to connect this to our various internal systems right now. We're actually going through a transition with some of our laboratory information management systems, but we're working on connecting and enabling cross systems, metadata management and really, really excitingly automated, run parameter generation and automated run start. I'm sure a lot of people here in the audience are familiar with the idea of an end to end solution in pharma. At least when i have other stakeholders that am talking to me about this, they are literally talking about ends to ends. They say as soon as the data are sequenced, they should be uh automatically generate a design file, automatically start a run and have them analyzed such that the scientist comes in in the morning and the data are analyzed already. They're not, we don't require human intervention simply to do what many people would consider essentially preprocessing on omics data to get any useful inferences downstream. So that's, that's the vision is fewer human interventions and ability to automatically see the data get analyzed in a fast fashion.

Um and, and as i mentioned, i i wanna empower both myself and other amgen scientists with next flow knowledge because nf core is one great way to implement pipelines on this platform. But it's not the only way any nfl pipeline or cwl or wdl could be implemented here. So it really does enable you to do your own custom pipeline development with any of these languages.

Um and then as i mentioned on the prior slide where we're working on developing our c i cd workflows to ensure that we have a robust standardized version control and regular updates to our pipelines. So a little more vision casting for the future, i think the broad goal is to make all manner of omics analysis as simple as a point and click operation or even better as i was just discussing mostly automated with results ready to access shortly after the data come online.

And then i didn't really address this as much because most of our usage so far has focused on the workflows. But very excitingly, we are interested in using omics storage in the future. We're really interested in migrating a wide variety of currently s3 st stored omics files like fast qs and bams um to the omics storage, both for the integration with the platform, but really for the cost savings uh gabba versus gigabyte. To me, that's uh incredible. I don't understand how they accomplish that. Um but it's a very exciting, exciting prospect.

So i hope this has provided you with an interesting overview of how amgen is using this service for us. This is just another thing that we're doing to democratize omics analyses and represents one more way that we're working on the cutting edge to go all in for the patients that we serve. So thank you for your attention. And now i will hand it back to arriella.

Oh stay up there.

Thank you, itai. Um that was a great presentation, a world where science defeats all disease that resonates very strongly with me. So now you might be wondering what's next if you want to learn more about healthomics, you can take a look at the developer guide or the omics landing page. If you want a more hands on guided experience um with all three services, check out the healthomics end to end workshop. Um it you can, you can get experience with a guided experience with all three services with the console, the aws uh cli and the sdk.

And here's another way our customers are doing amazing things. And as i said, speed to patients is critical. The children's brain tumor network is building their multimodal data sharing and analytics platform on aws, integrating data across 30 institutions on and unlocking open science. This year, we're gonna be distributing limited edition pins designed by five year old cameron, a patient with recurrent anna anaplastic idem. This pin is a symbolic reminder of the continued impact of cloud serv cloud based innovations on patients and the importance of aggregating data across institutions.

Finally, as you go through re invent, if you wanna see some really cool health care and life science, focus demos or have questions about hcls at aws or just wanna hang out, you can find a team of my very talented colleagues as well as probably myself ready to answer any of your questions at the healthcare and life sciences demo pavilion.

And finally, if this wasn't enough for you and you wanna go hear more about aws healthomics at re invent, we have a few other sessions later today, there's gonna be one focus on stan, how stanford is building their precision medicine platform using aws healthomics at the mgm. Um and we have uh several, several workshops. They are repeats of the same thing. So you can look at the times that are available on how to build a complete multimodal analysis for health data.

And finally, thank you so much for joining etai and myself and hap ty and i are happy to answer questions here um or once they kick us out, out in the hallway. Um but before you go, please remember aws is a data driven company. So please let us know your thoughts. Please take some time and fill out the survey.

Um thank you very much for listening. Thank you very much for your time and have a wonderful rest of your reinvents.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Omics innovation with AWS HealthOmics: Amgen’s path to faster results

Good morning, everybody and welcome to re invent. My name is Ariella Sasson. I'm a Principal Solutions Architect at AWS supporting customers across the healthcare and life sciences. I am joined today by ETA and today we are going to show you how to acceler
复制链接

扫一扫