Accelerating life sciences innovation with generative AI on AWS

最新推荐文章于 2024-09-14 09:01:26 发布

李白的朋友王维

最新推荐文章于 2024-09-14 09:01:26 发布

阅读量333

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134828282

版权

All right, good evening everyone. Thank you for coming. My name is UJ, I lead Machine Learning for Healthcare and Life Sciences at AWS. And today I'm going to be joined by my two co-speakers from Gilead Sciences - Kevin Cox, who is the Chief Cloud Architect and Senior Director, and Jeremy Zhang, who's the Senior Director of Clinical Data Science and Advanced Analytics.

We are going to talk about how generative AI is transforming the life sciences industry. The session is going to be divided into three separate parts:

Me kicking off this agenda by telling, talking about how generative AI is being utilized by our customers in the life sciences space.
We are going to switch to some key considerations and differentiations that AWS brings to the table for life sciences organizations.
We are then going to invite our speakers from G to talk about how they are transforming the business of G with generative AI and AWS.

I'm going to come back towards the end to conclude with a few talking points around best practices and how to use generative AI responsibly because this is one of the things that stop in the minds of our customers.

So I want to begin by saying that this field of generative has just taken off and this is propelled not by just a few models or just a few things that you are seeing from the beginning of the year. But it's like a sustained effort that has continued over the years, right? From 2017, when we initially introduced the concepts of transformers going into how they could be pre-trained and generate content. And now seeing a surge of applications that are all being utilized by that technology.

This is propelled of course by the availability of a lot of data models to choose from. And then all of the compute that propels all of this into creation of use cases and workflows that are truly meaningful for the businesses today. And the life sciences industry is no exception to this.

We have seen the application of generative AI across the spectrum of life sciences use cases starting from all research and discovery. The generative AI is really transforming the way drugs are discovered with algorithms that can now predict the properties of proteins and different molecular structures.

We can now also use generative AI in clinical development where we are optimizing clinical trials using generative AI where you see a lot of documentation. A lot of uh repeated work that that are manually done are all being utilized to actually create operationally efficient workflows.

We are also seeing the use of generative AI in the manufacturing space where we are identifying things like false rejects or optimizing manufacturing processes.

Through the generative AI in the commercial and medical affairs window, we are seeing use cases around patient outcome predictions and content generation, especially in the space of personalization. Each of this content is actually utilized in very different ways and different geographies and different personas and generative AI is being utilized and models is being utilized to actually create personalized experiences. for all of these use cases.

Finally, is a space of patient support. We see generative AI being used as a concierge where patients are using algorithms and and chat bots that are being front ended by those algorithms. And this is all happening in the space of a very user friendly applications that are now available to these patients.

Now, one of the biggest misconceptions we have heard from licenses organizations and and the way we have seen this industry progress is everyone is now chasing the size of the model, everyone is thinking bigger is always better. But that might be true for a lot of use cases that are general.

We have seen organizations utilize out of the box model of use cases that you know, summarize content that may not actually depend on domain specific tasks or create chat boards that are fairly generic about the knowledge about the world. But what we really feel is the differentiation in the life sciences space, especially when you go into really, really deep domain specific use cases is your ability to fine tune on your domain specific data. And that is where we see some differentiation coming from early research. Like this one that was published by MIT going into some of the things that we have launched, you might have heard earlier today about how we are making all of these as possibly the easiest things to do so that you are able to differentiate by your data and fine tuning these models to your use cases.

In fact, it's not only the model model is just one of the things what we really want you to think about is the choice of generative AI is. And the way you construct it is dependent, a good balance between three different things starts with the quality and quality does not always mean more or both responses or it doesn't mean a more accurate model for one or type of a problem. But the kinds of problem that you are really looking to build, it also depends on the cost as we are seeing more and more uh p cs that we were initially executing, making it into production. Organizations are feeling the need to actually run them at scale in production with a cost efficient in a cost efficient manner. And as you grow larger, the models grow larger, the latency also plays a very important role because large models take time to actually regenerate inference.

So a true generative AI strategy should actually take into account all of these three different parameters and and make sure that the right model is utilized for the right use cases. And only then you can create a differentiation.

Now, let me walk you through an example of this. This is an example that we created as a prototype to demonstrate how exactly organizations are using generative AI in the healthcare and life sciences space. This is called Health Agent. This is a prototype that we have recently developed. And Health Agent allows you to connect both internal and external data sources and create meaningful responses in a conversational manner.

What Health Agent does is utilizes the capabilities of agents and large language models that are built on top of Bedrock and be able to create this in a conversational way by preserving context. Let me play this video and show you what I mean.

Let's take a scenario of a hypothetical patient called John 100. I have John hundred's clinical records sitting in HealthLake which is our EHR data store. I have health John hundred's medical imaging record setting in Health Imaging and I have his omics record setting in Health Omics. These are purpose built health care and life sciences storage services that are created by AWS.

We are now being able to create a workflow where you can interrogate a patients. In this case, John hundred's clinical records, their imaging records and the genomic records and cross reference those by public with public data sources like PubMed ClinicalTrials.gov, you can also look up gene annotations from public gene databases and do that in a conversational manner.

So this is an assistant that can sit on the side of a researcher or a physician who is now looking at all of these different personas and different patients and trying to determine what is the best outcome for that patient.

Here. In this case, you can see that this patient is clearly very sick. He's got, you know, a variety of conditions starting from, you know, most of mostly cardiac related. And one of the things that we notice is in his genomic profile, we have found a variant that may be contributing to these conditions. And some of the researchers who are doing research on that variant can actually figure out what medications or what therapies can actually target that variant.

In this case, I found out a variety of things that this patient can, you know benefit from including a clinical trial that is recruiting patients with similar conditions. Another thing we would always also want to do is probably figure out if there are any known therapies that targets that variant.

All of these are things that are not possible to be done by a single large language model. But by using the capability of agents that take all of these as separate uh workflows and then orchestrate them with a large language model allows you to create such workflows.

And this is happening, you know, for multiple years at AWS. Right. From the time when we started in the machine learning journey, with the launch of SageMaker, we have seen an immense adoption of that service for a variety of use cases in the life sciences space. And it's resulted in the fact in a true testament that nine of the top 10 pharmaceutical organizations are actually relying on AWS for their machine learning and analytics needs.

This is propelled majorly by a few key tenants that we have heard from customers. And this is, you know, a testament of how we are actually working backwards from all of those key points and creating services that address those.

One of the things that we have repeatedly heard from customers is why is it so difficult to fine tune and train a model? We want to make that easy, remove the undifferentiated heavy lifting around that the other thing that we see organizations really care about is their data, their data is their ip you cannot actually give away your data. And and if you're not differentiating with your data in these large language models, there's not a lot of differentiation, you can create.

The other thing that we have heard from customers is they want to see apps that actually increase productivity instead of just training and fine tuning models to no end, you need to create like workflows that uh you know, understand the usage of that large language model and is able to, you know, uh create meaningful productivity and end user workflows.

And then lastly, when organizations think about deploying things into production cost plays a very important role and cost reduction should not come at the expense of uh you know, bad performance. So keeping the ratio of cost to performance as meaningful as possible is also a very key tenant from all of the things that we have done.

And this is happening through the launch of Amazon Bedrock. It's generally available. It's a service that allows you to create a ps on a variety of workflows with a choice of foundation models that you can, you know use from, you can also privately customize these foundation models with your own data. This includes post uh you know, customization with label data sets uh and also uh with continuous pre training.

And it has a range of foundation models starting from the a i 21 series m. uh the cohere series models from a i uh the jurassic series models from a i 21. We have the clad models from anthropic command model from coha. And we also have uh the lama models that a re available from meta.

Finally, for your image generation needs, we have the stable diffusion model and uh our own models, the titan text and titan embedding. All of these are available as easy to use apis that you can then integrate into your work flows and create work flows. Like the ones that i showed you earlier with health a ok

Kevin Cox, Chief Cloud Architect at Gilead:

Good afternoon everyone. Thank you Usual for the introduction and thanks to all of you for coming today. I know it's a late session, but I wanted to share a little about Gilead's journey with generative AI so far.

My name is Kevin Cox, and I'm the Chief Cloud Architect at Gilead. I want to start by asking - how many of you are experimenting with or deploying generative AI solutions in your organizations? That's incredible. There is clearly a lot of interest in this area, and I'm very excited about the potential for this technology to really impact healthcare and life sciences.

At Gilead, our mission is to eradicate diseases in virology, oncology, and inflammation. We've made great strides - delivering a cure for hepatitis C, transforming HIV treatment and prevention, and taking on cancer with patient-centric solutions like CAR T-cell therapy. But that's just the beginning. We're investing in world-class science and technology to deliver 10 or more transformative therapies by 2030.

We can't do this alone. We rely on a wide network of partners, with over 40 relationships in oncology alone, to deliver these innovations. We've established a culture where our employees feel empowered to make a real difference in bringing health to the world. We believe continuing to invest in our data, AI, and ML capabilities will be critical to meeting our goals.

We've already built a data mesh within Gilead that allows all business units to curate, contribute to, and consume governed data products. Our goal is to align business needs into our data and cloud platforms, enabling our data scientists to leverage tools like SageMaker and Databricks in a modular, self-serve fashion. This approach has broad support across the organization, with data leveraged from research through commercialization.

None of this would be possible without our partnership with AWS. We've increased our partnership in recent years by becoming a cloud-first organization, migrating most applications from our data centers to AWS. Our cloud and data/analytics platforms allow us to innovate and build on new capabilities. This year we also deployed an enterprise AI solution and are building an HPC solution for research. We'll continue to innovate with technologies like generative AI.

Sometimes technology can feel like a hammer looking for a nail. But that's not the case with generative AI and life sciences - there are key opportunities ripe for this technology. Traditionally, our highly regulated industry relies on manual, inefficient processes that generate documentation and unstructured data. It's difficult to glean insights and improve processes from this information. Additionally, life sciences and healthcare data is very complex, often requiring ontology to make sense of it.

We've heard many use cases from business teams hungry to use this technology. For example, our Medical Affairs team wants to glean insights from their data trove without building complex pipelines. Our Commercial team wants to jumpstart new content from their information treasure trove to work more efficiently.

Why partner with AWS? It's a natural extension of our existing AWS platform where our data and compute reside. AWS has broad generative AI experience and is continuously innovating. And AWS is deeply partnering with us, running workshops to identify use cases, providing education, early access, and helping run proofs of concept to demonstrate value.

We're already seeing the value of this effort in unlocking insights and streamlining work. Our strong cloud foundation enables teams to self-service new workloads, while our data platform facilitates reusable data products with built-in quality. On this foundation, we're enabling data science tools to leverage AWS' foundational model services.

We started by establishing an AI reference architecture and standards/guidelines like:

Keeping data and models in the Gilead cloud
Starting simple with foundation models and prompt engineering
Leveraging hosted services like Amazon CodeWhisperer
Choosing the right model for each use case
Addressing legal and security compliance

To scale solutions, we take a two-phase approach:

Innovation factory: Identify POCs and use cases, create a registry, run workshops, build POCs to demonstrate value
Product development: Transition successful POCs to our SDLC for build, test, deploy, run

Underpinning this is our AI Center of Excellence to ensure responsible deployment. Some patterns have emerged, like "question and answer" chatbots to interact conversationally with Gilead data. We've seen great success so far and are excited to continue scaling generative AI to transform our business.

And it also allows us to include uh references to where the information was found, which is critical. So this architecture starts with documents being uploaded into a knowledge base. And then we index that information so that when a search comes in, we can query against past questions that have been asked and return that information. If, if it matches, if not, we then go into our index, look for documents, sections of the documents that match that answer, feed that into an LM and Bedrock and then that generates a summarized response that goes back to the user.

The second pattern is sort of an extension of the first. And it's a little bit more of an advanced use case where we want to not only go against unstructured documents, but we want to pull in other structured data and also do summer, you know, broader summarization and potentially content generation as well. Some examples of this are summarizing Gilead research papers. So our research team, you know, has a treasure trove of information they want to be able to identify new targets and and this is a tool that we're using to be able to do that.

Um the other example that i mentioned earlier from our commercial team is is creating new marketing material based on previous uh product releases. So this pattern, you'll see the main differences are on the bottom. We have a broader set of data, right. So we can pull in structured and unstructured data. And we're creating a vector store using the embeddings from a large language model so that we have more control over the data and how it gets matched in the index.

The other thing is you'll see is we've got multiple LMs in this architecture. So we can use the right model for the right phase in this architecture. So the execution phase, you know, we're looking for information retrieval a model that does that the best, then we have a second phase where once we've retrieved the information, we can actually verify, using an LM to verify that it actually answered the question correctly. So this is an extension of the first and, and again, we're, we're building this solution as we speak for the research team.

So at this point, I want to hand it over to Jeremy so that he can talk about how we're using generative AI in the clinical space. Are you coming? All right.

Um hi everyone. My name is Jeremy Zhang. I lead AI for drug development at Gilead. So as Kevin mentioned earlier, Gilead is a actually a pretty large company now, it grew really rapidly during uh during the pandemic. Um we're a very global company and in development. Um you know, you have clinical site partners, MSLs, local affiliates all over the world. Uh so it's not just when you're building systems technology solutions, especially in a data science space, you're not just building stuff very US centric, even though a lot of our, you know, real world data, medical claims, electronic medical records are mostly US based. Uh we have a lot of data in the US. But we really have to consider how the solutions that we're building and the optimizations that we're bringing to clinical programs and portfolio programs, how the insights and the evidence that's being generated is using, being used broadly globally.

Um and that's something that I think, you know, a lot of data scientists that mainly concern themselves with technology. Um there's a lot of stuff in this domain that you really have to think about when you're developing solutions. Um and now going into a large language model GPT like generative AI space, they, they're really front and center, especially in the EU where regulation is coming swiftly for a lot of the things that we're trying to do now.

So aa as Kevin mentioned, we are very much concerning ourselves, not with just bringing transformative therapies to people, but also cures. Um you know, a little while ago, we brought uh brought a medication for HCV to bear where we for all intents and purposes were able to cure hepatitis C. And we are looking forward to doing the same thing for HIV and HPV. And our mission is to create a healthier world for people. And you know, data science is very much part of this when you think about what Gilead as a company is trying to do and its mission, I think data has been something that we've used since our founding, right? Biostatisticians, you can think of them as the earliest data scientists. But now with technology like large language models, generative AI and just more broadly applying AI, not only for, you know, boosting the efficiency of your workers, but to drive competitive advantage, that's something that we're becoming more and more aware, especially in a clinical space.

So our transformation is very much underway um for some of you guys in the crowd or from other pharma biotech companies, you know, Gilead traditionally as a virology company. So we've mostly focused, you know, being founded in the Bay Area on the HIV pandemic or a HIV epidemic. And that's been our legacy, our history. But as Gilead has moved into oncology into a more competitive environment in lung breast colorectal cancer, you know, when you move into that kind of space, you can no longer rest on what has brought you to where you are. But now you're, you know, going up against some of the companies that you guys work at and China can be in a very crowded space, especially an antibody drug drug conjugates immuno oncology.

Um and AI for us in the clinical space is becoming a differentiator. It's becoming something not just interesting that we use to boost the efficiency, to find documents, but to design clinical trials, uh to optimize protocols to reach community centers and sites and key investigators that our competition may be overlooking. So that's how we envision AI would make a difference and drive competitive advantage for us in clinical.

Um so as Kevin mentioned earlier, our ambition does span three therapeutic areas. You know, that virology has been our legacy. I think with COVID COVID-19, like we use a ton of real world data to drive a lot of value there. Um but now looking forward, I think in oncology and especially continuing our mission in in inflammatory disease areas like NASH um AI is becoming more and more part of our working norms and not just interesting use cases that we execute, but how do we get this done? Like what's the secret sauce?

So I'll try to share with you guys as much as our legal team is going to allow me to. So there's really three key ways if I really want to break it down, like how like ok, so there's a lot of different use cases, but how do you prioritize and figure out what's actually going to make a difference and move the ROI needle, right? For data science teams, move the ROI needle for, for uh the, you know, the program and portfolio teams that you're actually working with.

So these three key ways are accelerating data management tasks doesn't sound sexy but believe it or not, one of the key ways we use AI is in data engineering. Um the second way is augmenting decision making, um physicians, clinicians, MSLs, there's still the lifeblood of how we run our programs, right? And today you're not really gonna have like some kind of generative AI model, replace uh endpoint selection or which biomarkers you're going to use between your phase two and phase three, that's still very much a decision that's belongs with your development lead.

Um so a lot of the ways that we're using generative AI is to just make augmentations to help them with their decision making. And the last thing is increasing efficiency of tasks. Um Kevin mentioned many of those use cases, you know, like RAG search, um translating documents, uh information extraction onto. There's various different ways you can use generative AI/AI models to just boost little efficiencies here and there. Um and that really stacks up.

Um I think finding information is something that a lot of farm pharma companies biotech struggle with, you don't have just internal data, but a lot of external data, you're buying data from sideline from, you know, global data from, you know, all kinds of different data vendors. It's like how do you get all that together and make sense of what's the actual insight that you should be making your decisions off of?

Um so I'm gonna talk, I'm gonna dive into like a system that we built in the last year to optimize clinical trials. And a lot of what I'm showing here, you'll realize like how we're like using uh the various different LLM technologies in multiple different areas.

Um so before that, a discussion on generative AI or AI in general in pharma is not complete without talking about regulations. So I'm talking about that a little bit.

Um so there are some key challenges that we face in our industry that um isn't unique, there's other regulated in industries, but i think the stakes are higher, right? Because if we mess up its patients who will feel that impact, um there's trust explainability, there's safety concerns, there's privacy. Um traditionally, we've segregated data, we have primary secondary use and we, you know, we go to sleep at night understanding that we have good data governance and that the right teams are using the right data. But with large language models, if you're not careful, you can combine that data together and you can actually reveal information uh from disparity data sources, make those connections and actually reveal private information. So it's very dangerous to put these kinds of capabilities in the hands of analysts or data scientists who aren't aware of how those connections can create those privacy concerns, which is something that we're very, very keenly aware of in R&D.

Um I put regulatory or lack thereof. I think that's not fair. I created this slide before Biden did his executive order. Um but it's coming right. Um at some point. Um you know, I think on Monday, I had a chance to see a presentation from some of our Bay Area colleagues at Stanford and there was some discussion around the California attorney general going to different companies and asking them to show what they're doing with AI, right. Like that, that it is coming. I think California may, might want to take the lead on that, but even uh nationwide and especially, you know, with EU AI act and FPI making moves here, um the regulations are coming and some of the technologies we're using, like if you look at GPT four, that could be illegal if there are now regulations around how many, how much emissions large language models are creating. So I think teams especially in our space that are looking at fine tuning models or leveraging open source large language models from providers such as Amazon Bedrock needs to be aware of each individual model and what the implications are. It's no longer just a conversation about, you know, like um whether your legal team allows you to use it, whether the IP is there, but also what the EU AI act thinks about it and what the FDA or, or NIST or other regulatory bodies think about it too cost and scale.

I think Kevin and Bourgeois both touched on that. Um that is a concern, right? Um I think there's big numbers being thrown around about the cost of large language models, but that is being pushed down. Like we recently we had a chance to talk to one of our partners about what it would take to like fine tune a 13 billion parameter model. And it's actually in like the $10,000 range

So it's not like out of reach for companies like uh farmer companies, um lastly, expertise and talent. I think the number one limiting resource, at least in that i've noticed in our industry is in compute. Actually, it's talent. It's very difficult to convince data scientists who could easily get, you know, another job in the bay area to come to a farming company where they're gonna see a lot more restrictions around the data, right?

Um so with all these challenges uh within development, we've really scaffolded or structured our use cases into 33 different areas and we had to make a decision. The first is foundation models. Um and even there, you have to think about some, some key questions before you can rush into using f ms, right? And the, i mean, these are just a few but i would use them as an example.

Um foundation models are great because they allow you to go fast, like for me to have somebody from my team even download something from hugging face and then put it on a model registry and have it connect to an app and that's a lot of steps. But something like bedrock, a foundation model service can get you from point a to point b and at least get you going on a poc much faster.

Um but what we've noticed and you know, my data scientists tell me this all the time. It's like, dude, like the big foundation models just don't perform as well as like b obert. And it's true, like a lot of our use cases, you know, you download a much smaller model that's domain specific from hugging face and it will drastically outperform a large foundation model.

And, and yeah, it's totally true. So you have to really think about if you're gonna fine tune like what's the cost? And also uh there's various different health care um life science foundation models that are smaller models that are being built for task specific things.

Um and a lot of times those are better than your large uh large general models. Lastly, diy, um this is probably the hardest thing because a lot of really good, like rural data, medical records are from vendors and they're becoming increasingly aware of that as a revenue source.

So contracts with uh real data vendors are becoming more and more complicated because maybe they don't want to let you train your own large language model. Um and then internally like to gather our own data, we have to be aware of if we train our own large language model and some of our proprietary data, if that ever leaks like, and there's the information leakage like is that a security risk? And then, so the cost is only one of those considerations.

I think security is also something for diy that you have to be super aware of. All right. So in drug development, we have a lot of different use cases, right? Evidence generation um biostatistics that programming uh creating tables, figures list for trials in development.

We decided that we wanted to pick a business process, right? A less risky one to really pilot and see how we can drive a lot of value with ll ms. Um we looked at from study approval to postmark market monitoring like what in between the, you know, the development of a program and then the market marketing as well as the sell through and the manufacturing like what are some of the steps that we can kind of address initially?

Um that would be less risk for us. We chose protocol design um trial execution which is an extremely common use case in our space. And then lastly, you know, being in the the data science team, uh there's a lot of data management as well as data analysis, things like biomarkers, multi, multi omic analysis, multimodal fusion that a lot of a ia i technology can be used for now.

Um so we picked these three areas and i'll talk about some of them today and and show you how we're driving some of that value. So how do we do this? This isn't rocket science like just because large language models are involved, you still need data actually, you know, i'll go more in depth on the next slide. Every single large language model projects that starts with data engineering, right?

Um so for us like our strategy was we need to think about three things. The first is how, what is our data strategy? Like if we're gonna set out on a project to use large language models to drive value, uh where's the data and how are we gonna manage that? And now we have to think about things like vector stores like the data strategy, a foundational common data model for that project is necessary.

The second piece is the actual a i. And we made a very specific decision early on in the development space that we wanted to own this. ip. Um so we became very aware of even foundation model services like where they were playing the a a i models that we were going to use to optimize clinical trials would be our own.

And lastly, it's all about execution, right? You build the best models, you build the best bot or whatever. If you're a clinician or your study team, the, you know, the clinical program manager, the the psl like if they don't want to use your app, like you're pretty much not driving any value.

So how you actually deliver those insights to your study teams to the portfolio? The program teams is extremely important. So about a year ago, we decided that we wanted to kind of change the paradigm around how we optimize clinical trials. We were going to skip just doing reporting and we decided that we wanted to see if we could use a i to augment a lot of the decisions around study feasibility protocol design um and the execution of the clinical trial.

So from end to end, we aggregated both internal and external data from some of the common ones, metadata, health verity. Um some partners that you, you know, many of you are probably working with, we combine that with our internal clinical data, um e dc ixrs other systems, as well as operational data, you know, contracting costs.

And we created what we call the a common data model. I'm not gonna it's an operational strategy model. Um you know, it's a very common data paradigm, right? We had three different layers, we had raw curated business, we call it bronze, silver and gold. Ultimately, what we wanted to do was create a fit for purpose data model.

A combination of clinical and real world data that would become the engine of our a i. And we had a longer term vision that this wouldn't just be for the execution of clinical trials, but also the design of trials, the design of programs.

Um so we leveraged a bunch of aws services like if i drew that diagram for you, it would probably span multiple pages. Um but some key ones like s3, obviously for data lakes r ds um airflow and we made a really specific decision early on that we were going to use data bricks.

Um so one of the things that we noticed like as we built this app and we started using it with study teams was that believe it or not, the development of drugs is a conversation. Like you don't just create a report, send it to somebody and then they just like use the data, the data gets discussed, right?

Uh it's a cross functional discussion between regulatory affairs, medical affairs, your clinicians, your cro the ctm and it's like a room or a zoom call full of people. And they're all talking about these insights that you're generating and we're like, well, wait a minute that that's like multiple meetings. You know, people are busy, you gotta find time like the actual number of days that it takes to have these discussions is becoming a limiting factor.

Um so we decided to work with aws uh with bedrock to essentially create a chat bot. It was kind of a chat bot, but it's a way for you to interrogate clinical protocols and do natural language query that allows you to replace that conversation though some of that conversation, at least with conversations with a i.

Um so then that way you can identify the right clinical trial comparators, you can get the right cohorts, you can get the right data sets um and skip some of those conversations that would normally have to happen or at least the data science team that's using these tools can dive down to what uh the protocol design currently is um at a much faster rate than having to constantly have meetings with the cross functional teams.

Um so we build this application. Um and so it's not just a data set or a data lake. It's actually a python app that we created um in parallel with the data foundation and all the modeling and connections to bedrock.

Um and it's being used across our portfolio in oncology, virology as well as information. Ok. Oh i did want to make a plug like we also use it to optimize the diversity of clinical trials. Many of you guys know starting in january, the fda is going to be very concerned about that.

Um so our models not only take into account things like enrollment rate and drop outs and uh cost of trials, but whether or not we can actually recruit diverse populations that represents um the real world data or real world evidence uh cohorts uh for, for the protocol design.

Um so the generative a i i, i'd like to think of it as something like when you think about a chat bot in a conversation, it feels like something for efficiency. But what i like to think of it as and i think our research team is also doing this, as kevin mentioned, it's, it's really an analysis tool like it helps you analyze the data in ways that you never thought before.

Like i i'm showing you some of the questions that we can now ask our optimization tool. Things like i only want first line patients, right, that my therapy is first line uh remove all the data that's related to metastatic conditions. I only want phase two studies. Please include indications for similar drug studies, right? Only recurrent uh only recurrent conditions like you don't even have to filter now across, you know, if you've ever used sightline or clinical trials.gov, there's all these filters, you can just use english language to talk to these protocols and get exactly what you're looking for.

Um here i have an just an example for uh for an oncology study from gilead, how this data actually looks. Ok. So this data foundation that we built, like i said, there's a longer term vision, right? We we don't, we created this product for clinical trial optimization, but it's really supposed to be for more than just that.

Um so combining all this internal external data, our vision was to create a platform so to speak and a way to have all these partner services and data services and then use it for other stuff. So we use it for a trial optimization app, but we're now also using it for a lot of other interesting and new new things such as design and optimization of of the protocol.

Um we're also looking at multi omics. So leveraging radiographic biomarkers, real world data genomics profiling together into like a a combined data model of patients um so there's a lot of really cool ways we can now use this common data model for.

So this is just one piece of advice for me, like if you're thinking about embarking on creating something like this, have a vision towards other use cases that could potentially be useful in your area.

Um so here's just an example of the same paradigm that we created and how and again, it's not rocket science, many of you probably have the same paradigm but how we're not not just using it for structured data to clinical and verbal data, but just completely like the wild west of unstructured data.

We've got public api s pdf parsing ocr of images, radiographic bio markers and other other kinds of imaging modalities. And we're able to use the same paradigm to then create onto extract information, use generative a i across the entire thing, right? Not just for chat bots, but also for the data engineering aspects and then deploy data sets and applications and tools that optimize kols draw protocols, um look at publications and create scientific knowledge graphs.

Um these are some of the cool these are mostly in the poc stages, some of them are heading into actual use. But there's but creating this paradigm has created an explosion of the different ways that we can leverage uh generative a r large language models and then use that on actual business cases that will move that ro i needle ok.

So with bedrock, like i said, we, we realize that the development of drugs is not one data set or report, it's a conversation with your cross functional stakeholders. Um and that's a lot of the way that we're using it.

Um so we're using it for a natural language query, which i think is a huge paradigm for us. Like many of you are probably using rural data and probably spending a lot of money on transforming that rural data into different common data formats.

Imagine if you don't have to actually do that um which will bring the cost of your data engineering on aws down. But then you can leverage large language models to do things like zero shot matching between clinical protocols and patient profiles or replace the laborious effort of creating patient concepts and cohorts with natural language query. Like these are just some of the ways that we are now looking at using bedrock and and other large language models um to try to drive even more efficiency.

Um domain specific entity extraction. So just a domain specificity of the large language models is so important to us. We realized very early on that we can drive a lot of efficiency by having data scientists understand um which domain models are good and that's improving all the time.

Um we do do a lot of chat uh with bedrock. Um synthetic data creation is something that we're looking at. And it's really interesting um one of the biggest barriers to things like using a i for radiology or pathology is just how much data you have access to. Like typically on a trial, you're not gonna get that many subjects, especially if it's um still in the early stages, like in the phase two, we can go into phase three. Um so, by so creating synthetic data um using large models for your radiographic biomarker modeling um is actually quite efficient and can allow you to do things uh to derisk trials um that you could not simply do because you just didn't have enough data.

All right. So last thing i'm gonna end with uh before i pass it back to us is um just like the broader vision of what you want to do with a i in this kind of space. Um it's not just about creating, you know, the one use case, you know, like all that's good, that's large english model. But if you have like your eye on these other use cases and these are just some of the different ways, many of them leveraging bedrock and obviously on aws, we've now created in a short year about a year's time, many different models that optimize the way we conduct trials. And we're using multiple different nlq applications, optimizing cost ies um optimizing for different indications, you know t a and indication specific modeling.

Um and all of that has kind of taught us that having that longer term vision and that, that tighter coupling between the business case and what you're actually doing with the technology is really what's gonna move the needle on driving ro i with the a i capabilities?

Cool. Um so i'd like to invite uu back up to the stage to finish the talk. Thanks.

Thank you jeremy and thank you, kevin. You know, for sharing all of these important insights as you know, you, you might have seen and to us throughout the theme of this talk been gen a i is actually at the tipping point, but don't forget all of the key important things that you have to build to create meaningful generative a i applications.

I know this is not as cool as the end user applications that you might be interested in developing but having the ability to create foundation models and analytics and have a solid data strategy is extremely important. That is why we have a comprehensive set of services for your creating your data foundations starting from ingestion services that allow you to, you know, ingest a variety of data and store them centrally in data lakes to query them, you know, not worrying about the modality, the size of the data.

What we are believing that the world is moving towards is, is a future where you don't have to consider, you know, barriers and storing a variety of data and then querying and generating insights from them. What we are trying to do is create a ps create services and all of that abstracts uh in in terms of the potential that you can create with applications by you know, directly for end users. And that is where you see the success of machine learning come in when you have a solid data strategy coupled with a you know, choice of models. And all of the things that you can do with machine learning on that data, that is where the true transformation happens.

The other very important thing that we have to keep in mind is the ability for creating generative a i responsibly. As jeremy mentioned in his talk, there is a lot of regulatory aspects that come into play over here, especially for a highly regulated industry like pharmaceuticals. You have to think about how to avoid the dangers and the pitfalls of generative a i when you're building with this technology.

That is why we have kept, you know, the core at the core of our services. We have kept this idea of creating generative a i responsibly. This is happening through the launch of services which we are putting in the hands of our customers who are then using it to validate and, and build, you know, proof points for regulatory agencies who can then interrogate all of that report, all of that data that we give them doing this in a transparent manner is extremely important as well.

A lot of these models cannot explain why they are generating something they have the ability to hallucinate. How do you control all of this is extremely important in use cases that you know, go into decision making where a life can be impacted. That is why we have services that allow you to create specific metrics, look at which models is better than the other. In fact, having a choice of the model is extremely critical in such use cases, having the ability to tie all of that and and do proper enablement. Because this field is also very, very new is also something that we consider extremely important.

We do this through a variety of webinars immersion days, teaching sessions that generate awareness and make sure that all of this technology while is broadly available is also putting a proper guard rails and proper utilization of it.

Lastly, we are investing in science, we are making sure that our own scientists are creating, you know, methods and publishing them for everyone to consume and and making all of those methods, you know, keeping central element of generative a i being used responsibly. All of this again, translates to the benefit of our products. We launch those products as services so that it can benefit you when you access them and build applications on top of it.

So i want to end by a key takeaways from the talk, you know, try to make sense of everything that you have heard and some key talking points and best practices starting with a i is evolving very fast. We are still in step one of this, there's a lot that will happen over the years. And one of the things that we want to, you know, advise organization is to think about a strategy that allows you to quickly change and the ability to fail fast and pivot to other approaches is extremely important in that.

How do you invest in, in strategies and workflows that allow you to create modular workflows? Like maybe you change a model or an api but the end user strategy doesn't change. So think about those when it comes to generative a i, the other thing that we get uh get from a lot of customers is them getting swayed by a lot of uh you know, information they see on the internet but not bother about how all of that benefits certain use cases that they care about working backwards from from the use cases has been one of the key important things that we have learned in our own experiences with customers while there are a lot of models available, not all of them are actually applicable to the kind of problems organizations are trying to solve. So working backwards from key tenants of use cases is extremely important.

I have talked about how data plays all of this role. But one of the things that data also does is differentiate you from the others only you have access to the data, data sets you on. So how do you make that as a key element in your generative a i strategy is extremely important.

The other thing that we have seen customers utilize a lot of is the use of manage services from aws because there are some managed services that abstract that you know, key decision making point of which model is better. For example, code wisper, it's a model that it's a service that allows you to generate code based on instruction tests. You don't have to choose model for that but utilize a service you saw amazon q being announced today which allows you to create like responses to a variety of questions on your own data source. That is where i think the value is, that's where the organizations are moving towards and that's where that's how you should think about adopting generative a i model.

Choice is critical when you're building your own applications, but it's not the end of it. There's so much more beyond model choice that you have to think about. What is your data strategy? How do you scale out all of these applications at minimum cost? How do you give access control governance, you know, audit all of these are things that you have to keep in mind and those are not always just driven by just, just a single model. It depends on your entire strategy and your architecture and then introducing governance policies using all of the services that allow you to build all of this, all of these applications that can be interrogated when there is a need for policymakers to do.

So. You should think about adopting those strategies in your own services.

So i want to end today by saying that we have been talking a lot so far about what and how, but one thing that makes this job the most fulfilling for myself is the way and this is one of the things that we, we care deeply about and the y is actually making, you know, differences in, in patients lives.

So this year a w has decided to partner with children's brain tumor network. Here's an artwork by one of the patients named cameron who is a five year old patient and she drew a picture of her swimming and she calls her uh calls it her superpower. We a re distributing pins just like uh where you can collect them at the venetian where we are uh you know, hosting a lot of different demos and it's a pavilion where you can interact with uh you know, host of peer industry uh colleagues.

You can also see a lot of demos. So these demos are available for you to interact with, including the one i showed you today. That was just one of them, the health agent. But you'll see drug discovery workbench. You will see variety of chat bots being created with rag using our services. You also see a lot of different health care use cases over there.

So i encourage you to if you have the time go and, and you know, uh check out the all of the things that display the lifes.

Lo lastly, i would say that there are a lot of sessions uh still going on that are specific to life sciences. My team is running, you know, a session on uh um you know, creating on fine tuning protein models. So we are taking, you know aws technologies um like sage maker and using uh workflow and distributing workflows that allow you to create fine tuned version of those models. These are all hands on workshops. So you can get access to the code, you can try it out for yourself. You can mo uh moize it, you can customize it to your own liking.

The other one is uh the scientific uh web bench environment. This is also something that resulted from a pc we did for one of the customers and we are now making it open source for you all to access. We can get more details about these sessions by scanning the qr code away. I'll leave it on for two seconds for you to scan and then move on.

All right. Thank you so much for spending.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫