LLM inference and fine-tuning on secure enterprise data

All right, welcome everybody. Uh this session is LLM inference and fine tuning on secure enterprise data with uh Snowflake. My name is Miles Atkins. I'm a Data Cloud Principal for our AI/ML workload at Snowflake joining you from Chicago today. Um as you can tell the Las Vegas dry weather always gets to my nose pretty quickly. Um so bear with me.

Um although the good news is when I left Chicago, it was about 15 degrees and snowing. So Vegas is just a little bit better, better than that. Um so thank you all for joining today. Um looking for uh looking to have a lot of fun here over the next hour. Really walking you through everything that you're gonna be able to do when it comes to Snowflake and uh gen AI large language models.

So many acronyms. And uh again, thank you for, for the attention today uh to kick us off here uh for those of you that might not have seen some Snowflake marketing before been through a Snowflake pitch and just gonna give everybody a quick platform overview of who we are, uh how we came to be and how we're different.

Um and then from there, we're gonna jump right into it and I'm really gonna walk us through three different categories on how our Snowflake product teams are thinking about integrating, working with partnering with um everything related to gen AI.

The first will be AI in seconds. This is really going to be tailored towards how can Snowflake think about bringing to you customers just LLM enabled experiences to where we manage all the complexity. We do all the fine tuning, we do all of the operations of these large language models and just are able to deliver to you very quickly. Uh new exciting experiences to uh you know, increase revenue, reduce costs, make you more productive. As a Snowflake developer, the the second category here, um I'm gonna what we call applications in minutes.

This is gonna be us bringing to market more of the fundamental building blocks so you can build your own LLM applications rather rather quickly. No need for you to worry about any sort of again hosting of more of the basic models. We're gonna provide serverless functions, we're gonna provide vector store like capabilities so you can cobble these together to build your own applications.

And then the last category here customization in our, hey, I do want to be able to kind of play around with the weights of my model. Hey, I do have some very specific application that I have in mind and I need to do customization to my model, to some sort of um you know, content context that you might want to include with your model. What building blocks does Snowflake provide for me there from a more raw storage compute uh perspective. And I'm going to be walking through what you're going to be able to do uh with that on Snowflake as well.

So kick us off here, Snowflake. How did we go from start up to IPO in 2020 beyond. Um fortunate for us, some things had to occur uh in the past for us to even be alive today. Uh first and foremost, uh you know, on prem data warehouses were really uh getting bottlenecked by the growing volume of data over time. Companies were no longer really able to incur the large infrastructure cost of having on prem uh you know, data centers.

Um that was a big growing challenge that was only getting worse and worse and worse over time. Um the second was your first generation data lakes were actually very, very poor when it came to performance in your queries in your workloads. Um and so, you know, that was another issue.

One of the benefits was the invention of public cloud right now that we are able to sort of just rent infrastructure when we need it, that we're able to scale up our needs for storage, scale up our needs for computes uh on demand.

Um we can start to do a whole lot of new cool things. And so what Snowflake did and what really just separates us from any other place where you can run a sequel is we completely re architected. What we thought was we completely re architected from your traditional on prem technologies to solve a lot of the challenges around. Hey, how can i include all of my data? How can i think about having that sort of built in scalability? How can we make it completely managed? So you the customer don't ever have to think about hiring uh you know, an on prem it management, uh you know, arm or or services uh ever again.

And so all of these things kind of con uh together is what has kind of given us the confidence to say we are now a kind of a full fledged data platform or also known as the Snowflake data cloud. And so what this is kind of turned into is Snowflake now being able to handle at least seven distinct different types of workloads with inside of an enterprise, obviously data warehousing data lake, but also newer more innovative workloads like uh AI and ML, which we'll definitely dig into deeper today data applications, cybersecurity as well as us getting into the online transactional processing space with Uno um and the other beauty of Snowflake is because we are built on the cloud.

The the walls are just not stopping at what you can do with your enterprise data through all of our marketplace technology, you can share your data, you can work with your second party customers, partners, suppliers to augment your data to get even more value. You can also look into the Snowflake application marketplace to find partner applications that you might not be able to procure because of security issues, procurement issues with our marketplace. We now make it very easy to deploy partner technology into your single tenant synth account. You play with, you play with yourself like credits, you get all the benefits of the governance and security. They have already set up when it comes to your, your data architecture.

And so when it comes to uh AI and ML specifically, um this is where I focus all of my time on here at Snowflake. Um happy to start diving into uh you know what we call generative AI for enterprise data where we Snowflake really are after we are chasing two different things, one to be easy to use. And then also to of course, be secure as we do think that your enterprise data is really the one thing that is going to be your core differentiator when it comes to everything. Gen AI, ok.

So again, I alluded to this earlier, the way that we're thinking about this is really two different attack angles from a product perspective. Starting up on the left hand side here, LLMs in everyday analytics within seconds. This is gonna be our experiences bucket again. Snowflake engineers are gonna do all the heavy lifting to bring to market for you applications that are just gen AI enabled on the right hand side, also giving you the building blocks for developers to build and deploy your own LLM applications on top of your self data um as easily as possible as well.

And then of course, like everything else, this entire foundation of capability has to be built on top of a security and governance framework. Uh and we're gonna be doing that of course for your data as well as the models that you might look to deploy on Snowflake as well.

So getting to a portfolio view here, I'm gonna kind of walk through each of these different boxes and give you a sort of a 32nd pitch on everything that we have going on here and to start us off, let's focus on the use of AI in seconds.

We have Document AI universal search and our most recently announced Snowflake Copilot, which we announced about a month ago at our Snow Day, which was a virtual event. Uh ok. So Document AI, this has been uh out there for a little bit of time now. Document AI we announced at our Snowflake Summit uh in June of last year um as well. It has been in private preview.

Document AI is the ability for you to take unstructured data, your PDFs your image files and ask questions of what values that you might want to uh that you might want to extract. I might have a w-2, i might have some sort of order form you in natural language can say, hey, i wanna know, you know who was the employee name for this specific w-2.

Dot AI will go ahead understand not just the image itself but also also the semantic of the words in a specific image and try to go find and give you a confidence score on what it thinks that value was that you were asking for.

Now, the beauty of Doc AI over the more traditional text uh extraction type services out there is for documents that have what we call kind of a changing schema, they look a little bit different Doc AI is going to be able to handle that sort of change uh much, much more gracefully.

Um and in addition to that, if the base model that you're working with is not quite doing its job on your specific document, you do have the ability to go through a no-code fine tuning exercise to help it achieve better, better accuracy.

Um and so net net here, we think with Doc AI, you're gonna be able to generate and uh produce more structured data from a lot of your unstructured data that can then be used in downstream workflows.

Uh next up is Snowflake Copilot and I do have a demo that I can show of Copilot today. Um Snowflake Copilot is gonna be your LLM powered assistant for everything that you wanna do in Snowflake. So at its very core, taking your natural language inquiries, generating the appropriate sequel code. And again, allowing you maybe someone who doesn't quite know how to write SQL, maybe you know how to write a little bit of sequel, but you're just looking for a tool to make you go faster. Maybe you don't quite know how to s uh approach a specific sequel. Uh query do uh sorry, Snowflake Copilot is gonna help you do that.

And so when we think about differentiators, when it comes to generative AI, uh Snowflake is very much in the mind that yeah, everyone will be able to host models, everyone sort of will have their own vector store capability of of some flavor. Snowflake Copilot, we fully intend on competing to be the best natural language to sequel LLM um on the market.

And so this is what we see as one of our differentiators where you might not find this on another platform. Uh the last and this goes in hand with some of what powers Snowflake Copilot as well is our universal search capability.

So all of the metadata, all of the database names, column names, schema tags, uh applications models that you might have in Snowflake, we have been able to take all of that metadata combine that with an LLM behind the scenes as well to start giving you much better search capability for uh objects that you want to interact with in Snowflake, giving you much better discoverability of data sets, not only in your Snowflake account, but also in our marketplace that again might help you augment your data sets for the use cases that you are you are going after.

So let's take a look here and see what we can do with uh Snowflake Copilot. Ok. So here i have pulled up our uh Snow site. UI i am in just one of my demo, Snowflake uh a accounts that i have assigned to myself. Uh this is a production Snowflake account in a w and uh i do have a few different data sets loaded uh for a couple of the demos that i'll be showing today uh to kick us off here

"We are interested in Snowflake Copilot. You'll notice at the very bottom of the right hand screen here. Once we get to public preview, you will have a “Ask Copilot” button that you can then prompt at the bottom of your screen.

And I can go ahead and start a new chat here for one data set that I have - Lending Club, which was a peer to peer lending company where investors could loan out their money to individuals without really needing to go to a bank. Just make this a little bit bigger we can see here, we have some loan IDs, loan amounts, how much were they funded, interest rates, their grade, and ultimately whether they did actually default or not - were they bad? Very traditional machine learning dataset that you might have seen before.

And what I wanna do is can I just ask a question to better understand some of my data here? And maybe the question is, what is the total sum of loans by grade that turned out to be bad in my table? Very simple question - a risk manager, maybe a P&L manager for a bank might want to know. Maybe they have never worked with SQL before - instead of bogging down their IT department, their analytics team, they could self serve this question on their own.

You'll see here, Snowflake will pump out the relevant SQL query. A few things to note - while Snowflake Copilot is powerful, it is not fully self aware of everything going on in the world. So you do need to give it a few explicit instructions, for example, hey, you know what table should I actually think about looking at?

Now, while we might need to prompt it to specific things, it still does have the ability to understand the metadata associated in your account. And so while I didn't explicitly tell it that GRADE should be capitalized, that LOANS referred to LOAN_AMOUNT, it was able to infer that. But you can get, like everything else with poor prompt engineering, answers that don't make sense if you're not quite giving it enough context.

And so what I can do here is click Run and you will then get your result. And so here we can see exactly what we asked for - by grade type of the loan was the sum of my outstanding losses for that specific portfolio.

We can get a little bit more complicated. Maybe I'm curious about flight delays at our Boston Logan Airport. I was in Boston for a summer in 2019 starting at a new company. And so it seemed like every time I was in Boston, the flight was always delayed. So I found this dataset a little amusing, maybe a little bit of a more complicated query.

Hey, on a weekly basis, can you show me how many flights departed, how many ended up being delayed, and what was the ratio between the two? Maybe there's something that I can do from an operations perspective to find the anomalies in our delay patterns.

Copilot will give you the relevant query and you can go ahead and execute. So what do we do here? We're gonna do some conversion from some timestamped flight delays, aggregate them on a weekly level, figure out some percentages between two columns - a little bit more complicated.

Here, we can go even a little bit harder than that. For this last dataset, being from Chicago, I'd have some weather data on an hourly basis here. And maybe I want to think about not just prompting it as a more natural analytic question, but maybe I want to do feature engineering. How can I think about generating potentially features quicker with Copilot as well?

Of course, we can definitely give Copilot a much more verbose prompt here. I am interested in whether in Chicago for each unique latitude and longitude pair, can we create four different rolling averages of temperature based off the current value and the following 26, 12, and 24 preceding values which account for hours in the case of our dataset here.

And Copilot will go ahead understand the complexity of building window functions that are necessary for calculating these averages over those different time windows that we're looking to build as potential features. And we can run this guy, get our result and that is Snowflake Copilot.

Ok, moving on to our next category here - applications in minutes. This is where at our Snowflake event about a month ago, we announced Snowflake Cortex.

Snowflake Cortex is gonna be a collection of AI - I'll get to that in a second - and LLM serverless functions that you can just call as a function in Snowflake and very soon as just an API call from any environment that you might have.

In addition to LLM functions and AI functions, Cortex also is going to include vector support as well. So we announced a new Snowflake data type, Vectors, which is gonna be your storage capability for all manner of word embeddings that you're looking to both store as well as serve to your LLM applications in what we call retrieval augmented generation.

And then on top of that, Streamlit - a company that we acquired roughly 18 months ago - we now have Streamlit and Snowflake where we're gonna be able to automatically self-host all of your Streamlit applications, make them very easy to share, and be sort of the interface for some of these LLM applications that you might think about building.

So diving into Cortex a little bit more - it's a serverless SQL function portfolio. There's really two different groups here. One is Snowflake Cortex specialized functions. Our specialized functions are gonna be specific task-based functions that you're going to be able to call.

Some of them are backed by a large language model, like doing translation, doing summarization, doing sentiment detection, and some of them are just going to be your standard supervised machine learning models - forecasting, classification, anomaly detection.

Our second group is going to be generalized functions. Again generalized in the fact that you supply the specific model that you're looking to run when it comes to doing some sort of chat completion or embedding of some natural language text, for example.

And we're gonna have a growing list of models, open source models, where today we already have LAMA-2, the LAMA-2 family up and running. You're gonna be able just to call this function and it'll be very much a serverless experience for you.

And then of course, Streamlit and Snowflake - the ability to again develop your applications on top of some of these functions as well. And let's take a look and see what that can look like.

Ok, so here I am back in my Snowflake UI. I have another dataset for us - call transcripts. Maybe I'm a company that has a multilingual customer service operation and I have a bunch of call transcripts from different locations - Germany, France, United Kingdom - coming in in different languages.

And me as a centralized global data science team, you know, want to be able to do something useful with this transcript data with our new Snowflake ML Translate and Snowflake ML Summarize functions. Very easy for us to now do these translation and summarizations.

I'm gonna grab just 10 samples here for the purpose of this demo, focusing on our German. And you can see here, very easy for us to go from a transcript to an English translation to even a summarization of that call log. And now it's just a matter of what do we want to do next. But now very much make it very easy to put our data into a much more standardized format for building your downstream use cases.

What can I do next? Maybe I want to even go a little bit further. Maybe I have some sort of service and I wanna actually bundle and build a JSON record out of some of this data here. We can use one of our general functions, this ML.COMPLETE, use our LAMA-2 model, 70 billion parameter, and give it a prompt here:

"Hey, can we summarize this transcript in less than 200 words? And I want the following fields to be outputted in a JSON format for me to later use and treat in a much more semi-structured format going forward as well."

Very easy for us to build this prompt, pass it into our COMPLETE function, and get an output that can then be stored inside of Snowflake as a semi-structured data type.

And then of course, maybe you don't want to be doing batch analytics with this function, but you would rather have some sort of workflow where on the fly, a boots on the ground worker is actually just interacting with a Streamlit application to do this and enable them in some sort of way.

We can just pass this to a Streamlit application running that same COMPLETE function with the translation, with the summarization, and get that back that way through a much more interactive Streamlit application. We can do the same for our generalized functions again for a different type of application that we see quite common amongst our customers.

How do I think about introducing my knowledge base? Maybe I have a bunch of content, maybe I have a wiki of a bunch of product documentation or logs or something. How do I let a user start to have a natural language conversation with my knowledge, my knowledge base here?

I have a dataset which is a wiki of a bunch of fictitious Snowflake-themed products out there. And let's see how we can very easily build a functioning retrieval augmented generation based chat application that someone might want to work with Snowflake.

Again, Cortex make it very easy to do embedding generation. So built in EMBEDDED_TEXT functions you will see in Cortex as well. I can go ahead pass in our initial unstructured product wiki and get back the word embeddings that represent that knowledge.

For those of you not familiar with word embeddings, word embeddings are basically these big, usually 768 long list filled with floating point numbers, really meant to represent natural language text in a numerical format. From there, what we are able to do is compare different types of words using Snowflake's VECTOR_SIMILARITY function that comes part of Cortex as well.

And this is gonna allow us to retrieve what we think might be relevant context when you think about doing and building an LLM based chat bot. And so what I can do here is go back to my Snowflake ML COMPLETE function. I'll be using LAMA-2 again here.

My prompt could be "Who are the suppliers of that Ski Max Pro 9000?" as part of my prompt. Very easy to then build an inner query here to do that retrieval augmented generation - go take these two different vectors, find the ones that are the most similar, retrieve them, put them back into its natural language format, put that back into my final prompt.

And what we can get back here is "The suppliers for my Ski Max Pro 9000 was The Mountain Gear Company." And of course, can do this in Streamlit as well to wrap up this function, put it into a Streamlit where maybe a user just types in..."

They question, here's another one. And what they get back is the answer that they're, that they're looking for. So that sums up how we think about applications in minutes, giving builders uh building blocks, the right types of building blocks to start to cobble together lln based applications that are gonna do some specific custom, you know, function for uh you know, for, for your enterprise, for your company.

Uh ok. The last one here fully custom in our um so one of our big announcements back at summit uh again in, in uh june of last year was the announcement of snow park container services. You saw us do a big uh opening. Um kick off with our ceo frank and the nvidia ceo jensen talking about how snowflake and and nvidia were uh going to build a partnership, the product behind that is gonna be a snow park container services where you're not gonna be able to access gp us as part of your, your snowflake platform. And we're gonna give you a managed kru neti layer on top of that for you to do both uh model, fine tuning any sort of job based um uh and sort of job as well as long running services for also hosting and serving large language models on top of gp u infrastructure.

The cool thing about snow park container services is again snowflake to its core, not just about what you can do inside of, of your own enterprise, snowflake account, but also how do we think about letting partners interact with you in a very secure way, partners are also going to be able to deploy, build uh fine tune uh and then make available to you customers uh proprietary commercial, large language models, large language model applications as well. And that can all run on top of snow park container services.

Um and so as, as i said here, fully managed kubernetes platform, um very much built for the quote data science developer. So if you've worked with kubernetes in the past, if you work with containers in the past, uh a lot of complexity around networking, how do i set up a scalable uh endpoint? How do i think about um authentication fail over a lot of different knobs um traditionally found in a kubernetes offering, snowflake is really built our service to be targeted towards the data dev developer persona. You just need to know a little bit of python. You just need to know a little bit about how to build docker containers. We're going to manage the ability for you to scale your applications with just a couple of just a couple of knobs.

Uh and then of course, we already have a budding uh partner ecosystem building on top of snow park container services wanting to be the first applications that are made available where again you the customer get to browse the catalog. You choose a specific partner technology that you might want to work with. It gets single tenant deployed into your secure and govern self like account you pay with your yourself like credits. The partner is able to very easily distribute their technology so you can get value from, from your data uh in whichever way they see fit.

And so for the last bit here, we got about 22 minutes left. So hopefully, uh it should have some time for questions as well. Um I'm gonna go into how do we think about doing some customization of, of a uh an embedding model actually to um think about boosting performance of a retrieval uh a retrieval application.

Um ok. So for those of you that uh have seen snowflake before, um everything in snowflake can be done infrastructure as code um very easy to spin up a snowflake warehouse which our customers have been doing for nearly a decade at this point. Um all matter of scalability just built into a very declarative what we call ddl statement here. Uh you're gonna be able to do the same thing for compute pools again, which is our new compute modality behind snow park container services to choose a specific sku of infrastructure that you want to work with. And so in our llm example, i'm gonna want to spin up a gp u seven. This is an aws four configuration of nvidia a 10 gs. Um and we're gonna do some fine tuning on this uh on this gp u. You can see we have auto resume, auto suspend this compute pool is already up and running for the purposes of this demo again.

And what i'm gonna do is actually spin up a jupiter notebook on top of this container uh container infrastructure. Um and that's where i'm gonna do my, my fine tuning example, very easy to build the specification of what i want this cluster to do. Um you can see i have an image that i'll be using. I've mounted some persistent storage as well and of course, very easy to then make this a publicly accessible uh application uh uh if needed, obviously jupiter notebook, a long running sort of u i to run and execute python code so i can get my public endpoint here. I've already gone ahead and logged in. I've mounted some code that i had uh written and stored in git through our snow git integration was then able to mount that code to this specific jupiter notebook. Uh and from here, we can get into kind of the meat of this last demo.

Here i can go ahead and import my packages, connect to my snowflake warehouse for data processing and the data set that we'll be looking at here is uh a fictitious musical store ecommerce website that has a bunch of customer reviews around the product when they bought it as well as the, the rating for that product. For example, i'm amazed by the sound quality of this guitar. The, the pickups are incredible, definitely worth the price.

Now, very easy for me to then uh in this case, go grab one of my favorite embedding models from the open source hugging face model repo sentence transformer is a python sdk making it very easy for me to take a model, put it on to my local hardware uh and put it on to a gp u specifically for me to start working with it to do a quick evaluation of the base model that we grabbed here from hugging face.

Uh we can kind of zero in on two different, i can make this a bit bigger two different sentences here. Uh the first one, my martin uh 16 ny is a beautiful piece of craftsmanship. My guitar sentence number three is a beautiful piece of craftsmanship. Um to me as a musical instrument store, uh you know, sme they're pretty much saying the same thing but to our embedding model, they're only giving me a cosine similarity of about 0.55. So not similar, not dissimilar kind of somewhere in the gray area. And if you were to build a retrieval augmented generation application, you would be pretty disappointed with the performance that you would be getting because it's not really even understanding similarities between a question and what your, what your wiki, what your product catalog, what your context might uh might include.

Very easy to then take a fine tuning data set. Here. I have a data set that has a set of sentence pairs as well as a label. So i'm an sme here is sentence number one here is sentence number two, what would you rate the similarity of those two sentences to be? Uh and this is where companies again, you your your real uh competitive advantage when it comes to the applications that you build is gonna be in developing procuring building, for example, fine tuning data sets like this to make your applications uh performant uh and, and performance should make the customer happy.

So i'm gonna take this fine tuning example, put it into a model model dot fit function here. And let's go ahead and fine tune this model we can see here that uh my gp u utilization has ticked up here. So i'm fine tuning this model on a gp u uh pretty fast and we can run that similarity analysis again. And if our baseline was 0.55 for that similarity, i have bumped up my cosine similarity for that specific pair to a 0.75. And so as a data scientist, you know, as an sme you would do this fine tuning over and over and over again until you got to a comfortable place with the types of similarities that your model would be putting out for different sentence pairs.

Now, the last piece here um it do, it doesn't just stop at, hey, let's fine tune a model. How do we think about actually deploying now, this custom fine tune model uh back into snowflake for an application for a downstream user to very easily use. This is where with our snowflake model registry, which is one of the features as part of our snow park ml portfolio. I can actually take this model artifact and deploy it as a real time inference end point on that same gp u cluster that we have up and running.

I'm gonna be using our custom model uh framework here for registering this model. It comes with a signature. So what's going in natural language text? What's gonna come out of this model? It should be a word embedding as this is a embedding model. I can go ahead and then log it with a specific name. So fine tune in that text can give us some kind of dependencies. Hey, here are the packages that i want you to deploy this model on top of when you think about building a container for it. If you want to use gp us, give it ac u a version. And then this last cell here which i've already ran is a dot deploy.

And so where do you want to run this model? I wanna target snow park container services, snow park container services, gonna give you that real time model inference, endpoint target platform, snowflake warehouse warehouse gonna be that target where you wanna do batch large scale based model inference. So two different types of profiles on how you can do low latency model inference as well as high throughput model inference. In this case, we want to be able to get a response back pretty fast. And i've gone ahead and deployed this model, we can then go back to our snowflake uy here and i was in my miles dot ml database schema pa with this model now deployed, i can finally run my fine tune embed text function which gets created and let me run the whole thing here. You can see that we can now go from text to embedding with that fine tuned model.

And if i'm fast enough here doesn't look like it, you can see you should be able to see. I hope a small take here on the gp u utilization kind of giving you the signal. Hey, look, the inference of that model actually happened on that snow park container service where we deployed deployed that model. And so that wraps up kind of how you can think about deploying custom model artifacts doing fine tuning entirely with inside of snowflake um very, very easily.

Um and so with that, that kind of sums up the presentation here, walk you through how snowflake is thinking about you using a i in seconds, walk through how snowflake is thinking about you being able to quickly build and deploy applications in minutes and then as well as going fully custom with much more flexible storage and compute to build your own specific uh applications um rather, rather quickly, hopefully in hours.

Um so with that, thank you for joining us. today. Um i see we got about 11 minutes here left, so happy to field some questions and uh appreciate you guys coming out today.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值