Generative AI panel: Moving beyond the hype and realizing value

Interviewer: With that, let's begin. So panel, my first question to you is really about the history of how we got to where we are with generative AI, you know, over the last decade or so, we've seen that investment in infrastructure and tooling has led to the creation of more sophisticated models which has then unlocked many new use cases and that has led to a wider adoption of AI and ML technologies. It has created this virtuous cycle and all of you have touched various aspects of this flywheel. So talk to us a little bit about that, talk to us about from your vantage point. What have you seen unfold that got us to this place that we are at today? Ron, maybe you want to start.

Ron: Yeah, yeah, sure. Um so AI has been in a self reinforcing cycle for more than a decade now. For me, it kind of started around 2012 with AlexNet and just to set the context, neural nets have been around for many decades. But in the, in the latest era of AI in the last decade or so, we've been seeing again and again that as we scale model sizes and the training data set, we continuously hit a new state of the art results. And this fits into the flywheel that you, that you just described because uh we get bigger and better models and we continuously make the models bigger that drives more uh use cases, applications, compute demand with that enables more infrastructure investment. And the better infrastructure allows us to build the next uh biggest, bigger and, and uh and state of the art models. So uh we see how this flywheel is accelerating and we don't see any signs of it slowing down anytime soon.

As for me specifically, I work on the infra side uh building infrastructure and training as Shruti mentioned, um we, our goal is to accelerate this flywheel even further by enabling our customers with best in class infrastructure for the their AI compute needs. And just today, we announced Trainium 2 which is 4x faster than Trainium 1. And I've been in the AI space for a while 4x in a single generation is is something significant. Uh so our hope is that this will continue to feed this AI flywheel and the community will use that infrastructure to build the next generation of state of the art models.

Interviewer: Awesome.

Shruti: So my AI journey started around 2009 uh and actually started in the context of Spark, Apache Spark. Many of you don't associate Apache Spark with machine learning platform. But that was our first one of the first applications on top of Spark and that has happened at Berkeley where we have these labs and in these labs, we are actually with fears. Before that, we decided the faculty to give up our offices and have a open floor plan cubicles and you know, so to accelerate, you know, to um to increase the interaction. At the same time, in the same space, we had people from different areas. It was machine learning, Michael Jordan and his students and databases and systems and networking. So all these people so you know, in a very small area and how did we get there? Because we there was a problem to solve and the problem we chose to solve is that Lester McKee who was um a machine learning student and a group of other machine learning students wanted to participate with this Netflix challenge. Anyone remember about Netflix challenge? It was $1 million bucks for improving the recommendation engine, right?

And the students came to us, I I'm a system also infrastructure, software infrastructure and they came to us and say, ok, how do you do it? You know like we have all this data, what you are going to use and we happily send them to Hadoop use. Hadoop you know, it's going and then they came immediately back to us say, but it's slow, right? It's like because you know, you have all this iteration, every iteration, it's a hard uh you ma reduce job, you write and you, you, you and, and read from the, from the, from the storage all the time. And then ok, we ok, we have a solution and then Matei which working with me at that time. Uh it is also one of the cofounder and the CTO at uh at uh Databricks.

Um came up with the first uh prototype of Spark to solve this problem and it was a few 100 lines of code. And the reason Spark is written as in Scala is because it allows him to prototype much quicker than doing it in Java. So that's kind of the, you know, the story and then we, you need first customer, we have the data, Databricks, we solve them because they, we, they sort that they want to use machine learning, right? So they are saying, oh, we want it aspirational, right? Even you know, long that was 2013. So they, they, they, you know, we sold, we sell our product that we have back back then and to do the machine learning. But then after a few months go back to them because they are happy customers and say, ok, what do you have? What, what did you build? And none of them are doing machine learning. You do anything about machine learning. Why? Why do you do it? Why, why, why they say well, you know, is this data problem, right? You know, i don't have the data, i need to clean up the data. You know, there is this kind of other data i need and so forth. The luck we are lucky because park is also great at uh you know, um data analytics and engineering. So that's kind of the story. And since then i i, you know, i i really focus on from the software system side about how to scale and these workloads. So, and we build a bunch of open source systems at at Berkeley to do that, to do that and why to scale. Because it was clear about when you see this kind of the demands of this machine learning workloads growing much faster than capabilities of the of a hardware, even with, you know, uranium two and so forth. That was pretty clear 2000 1415 because it was going on since 2010. So you know, a bunch of systems like after Spark was Ray, we scale up all this machine learning workloads like reinforcement learning. And more recently, you know, this VVLLM for uh improving the performance of the inference, all of them open source. Ok.

Interviewer: Awesome. Thanks for sharing that anecdote.

Dharshi: Yeah. So uh Ron has obviously one of the uh is one of the uh uh co-founders of Databricks where I now I now work. So I came there through the acquisition of my, my company that I founded back in 2021 Mosaic ML. And really, we were focused on how do we bring these large scale neural networks? And this is back in the ancient times in like 2021 uh where, where we didn't call it generative AI yet uh they had generative abilities but uh really, it was about scale. Um how do I, how do I scale something and make it reliable and repeatable? Um and when I, when I do that, I make it more accessible to many more uh customers or users. And so really, I think many of the problems that we talk about AI safety, lots of these things are solved by many users, basically distributed capabilities, decentralization. And so that was really uh what fueled us then. And you know, we, we attacked cost through algorithmic optimization and, you know, just builds building better learning models as well as software optimization. But um my AI journey actually started um ii i was actually in, it's interesting, i i predate you, which is uh i don't, i don't know if that makes me old or what. But um i'm um i've been in this field for a long time. I actually did research even as an undergrad and uh in the nineties using uh like neuromorphic machines, things that were, were circuits that can behave like neurons. And some of Carver Mead's work, I did some research then came out to silicon valley and actually designed uh chips, CPUs and other kinds of chips and built software stacks on top of them. So always at the systems level, um how do i translate algorithms to run efficiently on hardware?

Um and a actually in 2007, uh i wanted to go after this question of how can we build better computers um computers that can learn from data. And again, in 2007, we just didn't think like this yet. Machine learning was uh like a obscure uh academic field. Then there were, there was a small community of people. Um i remember seeing Yan LeCun come and give a talk and it was kind of like uh this is a joke. Why are we even talking about neural networks? Right? And i was like, no, no, this is great. This is gonna be gonna be amazing. And uh at that time, it was uh everyone was talking about SVMs and regression methods. But um anyway, uh you know, i went back and got a PhD in computational neuroscience. So we could, i can understand a little bit about what the brain does and how it works and kind of view that from the perspective of a computer architect. And, you know, actually there are a lot of concepts then that uh do translate over. And i ended up founding a company called Nirvana, which uh uh was one of the first dedicated AI chip companies. It was acquired by Intel. So i i've been kind of in this field for a long time and seen a lot of these transitions.

And you know, it is really interesting to see this flywheel um involve hardware and that's almost a rate limiting step here. Um hardware capabilities enable scale with more data. They enable bigger models, better models, those models then enable more data and then new capabilities that go into hardware, which takes another two years to develop. So it's this cycle but it is a somewhat rate limited there. But i think what i've seen from even like the late nineties to today is this um push towards instead of fully prescribed systems where i dictate what they do. And i i sort of delineate what they do upfront to learning systems which actually start discovering aspects of, of the task from the data. I think the watershed moment that Ron was referring to in 2012 was was the ImageNet moment when we saw a convolutional neural network finally have enough scale both from the data and the computation side to be able to start discovering features that were much better than what a human could engineer. And that was like for me, um a light bulb moment of just ok. Now it's time to start thinking about this as a, as a, as a, as a new computational paradigm.

And um i think now we've, we've stumbled upon the next one which is not discovering features, but now actually discovering how those features fit together, you can call them know, call it knowledge or i don't know whatever you wanna call it, but it's not over, there's a lot more to do and each, each step of the way we, we build new capabilities and new tools. Um i think the first, i, i know the first wave, maybe it's the second wave, whatever wave the, the the 2012 moment uh gave us capabilities that we can start building face detectors and things that dealt with noisy data. Uh now we can start dealing with very large data, large corpuses of knowledge that we can start to query in more natural and imprecise ways. And i think that's gonna be a tool that's gonna fuel the next 5-6 years of growth. Next, after that, i know agents, things that have agency can actually take action and plan i think is probably something that's 5-6-7 years from now. But uh it's a very exciting time and hardware just keeps getting better and serving all this.

One thing i'll leave you with before i hand it over is human brain runs on about 20 watts of energy. That's it. That's, that's a story for me. That's, that is exactly our, our, our, our um our path forward. We gotta figure out how we can get close to that in uh synthetic systems. Thanks.

Jeremy: So my background, I guess is more in like traditional machine learning, things like linear models, uh random forests, that kind of stuff. And I got into it about 2012, 2014. Um and, and what really strikes me about gen AI is how broad it is, how broad the appeal is. And I think one of the, the great things about it is how accessible it is and that's driven a lot of, you know, the interest and hype around the space because everybody can go and use these tools and see what they do and intuitively understand, you know, what they, what's possible.

Whereas I think before machine learning, it was very like a niche thing, a very nerdy thing kind of under the hood. Um I'm doing fraud detection, spam detection, things like that. Yeah. Awesome. Ok. So thank you for sharing your backgrounds and experience. Um now that we've talked about how we got here, let's discuss what here is. Um you know, on the screen, you see uh a limerick that was written by generative AI about itself and I don't know if this was possible four or five years ago. So today, are we in a hype for generative AI or not? And regardless of how you answer that question, what are some of the challenges that we need to overcome before products this technology at scale?

Uh I can start, uh I think we're definitely seeing a time of heightened expectations from generative AI, it's very popular these days. But underneath the expectations, there's a genuinely transformative technology that we cannot ignore. Um I think it has the capacity of change in many aspects of our lives. And if you think about it, it already changes the way that we interact with computers even today. Uh just a few years ago when we wanted to, to, to, to search the internet, for example, we would type a couple of keywords in the search bar, get a list of url s and then we would need to go through different websites to sift out the information that we're trying to, to, to figure out to find. Um and over time search has become way more conversational. And when we look at AI assistants these days, we can ask a question and, and get an answer with the distilled information that we are looking for. And that's a much more natural way to interact with computers and with the internet. And the, the reason that many of us here are very excited about this technology is that if you take that and extrapolate out, you can see significant impact on our societies even, I mean, we can uh we can extrapolate just, just a line of sight and think about situations where every child in the world will have a world class expert as a personal tutor for every subject that they're trying to learn or where everywhere in the world, you can get access to world class health care. And again, this is just a line of sight, extrapolation. We're not going super far here. If we go further than that, we could even imagine a situation where gen AI helps us solve, solve problems that we don't know as a society, as humanity to solve today, including curing diseases and, and, and whatnot. And I think uh if uh if we go to whether it's a hype or not, I think the reality is that some of these gen AI promises will not live to their expectations, but some will and the ones that do will fundamentally transform our lives. I wouldn't be spending my time in energy on this if i, if I didn't believe that.

Um as uh just to, to close, I think one of the major uh aspects that we need to solve in order to make this technology much more accessible and dependable by the community is first and foremost, guarantees and correctness or, or prevention of hallucination. Uh today, these models can sometimes generate wrong results with what seems like perfect confidence. This is the answer to the question that you're asking for and we need to, to build capabilities to prevent that, for us to rely more and more on this technology.

Yeah. So um the question is whether is hype, it's absolutely is hype. But with any new technology, you are bound to be, you know, you it's going to be hype. It was with electric, electricity, people believe that, you know, 100 a year a year ago, everything will be electric um nuclear power, the same thing, everything will be powered by nuclear power, including cars, right? So every, every time people, you know, like when there is something innovative, you know, you are going to imagine everything you could possibly or or you know, impossible doing that thing. So there is a hype but um like, like you mentioned it like there is a revolution, right? It's like and the way to look at that is that look at the facts, right? And what we have right now we have a technology that is performing some of the tasks good enough, right? It's the results are good enough, right? So we can use those results. It's generative AI what is generative AI it's about generating solutions and when the solution becomes for some task good enough and then these solutions are actually hard to generate by humans, right? But it's easy to assess their kind of quality. That's kind of the combination good enough solutions solutions which are hard to generate by humans and it's relatively easy to assess their quality. If you look at the first generative application which is text to image, right? Dali, right? Think about that like you have pictures which are good enough that you are going to use in some of your presentations. They are hard to generate by humans, right? By you, you know, like, and then it's easy to evaluate, you know, you look at, say, ok, it's kind of what i want uh when you use um child GPT to, you know, to write an email or uh marketing material for you, you need tell what to do and then presumably it's harder for you to generate that material. But at the end of the day you read it and say, ok, this is good, maybe i do some changes. So I think that's kind of the key there. And by the way, I'm, I'm just going to uh a few, you know, uh Nave uh remind me about my, you know, my prehistory actually, actually when uh and, and, and this is, this is uh uh there is a, there is a, the big thing here is that um actually I am coming from Romania in Romania. I started a PhD that my PhD was in neural networks. Then I gave up because it was the end of nineties, I came in the US because it was something much hotter than it was the internet. But the thing about is that at that time, it was applications and I went to see a U which I see me was a little bit of a cradle, at least in the United States of neural networks. I took two classes also on, on that during my PhD. But there are kind of toys, right? It's like, you know, it was more a proof of concept you can do, right? I remember. So that's kind of, but right now we, we, we went a long, long, long, long way, right? Starting to so 2012 image internet and things like that. But really what you have right now, you have good enough solutions for some tasks, not for all tasks, but it's just a matter of time.

Yeah. And you know, I would argue like, I think, I think we're all gonna agree that it, we're in a bit of a bubble right now, which is fine. I think um having I was a, I was a young engineer during the um dot com bubble you referenced and, you know, it was uh as an engineer and a technical person, I'm like, oh it's crap. It was all this hype, it's ridiculous, you know. Um but, you know, I've come to realize that's kind of how things work and it works because once something has demonstrated enough capabilities to capture an imagination, people then put focus on it, they put energy and they put investment into it. And then you, you actually see, you know, college kids studying that thing and I started to see this in with AI and machine learning starting to happen. I don't know, you're, you're a professor maybe 2016, 2015, we started to see computer science departments teaching ml to undergrads. And so what happened? Those students in 2016, graduated in 2020. And those people are the ones who are innovating and pushing some of these things forward. So I think the hype is actually a mechanism that enables us to do these things to make these big transitions. So I see it actually as, as not just a, you know, a bubble or something to be uh you know, uh to, to, to look down upon, but actually, it's, it's how we make these transitions happen.

Yeah, I have to agree. There's definitely a lot of hype, I think, uh, in 2017 I was working for an insurance company and they were sort of petrified of driverless cars because if, you know, cars could drive themselves, there would be no accidents and noone would need insurance. Right. And I, I think it demonstrates how hard it is to actually take an AI system and turn it into a viable product that's kind of robust and reliable in the real world. Um, and ii, I think the point about being good enough is actually a really, uh, it's kind of how I look at gen AI, it's something that's good enough and I think people like computers are really good at doing the same thing very quickly, you know, a lot. And gen AI fits best where you have something that's good enough and you need to do it a lot. Right. And very quickly and ii, I think those use cases are gonna be the initial ones, um, as we get more robust models and better validation and things like that, we'll be able to use it more broadly. But for now, I, I, uh, I think there's a lot of overinvestment, a lo a lot of over promising and, uh, you know, we'll see how it plays out. There's absolutely some valid cases there.

Yeah. Awesome. Uh you know, we talked about the history of generative AI and the birth of the transformers, the the models. Uh well, the machine learning models uh was what was a pivotal moment. And today we see there are so many options out there. The open source innovation has been great. On the one hand, you have these really large scale sophisticated models and on the other hand, you have smaller but still capable models. I guess my question is like, if you had to bet your money, which approach would you bet it on? And jokes aside, how should you know, developers and, and, and organizations think about choosing the right model for their use case, the right size of model, the right type of model, the right model architecture. Um Noreen, I know you have some thoughts on this.

Yeah, I mean, uh I think again, at the beginning of these things, we look at like, you know, Jan said, we, you know, one new tool can, can be the hammer for every and everything's a nail. Um but it's uh I it's actually a lot more subtle than that because I think when some, when technology takes over, it inherently becomes scaled, when something becomes scaled, it becomes expensive. When it becomes expensive, you start thinking about the dollars. So the economics of the world will always come back at you. Uh it's sort of like the physics of business or something. And so I look at it like, well, you know, big models are great. They have some interesting capabilities. Do you want to use them for everything? Ii I, in an interview recently, I said, like, you know, using GPT four for most use cases, like having two PhD s. Do you know customer support for nerf guns? Like it doesn't make sense, right? It's, it's, it's, it's too, it's too much of an overkill. It's very expensive for inference cause it's a very large model. It uses a lot of compute, it doesn't make sense if you know a very, you know, precise set of outputs that you care about. So, um I think what we're gonna see is this, they're gonna be these big models that do some interesting things. We're gonna keep pushing the forefront on them. Um and we're gonna have this long tail of small models that actually power a lot of industrial use cases because the economics matter, the latency matters, you know, the ease of deployment of a smaller model matter, better hardware from folks like Ron and others is gonna enable that to grow over time. And you know what, what is small today will, I mean, what is big today will not be, will be small tomorrow. But uh I think we're gonna have this long tail. You're not gonna see one type of model or one model fit everything that is just not gonna happen.

Yeah, I, I not surprisingly, I totally agree with that. Um there will be and we already see it right? Because, and you have again, it depends on the task. But for many tasks, you already see that if you fine tune a relatively small model, 7 billion, certain billions, you can get results as good if not better than GPT. So it's again the profit in the pudding, so to speak. But think about you call this assistance, but think also about human assistance in our life. We do the same thing we are when we, when we have something legal, we go to a legal expert, you know, you want to prepare your taxes, you know, you are going to go to an expert for that. So we do the same thing, we have experts in different areas, right? Um it's much harder and the more task you have, the harder is to have something which is work equally well everywhere, right? It used to be like a renaissance man, right? In, you know, in 56 16th century Leonardo da Vinci and others. Um but you know, that time is also it's is gone. So you are going to have like we do in our real life, we are going to have models which are really tuned to perform one particular task the best. And it's not only again, it's going to be better accuracy. And uh Naveen said it's about also cost, right? And it's not only about cost, it's also about speed, about latency.

A smaller model is going to have smaller latency, right? So that's kind of what you are going to have. And the complex to migrating the sense is that go on one level up about, I have all these kind of 100 models which I'm going to use, right, for a particular query. So you know, choosing the right model type size architecture goes hand in hand with choosing the right technique to build that model, you could use something like fine tuning or take sort of a rag approach or go all the way to pre training from scratch.

How do you think about choosing that right technique? What are the factors at play? What are the considerations?

Yeah. So I think there are several. It it depends about your use case and depends about how much data and how qual the quality of the data you have. It's as simple as that, right? If you are a company like I don't know Bloomberg or something like that, you have huge amount of data. Maybe it's worse to build and to re train your own model. I think Nave can say more about that.

Um if you have a little bit less but also high quality and otherwise you should try that, you can find tune one of the existing models. Actually. Even at Berkeley again, it's, it's, it's, it's, it's a lab. So it's, we didn't have a lot of resources. But, you know, we early on, we, we, we, we find you on this model Vuna on data and high quality data, uh the shared GP T data. And for a while was the best open source models in terms of accuracy um and competing with charge GP T at that time. So, so that's kind of it is and then of course, it's you, you, you have, you, you, you have uh prompt tuning and all of these things, right? So you have all of these kind of methods and approaches and all of them are good depending of where you are, where you are, but one constant across all of them. I think it is the quality of the data you have, right? It's, it's incredible about how much of a difference, high quality data can make in the ultimate results of building a model which solves a task very well, maybe just to add a bit to that.

So I completely agree with what Jan said uh as someone is who's optimising model day in day out. Um uh just to touch on the performance optimisation techniques, it's really hard to track all the different performance optimisation techniques that get published all the time. But essentially, you can think of a group of techniques that became table stakes that everyone needs to implement them. You can think of tensor paras, you can think of flash attention also kv caching during deployment. So there's no question, you'll see them implementing, implemented every single time.

There's also a set of techniques that are quite case dependent. Uh as an example, I can give continuous batching but there are others inter piping. Um so there are some cases where you'll need to evaluate a technique against the application that you're trying to build. And I think one of the the good mechanisms that that many try to employ and i think it's very useful is to track what others in the field are doing. Like mosaic ml. Now data breaks like like hugging face. It's good to track what are the critical optimizations that many folks are implementing into their models?

By the way, the, the way we get around that, we have a guy who literally does an exhaustive search. He reads every paper on archive and summarizes it the good ones. And he has a, he has a newsletter. So that that's how we stay on top of it. It's like exhaustive search. It's insane, right?

Well, to be, I was just curious, maybe we'll build a model that can do that for us. And then we have a summarize model that's supposed to pick out interesting ones. It's, it's actually pretty good. We're basically taking the guy who, who did this that as a data set. Excellent.

Um just one thing to say is that um the reason is the summarisation for papers is good is because they can pick the information, especially if the paper is well written from the introductions and the abstract. This is what they they do, you know, but be careful to use summarisation for things which a re really, really important like summarising legal documents, we a re not there yet.

Yeah, ii, i think um in choosing which kind of approach to take when building the model should also be a function of like capability and experience. Like I see a lot of people dive straight into um you know, large scale pre training and it's like you just don't need to do that. I think there's so many amazing open source models. You can start very simply um work with what data you have and, and come up with a great model. And then once you've gone through that process and understood what you're getting and how it works, then you can look if it's really worth uh you know, taking the next step and ge getting into more advanced forms of training.

Yeah. Yeah. So say you've, you know, chosen a model, you've decided to either pre trainin it now to deliver a delightful user experience. Uh you need to deploy the models in, in in production for inference because the magic is built in fine tuning and pre training. But your users really experience it when it's deployed. What are some of the technical considerations to make sure that users have a great experience and what are some of the pitfalls to avoid pete? You have uh you have several models in production within leonardo dot a i, yeah, i, i think um to have a great user experience, you probably don't want to think about the technical side of things. I think uh as technical people, it's very easy to get kind of caught up in the, in the love and, and how amazing these models can be, right? But when people are trying to achieve a task, you know, and you want to give them a good experience, you wanna think about what outcome they're trying to achieve. And so you, you take an outcome based approach rather than um just a technology based approach.

And i think particularly with gen a i, uh i don't wanna say there, there's a novelty to generate like it's very new and you can kind of go and talk to chat gp t and make images. But after a while you're like, i, i don't need, you know, 1000 pictures of a cool dog every day, right? And, and i think the that plays into um you know, building great products and, and one of the other things about making a good ux is um you know, chaining several smaller models. I think you have so many models that can do so many different things and they're all very powerful. And so if it takes the image, you know, you might take a basic prompt and use an llm to, you know, make it more elaborate, which gives the user a better image and they don't necessarily have to know that that's happening under the hood.

And then, ah finally, i think you can use these models to engage better with your users so you can engage with them in their native language um through different modalities, it doesn't just have to be text, you can use images to make your content more engaging and all of these things happen under the hood. And i, i think that's what creates a great user experience around a i anyone else want to add.

Yeah, i mean, we, we're seeing actually from some of the customers that uh have built models with us or two models or, or serve them. Um is that to your point, they're actually putting together models in these ensembles kind of and either chaining them or using things to route different information and different models for, you know, i think that i it really depends on the application. But um if you think backward from the product and what you're trying to deliver, then you, you, you think of the llm or the diffusion model or whatever as, as a set of modules that you can kind of bring in. And, um, and i think this is actually, uh it's gonna be explored a lot over the next several years. I don't think anyone really understands this. There's a couple of companies that i have seen like perplexity a i is doing some really cool stuff here, um, and a few others, but it's really new and, and no one's really made things work. I mean, people got really excited about agents for a little while. They don't work because autoregressive models don't really do that. Um and i, and i think there's a lot of dead ends that people have gotten excited about, but we're gonna find this kind of application where on top, which is actually, i think where a lot of value is gonna be accrued per vertical. And so this is an area where i think there's a lot of opportunity if you're thinking about building something just to tie it back to, to choosing the right model size.

Uh user experience ties back to that as well because uh if you're building a speech based a i assistant, you don't want a long pause from the time you ask a question until you get a response. So that's another reason user experience is another reason to not go with a gigantic model, but rather something that you can kind of squeeze the latency and get a very instantaneous uh uh interactive uh use case.

Now, you know, we'd be remiss if we did not discuss this topic of capacity today due to the global a i chip shortage, you know, resource availability and not just resource cost is becoming a gating factor for innovation, especially at smaller companies. What is your advice to these organizations? How should they navigate this situation?

Um i think i, i think, yeah, we, we're seeing it uh day to day that uh uh getting capacity with all the a i innovation experimentation that is going on getting the right a mount of compute capacity is a challenge that folks need to deal with.

Um it does tie to stuff that we talked about before. So if you don't have to pre train your model, if you can start with fine tuning just for a proof of concept that would require a much smaller cluster. And of course, there's aws here. So i, i would definitely advise embracing the cloud for that because you get the feature of elasticity, uh we call elasticity, the the capability to scale your compute cluster up and down according to your needs. And that flexibility matters a lot because you don't really know what typically you don't really know what would be the exact compute demand uh through the next two years. So if you can kind of get some more flexibility there, that matters a lot.

Um another uh another advantage that you get is that you get a very broad uh uh set of options of hardware to use. So you can kind of pick and choose the most optimal hardware for each and every use case and you might be running quite a few models at the same time. And i think is one very concrete approach and i've seen teams that are not doing that. So that's what i'm the reason i'm saying that is optimize for performance before you scale to a very large cluster. It doesn't make sense to gather all that compute and just use it for 20% of the time. So make sure that you squeeze every piece of performance from, from your compute cluster before you go big. That's why you use data breaks mosaic. That's what we do.

But i, i think uh what's really interesting is um we're seeing now in this kind of dense compute field like basically matrix vector math uh uh engines like trainum and, and gp us that you, you can't separate the, the neural network workload that far away from the hardware. The hardware is kind of consumed by that workload. Whereas with cp us like the entire cloud, you know where at a w os s conference, the entire cloud was predicated on the fact that basically the hardware was kind of sitting idle most of the time and i can time slice it very effectively and sell it over and over again with, with a i, you can't really do this. And actually that's part of why we have a shortage and then building these devices is actually complicated. I, i don't think people understand when they say like we need to see more capacity, we literally can't build more. Like we don't have enough jigs and machines to do the packaging of these chips.

Uh i was in that world for a long time and you know, i think we're building more of those, but you have to build more jigs before you can build more capacity um in the world. This is not like a us problem or something like that. So um uh capacity is gonna be important. I think we did have a little bit of this sort of tulips excitement for the last couple of years and that is gonna die down. But there's like we talked about the hype is, is, is there's, there's something underlying it that's real and we're gonna, we're gonna see a greater share of wallet move toward compute. Humans become a little bit less important in part in the equation, in terms of cost compute becomes extremely important. It's not gonna go away

Yeah. So um regarding capacity, um a few things first, that's another reason to use small models, right? To stay the obvious. But I think that what, what happens today is that a lot of complexity has, you know, the complexity of our hardware environment, it's increasing dramatically, right?

Um when we, when we started, when it is Spark, it was the big thing was about distributed, it has to be distributed the engine, but all the the cluster is homogeneous, has CPU and some you know, hard drive at that time. And SSD when it started Ray, we are already thinking about, oh we need also to support GPUs as a in, in, in the language itself to, to, to expose to the programmer.

But right now if you think about so there are two axis, you need to distribute these workloads. But the other axis, you need to deal with all this heterogeneity of resources, right? You have Nvidia and you have Trainium and you have Inferential if you do, if you do inference and if you have TPUS and you have AMDs and you have Maya from Microsoft.

So right now this and, and by the way, and if you look at only one vendor, there are a plethora of, of chips which are good for different uh for different workloads, for different models. Like if you think about Nvidia L4 A Tens and so forth. And so so, and if you really want, you know, to, you know, if, if you are a small company, you need to be very kind of thoughtful and agile about what resources you are going to use.

Let me give you an example like for instance, again, like you know, another problem which is in the cloud aside from what Dave, as I was saying about, it's a little bit this kind of elasticity breaks. If that the resources are not available when you want, go and do get some H1 hundreds. Good luck. Right?

So what people do? Right. It's like when you don't have food, you hurt, right? You get, you buy and you reserve the GPUs for a long time and you are almost you, you are creating a virtual data center for yourself which by the way, these nodes, you are not going to use all the time, but you get them because the fear you, you have a fear that when you need them, they are not there, right?

So instead of doing that, look at the GPUs you already that are already you know, available like a 10G, right? It's it's a good GPU for instance for um 7B models, right? Actually the most cost effective for summarization for instance.

So that's kind of you see and, and, and but for that, you need software, I think the software, you are still scratching the surface because you need to, the software needs to fill the gap, makes needs to make it easy for you as a developer and to abstract away all this increasing complexity.

So you need to be on top, you know, look up of, of, you know, software infrastructures which can provide this abstraction to accelerate your development and to be very effective in using whatever resources are, are, are out there are out there, you know, in your region, across regions, maybe in other data centers and so forth.

Yeah, I, I think you um you just work with what you've got, right. I think that's what he's effectively saying. And if most people don't need to get thousands of H1 hundreds to build the next ChatGBT, right? And even to have the knowledge and the know how to do that is kind of a journey that you have to go on.

And a lot of the companies I see trying to adopt AI aren't there yet. And so even if that's what you, you know what you should actually be doing, that's the right answer to your problem. But you can't get the capacity, there's still other things you can be doing with AI and building that institutional knowledge and the organizational knowledge because building, you know, building a large scale model is a lot of work and it's a very detailed expert work and most organizations probably don't have that skill set already.

And I think that's something you can start work on, you know, today, now with, you know, lower tier GPUs for want of a better phrase. But um and, and bet that in time, you know, they, they're gonna become more available. Um they are gonna get faster, the models will get smaller, um all, all of these things will happen and it's, it's almost inevitable.

And so even if today isn't perfect, you work with what you've got and you can kind of bet in the future that it will eventually get there.

Great. You know what you said, Jan reminded me of uh 22 things. One is, uh we have a customer AWS customer based in Europe. Their name is Codeway and they came to us asking for um a one hundreds and through conversations with our solution architect team and through their own exploration because, you know, they couldn't get the capacity that they wanted.

They figured out a really cool solution where they, they realized that some subset of their models didn't need the A1 hundreds because their latency requirements were not really as tight as some of the others. And they could move to our 8G, which is the G5 instances instead of the wearable and expensive P4D. And that really just made it so much easier for them.

So sorry. And if it's a G, you know, doesn't provide you enough latency, hey, you can stripe the model, you know, across multiple 18 Gs to reduce the latency. There are techniques doing that.

Yeah. And then the second thing that you reminded me of when you said that, you know, people have this fear and they have to, they, they reserve capacity. So I'd be a bad marketer if I did not mention that AWS just launched a capability a couple of weeks ago, month ago, it's called two capacity blocks for ML.

It's the only consumption model that today that can let you reserve compute capacity, short term compute capacity for the. So it's literally like going and making a hotel reservation and the only one we actually started that.

Oh, ok. I think it's actually a copy of that, of this idea of I can reserve a large block for pretraining, right? Ok. Short term basis. But yeah, ok, I will have to, I will have to go, go back and check our stats. We did advertise it as the first of its kind uh consumption model from a major uh cloud service provider. Let's just say that uh uh is this being anyway, so please check it out, you know, the intention behind that is it will give you a lot more predictability and you can plan for your development cycles and, and reserve capacity when only for the time you need it.

Um so those are the two things that come to a. Ok. So, you know, I'm very cognizant of our time. We have about 12 minutes left and I really want to leave some time for audience q and a. So with that, this will be my last question to you panel, you know, today, so many people are being asked about their generative AI strategy and it almost seems like regardless of what the question is or what the problem is. Generative AI is the answer.

Um what is your advice or what are your top three takeaways for organizations looking to build generative AI powered applications?

I think. Um I probably got 22 points here. One is, um, you know, start small and start simply. I probably said that a few times and, and it's more important to get something up and running and see it in practice and, you know, end to end integrated with your systems to understand what that actually looks like and to understand what, you know, all of the vendors and everyone else you see are actually talking about, it gives you that context to make sense of what's actually going on.

And then I think the second thing is kind of on the other end of the spectrum where you should. Um or, or I've seen people try and kinda lift and shift AI into their existing business processes where they have a, a process and say, AI's gonna do this AI's gonna do that and that can work. But I think the real value um really comes from sitting back and, and kind of looking at these new models and their amazing capabilities and uh thinking about the nature of the work that you actually do in your business.

So rather than just trying to map a process to an AI say AI can do this for me. Now, what should my, you know, humans be doing and what should my computers be doing? So, yeah, I think we, we've seen this, we, we deal with a lot of customers who are, who are going down this journey and uh, you know, some of the advice we give is be very clear about what the, what the problem statement is.

And when you do that, then you actually start to under understand the constraints. Is it latency bound? Does that matter if it does? Then that's something you need to think about. Um what are you, what are you replacing? And what was the scale of that thing? Because when you understand the scale, you can actually make very rational choices now.

So it's not gonna be that GBT4 is gonna be the best solution most likely. If it's, if it's a high-scale problem, you need to now think back like, ok, if latency really matters and I'm gonna have like a billion inferences a day. Now, I need to start thinking about something smaller and then I got to think about a strategy to get there.

So really kind of being crisp about those things is important. I, I hate to say this as a, as a technologist, but it's almost like a discounted cash flow DCF model or something. Understand what it's gonna, what, what you're generating and then kind of back calculate what you're gonna put into it and that, that's an important, important consideration.

Yeah. So a few things. So the first and um it's about, yeah, you, you need to be very clear about the problem you want to solve about the task you want to com to accomplish. And then it's, it's, it's again, it's like, because I've seen this, it's like, remember generative, at least at this stage AI is about generating solutions, not necessarily correct solutions, right?

That's how we are talking about, you know, it's like being confident, very confident about the, you know, generating something wrong. So it's very important to know that you can, there is a easy way, relatively easy way to assess the quality of the solution. Right. Right. That's kind of is very important because otherwise if there is no easy way to assess the quality of the solution, you can get into trouble, right?

So this is a good pattern, something which solution is hard to generate, but easy to check. And again what check means it depends again also as a problem, right? If I'm even for the same task, like we were discussing earlier, if I summarize a blog or a paper is fine, right? But if you, if you do it legal documents, if there are mistakes, it is a problem.

And the only way to check really for that is to read the legal document, which is a lot of work to do if I read the legal document probably generating a summary, not a big deal. Right. So I think that's kind of the way I, you know, ii, I think, I think about the other thing, I would always say, look, can an open source model do the job if I fine tune it.

And again, depends on the task we are talking about for some task with open source fine tune models can be, you know, ChargeGPT. However, there are other tasks like reasoning, solving math problems and so forth, which it seems that the size of the model still matters, right?

Ok. And um and, and the other one, it's about obviously the scale, you need to have some idea about the scale because again, sometimes lift and shift will not matter, will not work if you have huge traffic and you want to s to replace a component of your existing system back and system with some LLM, it can be 100 times more expensive.

I think I'll, I step up a step back a little bit and, and kinda say if you're not using generative AI just don't over think it start experimenting with it. It's probably easier than you think with all the foundational models. You can just kinda fine tune something and get the proof of concept uh up and running very quickly.

Um and I think it, it matters a lot because it's, again, it's a revolutionary uh revolutionary technology. It's good to know what its capabilities are. In order to understand its impact on your business and how you should approach it. Uh but of course, if you're in the room, probably you, you're, you are embracing JAI, so uh the, the other comment I would know or advice I would, I would give is, is related to what Jan and Navin said, uh think about deployment early on.

I, I see too frequently that people are training a model and only uh a bother to think about getting the best quality out of the trained model. But then after the model is trained, if that's the point where you start thinking about deployment, you're going to be bound to a box and you won't be have any flexibility to shift left or right to, to the place you wanna get to.

Awesome. Thank you so much for such an insightful conversation. I definitely learned a lot. I hope you all in.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值