Building Falcon LLM: A top-ranked open source language model

最新推荐文章于 2024-06-13 23:38:06 发布

李白的朋友王维

最新推荐文章于 2024-06-13 23:38:06 发布

阅读量162

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134790229

版权

Good morning. Welcome to re:Invent 2023. It's exciting. You feel the energy, you feel the, the passion for innovation. My name is Cameron Brooks. I'm the uh the leader for Amazon Web Services, Public Sector for Europe, Middle East and Africa. And it's my honor to kick off this session to kick off the day with a very exciting topic - generative AI.

In a minute, I'll, I'll go through a little bit of the uh the, the background of what we have today. But I know that you're really excited about generative AI, one of the leading topics in the world today. So you're in this room because you know that large language models or LMs are ushering in a new era of possibilities - from personalizing learning experiences to summarizing 200 page manuals. These powerful algorithms have cracked the code of natural language processing. The impacts are endless in a world that is more connected than ever.

These models facilitate communication across borders, fostering understanding and collaboration on a global scale in the face of global challenges like climate change or public health crises. These models assist researchers in analyzing vast amounts of data accelerating the pace of discovery and innovation. And how about education? Imagine a world where every student, regardless of their location or background has access to a personalized learning experience that unlocks their full potential by harnessing the power of language. LMs are helping us confront some of these most pressing issues of our time.

Lucky for us, we have the team that developed one of the most powerful LMs here today. Falcon 180B is the top performing personalized pre-trained open source model among 100 plus models listed on Hugging Face. Today, we will dive into how the team built this. What makes Falcon unique, the AWS training environment and practical, practical applications for you to get started.

In a minute, I will introduce or I'll turn it over to Dr. Al Abdela, Al Mousauri, a trailblazer and a visionary leader whose remarkable journey has left an indelible mark on industry and has inspired countless individuals to reach for new heights in their pursuits.

Dr. Abdeslam is the Executive Director, Acting Chief AI Researcher at the Technology Innovation Institute or TII. TII is a leading global research center dedicated to pursuing the frontiers of of knowledge. TII's team of scientists, researchers and engineers work to deliver discovery science and transformative technologies. TII's work focuses on breakthroughs that will future proof our society.

Dr. Abdzam also co-founded the AI Cross Center Unit at the Technology Innovation Institute and leads the Big Data subcomittee at the UAE Council for AI and Blockchain. She was featured on the list of Leading AI Women of the World in 2023. And she was the first Emirati female to hold a PhD in Artificial Intelligence for Wireless Communication Engineering and Computer Science.

And my favorite anecdote about Dr. Abdeslam is that Abdeslam in Arabic means "smile". So she was linked to Amazon from the very beginning with her name.

So please congratulate me. He help me welcome Dr. Ezam to the stage.

Very good morning. Ladies and gentlemen, thank you, Cameron for the great introduction. You did better uh job than me introducing TII and the work that we are doing there in United Arab Emirates.

Thank you uh ladies and gentlemen for being with us today to shed light on some of the most powerful open source model in the world Falcon 100 ATB which has been trained using Amazon S maker.

In today's agenda. It's not only me also, it is AWS team who were with us through all the journey, working with us to train the most powerful open source AI later Will and also Ben will join me to introduce more large language model and how we train it using Amazon SageMaker. And also they will speak about the BRT engineering and function calling and the plugins using Falcon 100 ATB.

So from where I come, I am Dr Tiam Maoi. I founded the AI Cross Center Unit in Technology Innovation Institute starting in January 2022 with a mission to build one of the best leading AI center in the world. It's not only my vision, it is the vision of Advanced Technology Research Council. Our mandate is to have advanced technology research and development hub in United Arab Emirges. And what is the best story to do is we have to focus on what's the recent trend and the latest advanced technology and how we can shape our human resources, our talents from around the world.

In Technology Innovation Institute, the applied research center in Advanced Technology Research Council. We have people from more than 7080 nationalities around the world. We are more than 1000 employees. So we believe that with collective effort with different people from around the world, we can contribute towards the advancement of the technology, not only in AI but also across different disciplines and sector such as robotics, cryptography, directed energy telecommunication, uh cybersecurity and many others you can see here also we have the other subsidiaries under Advanced Technology Research Council Venture, one is the commercialization arm.

So after a few days or maybe this week, you will hear the spin out of the first AI company from United Arab Emirates and it will be based on Falcon. In fact, tomorrow is the lunch and then also we have SIR the program development.

So what we do in Technology Innovation Institute, we don't only develop a technology for the sake of the purpose of publication and research. We want to solve challenges, real challenges in the world. So we listen to our customers to the global and we address that by developing a technology up to technology readiness level four. And plus, then we take it to different customers to run different pilots and then we can spin it out as a start up company or we can set it as a commercialization under Venture One.

From where I come, we believe in advanced technology, it will have a significant impact on the sustainable development goal. We have more than or we have 17 sustainable development goal. That's focus from poor education, health care AI technology is a tool and we believe by open science, open research, open LLM open AI models, it should be equitable and everyone should have access to that knowledge and to that science and to that advanced model. Because once you have access to that model, you will be able to develop advanced technology that it will solve the global challenges across different sectors.

And also under the sustainable development goal collaboration. Openness also is important to harness the technology while also we are safeguarding the human values and the safe consideration. While we are developing and deploying these advanced technology models.

Our achievement, interactive AI. It doesn't only start with Falcon 100 ATB it start long time ago by also introducing the largest Arabic LLM model in the world in April 2022. Back then GBT was not there. So only the technical community, they were aware about Noor and the capabilities of NR as a large language model with 10 billion parameter.

Then we started and the work starts by, we already have the road map and we already built the human resources, we have the right financial investment and we believe that we can contribute to the towards the advancement in the technology. So we set the road map for Falcon LLM. It is an incremental work started with Falcon 100 ATB. But before that, there was Falcon 4TB.

Falcon 4B has been trained with one trillion tokens with only uh with the size of 40 billion parameter. Then of course, we continue our journey by training Falcon with 180 billion parameter using 3.5 trillion tokens. And this is not only something that it comes out of nowhere. Thanks for the open research in the AI when we saw the Hoffmann scaling law that introducing that to have a better model, no need for you to have larger model. But also you can scale in terms of the size or you can train your model longer.

There, it comes the idea that ok, let's train a model with a similar size of GBT three, but with more tokens. And let's see how it will perform. And that's why Falcon 4TB with half of the size of GBT three model, it's already out perform GBT three performance and zero shot uh a accuracy.

What is LLM? It's part of generative AI model and the main concept for these large language model, our foundational model. Once you train this large language model using the whole public web data, you come across the web data, you filter your data, you build the data pipeline, you already train it using different distributing architecture for different transformer. And the main purpose is once you developed this foundation model, you can have a lot of use cases that you can build different use cases for uh different customer and different business sectors.

How ARM is different than the traditional machine learning algorithms. Yes, it is part of it. But in traditional machine learning algorithm, you have massive amount of data that requires labeling. And not only that when you train traditional machine learning algorithm, you only solve a specific task. But using large language model, you have massive amounts of data, they are not labeled at the same time. Once you finish the training, you have one of the best foundational model or general foundational model that it can solve multiple tasks from text generation to automation to summarization to shorts spot and many other applications for across different spheres and sectors.

Of course, building aliens is not as an easy step, the journey start, we have the vagin we have the mandate and we need to collect our resources to come over the challenges. One of the main challenges that we face is you have to collect the data, the data, it is there, but you have to make sure that the data is clean. We remove the any bias in a toxic data set. So we developed a thorough data by line infrastructure to remove any duplication or to filter any tec or bias uh uh information in our training data set.

Also, not only that, but you have to build your own data by blind infrastructure to be able to use it to train large language model. The story doesn't end here. One of the major factor that always you have to consider while building LMs is the compute power. So you have to have the infrastructure to, to train a massive computational load using your infrastructure. And here you have the choice, either you will rely on HPC infrastructure or a new cloud infrastructure. For us. We choose AWS for many reasons later, we can discuss it also, we have to make sure that we are already using a regular health check to maintain the integrity of our model and make sure that whatever we train it's trained correctly.

The orchestration also is important and essential factor here to make sure that whatever you have or you build is seamless integration through all the process while you are training one of the best models in the world, the data scale also. And the type of that data scale, do you have enough variety in your data? Do you have the right to proportion for the data that you need to consider before training your LLM? Because the type of the data and if you will able to answer these questions, you can make or break your LLM and also you have to have the strategic thinking because even if the mandate or the target is to build one of the best model in the world, you have to build it incrementally. So we started building Falcon with 1 billion parameter

Then 3 billion parameter conducted different experimentation, different architecture to train it using SageMaker. And then we learn from each stage what is the best recipe for training this model? Then also we have Falcon 7B and then we scale it to Falcon 40B. And once everything was set and we make sure that the performance is scaling with our recipe, we scale to Falcon 180 billion parameter trained using 40,000 GPUs in a 100Gbps cloud infrastructure using AWS.

One of the most important factor as well is the cost because there is a financial investment that you will pull that you will put here. This financial investment, the goal is it will be expensive task for you. But here what you are trying to solve - are you trying to build a model just for the sake of building it or you are trying to build one of the best models so that model will be utilized and accessible from different researchers, from different developers and from different entrepreneurs around the world? And this is the main value and the main thing that you have to consider while you are putting any financial investment for training models.

And here I am introducing Falcon. And as I mentioned before, regardless of the machine learning application or the technology that it will start, either it is for business or organization or maybe a mission to solve any potential problem, we don't use machine learning just for the sake of using it. But also we want to solve a problem. And the mandate is to build one of the powerful foundation models for research and commercialization.

So how we did it, we build one of the top pioneer open source models, Falcon 40 billion parameter which is trained using one trillion tokens using 384 GPUs in Amazon. And also then we have Falcon 180 billion parameters which is trained using 3.5 trillion tokens. And also it is trained up to 4000 GPUs during the last month of the training.

Here, you can see that we have different Falcon series from 7B, 4B, 180 billion parameters. All of them, we use decoder structure for our Transformers. And also we consider high quality C4 that is assembled from different massive web data, we ensure that there is no duplication. So we limit the memorization especially with the small LMs. And also we train it with this massive training, number of tokens to make sure that we are receiving the right performance or the right mission that we are trying here to solve.

Here is in the table you can see here that we in Falcon, we cover a wide range of the capabilities. And also in terms of the inference requirements, it enabled different usage of example Falcon 7B today. You can rent using your App LM 2 and the hardware devices in Falcon 180B. It requires for you 8 A100s to run that one as an inference. And you can see here that we reported the steady performance in terms of zero shots across all the Falcon series - Falcon 7B and Falcon 40B is available under Apache 2 and Falcon 100ATB is available under open source restriction to emphasize the responsible use of AI.

So how we built one of the most powerful LM models in the world? One of the main criteria as I mentioned before is the data, you have to make sure that your data is qualified and you have to remove any duplication in your dataset. So there are multiple stages that you have to consider as we increase in a retraining compute budget, you will go through different stages. So either you increase the size of your model or maybe train it longer.

So first we started with the stage one where of course we perform the filtration, extract whatever you want to extract in terms of the text. So we were focusing on text. So we only extract the text from our HTML pages, the language identification, there are more than 170 languages available. We only focus on the English or the Latin European characters. Why is that? As I mentioned before, we did a lot of experimentation to make sure that the multilingual will not affect or degrade the performance of our large language model.

Also, we make sure that we remove any duplication to minimize the memorization for our LLM. And this is the data mixture and the filtration. So you can see here that in terms of web data for Falcon 4TB and Falcon 100ATB, it ranges between 82 to 85% depends on the model and it's mainly high quality web data.

Then of course, we have also fraction around 17 to 20% from a curated data and the proportion for that curation data, we did a lot of experimentation to know what's the right proportion for conversational datasets, for box technical data, and also for any conversation and box that has been embedded in our datasets to make sure that we have also multilingual or to address the multilingual capability in our LLM.

We did a lot of experimentation. And also at the end, we decided that we only, we will stick with English, Latin European languages. One of the main question that maybe you want to ask or you want to consider, can web data alone with filtering and deduplication be used to train models, outperforming models trained on curating data as measured by natural language zero shot performance?

And we found what we found is if you have already a strong baseline for web dataset, after 50% of that one, you will see that the performance of your model will start to degrade. So you have to pay attention of the proportion of the created data, especially if you have very good or massive of high quality web dataset.

So when you are combining a strong web based line, we find out that the addition of curated data, it can affect the performance especially after 50%. And also the other question is can limited amounts 5 to 10% of code and multilingual data be substituted in retraining data added without compromising the English performance of the model?

And as you know here that code, especially for large language models, they have very strong code capabilities, you have BLOOM Code. So what people they usually do either they train a large language model specifically for coding or then they have their foundational model. Then of course, they will fine tune it by embedding a new code data in it. You can do this with Falcon today. Falcon is an open source. You can utilize Falcon for TB and build Falcon Code on top of our data.

But at the beginning at that time in 2022 especially at maybe September or from September to November, we were discussing what is the right fraction of the code data that I can implement in Falcon. And as you, we did a lot of experimentation, one of them is to make sure that the zero shot performance, it's not degraded and we manage and we concluded that ok, we will include only small fraction of code from 5 to 10% only.

And of course, we selected the top 30 programming languages from public GitHub and substitute only 5% of the global training data size that we implemented in our training process. That was about the data.

What about the architecture of the distributed training? So the Falcon architecture and the recipe for efficient inference and the stability is one of the important action or the step that you have to consider while training your LLM to help scalability. For example, we manage to have for example, or replace the classical attention viber attention inside the transformer block and that it will help the model to be scalable enough.

We decided of course to have no biases in our linear layers. And by removing the biases in the linear layers and also the linear layers improve the stability of training our model. We combine 3D bars for fine grain control and also zero for scalability. So we have data bar, tensor barial and pipeline also bars we use also optimizer sharding sublet the large optimizer state across different or multiple degree of parallel. We reduce the memory footprint and also improve the scalability of our model.

So after training Falcon 1B, 3B, 7B, 4TB in each stage, we always conducting different evaluation matrix and what we found after final analyzing the training of F4TB in March 2023 that's already outperforming GPT-3 despite being only the fraction of its size. And this also, as I mentioned at the beginning of my speech that emphasized that it's not only about the size of the model, it's about the quality of the data. It's about the amount of the data that has been embedded with one trillion token fraction of the size of GPT-3, we already outperform GPT-3 model.

And in Falcon 180 billion evaluation results, you can see here that it's already on the palm of Palm 2 Large in terms of the performance using one shot NLP task benchmark as reported in a PALM paper, we find that when averaging the performance across task Falcon 100ATB already recovers 99.5% of the performance of Palm 2 Large.

Also Falcon 100ATB delivers downstream performance between GPT-3.5 and GPT-4. Falcon 100ATB performs well on common sense task where it is also well ahead of GPT-3.5 especially when we are considering also the multiple choice question answering Falcon 100ATB performs above GPT-3.5 model.

Outside of PALM 2 Large, you can see here that Falcon 180B significantly improve other of the state of art models such as LaMDA 2 that has been released from Meta. And you can see here that Falcon as well 100ATB improved significantly over GPT-3, BLOOM, and LaMDA 2, any question answering dataset.

Maybe you can ask your question, you already improve or you have the best open source model Falcon May 2023 when we release Falcon 40 billion parameter. Why you continue the journey towards scaling to Falcon 100ATB?

And you can see here that the larger model size also unlock new capabilities in terms of better reasoning and also better mathematical explanation. So here is the example between an answer the same prompt using Falcon 4B and Falcon 100ATB where the it has more nouns and details and answer and better reasoning as well using larger models such as Falcon 100ATB.

Also here in terms of multilingual capabilities, Falcon 100ATB, it has much better multilingual capabilities than Falcon 40B. What is the most important thing here is the training environment. So large scale distributed training is on a cloud service or your HPC cluster. You have to decide what is the best HPC resources that you have to use in order to train your large language model.

And in our case, we consider Amazon SageMaker and my colleague will be here today to introduce for you how we use Amazon SageMaker to train one of the best LLMs in the world. Thank you.

In the first generation for experimentation, TI started to use 384 GPUs. As we scale it was we start training Falcon 180B, they wanted to keep the performance consistent or the implications on the performance is not that um high. So as we go from 384 to 4000, that's about 10 times the scale, we wanted to reduce the performance implications on the training process.

And of course, things always fail. And when they feel we needed to have a mechanism to in place to identify all of the faulty nodes or all of the faulty issues, replace them in a timely manner and resume the training process as fast as we can so that we don't disrupt uh the compute days of the budget that they specified.

And then lastly, and I think this is one of the major core kind of challenges that we've seen. We've gone through optimizing the storage. So storage lives in the middle, all of communication happens between all of the nodes and the storage. And if this communication is not very optimized and very efficient, it's gonna slow down the whole training process. And we had to make sure that we leverage every single trick that we have in our pocket in order to make it fast and optimized.

So this is how um based on all of the challenges that you've seen right now, we decided to follow the simple approach. So we decided to use Amazon SageMaker because we wanted to get started very quickly. Um the team at TI didn't want to really bother about building HPC clusters and configuring all of the underlying instances, configure the communication and configuration between these instances. So they decided to use the Amazon SageMaker for that.

And then for the container itself, they rebuilt that container from the ground up with custom setup and custom configuration. So distributed training libraries was completely ran built from scratch by them. They built all of the base code, all the training um and all of the communication with the storage, which is S3 in that case. So pulling the data down, streaming the data down from S3. And then when we save the model state which we call checkpointing, upload the data in a timely manner.

And then on the right hand side, you see the data preparation cluster. So we used C5n.18xlarge instances, we used about 257 of these instances. And for the training, we use P4d instance that contains about eight A100 Nvidia GPUs. And then on top of SageMaker API, they built a custom agent that is able to orchestrate all of the different processes in order to make that training happen. So pull the container out from ECR repository, start the training process monitor that training see if there's any faulty nodes resume that training if needed from the last healthy checkpoint.

Now I wanna zoom a little bit into the training cluster and talk a bit more about that one. So for the Falcon training, we built a SageMaker cluster of more than 500 P4d 24xlarge instances. Each one of these instances is equipped with eight A100 Nvidia GPU chips. And then having eight GPU chips inside a single instance gave us some throughput advantage. And that's because the A100 GPU chip um interconnects uh has NV Switch interconnect that is able to communicate with every single other GPU within the same instance at a speed of 600 gigabyte per second.

We also had the EFA or Elastic Fabric Adapter that held all of these instances to communicate with each other and with other services outside through the network interface at about 400 gigabits per second. So um we were able to efficiently download the data from S3 and upload the data to S3 at, at that speed, which is really good.

However, um again S3 or communication with the storage is one of the most complicated part or one of the most complicated challenge here because if we don't get it right, it's gonna be problematic. So Amazon S3 automatically scales to high request rates. Um so if the application is aggregating about 3500 per transaction per second or a 5500 get or download transaction per second you don't need to do anything. And that is per prefix. And the awesome thing about um S3 is that you can create unlimited number of prefixes within the same bucket, meaning that you can scale and optimize the read and write performance as much as you can. And that's exactly what we've done.

So now I want to talk about some of the best practices we found training Falcon and overcoming some of these challenges. So when we um started training, uh the training job, we made sure that all of the training nodes in the cluster are started in the theme availability zone. And that's because we wanted to have a very close proximity between all of the nodes in that cluster, right?

We even collaborated with the SageMaker service team in order to within the the availability zone place some of these nodes behind the same backbone or the same spine. And that helped increase the performance by about 3%. You might look at 3% and say that's a small number, but it's actually saved a couple of days of training, which is quite expensive if you scale it to 500 P4d instances.

So the second um challenge here, if you remember data processing, we use what we call parallel shared nothing architecture, meaning that um all of the all of the data preprocessing tasks that Sam was talking about like duplication, filtering. we had to classify for profanity and and um different words, all of that happens independently and inside the cluster nodes.

So anything that can happen with that avoiding, with avoiding the communication between the nodes, we did that separately in um the C5 instances. And then ended up having or using 18,000 vCPU and about 37 terabytes of memory that produced about 11 terabyte of very, very clean data mapping to about five trillion tokens.

And then we had a lot of um quite a few silent GPU failure. We had some network timeout, we had some other failures coming from different parts of that cluster. So initially, we started to set up CloudWatch to look at GPU utilization and that ended up not picking all of the issues.

So we used triple dimensions, we use GPU utilization training job logs and um looking at the throughput coming out of the network interfaces, all three combined gave us a signal whether or not something is happening, the training is continuing or not.

Um because in some instances, we will see um the utilization for the GPU quite high but nothing is happening. It's also worthy to know that SageMaker job um um takes the maximum time for the job is about 28 days.

So um and that's what the custom agent that TI built was uh taking care of. So it was chaining jobs together and resuming from the last checkpoint using a lambda function. And then um lastly, we had to scale and partition S3 bucket to prepare for bursting workloads.

So that was basically doing one of the heaviest tasks when you train the model, which is seen in the model state and doing the checkpoint and uploading that checkpoint to the storage, which is S3. In that case for Falcon 100B, we had about four terabyte checkpoints and we were saving a checkpoint every two hours. So you can imagine the amount of traffic that was going through all of the network interfaces to the S3 bucket.

Thankfully, we built um all of these best practices are more into the AWS common runtime. So uh the CRT is basically a set of native tools and libraries underpinning many of the AWS um SDKs. Um it also includes native S3 client um that implements automatic request parallelization, request timeout retries and connection reuse. This is really important because it helps avoiding overloading the network interfaces.

So for example, if we have a very large object like that checkpoint that we need to resume from. Um and we download this using the CRT client. Uh CRT client is gonna automatically download multiple byte ranges of that file in parallel which increases the throughput and reduces uh the or or fully saturate the network interfaces so that you can make the most out of it.

I'm also super excited that we announced um just literally a couple of days ago, we announced um Amazon S3 connector PyTorch, which is also helping in optimizing a lot of these um tasks that happens if you have a PyTorch container.

So um the checkpoint part of the training job, um now it is gonna happen directly to S3 and instead of saving it into the internal storage and then upload it to S3 that saves about up to 40% or makes it about up to 40% faster when you checkpoint.

Now that we covered the architecture that underpins Falcon, it's really important to understand um how TI evaluated the performance and accuracy of the model. So model evaluation is really important part of any machine learning model training life cycle. And apart from running all of the offline evaluation that Sam presented, we also did human evaluation uh to understand how good the model is.

So this is really important because we needed to um just assess uh not just the technological parts of the model but how um ethically sound the model is. So we built a very simple architecture here and we leverage uh Slack channels. So what was happening is we have hosted the model um behind uh SageMaker endpoints. And every day we would send requests to the model, generate responses, send it to the Slack channel and have a few people from TI and AWS team as well to look at some of these answers and evaluate them.

Um so we would rate them from 1 to 5 and also say whether or not it's appropriate or fabricated. So that was really important part of the evaluation process.

So this summarizes um the collaboration that happened between TI and AWS on Falcon. Now to truly appreciate the power and potential of Falcon, let's see it in action.

Um I'd like to invite my colleague Ben to demonstrate some of the capabilities of the model through some prompting strategies and also show you how Falcon can be a gateway to AI possibilities. Thank you.

These are essentially roles uh of the interaction. You could have one role, saying falcon generate content uh and pretend you're a teacher, a programmer, uh expert in this domain. Uh so it could be a single role or if it's a conversation, you could have multiple roles or personas that are defined and those show up in the various tags that we're gonna see later on in a moment, the context comes after those personas. Uh and that really helps scope uh the results for falcon. So do you want uh to use only the information provided later on in the prompt? Do you want it to output json? How do you want uh the interaction to be scoped uh within that context of the request? And then finally, after the context is the guardrails and these are really important to be able to have falcon uh and other large language models. But in this case, falcon uh respond uh the way you want to respond. So, do you want it to say, i don't know if it doesn't have the information? Um do you want to um have it think step by step and show a little bit of that reasoning uh or show that reasoning power under the covers. Uh we're gonna show some examples of these advanced prompting techniques uh on the coming slides. Uh but this really helps um set the stage uh to be able to, to have falcon respond, how you want it to and get the really powerful results that um that you're gonna see.

So, so from there, um one question that you might be asking is language is naturally ambiguous. If I ask it to summarize an article. Uh uh here we have an article uh with werner vogel and two distinguished scientists from amazon. And if we ask um a large language model. If we ask falcon to summarize this article, it's an ambiguous task. Uh so um you might uh get various types of responses back depending on um what, what type, the way you asked to summarize that information. One advanced prompting technique that falcon supports is the directional stimulus. What this is is it's auxiliary information to your prompt information that guides that, that answer. So it clears up that ambi am ambiguity. Uh so in this case, we're providing a reference and a hint. And rather than having um falcon uh summarize the information based on uh different year ranges, 5 to 6 years, uh different roles, those sorts of things, we could say these are important elements when you're summarizing. So rather than entering over and over again, trying to refine that prompt and ask very, very specific questions. We could use this advanced prompting technique called uh directional stimulus. Uh to be able to say, i want to really focus on verner vocal generative a i scientist in 30 years. And what we see is falcon will generate uh 2 to 3 senses because that's what we asked for in the task uh based on that direction. So very very um powerful advanced um prompting techniques to be able to still have that prompt, more generic but guide through that ambiguity saying focus on these things when you're doing that summary, another events uh technique is being able to do react, reasoning and action uh so here, what we're doing is um integrating with different api s uh different tools so large language models, uh you might be asking yourself with falcon, how can i bring that into my solution? How can i start customizing both on that task but bring in external data sources? Uh in this case, we're bringing in a data source for news and weather. Uh and what we're saying is uh we're saying falcon uh try to answer the question. But if you can't answer the question, use these data sources, use these tools to help get more information that you might need. Uh so you might be seeing how that could relate to integrating with your enterprise tools or other tools you might have in your organization uh to be able to bring data in. Uh then we're through the prompts. We're saying uh you're a helpful assistant in the high level task uh only answer based on what you know or use these tools. Uh those are setting up the various uh contexts and guardrails. Uh and then do this structure uh perform these uh actions um start with a thought action, action, input and observation and iterate over that until you can answer the question. And here we see asking the question, being able to ask the weather today in stockholm. And if we take a look at the results of what falcon does, it's doing that thought process first, it realizes um i don't have this information, what tool should i use? And, and we're providing multiple tools. We're not just saying, use whether if, if you don't have the answer, we're saying use these set of tools and determine the right tool to use. And here we see it, determining whether uh and also what city is being specified, stockholm. Uh so that's the thought, i, i need to figure out what tool to use because i don't know this information, that's the action. Uh i need to perform weather and the action input is stockholm from there. Uh we can see an observation of uh now i know the weather in stockholm. Uh and then ultimately, the answer of the weather is two °c. So that's showing um how multiple tools could be brought in another advanced technique technique in in prompt engineering that falcon supports uh and being able to determine that tool to integrate into your system from there.

Uh you might be asking how can i actually do function calls? Uh how can i call a weather service that might be an api or um you know, maybe i want to do other types of functions and integrations calling api s or functions code or uh other integrations there. So that kind of leads to function calling. Uh so with function calling here, we have a function definition in a json form saying uh we have a weather function and it takes these inputs, it takes the country, it takes the city, it takes the date uh that you want the, the weather forecast for. Uh and here, what we're doing is we're asking falcon um a question. We're not actually asking the falcon to fill in a function parameter. Uh we're using natural language and saying we want this information, uh want this information uh to be able to um get the weather information. And what we're doing is we're saying uh pretend like it's tomorrow. Uh today is uh november 28th 2023. Uh and i'm driving from um uh l a to las vegas. And what's the weather in two days uh um uh from there or from today? So, from here, what we see is uh being able uh to get the weather um get uh the country derived uh from the city. Uh and we're also having providing information about the date. Uh we're saying today is this date and in two days, uh what is the date? And from here, uh we're able to um generate the function call to be able to call the get weather service.

So those are a small sample of the advanced uh prompting techniques. Uh there's many, many others as well. Uh we can spend a session on uh optimizing um you know how to get the best value. Uh but those are some of the techniques to be able to uh see the power of falcon and do uh things like raise inc uh calling functions, determining tools. Uh and the, the key thing there is that structure of the prompt. Uh the structure of the prompt is, is uh often times specific to an llm uh and structuring the prompt, that specific way will help you get the right um uh get uh optimal results out of falcon.

So from there, how can you get started in your aws environment? Uh we have uh falcon l ms through jumpstart. Uh it's the easy, easy way to integrate into uh your aws environment. Uh it'll uh provide single click integration through sage major studios. So here uh we have sage maker studio which is a integrated environment to do your end to end ml uh development. Uh and here we could see through that in uh integrated environment, we see falcon 8180 b uh one a db falcon 40 b seven b uh and the different models. So from here, you could do through a single click deployment, you could do fine tuning. Um but not only that you could also do the uh the same sort of um getting access to those open llm models uh through jumpstart through the uh aws m in console, you could integrate into the model uh model cards and the um see the model cards in hugging face. Uh you can jump into the code itself uh and see coding samples to integrate it into your code uh and get um snippets of code to integrate into your, your logic and your code repositories uh all through things like three lines of code to deploy falcon. So in this case, we're deploying falcon seven b. Um but it's as easy as changing that string to deploy the other models out. Um to be able to integrate um uh top performing open llm in your application once it's deployed, uh you can integrate it through um rsdk uh to call falcon through the sage maker endpoint.

Uh and just to give uh a small perspective of what's under the covers here uh is when you deploy to the stage maker endpoint, uh you don't have to worry about managing everything on, on the left hand side. Uh you to find that in a, in a policy in a declarative statement saying here's the infrastructure i want to deploy falcon on to and we manage the uh the auto scaling and all that nfh heavy lifting uh behind a managed secure end point. So from there, um uh you could see the different elements of deploying falcon out.

Uh i want to invite back on stage um will and, and dr sam uh so that we could uh uh highlight some key takeaways.

Yeah, hello. So we are welcome uh here uh again, back with you. And one of the main thing and the, one of the main takeaway today will not be able to be here without the collective effort between different team members and different players from the people who already felt falcon to the b hall helped us in deploying falcon. So in order for you to build one of the top ranked llm model or any advanced a i model openness collaboration is the key. Thank you.

I think the second most most important uh key takeaway is um everybody thinks that if you have a bigger model, that means better model in terms of number of parameters. Um yes, number of parameters can help capturing the language nuances. However, if you see falcon, that's one of the biggest dividends. That number of parameters is doesn't mean that the model is gonna be better if you compare it to something that is double or triple the number of parameters. So it all depends on the data. It all depends on the architecture of um of the model. And i see in the future that the model size are gonna get smaller and smaller and it's gonna get better as well. So that's where we should spend our efforts on. Thank you.

Um and um with all this, you can integrate falcon into your applications today very, very easily. You could do it through code, you could do it through wizards. You could see click. Uh and it's a uh all the, the different falcon models. Uh the seven b, the 40 b, the home 180 b, um you could start doing prompt engine very easily through the integration in aws. Um and uh yeah, uh thank you, thank you for being with us today. Thank you.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Building Falcon LLM: A top-ranked open source language model

Good morning. Welcome to re:Invent 2023. It's exciting. You feel the energy, you feel the, the passion for innovation. My name is Cameron Brooks. I'm the uh the leader for Amazon Web Services, Public Sector for Europe, Middle East and Africa. And it's my h
复制链接

扫一扫

Building Falcon LLM: A top-ranked open source language model

“相关推荐”对你有帮助么？