大语言模型生成式AI学习笔记——1.1.8 大语言模型及生成式AI项目生命周期简介——​​​​​​实验1——生成式AI用例:对话摘要

Lab 1 walkthrough(实验1说明)

This is Lab 1, and like I said, we are going to grab a data-set of these conversations that are happening between people. What we plan to do is to summarize these dialogues and so think of a support dialogue between you and your customers maybe the end of the month you want to summarize all of the issues that your customer support team has dealt with that month. Some other things to note now, I'm zoomed in a little bit much here, but you can see that we have eight CPUs, we have 32 gigs of RAM. We're using Python three and these are some of the pip install. So if I do a Shift Enter here, this is going to start doing the Python library installs and we see that we're going to be using PyTorch. We are installing a library called Torch data, which helps with the data loading and some other aspects for PyTorch specific to data-sets.

Here we see Transformers. This is a library from Hugging Face, a really cool company who has built a whole lot of open source tooling for large language models. They also had built this library, this Python library called data-sets, that can load in many of the common public data-sets that people use to either train models, fine tune models, or just experiment with. If you click Shift Enter there, this will run for a bit. Now keep in mind this does take a few minutes to load. This whole notebook will depend on these libraries. So make sure that these do install. Just ignore these errors, these warnings. We always try to do things to mitigate these errors and warnings and they always show up, things will still work. Just trust me, these libraries and these notebooks do run. We've pinned all of the Python library versions so that as new versions come out, it will not potentially break these notebooks so just keep that in mind. This does say to restart the kernel, I don't think you have to do that. Let's just keep on going.

Now we're going to actually do the imports here. This is going to import functions called load data-set, this is going to import some of the models and tokenizers that are needed to accomplish our lab here. We're going to use this data-set called Dialogue sum and this is a public data-set that transformers, and specifically the data-sets library does expose and does give us access to, so all we do is call load data-set that was imported up above and we pull in this data-set. Now, from here on out, we're going to explore some of the data, we're going to actually try to summarize with just the flat T5 based model. Before we get there though, let me load the data-set. Let's take a look at some of the examples of this data-set.

Here's a sample dialogue between person 1 and person 2. Person 1 says, what time is it, Tom? It looks like person 2 's name is Tom actually. Just a minute, it's 10.00 to 9.00 by my watch and on and on. Here's the baseline human summary. This is what a human has labeled this conversation to be, a summary of that conversation. Now we will try to improve upon that summary by using our model. Again, no model has even been loaded yet. This is purely just the actual data. Here's the conversation and then think of this like this is the training sample and then this is what a human has labeled it and then we will compare the human summary, which is what we're considering to be the baseline, we will compare that to what the model predicts is the summary. The model will actually generate a summary.

Here's a second example. You can see it's got some familiar terms here that a lot of us are familiar with, CD ROM painting program for your software. Now, here's where we're actually going to load the model. FLAN-T5, we spoke about in the videos. This is a very nice general purpose model that can do a whole lot of tasks and today we'll be focused on FLAN-T5's, ability to summarize conversations. After loading the model, we have to load the tokenizer. Now, these are all coming from the Hugging Face Transformers library. To give you an example, before transformers came along, we had to write a lot of this code ourselves. Depending on the type of model, there's now many different language models and some of them do things very differently than some of the other models. There was a lot of bespoke ad hoc libraries out there that were all trying to do similar things. Then Hugging Face came along and really has a very well optimized implementation of all of these.

Now, here's the tokenizer. This is what's going to be used to convert the raw text from our conversation into our vector space that can then be processed by our Flan-T5 model. Just to give you an idea, let's just take a sample sentence here. What time is it, Tom? The first sentence from our conversation up above, we see the encoded sentence is actually these numbers here. Then if you go to decode it, we see that this decodes right back to the original. The tokenizer's job is to convert raw text into numbers. Those numbers point to a set of vectors or the embeddings as they're often called, that are then used in mathematical operations like our deep learning, back-propagation, our linear algebra, all that fun stuff. Now, let's run this cell here and continue to explore. Now, that we've loaded our model and we've loaded our tokenizer, we can run through some of these conversations through the Flan-T5 model and see what does this model actually generate as a summary for these conversations.

Here again, we have the conversation. Here again is the baseline summary. Then we see without any prompt engineering at all, just taking the actual conversation, passing it to our Flan-T5 model, it doesn't do a very good job summarizing. We see it's 10 to nine. That's not very helpful. There's some more details in this conversation that are not coming out at this point. Same with the conversation about our CD-ROM, baseline summary as Person 1 teaches Person 2 how to upgrade the software and hardware in Person 2's system. The model generated Person 1 is thinking about upgrading their computer. Again, lots of details in this original conversation that do not come through the summary. Let's see how we can improve on this. In the lesson, you learned how to use instructions to tell your model what you're trying to do with the data that you're passing it. Here's an example. This is called in-context learning and specifically zero shots inference with an instruction.

Here's the instruction, which is summarize the following conversation. Here is the actual conversation, and then we are telling the model where it should print the summary, which is going to be after this word summary. Now this seems very simple and let's see how it does. Let's see if things do get better. Not much better here. The baseline is still Person 1 is in a hurry, Tom tells Person 2 there's plenty of time. Then the zero shot in context learning with a prompt, it just says the train is about to leave. Again, not the greatest. And then here is the zero-shot for the computer sample. It's still thinking that Person 1 is trying to upgrade, so not much better. Let's keep going here. There is a different prompt that we can use here, which is where we just say dialogue corn. Now these are really up to you. This is the prompt engineering side of these large language models where we're trying to find the best prompt and in this case just zero-shot inference. No fine-tuning of the model, no nothing. All we're doing is just finding different instructions to pass and seeing if the model does any better with slightly different phrases. Let's see how this does. Really this is the inverse of before where here we're just saying here's the dialogue, and then afterward we're saying what was going on up in that dialogue. Let's see if this does anything better. Tom is late for the train, so it's picking that up, but still not great. Here we see Person 1. You could add a painting program. Person 2 that would be a bonus. A little bit better. It's not exactly right, but it's getting better. It's at least picking up some of the nuance.

Now, as part of in-context learning, you learn there's something called one shot and then few shots. Let's get a sample of that here. Let's get hands-on with one shot and then few shot. Earlier we were doing zero-shot. That means we're not giving it any samples of prompt and then completion, all we're doing is just giving it a prompt. We are asking the model to do something and seeing what the model generates. With one shot and then few shot, we will actually give it samples that are correct, or that use the human baseline. That gives the model a little bit more information to work on. Let's see how one shot works here. All we're doing is just taking a full example, including the summary from the human baseline, then giving it a second example, but without the actual summary. That's the dialogue that we want the model to summarize. Let's see how this looks. One-shot means I'm giving it one complete example, including the correct answer as dictated by the human here, the human baseline. Then we give it a second example and ask the model what's going on. Let's see how we do here. Here we're just going to do the upgrade software. Person1 wants to upgrade, Person2 wants to add painting program, Person1 wants to add a CD ROM. I think it's a little better and let's keep going. There's something called few-shot inference as well.

Now some of you might be asking, well, this seems like cheating because we're actually giving it one answer and then asking it. It's not really cheating. It's more of you're helping the model help itself. Now in future lessons and in future labs, we will actually fine tune the model where we can go back to the zero-shot inference, which is what you would normally think of as a good language model. But here we're just building up some of the intuition here. Keep in mind, this is a very inexpensive way to try out these models and to even figure out which model should you fine tune. We chose Plan T5 because it works across a large number of tasks. But if you have no idea how a model is, if you just get it off of some model hub somewhere. These are the first step. Prompt engineering, zero-shot, one-shot, few shot is almost always the first step when you're trying to learn the language model that you've been handed and dataset. Also very datasets specific as well, and task-specific. Few shot means that we're giving three full examples, including the human baseline summary, 1, 2, 3, and then a fourth but without the human summary. Yes, even though we have it, we're just exploring our model right now. We're saying, tell us what that forth dialogue is. That summary. Just ignore some of these errors. Some of these sequences are a bit larger than the 512 context length of the model. Typically, you would probably want to filter out any of these inputs that are larger than 512. But here it's still does a pretty good job. Here we see a case where the few-shot didn't do much better than the one shot. This is something that you want to pay attention to because in practice, people often try to just keep adding more and more shots, five shots, six shots.

Typically, in my experience, above five or six shots, so full prompt and then completions, you really don't gain much after that. Either the model can do it or it can't do it and going about five or six. Here we see for this particular sample that really one shot was good enough. Now the last part of this lab is going to be fun. This is where you can actually play with some of these configuration parameters that you learn during the lessons. Things like the sampling, temperature. You can play with these try out and gain your intuition on how these things can impact what's actually generated by the model. In some cases, for example, by raising the temperature up above, towards one or even closer to two, you will get very creative type of responses. If you lower it down I believe 0.1 is the minimum for the hugging face implementation anyway, of this generation config class here that's used when you actually generate. I can pass generation config right here. If you go down to 0.1, that will actually make the response more conservative and will oftentimes give you the same response over and over. If you go higher, I believe actually 2.0 is the highest. If you try to 2.0, that will start to give you some very wild responses. It's fun. You should try it.

这是第一个实验,就像我说的,我们将抓取这些发生在人们之间的对话数据集。我们计划做的是总结这些对话,所以想象一下你和你的顾客之间的支持对话,也许月底你想要总结你的客户支持团队那个月处理的所有问题。现在还有一些其他的事情要注意,我在这里放大了一点多,但你可以看到我们有八个CPU,我们有32G的RAM。我们使用Python 3.0,这些都是一些pip安装。所以如果我在这里按Shift Enter,这将开始做Python库的安装,我们看到我们将使用PyTorch。我们正在安装一个叫做Torch数据的库,它有助于数据加载和PyTorch特定于数据集的一些其他方面。

在这里我们看到Transformers。这是来自Hugging Face的一个库,这是一个非常酷的公司,他们为大型语言模型构建了大量的开源工具。他们还构建了这个叫做数据集的Python库,它可以加载许多常见的公共数据集,人们用它们来训练模型,微调模型,或者只是尝试。如果你点击Shift Enter,这将会运行一段时间。请记住这确实需要几分钟来加载。这整个笔记本将依赖于这些库。所以确保这些确实安装。忽略这些错误,这些警告。我们总是尝试做事情来减轻这些错误和警告,它们总是出现,事情仍然会工作。相信我,这些库和这些笔记本确实可以运行。我们已经固定了所有Python库的版本,以便在新版本出来时,不会破坏这些笔记本,所以请记住这一点。这确实说要重启内核,我不认为你必须这样做。让我们继续前进。

现在我们实际上要在这里做导入。这将导入叫做加载数据集的函数,这将导入一些完成我们实验室所需的模型和分词器。我们将使用这个叫做对话摘要的数据集,这是一个公共数据集,transformers,特别是数据集库确实显现并确实让我们访问,所以我们所做的就是调用上面导入的加载数据集,我们拉进这个数据集。现在,从这里开始,我们将探索一些数据,我们将尝试仅使用flat T5模型进行总结。不过在我们到达那里之前,让我加载数据集。让我们看看这个数据集的一些例子。

这是人1和人2之间的样本对话。人1说,“汤姆,几点了?”看起来人2的名字实际上是汤姆。“等一下,我的手表上是10点到9点“等等。这是基线人类摘要。这是人类标记这次对话的摘要,对话的总结。现在我们将尝试使用我们的模型改进该摘要。再次强调,甚至还没有加载任何模型。这纯粹只是实际的数据。这里是对话,然后想象这是训练样本,然后这是人类标记的,然后我们将比较人类摘要,我们认为这是基线,我们将比较人类的摘要与模型预测的摘要。模型实际上会生成一个摘要。

这是第二个例子。你可以看到这里有一些我们都熟悉的术语,CD ROM画图程序用于你的软件。现在,这里是我们实际上将要加载模型的地方。FLAN-T5,我们在视频中谈到过。这是一个非常好的通用模型,可以做很多任务,今天我们将专注于FLAN-T5的对话摘要能力。加载模型后,我们必须加载分词器。现在,这些都来自Hugging Face Transformers库。给你一个例子,在transformers出现之前,我们不得不自己编写很多这样的代码。根据模型的类型,现在有很多不同的语言模型,有些模型的行为与其他模型非常不同。有很多特定的临时库都试图做类似的事情。然后Hugging Face出现了,真的有一个非常好的优化实现所有这些。

现在,这里是分词器。这将用于将我们对话中的原始文本转换为我们的向量空间,然后可以由我们的Flan-T5模型处理。为了让你有个想法,我们就拿这里的一个样本句子来说。“汤姆,几点了?“我们上面对话的第一句,我们看到编码的句子实际上是这些数字。然后如果你去解码它,我们看到这解码回到了原来的。分词器的工作是将原始文本转换为数字。这些数字指向一组向量或通常称为嵌入,然后用于像我们的深度学习、反向传播、线性代数等数学操作,所有这些有趣的东西。现在,让我们运行这个单元格并继续探索。现在,我们已经加载了我们的模型和我们的分词器,我们可以运行一些这些对话通过Flan-T5模型,看看这个模型实际上为这些对话生成了什么摘要。

再次,我们有了对话。再次是基线摘要。然后我们看到没有任何提示工程,只是采取实际的对话,传递给我们的Flan-T5模型,它在总结方面做得不是很好。我们看到是10点到9点。那不是很有帮助。这次对话中的更多细节此时没有出现。关于我们的CD-ROM的对话也是如此,基线摘要为“人1教人2如何升级人2系统中的软件和硬件“。模型生成”人1正在考虑升级他们的电脑“。再次,原始对话中的许多细节没有通过摘要传达出来。让我们看看我们如何改进这一点。在课程中,你学会了如何使用指令告诉你的模型你试图用你传递给它的数据做什么。这里有一个例子。这叫做上下文学习,具体是零样本推理与指令。

这是指令,即总结以下对话。这是实际的对话,然后我们告诉模型它应该在哪里打印摘要,就是在这个词摘要之后。现在这看起来很简单,让我们看看它是如何做的。让我们看看事情是否确实变得更好。这里并没有好多少。基线仍然是“人1很匆忙,汤姆告诉人2有足够的时间“。然后是零样本上下文学习与提示,它只是说火车即将离开。再次,效果并不是很好。这里是计算机样本的零样本。它仍然认为人1试图升级,所以并没有好多少。我们继续看这里。我们可以使用不同的提示,即直接说对话玉米。现在这些真的取决于你们。这是大型语言模型中提示工程的一部分,我们正在尝试找到最佳的提示,在这种情况下只是零样本推理。不对模型进行微调,什么都不做。我们所做的只是寻找不同的指令来传递,看看模型是否能够通过稍微不同的短语表现得更好。让我们看看这样做的效果如何。实际上这与之前的情况正好相反,在这里我们只是说这是对话,然后之后我们说在那个对话中发生了什么。让我们看看这是否有所改善。汤姆赶不上火车了,它捕捉到了这一点,但仍然不是很好。这里我们看到人1。你可以添加一个绘画程序。人2那将是额外的奖励。好一点了。它并不完全正确,但它变得更好了。至少捕捉到了一些细微差别。

现在,作为上下文学习的一部分,你会学到所谓的“单样本“和”少样本“。让我们在这里得到一个样本。让我们亲手尝试一枪然后少数枪。之前我们做的是零样本。这意味着我们不给它任何提示和完成的样本,我们所做的只是给出一个提示。我们要求模型做某事并看看模型生成了什么。通过“单样本“和”少样本“,我们将实际上提供正确的样本,或者使用人类基线。这给模型更多的信息来处理。

让我们看看“单样本“在这里是如何工作的。我们所做的只是拿一个完整的例子,包括来自人类基线的摘要,然后给它第二个例子,但没有实际的摘要。这是我们希望模型总结的对话。让我们看看这看起来如何。“单样本“味着我给它一个完整的例子,包括由这里的人类指定的正确答案,人类基线。然后我们给它第二个例子并询问模型发生了什么。让我们看看我们在这里做得如何。这里我们只是要做升级软件。人1想要升级,人2想要添加绘画程序,人1想要添加CD ROM。我认为这好一点了,让我们继续前进。还有所谓的”少小样本“推理。

现在你们中的一些人可能会问,嗯,这看起来像是在作弊,因为我们实际上给了它一个答案然后问它。这并不是真正的作弊。更多的是你在帮助模型帮助自己。在未来的课程和实验室中,我们将实际上对模型进行微调,那时我们可以回到零样本推理,这是你通常会认为的好的语言模型。但在这里我们只是在建立一些直觉。请记住,这是一种非常便宜的方式来尝试这些模型,甚至弄清楚你应该微调哪个模型。我们选择了Plan T5,因为它适用于大量任务。但是如果你根本不知道模型是什么样的,如果你只是从某个模型中心得到的。这些是第一步。提示工程、零样本、单样本、少样本几乎总是当你尝试学习你被交给的语言模型和数据集时的第一步。也非常特定于数据集和任务。少样本意味着我们给出三个完整的例子,包括人类基线摘要,1、2、3,然后是第四个但没有人类摘要。是的,尽管我们有它,我们现在只是在探索我们的模型。我们说,告诉我们那第四个对话是什么。那个摘要。忽略一些这些错误。这些序列中的一些比模型的512上下文长度要大一些。通常,你可能想要过滤掉任何大于512的输入。但在这里它仍然做得相当不错。这里我们看到一个少样本没有比单样本做得更好的案例。这是你想要注意的事情,因为在实践中,人们经常尝试不断增加更多的样本,五样本、六样本。

根据我的经验,通常在五到六样本以上,完整的提示和完成之后,你真的不会获得太多。要么模型能做到,要么做不到,大约五到六次。这里我们看到对于这个特定的样本来说,单样本确实足够好了。现在这个实验室的最后一部分将会很有趣。这是你可以真正玩一些你在课程中学到的配置参数的地方。像采样、温度这样的事情。你可以玩这些尝试并获得你的直觉,了解这些事情如何影响模型实际生成的内容。在某些情况下,例如,通过将温度提高到上面,朝向一甚至更接近二,你会得到非常有创意的响应类型。如果你降低它,我相信0.1是hugging face实现的这个生成配置类的最小值,无论如何,当你实际生成时可以使用。我可以在这里传递生成配置。如果你降到0.1,那将使响应更加保守,并且经常会一遍又一遍地给你相同的响应。如果你提高,我相信实际上2.0是最高的。如果你尝试2.0,那将开始给你一些非常狂野的响应。这很有趣。你应该尝试一下。

Lab 1 - Generative AI Use Case: Summarize Dialogue(实验1——生成式AI用例:对话摘要)

In this lab, you will do the dialogue summarization task using generative AI. You will explore how the input text affects the output of the model, and perform prompt engineering to direct it towards the task you need. By comparing zero shot, one shot, and few shot inferences, you will take the first step towards prompt engineering and see how it can enhance the generative output of Large Language Models. 

The labs are accessible to learners who purchased the course. If you have not yet purchased access, you can do so through the "Upgrade to Submit" button below.

If you have already paid for the course, start the lab by first ticking the checkbox below indicating you will adhere to the Coursera Honor Code, then click the "Launch App" button.

The lab is formally ungraded, but you will need to click on the Submit button to complete the lab. This button is on the top right of the Vocareum page and not on the AWS console.

Please refer to this topic in our community platform for common questions and troubleshooting regarding the labs. If you can't view it, please create an account following the instructions here, then click on the topic again.

This course uses a third-party app, Lab 1 - Generative AI Use Case: Summarize Dialogue, to enhance your learning experience. The app will reference basic information like your name, email, and Coursera ID.

在这个实验室中,你将使用生成式人工智能进行对话摘要任务。你将探索输入文本如何影响模型的输出,并进行提示工程以引导它完成你需要的任务。通过比较零样本、单样本和少样本推理,你将迈出提示工程的第一步,看看它如何增强大型语言模型的生成输出。

实验室对购买课程的学习者开放。如果你还没有购买访问权限,可以通过下面的“升级提交”按钮进行购买。

如果你已经支付了课程费用,请先勾选下面的复选框,表示你将遵守Coursera荣誉守则,然后点击“启动应用”按钮开始实验室。

本实验无需正式评分,但你将需要点击提交按钮以完成实验室。这个按钮位于Vocareum页面的右上角,而不是在AWS控制台上。

关于实验室的常见问题和故障排除,请参考我们社区平台上的这个话题。如果你无法查看,请按照这里的说明创建一个帐户,然后再次点击该话题。

本课程使用第三方应用“实验室1 - 生成式AI用例:对话摘要”,以增强你的学习体验。该应用将引用基本信息,如你的姓名、电子邮件和Coursera ID。

  • 3
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值