Unlocking your full potential with the power of generative AI on AWS

Good morning, everyone. I would like to start to tell you that in March 1973, an edition of Scientific American magazine was published. At that time, this magazine did an analysis to understand who is the most efficient animal on the planet in terms of locomotion. The researchers found that a bird, like a condor, was the most efficient animal on the planet in locomotion.

We, human beings, were also evaluated in this study. We became one of the last positions in this ranking. So the researchers had the idea to evaluate us using a bicycle. When they did that, we went from one of the last positions in the ranking and took the first one, far ahead of the condor.

Based on this study, Steve Jobs said something in one of his talks that I would like to share with you today. This is the first reflection for today - human beings are tool builders. Us, human beings, are tool builders. It means that you can build tools that erase our capabilities. For example, you can use a bicycle to erase our locomotion limitations. You can build glasses or binoculars to enhance our vision. And you can build TVs and artificial intelligence to enhance our cognitive and mental capacities. That's what we are talking about today.

I'm Vinicius Caridad and it's a pleasure to be here at Re:Invent, the biggest technology event in the world. Speaking about generative AI at Re:Invent is a great honor. Please feel free to add me on social media. Any posts you make about this talk today, please tag me - I'd love to connect.

I work at a bank in Brazil, the largest bank in South America. I'm also a professor there and I love engaging with the tech community, which is why I'm here today as an AWS Machine Learning Hero to talk a little about this technology and how it can create value for society and business.

Let's continue. I want to draw an analogy with the evolution of cars. The automobile arrived in society in 1886-1887. When cars first appeared on streets, society had a lot of fear about this new technology - it could cause accidents and kill people. With all this uncertainty and distress about automobiles, newspapers published accident photos and reports comparing cars negatively to horses.

Because of this fear, a law called the Red Flag Act was created in 1878. It said that if you bought a car, you needed to hire someone to walk in front of the car waving a red flag to signal the car's arrival. At night, this person needed to carry a lamp.

Of course, automotive technology continued to evolve. An important milestone was when Volvo invented the 3-point seat belt in 1959 - the seat belt design we still use today. Importantly, Volvo made the patent freely available, similar to open source today. When Volvo invented the seat belt, they realized it was too important to keep closed and could save lives industry-wide.

In addition to seat belts, we as a society have implemented other evolutions like ABS brakes. New safety technologies need to continuously be discovered to create good solutions for society and business.

This brings me to the second reflection for today, a quote from Arthur Schopenhauer: "All truth passes through three stages. First, it is ridiculed. Second, it is violently opposed. Third, it is accepted as being self-evident."

In other words, every new technology arises with uncertainty and fear at first. You need to understand how it works, evolve it, and bring real benefits to society and business.

This is the same situation with artificial intelligence and generative AI. Though it arrived in people's hands en masse in the last year, it has been researched for over 30 years. For example, here I have a doctoral thesis from Harvard from 1993 on generating text descriptions for images - 30 years ago!

Last year, Gartner put generative AI at the "peak of inflated expectations" in their Hype Cycle. This year, 2023, generative AI is at the maximum "peak of inflated expectations." Now it likely begins moving into the "trough of disillusionment" phase. But this is a good thing - it means we start to really understand generative AI as a tool that can solve business and societal problems, not a solution for everything.

How can we realize the full potential of AI? This video tries to show the illusion of AI outperforming humans across all tasks. In reality, there are many things humans inherently do better than AI. We can use AI as a tool, like a hammer, to help us do certain tasks better.

Researcher Kai-Fu Lee proposes a helpful matrix. On one axis, you have tasks demanding creativity or strategy versus tasks that can be optimized. The other axis has tasks needing compassion versus not.

AI will dominate the quadrant of optimizable tasks not needing compassion - repetitive work like data processing. But for optimizable tasks needing compassion, AI can do analysis while humans provide the nuance.

For creative tasks not needing compassion, like writing a book, AI can provide an initial draft to be refined. And creative tasks needing compassion and empathy will be dominated by humans, though AI can still assist. The key is using AI as a tool.

This viewpoint is backed up by researchers analyzing AI's impact across industries - marketing, sales, software development, and more. AI can bring more productivity and enhance human capabilities. Studies have found consultants using AI can complete more tasks, faster and with higher quality. AI is a tool to augment our abilities.

It's important to understand AI to use it effectively, like the bicycle analogy. In a race, having some people on foot and others on bikes isn't fair. And if you give someone a bike who can't ride it, it's even worse. You need to know how to utilize the technology as a tool.

That brings me to the third message for today: You won't necessarily be replaced by AI, but you could be replaced by someone who knows how to use it.

You won't be replaced by AI is a very common question, but you could be by someone who knows how to use AI to erase our capabilities. Let's understand a little bit more how AI works.

In general applications, we have some user input. These inputs go through the artificial generative AI application and this generates output for the users. For example, if I ask for an application "I would like a text about generative AI", the system will put some text like "Generative AI is a breakthrough technology for us," etc. This is a general application in LLMs using generative AI.

For example, if I would like to use more specialized services like Amazon CodeWhisperer, in this application, the user input is text to ask about some code like "Parse CSV to reshape a string," etc. And the output of the system will be code that solves this demand.

It's important to understand and open the generative AI application toolbox and understand some main stages behind this application:

  1. Understanding
  2. Reasoning and Knowledge
  3. Generation
  4. Safety and Responsibility

These are the four main stages behind generative AI applications.

The first thing it needs to do is understand what the input means - the user gives an input, some demand, and you need to understand what this input means. Then you need to understand if it's possible to improve this input - this is prompt engineering, because the prompt is very important to enrich the input and achieve better results.

Then you need to go to the second big stage - reasoning and knowledge. Once I understand the input, I need to know how to solve it. Here we start to talk about frameworks like chain of thought, RAG, etc. that allow us to build reasoning to solve problems.

And once I know how to solve it, I need to know where the answer is - what data has the answer I need to provide the users. So here we start to think about retrievers or other components to help find where the answer is.

Of course, when I have the data, the information, I will start to generate the answer. Sometimes I need to customize this answer for a domain, customer, etc. And you need to do it in a safe way - generative AI is powerful, but with great power comes great responsibility.

This is a good solution - it's safe, responsible, and we finish the life cycle. Monitoring and improvement is very important - this is a life cycle, not a workflow. You need to be monitoring and improving all these steps continuously.

Again, AI is not one tool that solves all problems. You need to think of it as a toolbox - AWS has an entire AI/ML stack to allow us to choose the best tools to solve business and societal problems. This stack is divided into three layers - Provider, Tuner, and Consumer.

The Provider layer is frameworks and infrastructure to train models or put models into production. The Tuner layer is tools to build our own models. The Consumer layer is AI services with pre-built models to accelerate time-to-market.

When you look only at the generative AI tools in this stack, we have specialized computing for training big models, Decision Maker Jumpstart to use open source models, and services like Amazon CodeWhisperer that generate code. At the top is Amazon Bedrock for scaling generative AI and Amazon Titan, the AWS foundation model.

You can choose to use AWS' foundation model or use Jumpstart to load any model from Hugging Face. But it's important to integrate other components for a full generative AI solution that brings business and societal results.

I'll explain some of these important components. First is the prompt - there are techniques like chaining, MER, analogical prompting, etc. to build a good prompt text. Then you need to transform the text into tokens and embed representations.

With LLMs there are pre-training and fine-tuning stages. Fine-tuning techniques like RLHF align the model to your data. When using the prompt directly with the foundation model, you leverage its pre-trained knowledge. But sometimes you need your own data, not just world knowledge.

So in the Tuner layer, frameworks like Rag and Rag Fusion allow using your own data with retrievers and indexes. You may need data chunks to structure the data. To store and retrieve your data, use vector databases and vector search. This enriches the prompt fed into the foundation model.

You can also fine-tune the model for your domain using frameworks like Laura and Kora for efficient parameter tuning.

Once the application is ready, build guard rails, hallucination metrics, bias studies, etc. for safety. To serve billions of parameters efficiently, use techniques like GPT-Q with flash attention.

In short, AWS helps with generative AI in 4 main ways. But generative AI is just a tool - you need an end-to-end data strategy. Start with the business use case, understand your data, select algorithms, build pipelines and models, evaluate, and deploy. Monitoring and improvement are critical.

Let's look at another use case. If I ask for "a beautiful galaxy", Stable Diffusion can generate a related image. It needs text and image representations. Images are transformed into pixel matrices with convolutional neural networks - each pixel becomes a number. CNNs extract image features to understand the parts and the whole.

Diffusion models are trained by adding noise to images and learning to remove noise and reconstruct the original image. After training, it starts with heavy noise and generates new images from the text prompt.

We use techniques like evolutionary networks and futures to do this learning. We also need to combine text and image representations into one - we can use a variational autoencoder.

Then a U-Net convolution neural network acts as a noise predictor to understand the difference between original and noise, and generate new images. Its architecture mixes techniques like ResNet, attention layers, etc.

The full architecture has components for the diffusion process, text encoding, noise schedule, and reconstructing the original image. It starts with an image, adds noise, encodes the text prompt, generates a new image while learning the noise, and tries to reconstruct the original image. Training this architecture produces the desired results.

So in summary, generative AI has many components working together - techniques for prompts, knowledge retrieval, data encoding, neural architectures, training workflows, etc. When combined appropriately, it can generate useful and safe outputs for various applications.

"Ok. And this is a lot of techniques algorithms. You have some complexity in these algorithms too, but we can do it in a very easy and fast way using AWS A Maker to the lab here. I have an example. All this code is available in my GitHub. You can use it and here AWS A Maker Studio, you can do it using only six line of code and more less than 10 minutes and spending no money, it's free, you cannot spend any money here.

So in the first block of code, we are loading the stable diffusion model available on Hugging Face portal. It's a a brazilian fine tuning model that we and some colleagues put in Hunging Faces and you can use it and test and send uh some reviews for us, please. And you can use it for free on a uh on AWS A Maker Studio.

Then the second block of code we are loaded all these techniques that are already set like variation encoder un ne convolution network network test encoder scheduler tokenizer, 11 line of code for each technique, each algorithm. So very easy way you don't need to understand deeper any of these algorithms.

And the last one, the last block of code we have the prompt that you'd like to, to use as an input from this application. Like i i ask here to generate a photo of a ceo of a technology company and then you have some parameters for the the size of the imaging and et cetera.

So when you use this technology, i asked for example, a photo of a person from the united states and this the solution generated this image here. And then i try to enrich a little bit more my prompt. So i ask for a photo of a person from the united states that work at amazon's web services as a vice president and chief evangelist and the model generated this image here. So it's the model, it's not me. So the it's remember something but it's ok, let's move on.

And what is very important here, you can use this model for free, very easily in nws decision makers to the lab. But in a real world, you need to sometimes specialize this model. For example, for some business, you need to specialize for some domains and et cetera. If you don't need to specialize as you can see, it's very easy to use. But sometimes the data, it's outside of the training for in in this very toy example, the model know which a cat, the model know which is a dog. But the model don't know which what is a rabbit because i don't have pictures about the rabbit here. Of course, this is a two a toy example. But for example, in a bank that i worked, uh we have uh a product that the name is multiple multiplied. If you go to understand what multiply uh means in a normal word, multiply is related for some mathematical equations. But for the banking multiply, it's it's a name of a product that have two functions in the same time, debt and credit. So if i ask for the model about something like that, the model, you don't know about it because they, they don't have data to understand what is this. So i need to specialize this model.

And of course, here, i don't would like to uh compromise any company i need, i need to be a safety here. But i choose to compromise aaa person. He's a very famous person. It's jeff barr. I think everyone knows jeff barr. The bar is very famous from a double s so i choose to compromise him here using some example.

So if i ask for a stable diffusion, a photo of jeff barr, the this application will be generated this image here as and this is not a jeff bar. So i can enrich my prompt and say please generate a photo for me uh of jeff barr that jeff barr works in aws as a vice president and chief evangelist. Even that with this new prompt, the model generated this another image. It's not looks like jeff bar. It's another image very different for the first, but don't look like jeff bar because the model don't know who is jeff bar. They don't have this this data, this information. So you need to teach, you need to specialize the model to know this new concept. This is new information and of course, this is the de bar.

Jeff barr is a very famous guy from the aws. He is a very fantastic guy and i have the opportunity to met him three times in my life. And every two times, i asked to take a picture of him maybe after this talk, he not accepted to take pictures with me again. But yeah, i i can use another technique to teach this model. And this technique is i have a lot of techniques. So i choose the text showing version using a text showing version. This is the high level architecture for the text showing version. I can we specialize the model. We we using some few examples of the images and few examples from prompts to describe this image. I can specialize this model.

As you can see, i have uh one example of the prompt is a photo of ss is the new concept that i would like to teach the model. And another side, i have some input sample this kind of clock here. So i use this two uh this two data to specialize this model. As you can see some parts of this architecture have a lock because you don't need to um to lost the previous knowledge from the model. We freezing some neural networks weights and just put the new knowledge, the new concept in this model.

So for example, the the the new word that i would like to teach for the model with some examples. So for example, i can teach the model a new yoga pose. And when did the model knows? What is this yoga pose? Is you can generate the images in this pose. I can teach the model about this streit cat and generate the model about it.

So to do it to do this specialized model, now i can move from the aws as maker studio lab to the aws station maker because i need more infrastructure power. Now i need to use some gp us and etcetera. And i will start again. All this code is available in my github and you can use it and etcetera, you can use it for your family photos and et cetera. It's uh it's very fun.

So, so the first block of code, we need to start to some parameters. I would like to teach the model who is jeff barr and jeff barr is a person. This is the placeholder that can teach the model. And the second block of code, i'm loading the stable diffusion model, the foundation model again from the hugging faces. So i'm taking this this foundation model and then i will specialize it.

So for do it, i need, i, i did, i need to stalk jeff bar in the social medias. So i go for some social medias from jeff bar took some pictures and only nine pictures as you can see very few examples and with low quality because i do some print screen crop the image only the the face for the f bar. So very few examples here and and low, very low quality.

So i i get this this data and put in the model like the architecture that we saw here, some photos, the nine photos from jeff bar and then some prompt examples like uh there's a photo of jeff bar, this is a painting and et cetera. And, and then in the last, the last part of the code you need to train in this model.

So as you can see here, i just training this model for 15 epoch epochs. It's very little time to do it, very, very little to show you that's very possible to use with less data, less research. We using only 20 bucks and three hours for training, you can have our own personalized model.

So i in the the first block of code, i have the training and the training said i'm would like to train in a text encoder application and i loaded the privileged foundation model on uh stable diffusion, then the placeholder and other stuff to do this training. And then in the end i can ask for the model generate something like a painting of jeff barr as a vice president. When i did it before the model generates something with no sense because they don't know who is jeff barr is. But now using only nine photos, use few datas. Few example, i teach the model using the text inversion techniques, the texture inversion algorithms. And then let's see the results.

So here's uh some photos that some paintings that the model generate from the f bar. And yeah, you can check with its look like jeff barr. So i'm very honor, very honor. A little bit nervous too. I ask for jeff barr come here with me, jeff, please. Thank you. Thank you again. Thank you.

So, what a crazy world we live in that we can do these amazing kinds of things. Um, a awesome demo. So, uh a couple of months ago, um i was just going through my email which usually takes the first three hours of my day and vinnie sent me a nice, very nice email and said i'm getting ready to do a re invent talk. And he kind of walked me through it and he said, would it be ok if i used you as the example? And i said, of course, that'd be totally, totally awesome. That'd be super, super neat. And ii, i didn't quite grasp the entire um intent of what he was doing. And ii, i said, you know, i, i've been collecting selfies for almost the last three years and i had like 900 selfies. I volunteered to send him. He said, no, no, no, no, i, i can do it with just nine. And i've already got my training data which was super, super, super cool. If you need 900 selfies, let me know. I'll happily share those with you. Ii, i do actually kind of think that we need to keep the universe in balance and that if, if someone wants to do something a little bit cool, you kind of have to respond in, in kind and say, well, i, i'd be more than happy to help out and to, to, to be here and just say, wow, this is so, so amazing what you've done here. And ii i look at all of those layers in that you've kind of walked us through of how do we set up the um the training and how do we generate and all those edge detection and all those matrices. And i'm thinking there's like nine layers of things i will never ever understand, but you've packaged it up for us in this really amazing way that that makes it i think, understandable and approachable and very usable and very, very powerful. So a awesome work, really appreciate all that you've done here to, to make that, that uh something amazing. Thank you.

Thank you, jeff. It's very kind to you to have you here with me. Thank you very much. And can we do what looks like jeff wire did the photos or? No, let's do it with the audience. Ok, perfect. Ok. How about that? We're good. We get all that light. Let's move over, right? A little bit. Awesome. Great. Thank you. Thank you very much, jeff for being here. Thank you and everyone. Thank you again. It was a pleasure to be here talking about j tv iu. He's had some my, my care code with some social neuro social media links. Please add me"

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值