Jupyter AI: Open source brings LLMs to your notebooks

最新推荐文章于 2024-05-20 17:26:43 发布

李白的朋友王维

最新推荐文章于 2024-05-20 17:26:43 发布

阅读量265

点赞数 6

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134786819

版权

Good afternoon, everyone. Welcome to our presentation on Jupiter AI.

Hello to this room. Hello to the overflow room. Hello, everyone watching this on YouTube.

My name is Jason. I'm a senior front end engineer on open source Jupiter at AWS. I'm here with my colleague, Piyush and we're here to talk to you about Jupiter AI. That's an open source extension that our team built to bring the power of large language models to your Jupiter notebooks first.

A bit of introductions.

My name is Jason. I am a senior front end engineer on the open source Jupiter team. I'm also a member of the Jupiter Lab council and I serve on the Jupiter diversity equity and inclusion standing committee. I've also been a member of Jupiter's security working group.

I'd like to introduce my coworker Piyush here.

Thanks. Hi, everyone. My name is Piyush Chan. I'm a senior software engineer on the JP open source team. I'm also a part of the Jupiter Server console and I'm a contributor on the Juta Lab and L Chain projects. L is an open source project as a framework for building generative applications and it's a key framework that we use for Ju AI.

Alright, thanks Piyush.

So who here has used Jupiter products before - Jupiter Lab, Jupiter Notebook, AWS SageMaker Studio? I see a lot of hands going up. That's good. For anyone who's unfamiliar, Jupiter is an open source project that began nearly 20 years ago and our boss, Brian Granger is one of the co-founders of what is now Project Jupiter.

It's an open source project for interactive computing. It was originally designed for Julia, Python and R - hence the name. But because it gained in popularity, there are now what are called kernels for just about any programming language you can think of.

And a lot of companies including AWS have built commercial products on top of Jupiter and this is what a Jupiter notebook looks like. You see on top, we have some rich text style using markdown in the middle. We have some Python code. And unlike when you run, for example, a Python script in your terminal, you're running this usually in a web browser. So anything your web browser can display like this data visualization, you see can be rendered inside a Jupiter notebook.

So we built Jupiter AI. What is Jupiter AI? Jupiter AI is an official Project Jupiter extension. It is officially governed by Project Jupiter. So everyone in this room can download it. It's licensed under a permissive open source license. You can even file bugs enhancements and file pull requests to contribute code to it.

Now, Jupiter AI is not itself a large language model. We didn't build our own model, but instead we built an interface, an open source interface to connect you with the large language models of your choice.

And there are two ways to interact with it. You can use a chat interface that's in the left panel of Jupiter Lab or you can use magic commands with any Jupiter application that you want and those run inside your notebook.

We're going to be going through some of the things that you can do with Jupiter AI.

Starting with the basics, we're going to start with showing how you use generative models to generate text and code. We're also going to show you how you can explain code, debug errors in code and even rewrite code using language models.

Jupiter, a chat interface can also learn from your local data. So then you can ask questions about it using embedding and generative models.

And my favorite, which we're saving for last is that you can build an entire Jupiter notebook from a single text prompt.

So before we get on to the demos, I want to talk a little bit about the design principles that we used when we were building up Jupiter AI.

We talked a lot about these when we were first building it. We still talk about these when we talk about adding new features or improving existing features.

The first one being that Jupiter AI is vendor neutral. Now, there's a lot of vendors out there who provide large language models. AWS is one of them, but it's not the only one. We use an open source library called LangChain, which works with many different model providers and many different models and as new capabilities are added to LangChain, which is very often we can add those capabilities more easily to Jupiter AI.

And we make all of our features work as well as possible with as many vendors as possible. We put the user in control of whatever model they want to use.

Jupiter AI is transparent and traceable. There's a lot of concern out there about people using or misusing generative AI to mislead. And so we have tried to avoid that by when you generate a code cell using a magic command, we tag that in metadata to indicate that it was generated using Jupiter AI. When you use the generate command in the chat interface to generate an entire notebook, that notebook indicates that it was generated using Jupiter AI. And we think that this is going to improve trust in the technology.

Jupiter AI is collaborative. Now when you have a Jupiter server running, you can have multiple people using the same server at the same time, you can even collaborate on notebooks in real time. The chat interface is designed so that it's not going to be just you talking with an intelligent agent. We can have multiple users in a shared chat session, communicating among themselves with an AI assistant.

Now you're not going to see that today because we're just going to be doing one person demos. But that's our vision for Jupiter AI going forward.

Jupiter AI is exclusively user driven. You are in the driver's seat. When you use Jupiter AI, we are not passively scanning your code. We are not scanning your file system and sending it to a language model until you explicitly ask it to. We want to make sure that your data stays in your control as much as possible when you use our software.

And Jupiter AI is human centered. Our boss, Brian Granger is one of the co founders of Project Jupiter and he really wanted Jupiter AI to feel like software you've used before. So the chat interface looks and feels and works like any chat program you've used the magic commands if any of you have run magic commands in a Jupiter notebook before the ones we created look and work just like the ones you've used before.

In short, Jupiter AI is middleware. Jupiter AI uses the open source LangChain library to communicate with whatever language model you select. You can choose different language models for each magic command. You can change language models in the chat interface.

The user's perspective, that's pretty much all you have to do is just choose your language model, provide whatever API key as needed. And then Jupiter AI will handle all of the prompt engineering. It will handle all of the transmission of requests and the interpretation of the results.

Let's take a look at how an example, Jupiter AI magic command works. This is an example of what's called a cell magic because it starts with a double percent. And anything that starts with either percent AI or double percent AI is going to be handled by Jupiter AI.

In this case, we've chosen a model made by Anthropic - the Clawed version 1.2 model. And because this is a cell magic, the prompt starts on line two and it can go for however long you want. In this case, we say write a haiku about Jupiter. And when we run this Jupiter AI is going to parse this, send the appropriate command over to Anthropic Clawed 1.2 model, we get a response back code blocks interspersed with markdown and raw output.

Jupiter notebook. I didn't hear any snaps on that, but that's ok. Clawed doesn't mind.

The chat interface lives on the left side of Jupiter Lab. And so you can make as many prompts as you want without modifying your notebook. So you can ask it freeform questions. You can choose whatever model you want and Jupiter, not your your intelligent assistant will answer questions.

The chat interface can also read from and write to your notebooks. It can even read from your local file storage and write notebooks into your file storage. This is where some of our advanced features you'll see later in the presentation live.

But first, let me hop over to a demo and show you how you can build up a notebook using Jupiter AI.

Ok. So here we have a Jupiter Lab session that's running locally on our laptop here. The first thing that we're going to do is load our extension, the Jupiter AI magics extension. Now, this is an extension that you can use with anything that's compatible with Jupiter notebooks. So you can run this on the command line. You can even run this through third party editors.

You notice that all of the commands that we have start with either percent AI. These are what are called line magics. These are one line commands or double percent AI. This is what's called a cell magic. So everything in the cell is going to be executed starting from line two.

So the first thing that we want to do is list all of the models that we have. Now, I'm going to collapse the left side panel because AI list generates a lot of information. This is going to show me many but not all of the AI models that I can use with magic commands.

So when I run this, I get a very large table here and it shows me organized by provider, all the different models I can use. It will also tell me whether I have many of these have environment variables that for example, have an API key.

You can see here that Anthropic has a long list. Bedrock has a long list. Bedrock doesn't use an API key.

We also have what are called registry providers, Hugging Face. I don't know if anyone here has used Hugging Face before Hugging Face has over 300,000 models in it. And we can't represent even 1% of those here. So instead we say just go to this website, find a list of models and here is how you represent one of those models.

So you can play around with these as long as you have API keys and you're ok with using them for the terms of service, you can use whatever model you want.

So if I jump down here, I can send this command over to Anthropic Clawed two model, I can say generate a pandas data frame to depict airport delay times. And if you notice i passed in an extra parameter here, this is called a format parameter. I want to get source code.

When I run this, I'm going to get an actual code cell that I can run in Jupiter Lab. So when I run this, it sends it over to Anthropic Clawed version two model. It's going to generate a pandas or it's going to write code to generate a pandas data frame to depict airport delay times.

And you see it's generated here, a list with a little bit of extra markdown there. So we have import pandas pd. It's going to create a data frame. It has picked four airports, it has picked an average delay for each one.

Now, I just flew into and out of Kennedy Airport last week and I got lucky, I didn't have a 24 minute delay. I would not recommend relying on this for your future travel plans. I do not know where these came from. This code was written by someone else. If a human being in your team wrote code and gave it to you, you'd probably want to do a code review before you committed it and put it into production. Same goes for AI generated code. This is your coding assistant, but you can't blindly trust everything that it sends you. So I need the developers in the crowd here to review this code. Is this, if this code looks good, I need to ship it. Do I ever ship it? Just need one shippe folks ship it. Thank you. I got several ships. That means it must be high quality. I run it, it runs successfully. It doesn't print anything but that's ok. We didn't ask it to print anything. We're going to come back to this code in a little bit.

I also want to show you how you can use Jupiter AI to help correct errors in your code. Now, for the novice Python developers out there, you might not realize this is not valid Python code. We have a and b are two variables and when we print a plus b, we get an error TypeError to be specific, it says can only concatenate stir not in to stir. And that may not be intuitive to everyone out there. I respect that. So we have this command called AI error. And what this is going to do is it's going to send in the entirety the most recent error we got from running a cell along with a prompt that says something to the effect of, could you please explain this error to me in plain English? And just like with all the other AI commands we have, we're going to pick a, pick a model. In this case, we're going to pick a AI21's j2 jumbo model. When I run that, I'm going to send that prompt along with the error to this j2 jumbo model and I'm going to get back hopefully a response in plain English.

This error message indicates that the type of a is str, which is considered a string. On the other hand, type of b is an integer. And we try to use a and b in the print function we are trying to combine, which is not possible. And this is interesting. This is not something I've often seen in our rehearsals. We actually got two ways to solve this. The j2 jumbo model came through with two possible fixes for that, which is really cool. We have to copy and paste this code. After reviewing it of course, there's another way that you can rewrite the code and that's using a trick called interpolation.

So this is a cell magic. We're getting code as the output and we're going to use the Claude version 1.2 model rewrite the following code. So it doesn't have an error. Now this in curly braces here is a python expression we say in 5. Now in is a special list variable in iPython that has the inputs of all the cells. If you notice in the left on the cells, we have little numbers. This is in 5. This is the source code that's the input to cell number 5. So when I run this, it's going to be interpolated into that prompt and Claude version 1.2 sends back code that looks like we already have a bit of consensus. Because j2 jumbo's first suggestion actually matches looks like exactly the same, the same solution. So I need to ship it from a developer out there. Is this good code? Does this fix it ship it, we run it and it runs successfully.

Let's take a look at Jupiter Note. This is our chat interface, your chat, your programming assistant that uses whatever model you choose in its settings pane. And so I can ask it a question in plain text. I can ask it, for example, what's the weather usually like during re:Invent? So I send that it goes to the language model of choice. Um and it says as an AI assistant, I don't have real time data, but it knows that re:Invent is an AWS conference. It's here in Las Vegas, Nevada. The average high temperature ranges from the mid sixties to low seventies Fahrenheit 18 to 23 °C average low temperature. Now I was out early this morning that this seems a bit optimistic compared to today. I think we're a bit colder than normal today. However, it says it's important to note that weather patterns can vary. So you should check your reliable weather source. So that's uh that's useful information. It hedges it a little bit, but you've seen demos like that before. This is Jupiter AI.

So this can read from and write to your Jupiter notebook. So if I highlight this cell, remember this is the code that we generated using a large language model. When I highlight this code on the right on the left, we got a check box that says include selection so I can send it a prompt and I can say what does this code do? And when I hit, enter the highlighted code from the current cell is going to be inserted into that prompt. And it's going to send that to the language model of my choice. And it says this code snippet that is written in Python, it uses the Pandas library, it imports it calls DataFrame has a couple of columns and has an average delay. So this is a way of explaining it in plain English. In fact, we've used a couple of language models to explain it in other languages. So if you speak French, for example, you can explain this to me plain French.

So if I, if I click on this second button though replace selection, I'm allowing Jupiter Note to write into my notebook. So for example, this gave me four airports. It looks like we got Chicago O'Hare, New York, Kennedy LAX and SFO where my team at AWS just flew in from. So if I have this replace selection box checked, I can say rewrite this code to include delays. Now my home airport is SeaTac airport that wasn't in there and I did have a delay. So I would have liked to know any other airports that are omitted there. I add YYZ not z um all right. So now we're going to send that over. Now, everything that the language model sends back is going to be brought back into this um into the cell replacing the selection including some of the explanatory text with a little prompt engineering. I could say something like don't provide any extra information. Um so now we see the code looks very similar to what it did before, but we have these three new airports. Um my delay was longer than 10 minutes. It is an average and it's also made up probably. Um so we have average delays for that. Do I have a ship it on this new code? Does this look good? Ship it? Thank you, run it and it runs successfully.

So, using Jupiter AI, we wrote a little bit of code, we explained some code, we debugged errors in code and we even rewrote the code to include some extra information. But that's just the beginning of what you can do with Jupiter AI and for more on this, I want to send it over to my co presenter, Piush.

In this case, it's the what is ju a i is sent to the embedding model and it gets back the vector embedding for that query. And it sends that along with the original query to the vector database and ask the ve database. Is there any relevant data based on, you know the learned data, the vector embeddings it had already for the learned documents it had.

And so using the vector embedding for your query and the learn documents is able to provide back the relevant passages for that query. And so in in that form, is able to combine your original query along with the relevant data from your learned documents and sends that to the language model and using these two pieces of data, the context of your documents and the passages that are there.

And along with the query language model is able to answer your query more accurately than trying to create something that it has no knowledge of. And once it, once it gets a response, it's able to respond back and ju sends that response back to the user.

And so using this technique with learn and ask you can use this to connect documents. for example, tutorials docs a ps research papers to the large level model that it has no knowledge of and use that to ask questions on your local documents.

But my favorite feature of dr a i is that you can use something called uh generate command to create complete notebooks by using just a single prompt. And so let's look at how that actually works.

And so there's a command called generate available in the chat interface and you can type the slash command generate followed by the topic that you want to generate notebook for. So for example, here, i'm trying to create a notebook for mat web. And once you submit that um ju a i is gonna give you a response like this and it's gonna go in the background and start working on creating the notebook.

Once it's done creating the notebook, it will give you a response like this. And you can see that it has that long name for that file that it generated. And this is also something that is generated by the large language model. This is not something that you have to input.

And so let's look into a little bit more detail on how the generate command works in the background. Let's take a look at that.

And so once ju a i receives the generate command and the query, it sends in a task to the large model to create an outline for the notebook. And once it receives the outline, it sends a series of tasks to the language models. you know, for example, one of the tasks will be for generating the title, another would be generating the summary for the notebook and then it will send another set of tasks for creating different sections for that notebook and different cell contents for those notebook.

And all this happens asynchronously, this is non blocking. And while ju a i is doing this, um you can still keep asking other questions within the, within the chat you are, you can keep using the magic commands in the notebooks and those will still work.

And so here is an example of an actual notebook that was generated by the generate command that we saw before you can see the the long title that matched the file name. Then there is a summary section here that explains what the notebook contains. And then there are these different sections and the cell contents that are generated by um ju and so all these are created by the model.

And so with that, i'm gonna jump to a demo and show these features in the demo. i switch here. So i'm gonna close this notebook and gonna clear the screen.

And so the first command that we're gonna look at is the learn command. And so i have a folder called docs in my jupiter lab instance here. And this contains all the documentation for jut a i. And so i'm gonna use that for my learn command.

And so i'm gonna type slash learn docs. And so this is gonna submit a learn instruction to gyp a i to, to use the docs folder for learning. And so i'm going to submit that.

And what dr air is doing at this time is, is trying to go into that directory, reading all the text files, it's chunking that data and then sending to an embedding model and getting the vector embeddings back and then storing that along with the text chunks into the vector database.

And at this point, it's done. um and it's saying it's ready to use the ask command. And so once dr has learned your documents like this, you can use the ask command.

So i'm gonna use the ask command here. um i'm not gonna use the same question that we looked at. So i'm gonna use something new that we haven't looked at. So i'm gonna ask about aliases.

So aliases is another feature that is available in magic commands. And so let's see if it's able to find that. How do alias's work in jupiter, a magic commands. And so what it's doing at this time is that it knows that it needs to get the relevant passengers from the vector store.

It takes that query gets the vector embedding for that query sends those vector embeddings to the vector database, gets the relevant passengers, combines that with the query and sends that to the language model and language model. Looking at that context, it's able to give you an accurate answer based on that.

And so you can see that it's telling us that a you can use the register command with the line magic register command to register the provider and the model name so that you don't have to use or repeat those long model and provider names.

So you can use, for example here, let's give an example of this string, an alias called clad for the anthropic cloud b 1.2. And then you can use that alias in your magic commands in the in the notebook.

And so this is a way where it's not trying to hallucinate, but it's able to use the context for your learn documents and is able to respond based on that context.

Um let's look at another example. So i want to, so there are some advanced options available in the learn command. So i want to see if it's able to find that context.

Um what options are available in the lorn command in ta i, let's see, he's able to find that. And so i want to see what other options are available that you can use when you're using the learn command.

All right. So it did come back with something. So there's a delete option where you can delete all the learned data. And so if you want to restart or you want to use a different folder or you have updated a folder, you can use the delete to delete all the learned data and then you can submit the learn command again.

These are some advanced options, there's a chunk size and a chunk overlap. And so these are um advanced options where you can specify how big of a chunk size and overlap you want. And this effect how the search works. And so that you can tweak that search based on these parameters.

And so this is another example of where it was able to use that context from those documentation. And was able to pass it to the language model and given more accurate response based on that.

So i'm gonna clear that. And so the next thing we're gonna look at is the generate command. And so i'm gonna type generate. And so before i type a topic in that i've, does anybody has any topic that they're interested in that we should try here today and we can try any topic, genie regular expressions. That's a good one. ok.

Let's see. um all right, i'm gonna say generate a notebook to learn how to use regular expressions. All right. ok. So right now it's gonna go in the background and, and work on this and like i told before, it's trying to send these series of tasks to create the outline for the notebook first and then it's gonna send a next series of tasks to fill those different parts of the outline.

It, it's already done. And so while it's doing that, you don't have to wait for the notebook to done. If you have other questions that you want to ask, you can continue asking that because this is all happening in a non blocking way, but it's already done here.

And so you can see that it's telling me it has created a new file here called regular expressions learning notebook. And so let's take a look at how that look like. All right, i'm gonna move this here and so you can see there's a title on the top.

Um there's an introduction section and you can also see that it did indicate that this notebook was created by jupiter a i and then it also included the generate command that we used to generate it. And then here is some summary at the bottom that tells what the notebook contains.

And then these are various sections and then the cell contents of this section. I am not familiar with douglas expression. So i hope that this is useful but um you know, provided my knowledge of regular express, this seems like pretty detailed in terms of learning about regular expressions.

And um you know, so i'll definitely gonna check it out after the demo. But uh i'm really impressed by what it has created so far. So you can use the generate command to create notebooks for topics that you want to learn that you are trying to understand.

You can also use generate to create topics or get started with things that you want to share with peers. Because once it generates that you can edit this notebook and then, you know, learn something new and then also share this with your peers um on what you have learned.

And so, so i, i find this very useful for new things that i'm trying to learn. And so with that, i'm gonna jump back to the slides.

All right. And so let's review what we looked at today. We saw that ju a i provides it with magic commands and you can use magic commands to in the notebook to generate text to generate code to debug errors in your code and also to rewrite and correct errors in the code.

It also provides a chat ui which you can use as an assistant to ask questions from the language model. You can use the learn and ask command to learn your local documents and then you can ask questions related to them.

And then finally, we looked at the generate command where you can just give a single prompt and can create a complete notebook. And so with those capabilities, we feel that jukt a i is really here not to replace the work that you do, but to help you in your day to day, you know, workflows and help you become a better builder.

So let's review ju a's design principles that helped us motivate when we started this project.

Ju a is vendor neutral. It supports a wide variety of model providers including aws and third party providers and their models.

It is transparent and traceable. All the notebooks that you generate and the cells you generate are clearly indicated as having been generated by a i.

It is collaborative. Um our vision with the chat you are is that to provide a a chat interface, not just between you and the assistant, but also all the users that are present on your share. to instance it is exclusively user driven. That means that it is not passively scanning any source code or your file tree and it will only read data when you explicitly ask to it is human centered.

All the user interfaces that we have shown should be familiar to you. If you have used a chapter application before or if you have worked with i path and magic commands. in the past,

I want to thank brian ranger, who is our manager and a senior principal technologist at aws, who is also a co-founder of the project. jupiter. We have been fortunate to have his guidance on not just jupiter a i but all the other open source projects that my team works on.

I want to give a big shout out to my teammates, andre rosko and david q. This project won't have been possible without their contributions and hard work. And i also want to acknowledge all the open source community members who have contributed to the project or given feedback to make a i a better project.

Jupiter a i is an open source project which is available on github.com/jupiter lab slash jupiter dash a i where all the code, documentation and installation instructions are available. You can also use this to report bugs or enhancement requests and and find more information on how you can contribute to the project.

If you have any ideas or suggestions to make a better, you can feel free to open an issue on github or you can reach out to us after the presentation and we can help answer any queries regarding this project.

If you want to learn more about open source in aws, please visit the developer solution zone area where there will be aws experts. You can talk to um you can network with them. Jason and i will also be available to answer any questions about juta a i there.

Thank you so much for joining us. I hope you have a great rest of the rain went 23.

李白的朋友王维

关注

6
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Jupyter AI: Open source brings LLMs to your notebooks

Ship it?
复制链接

扫一扫

Jupyter AI: Open source brings LLMs to your notebooks

“相关推荐”对你有帮助么？