Power accessible solutions with gen AI & eye gaze direction detection

最新推荐文章于 2024-09-12 09:06:30 发布

李白的朋友王维

最新推荐文章于 2024-09-12 09:06:30 发布

阅读量40

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/135120136

版权

Hello, everyone. Many organizations find themselves today considering optimization and modernization of their existing application landscape to include the benefits and unique offerings of generative AI, with the view to transform user experiences they build for their own customers. But they might find the journey rather challenging and difficult to embrace the new trends while also having to think about ethical considerations and responsible use of the same technology.

My name is Amir Ravi. I work for Amazon Web Services as a Solution Architect. And today I will showcase a rather unique use case on how to build and power accessible solutions with generative AI and eye gaze direction detection. Welcome to the presentation and thank you very much for being here.

According to the UK's Office of National Statistics, there were 11 million disabled people using the internet in year 2020. Consequently, this use case is about building a solution and helping disabled people with a solution where they can interact with a generative AI enabled application simply by looking at the screen, using their eye gaze. That makes generative AI much more inclusive and in being so, it helps disabled people to access the benefits and unique offerings of the technology.

So today, I will showcase how we can build inclusive solutions on AWS using the fusion of traditional ML and the new AI services. I will show you the demo of the application, which hopefully will be very interesting to see. That will be followed by a deep dive of the architecture so we can see how it all hangs together and how it works. And in the end, I shall conclude today's session by talking about how we can help you and enable you to build your accessible, inclusive solutions on AWS.

Now this story has a rather personal resonance to me. Let me tell you about David - by the way, this is not David in the picture, this is just to show a working environment. My friend David was a friend and colleague and we used to work together some time ago. He was a disabled person, wheelchair user with speech impairment and limited mobility. He was a truly wonderful character and one of his favorite things to do while we were in the office was to joke at my expense, especially when my football team was getting beat, and that created a rather unique bond between the two of us.

On top of that, he was an excellent SQL developer and he used voice assistive technology to help him write his SQL code. The technology did work pretty well and it enabled him to deliver his work, but at the time I could also see that it made mistakes, it was error prone, it led to frustration and it failed him on the inclusion and accessibility side of things. That has made a big impact on me.

So I have embarked on building a proof of concept application. Our inclusion and diversity approach and ethos is embedded in the principle that inclusion must come first before we can have fully diverse and equitable AWS in the world. So when I found out that in May 2023, Amazon Rekognition released a new upgrade to the service that includes the ability to track the user’s eye gaze in near real time, I immediately thought about David and how he could appreciate a solution that’s based on capturing users’ eye gaze and mapping that to a particular screen or control on the screen that they are currently looking at. That would enable the eye gaze to be a primary driver of the communication between the human and computer application. That was one aspect of the solution.

Another aspect of the solution is to democratize generative AI and to make it more inclusive for disabled people, given the benefits that it can bring when used responsibly.

So far I’ve mentioned Amazon Rekognition a few times. So let me tell you about this service - what it does and how it works. It's a vision machine learning service that makes it easy to enable video and image analysis in your applications. It can detect faces, objects, scenes, activities, inappropriate content. It can also deduce emotional insights simply by analyzing facial expressions. And what we are really interested in - it can deliver the eye direction attribute. And that’s what we are going to use today.

In the demo, the eye direction attribute delivers two values - the x and y values. The x value represents the eye movement around the x axis, while the y value represents the eye movement around the y axis. And the values range from -180 to +180. The service also delivers a confidence score that ranges from 0 to 100, where a high value of the score represents the service's belief that the value we are getting is actually close to true.

So having now explained what the service does, it's about time to see the demo of the application.

[Demo]

Let's recap. We've got the screen that captures the video, that video is dissected into a sequence of images. Images are sent to the back end and that creates the text generation, we display that on the screen and we run the key insights extraction. And we then follow that process. So let's see how this works.

This is the part where we get to use Amazon Rekognition to tell us the point on the screen that we are looking at. The web user sends the image to Amazon API Gateway, the service we use to build APIs at scale. Gateway then speaks to Lambda. And Lambda creates a call to Rekognition to give us the x and y values and confidence score. That gets sent back to Lambda, Gateway and to the front end.

Once we have that, we then work out the prompt creation. The prompt is created, we use the same line of execution that goes to Gateway, Lambda, it goes to AWS Lambda and Lambda speaks to the Falcon 7B endpoint on SageMaker. Falcon 7B is a large language foundation model that's been trained on SageMaker. And when it gets text created, it goes all the way back to Lambda, API Gateway and to the front user.

If you remember from the application, all those large boxes that were holding key insights were created by the service called Amazon Comprehend. And Amazon Comprehend is a natural language processing service that does many things such as document analysis, topic modeling, key insights extraction, and so on. It can also be used to deliver the sentiment of your product user reviews for example. And in this instance, we use Amazon Comprehend to give it the generated text input from the generative AI large language model, and ask it to give us the key insights from that.

It follows the same path - the API request call is created to Amazon API Gateway which runs down to Lambda. Lambda gets in touch with Amazon Comprehend. And we get our generated text returned to Lambda, Gateway and the web user.

And now this is the complete architecture of the system. And one of the main parts of this is just the principle of simplicity - there isn't really that much going on at the backend. So we see I think 5 unique AWS services - Rekognition, SageMaker, Comprehend, Lambda that's used in each scenario, and Amazon API Gateway.

Now this could of course be, if this application was to be worked further on and delivered into a final product, this could of course be made much more complicated. But the main tenet of this demo is that it's based only on simplicity.

So how can this help you? To me, innovation comes as something when you think outside the box, when you're not afraid of trying different things, failing and then restarting again, thinking outside the box, pushing boundaries. And I quite often find these ideas of how to build something, minimal viable products, when I'm in the gym exercising or when I'm outside running, doing my 10k, because it really does help. It takes your mind off the fact that you are running, you're getting tired, you want to stop but you don't want to stop. And then thinking about ideas really does help because when you realize well I've already done half an hour of running, then it feels not only that you've been productive for your health benefits but you're also working on things that you really like and enjoy.

So when I heard that Amazon Rekognition released this upgrade of its service and made it available to us engineers, developers, architects to capture the user eye gaze in near real time, I had a vision of this application that had a large number of boxes on the screen and where each box represents a conversational topic that drives the conversation between the front end and the generative AI enabled application purely by using the user's eye gaze. And that was rather compelling to me because that use case was quite unique, it made it different from your everyday typical chatbot application, also being a fusion of different ML and AI services, it delivered more visual set of features, and it also enables generative AI to be more inclusive to disabled people and gives them access to this technology and makes it easier for them to use this technology.

And how can this be of help to you? If you or the organization you work for are currently undergoing similar projects, building inclusive solutions, or if you're at a point when you are thinking about starting this journey, I would be happy to share my experiences and challenges that I've encountered while building this proof of concept you've seen.

The backend of this demo application is fairly simple, there isn't that much going on. But the most challenges I've encountered were on the front end, trying to map the values I get from the backend and make them work on the screen. You should have seen the first and second versions of this, it didn't look anything like it. That red circle that was on the screen was quite manic, it was all over the place. And at some point I thought to myself, should I just drop this or should I just continue? And I'm glad I didn't choose the first option and I went with the second option.

So I will be happy to speak to you about that if you are thinking about building a similar project. On the next slide you will see my LinkedIn details as well, and please feel free to contact me on LinkedIn and I'd be happy to help you out. I also hope to get a blog on AWS written on this subject at some point before re:Invent. So if you follow me on LinkedIn you will be notified and we can certainly stay in touch.

That brings us to the end of this presentation. And I feel really privileged and happy to be able to stand in front of you here today and talk about a subject that means a lot, and show you the work I've been busy with over the last few months, let's say.

Thank you very much for being here. I hope you found this worthwhile and I wish you good luck on building your own inclusive solutions.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Power accessible solutions with gen AI & eye gaze direction detection

Demo]
复制链接

扫一扫