Plug and play with AI sign language recognition

最新推荐文章于 2024-10-17 13:28:07 发布

李白的朋友王维

最新推荐文章于 2024-10-17 13:28:07 发布

阅读量65

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134868127

版权

Alright, please join me in welcoming our next speakers, Nicholas and Brian to the stage.

Are you all feeling lost? So that is how it is for me as a deaf and hard of hearing person who is using technology and product services without adequate accessibility. ASL is my primary language. My first language not English. ASL um has a different grammatical structure than English and there are different methods that I use to communicate daily.

Um my first method would be or texting uh which leads to frustration and it is prone to error. The second option is TTY teletype writer device. However, nowadays, not a lot of deaf and hard of hearing people use TTY and the third option is through a sign language interpreter and they provide accessibility. However, globally, there is an interpreter shortage and most places do not provide accessibility as well.

It's where this is where innovation speaks. Um this is where SignSpeak is innovating in this area where ASL and spoken English are working in parallel for equal equivalency.

Awesome. Thank you Brian.

So Brian's story is not alone, there are across America, 50 million deaf and hard of hearing people. There are 50 million deaf and hard of hearing people in total, they comprise of $90 billion purchasing power every year. Not only does it make sense to provide them accessibility because it's the right thing to do because they have a huge economic power. But also it is economically or is regulator mandated. This is required under many different regulations such as the Americans with Disabilities Act, Fair Housing Act among many others.

In fact, the FCC recently released a change of rule that says that all video conferencing platforms, anything you have a video system where there are two people communicating such as even things like Apple, FaceTime, Snapchat Live now must follow the CVAA. So they must be accessible to deaf individuals. You have to transcribe, you have to interpret where necessary and you have to provide that.

Now science allows you to get ahead of these regulatory shifts and allow you to provide best accessibility to deaf and hard appearing individuals. We do this through a bidirectional solution. So on the first direction you a deaf individual can sign into the system and the system will recognize what they are signing and convert it to text. The text can then be voiced out loud for a hearing individual that text can be presented as text or it can be used as input for some other system. For example, if you think about systems such as Google Home Amazon Echo, all of those different systems.

We as hearing people can buy them and i can talk into them all day and they'll respond back to me. But if a deaf person walks up and tries to sign into it, nothing is going to happen. So that's where functional equivalency comes in with this. We can capture what they're signing, convert it to attack and provide it as input into these systems to allow for functional equivalency.

On the other side, we can take text or voice and convert that back into American Sign Language. This is important because as Brian mentioned, American Sign Language is a fundamentally different language than English. The grammatic structure is different. It's almost like English and Mandarin. For example, my first language is English. My second language is Mandarin. If you give me a piece of text and Mandarin, I can try to read it, maybe get some of it. But realistically, I use English every day, that's my preferred language. If you give it to me in English, I will rock it, understand it immediately from the beginning.

So that is why it is important to provide accessibility across the range because there are many different hard of hearing people whose first language is American Sign Language where that is the language that they are the most fluent.

To show you an example of this. This is a video of Brian I'm setting off screen because I'm a little bit camera sharp. So ignore that but he's going to sign into the system. And as he signs into the system, it's going to interpret that into textual input.

Hello, how are you? And i was voiced out uh through my computer so then i can respond and my response is then rendered maybe into a deep fake avatar of someone actually signing that. And this conversation goes back and forth on and on.

Now, you may be wondering why we present the text and also the sign language. The reason for that is because everyone's accessibility needs are unique. Everyone has their own preferred modality. And we strongly believe in providing optionality and choice. And this some deaf people prefer to look at the captions. Other deaf people prefer to look at the sign language. And finally, others still like to rely on a combination of both. This allows full accessibility to everyone who is deaf, hard of hearing, late deaf and what whatever whatever case this can provide accessibility.

And i'm glad to announce that right now, our API is in limited availability. We are currently working with different developers to integrate this into innovative applications to provide more accessibility throughout the their products. We're planning on releasing this for general availability in quarter one of 2024 to show you how easy this is. I'm going to show you the way that i call our API in this system.

Now understand that we also do have SDKs, we have a react SDK among others. Um however, i, i wanna sort of peel back the layers that show you one line of react is not really gonna show you what's going on. So it's quite simple. All that you need to do is you just call our endpoint and you just provide the video.

Now, there's many different customizable options because everyone's use case is unique. For example, you can stream data in through video and get a streamed output through web sockets. You can pre capture the mp four file and put it, provide it in and you can also fine tune it, fine tune the model. So that way it understands the context of what area it's operating in.

Then as you can see, you just get it back the result asynchronously, you can put it into whatever states you want and then surface it. Similarly for the sign language avatar, you can hit our endpoint and you can provide text and it will provide you with a video again, you can stream back the video if you would prefer, i chose not to because that would double the lines of code and i don't want giant block of code there. But really it's quite simple.

You can also customize the appearance and customize the style of signing for some of our partners. They really care about who that avatar is and they want a particular person and we allow them to do that because that is their brand for other people, they don't have a strong preference. So we use one of our own and with this, we've built different products and we've seen this integrated in interesting ways.

For example, some of my favorites are the video calling application that we showed you earlier. Another thing that we're working on is integrating this into a closed captioning system. So that way deaf and hard of hearing people can just sign into a video and then it's automatically captioned and it can be and you can use an AI voice to automatically voice for them in videos.

We have partners doing other interesting concepts. One of our partners is in the smart TV space and if you think of smart TV si can speak into my remote and say open netflix. But in the same way of functional equivalency, i can't sign into my remote, a deaf individual can't sign into their remote. So what they're working on is using our API to integrate it into their application to allow a deaf individual to sign open netflix and the netflix openers. And we've seen that this really improves stickiness of product among this large population.

I also want to mention that we heavily believe in doing this, right. And that's the reason why we consistently run user acceptability testing, constantly run audits of our system and make sure that it is responsibly deployed to make sure that this is by the community and for the community, most of the money that we receive goes back into the community through data collection and we always bring controversial topics to the community to allow them to make a decision about how this technology is used to.

So with that, um if you want access to our API scan this, you can schedule a meeting with our CEO Yama Piano. I really wanted to be here up on this stage, but unfortunately, someone needed to man the booth. So now he lives there. That's, that's her home now. So um please visit her.

Um and then also we are currently hiring. So if you like this, you think it's cool. You like the mission interesting technology or you like us, feel free to scan this and see if there's any matches. If not, if you don't think that any of the job descriptions match also, feel free to just send us your resume. I mean, we're signed up, we don't do things quite by the book. So that's what these QR codes are and with that.

Thank you guys. I'm Nicholas. This is Brian, our third uh co-founder or is Yami? She's there? Thank you all very much. Are there any questions?

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫