webgl 着色器_如何使用AI，AR和WebGL着色器来帮助视障人士

最新推荐文章于 2022-12-14 13:23:16 发布

cumi7754

最新推荐文章于 2022-12-14 13:23:16 发布

阅读量409

点赞数

文章标签： java python 人工智能大数据深度学习

原文链接：https://www.freecodecamp.org/news/how-you-can-use-ai-ar-and-webgl-shaders-to-assist-the-visually-impaired-3df5bdf3b3e2/

版权

webgl 着色器

by Dan Ruta

通过Dan Ruta

如何使用AI，AR和WebGL着色器来帮助视障人士 (How you can use AI, AR, and WebGL shaders to assist the visually impaired)

Today, about 4% of the world’s population is visually impaired. Tasks like simple navigation across a room, or walking down a street pose real dangers they have to face every day. Current technology based solutions are too inaccessible, or difficult to use.

如今，全世界约4％的视力障碍者。诸如在房间中进行简单导航或走在街上这样的任务构成了他们每天必须面对的真正危险。当前基于技术的解决方案太难以访问或难以使用。

As part of a university assignment, we (myself, Louis, and Tom) devised and implemented a new solution. We used configurable WebGL shaders to augment a video feed of a user’s surroundings in real-time. We rendered the output in a AR/VR format, with effects such as edge detection and color adjustments. Later, we also added color blindness simulation, for designers to use. We also added some AI experiments.

作为大学任务的一部分，我们( 本人，路易斯和汤姆 )设计并实施了一个新的解决方案。我们使用可配置的WebGL着色器实时增强了用户周围环境的视频馈送。我们以AR / VR格式渲染输出，并具有边缘检测和颜色调整等效果。后来，我们还添加了色盲模拟，供设计人员使用。我们还添加了一些AI实验。

We did a more in-depth literature review in our original research paper. ACM published a shorter, two page version here. This article focuses more on the technologies used, as well as some of the further uses, and experiments such as AI integration.

我们在原始研究论文中进行了更深入的文献综述。 ACM发布较短，两页的版本在这里。本文将重点介绍所使用的技术以及一些其他用途以及诸如AI集成之类的实验。

A popular approach we found in our studies of existing solutions was the use of edge detection for detecting obstacles in the environment. Most solutions fell short in terms of usability, or hardware accessibility and portability.

在对现有解决方案的研究中，我们发现一种流行的方法是使用边缘检测来检测环境中的障碍物。大多数解决方案在可用性，硬件可访问性和可移植性方面都达不到要求。

The most intuitive approach we could think of as feedback to the user was through the use of a VR headset. While this meant that the system would not be of help to very severely visually impaired people, it would be a much more intuitive system for those with partial sight, especially for those with blurry vision.

我们可以想到的最直观的方法是通过使用VR头显来反馈给用户。虽然这意味着该系统对视力严重受损的人无济于事，但对于那些视力不佳的人，尤其是视力模糊的人来说，它将是一个更加直观的系统。

边缘检测 (Edge detection)

Feature detection, such as edges, are best done using 2D convolutions, and are even used in deep learning (convolutional neural networks). Simply put, these are dot products of a grid of image data (pixels) against weights in a kernel/filter. In edge detection, the output is higher (more white) when the pixel values line up with the filter values, representing an edge.

诸如边缘之类的特征检测最好使用2D卷积完成，甚至可以用于深度学习( 卷积神经网络 )中。简而言之，这些是图像数据(像素)网格相对于内核/过滤器中的权重的点积。在边缘检测中，当像素值与代表边缘的滤镜值对齐时，输出较高(更白)。

There are a few available options for edge detection filters. The ones we included as configurations are Frei-chen, and the 3x3 and 5x5 variants of Sobel. They each achieved the same goal, but with slight differences. For example, the 3x3 Sobel filter was sharper than the 5x5 filter, but included more noise, from textures such as fabric:

边缘检测过滤器有一些可用的选项。我们包括为配置的是Frei-chen以及Sobel的3x3和5x5变体。他们每个人都实现了相同的目标，但略有不同。例如，3x3的Sobel滤镜比5x5的滤镜更清晰，但包含更多的噪波，来自诸如织物的纹理：

网络平台 (The web platform)

The primary reason we chose the web as a platform was its wide availability, and compatibility across almost all mobile devices. It also benefits from easier access, compared to native apps. However, this trade-off came with a few issues, mostly in terms of necessary set-up steps that a user would need to take:

我们选择网络作为平台的主要原因是其广泛的可用性以及几乎所有移动设备的兼容性。与本地应用程序相比，它还受益于更轻松的访问。但是，这种权衡会带来一些问题，主要是用户需要采取的必要设置步骤：

Ensure network connectivity
确保网络连接
Navigate to the web page
导航到网页
Turn the device to landscape mode
将设备转到横向模式
Configure the effect
配置效果
Enable VR mode
启用VR模式
Activate full screen mode (by tapping the screen)
激活全屏模式(通过点击屏幕)
Slot the phone into a VR headset
将手机插入VR耳机

To avoid confusing a non-technical user, we created the website as a PWA (progressive web app), allowing the user to save it to their Android home screen. This ensures it always starts on the correct page, landscape mode is forced on, the app is always full screen, and not reliant on a network connection.

为避免混淆非技术用户，我们将网站创建为PWA( 渐进式Web应用程序 )，允许用户将其保存到Android主屏幕。这样可以确保它始终在正确的页面上启动，强制使用横向模式，应用始终处于全屏状态并且不依赖于网络连接。

性能 (Performance)

Early JavaScript prototypes ran nowhere near our 60fps target, due to the very expensive convolution operations. We suspected that the bottleneck was JavaScript itself. We attempted a WebAssembly version. The resulting prototype ran even slower. This was most likely due to the overhead in passing the video frame data to the WebAssembly code, and back.

由于非常昂贵的卷积运算，早期JavaScript原型无法达到我们的60fps目标。我们怀疑瓶颈是JavaScript本身。我们尝试了WebAssembly版本。最终的原型运行得更慢。这最有可能是由于将视频帧数据传递到WebAssembly代码并返回的开销。

So instead, we turned to WebGL shaders. Shaders are awesome because of their extreme parallelization of a bit of code (the shader) across the texture (video feed) pixels. To maintain high performance, while keeping a high level of customization, the shader code had to be spliced together and re-compiled at run-time, as configurations changed, but with this, we managed to stay within the 16.7ms frame budget needed for 60fps.

因此，我们改为使用WebGL着色器。着色器之所以出色，是因为它们在纹理(视频馈送)像素上对代码(着色器)具有极高的并行度。为了保持高性能，同时保持较高的自定义级别，必须将着色器代码拼接在一起并在运行时重新编译，因为配置会发生变化，但是与此同时，我们设法将着色器代码保持在16.7ms的帧预算内60fps。

反馈 (Feedback)

We carried out some user testing. We tested some basic tasks like navigation, and collected some qualitative feedback. This included adjustments to the UI, a suggestion to add an option to configure the colors of the edges and surfaces, and a remark that the field of view (FoV) was too low.

我们进行了一些用户测试。我们测试了一些基本任务，例如导航，并收集了一些定性反馈。其中包括对UI的调整，建议添加用于配置边缘和表面颜色的选项的建议，以及关于视野(FoV)太低的说明。

Both software improvement suggestions were applied. The FoV was not something which could have been fixed through software, due to camera hardware limitations. However, we managed to find a solution for this in the form of cheaply available phone-camera fish-eye lenses. The lenses expanded the FoV optically, instead of digitally.

两种软件改进建议均已应用。由于摄像机硬件的限制，FoV不能通过软件修复。但是，我们设法找到了一种价格便宜的手机相机鱼眼镜头的解决方案。镜头通过光学而非数字方式扩展了FoV。

Other than that, the system surpassed initial expectations, but fell short on reading text. This was due to there being two sets of edges for each character. Low light performance was also usable, despite the introduction of more noise.

除此之外，该系统超出了最初的预期，但是在阅读文本方面却不尽人意。这是因为每个字符都有两组边缘。尽管引入了更多的噪音，但低光性能还是可用的。

Some other configurations we included was the radius of the effect, its intensity, and color inversion.

我们包括的其他一些配置是效果的半径，其强度和颜色反转。

其他用例 (Other use cases)

An idea we had was to add shader effects to simulate various types of color blindness, providing an easy way for designers to detect color blindness related accessibility issues in their products, be they software or otherwise.

我们的想法是添加着色器效果以模拟各种类型的色盲，为设计人员提供了一种简便的方法来检测产品中与色盲有关的可访问性问题，无论是软件还是其他方式。

Using RGB ratio values found here, and turning off edge detection, we were able to add basic simulations of all major types of color blindness through extra, toggle-able components in the shaders.

使用此处找到的RGB比率值并关闭边缘检测，我们能够通过着色器中额外的可切换组件添加所有主要色盲类型的基本模拟。

人工智能与未来工作 (AI and future work)

Although it’s an experiment, still in its very early stages, higher level object detection can be done using tensorflowjs and tfjs-yolo-tiny, a tensorflowjs port of tiny-yolo, a smaller and faster version of the YOLO object detection model.

尽管这是一个实验，但仍处于早期阶段，可以使用tensorflowjs和tfjs-yolo-tiny ( tiny-yolo的tensorflowjs端口)进行更高级的对象检测，后者是YOLO对象检测模型的更小更快的版本。

The next step is to get instance segmentation working in a browser, with something similar to mask rcnn (though, it may need to be smaller, like tiny-yolo), and add it to WebSight, to highlight items with a color mask, instead of boxes with labels.

下一步是使实例分段在浏览器中工作，并使用类似于mask rcnn的方法 (尽管它可能需要更小一些，例如tiny-yolo)，然后将其添加到WebSight中，以突出显示带有颜色掩码的项目，而不是有标签的盒子。

The GitHub repo is here, and a live demo can be found at https://websight.danruta.co.uk. Do note that until Apple provides support for the camera API in browsers, it might not work on Apple phones.

GitHub仓库在这里，可以在https://websight.danruta.co.uk上找到实时演示。请注意，除非Apple在浏览器中提供对相机API的支持，否则它可能无法在Apple手机上使用。

Of course, I had some extra fun with this as well. Being able to edit what you can see around you in real time opens up a world of opportunities.

当然，我对此也有一些额外的乐趣。能够实时编辑您在周围看到的东西，打开了一个充满机遇的世界。

For example, using a Matrix shader, you can feel like The One.

例如，使用Matrix着色器，您会感觉像The One。

Or maybe you just enjoy watching the world burn.

或者，也许您只是喜欢看世界燃烧。

You can tweet more shader ideas at me here: @DanRuta

您可以在这里向我发送更多着色器提示：@DanRuta