自动化编程 ai_使用AI自动化在线采购-CSDN博客

自动化编程 ai

With the advent of COVID-19, remote learning has blossomed. Schools and universities may have been shut down but they switched to applications like Microsoft Teams to finish their academic years. However, there has been no solution to examinations. Some have changed it to an assignment form where students can just copy and paste from the internet, while some have just canceled them outright. If the way we are living is to be the new norm there needs to be some solution.

随着COVID-19的出现，远程学习蓬勃发展。学校和大学可能已经关闭，但他们切换到Microsoft Teams之类的应用程序来完成他们的学业。但是，目前还没有解决方案。有些人将其更改为作业表格，学生可以在网上复制和粘贴内容，而有些人则完全取消了他们的作业。如果我们的生活方式成为新准则，则需要一些解决方案。

ETS conducts TOEFL and GRE among others is allowing students to give exams from home where they will be monitored by a proctor for the whole duration of the exam. Implementing this scheme at a large scale will not be plausible due to the workforce required. So let’s create an AI in python which can monitor the students using the webcam and laptop microphone itself and would enable the teachers to monitor multiple students at once. The entire code can be found on my Github repo.

ETS进行TOEFL和GRE等活动，允许学生在家中进行考试，在整个考试过程中，督导员将对其进行监督。由于需要劳动力，因此大规模实施该计划是不合理的。因此，让我们用python创建一个AI，该AI可以使用网络摄像头和笔记本电脑麦克风本身监视学生，并使教师能够一次监视多个学生。完整的代码可以在我的Github存储库中找到。

The AI will have four vision-based capabilities which are combined using multithreading so that they can work together:

AI将具有四种基于视觉的功能，这些功能使用多线程进行了组合，以便它们可以协同工作：

Gaze tracking
视线追踪
Mouth open or close
嘴张开或闭合
Person Counting
人数统计
Mobile phone detection
手机检测

Apart from this, the speech from the microphone will be recorded, converted to text, and will also be compared to the text of the question paper to report the number of common words spoken by the test-taker.

除此之外，麦克风的语音将被记录，转换为文本，并且还将与试卷的文本进行比较，以报告考生说出的常用单词的数量。

要求 (Requirements)

OpenCV
OpenCV
Dlib
Dlib
TensorFlow
TensorFlow
Speech_recognition
语音识别
PyAudio
PyAudio的
NLTK
NLTK

基于视觉的技术 (Vision-Based Techniques)

视线追踪 (Gaze Tracking)

Image for post — Photo by S N Pattenden on Unsplash

We shall aim to track the eyeballs of the test-taker and report if he is looking to the left, right, or up which he might do to have a glance at a notebook or signal to someone. This can be done using Dlib’s facial keypoint detector and OpenCV for further image processing. I have already written an article on how to do real-time eye-tracking which explains in detail the methods used that will be used later on.

我们将着眼于跟踪考生的眼球，并报告他是否向左，向右或向上看，以便看一眼笔记本或向某人发出信号。可以使用Dlib的面部关键点检测器和OpenCV进行进一步的图像处理。我已经写了一篇有关如何进行实时眼动追踪的文章，其中详细解释了稍后将使用的方法。

口部检测 (Mouth Detection)

This is very similar to eye detection. Dlib’s facial keypoints are again used for this task and the test-taker is required to sit straight (as he would in the test) and the distance between the lips keypoints (5 outer pairs and 3 inner pairs) is noted for 100 frames and averaged.

这与眼睛检测非常相似。将Dlib的面部关键点再次用于此任务，并要求测试者坐直(就像他在测试中一样)，并记录嘴唇关键点之间的距离(5个外部对和3个内部对)100帧并取平均值。

If the user opens his/her mouth the distances between the points increases and if the increase in distance is more than a certain value for at least three outer pairs and two inner pairs then infringement is reported.

如果使用者张开嘴，两点之间的距离会增加，并且如果至少三对外线对和两对内线对的距离增加大于某个特定值，则会报告侵权。

人数统计和手机检测 (Person Counting and Mobile Phone Detection)

I used the pre-trained weights of YOLOv3 trained on the COCO dataset to detect people and mobile phones in the webcam feed. For an in-depth explanation on how to use YOLOv3 in TensorFlow2 and to perform people counting you can refer to this article:

我使用在COCO数据集上训练的YOLOv3的预训练权重来检测网络摄像头提要中的人和手机。有关如何在TensorFlow2中使用YOLOv3以及进行人数统计的深入说明，请参考本文：

If the count is not equal to an alarm can be raised. The index of mobile phones in the COCO dataset is 67 so we need to check if any class index is equal to that then we can report a mobile phone as well.

如果计数不等于，则可以发出警报。 COCO数据集中的手机索引为67，因此我们需要检查是否有任何类索引等于该类索引，然后我们才能报告手机。

使用多线程结合 (Combining using Multithreading)

Let’s dive into the code now. As eye-tracking and mouth detection are based on dlib we can create a single thread for them and another thread can be used for the YOLOv3 tasks: people counting and mobile detection.

现在让我们深入研究代码。由于眼动追踪和嘴部检测基于dlib，因此我们可以为它们创建一个线程，另一个线程可用于YOLOv3任务：人数计数和移动检测。

First, we import all the necessary libraries and along with the helper functions. Then the dlib and YOLO models are loaded. Now in the eyes_mouth() function, we find out the facial key-points and work on them. For mouth detection, the original distances between in the outer and inner points are already defined and we calculate the current ones. If a certain amount is greater than the predefined ones, then the proctor is notified. For the eyes part, we find out their centroids as shown in the article linked and then we check which facial keypoints are they closest to. If both of them are on the sides then it is reported accordingly.

首先，我们导入所有必需的库以及辅助函数。然后加载dlib和YOLO模型。现在，在eyes_mouth()函数中，我们找出面部关键点并对其进行操作。对于嘴部检测，已经定义了内部和外部点之间的原始距离，然后我们计算当前距离。如果某个数量大于预定义的数量，则将通知代理人。对于眼睛部分，我们按链接的文章所示找出它们的质心，然后检查它们最接近哪些面部关键点。如果它们都在侧面，则相应地进行报告。

In the count_people_and_phone() function, YOLOv3 is applied to the webcam feed. Then the classes of objects detected are checked and appropriate action is taken if more than one person is detected or a mobile phone is detected.

在count_people_and_phone()函数中，将YOLOv3应用于网络摄像头。然后，如果检测到一个以上的人或检测到移动电话，则检查检测到的对象的类别并采取适当的措施。

These functions are passed to in separate threads and have infinite loops in them which the proctor can break by pressing ‘q’ twice.

这些函数在单独的线程中传递给它们，并在其中具有无限循环，监理员可以通过按两次“ q”来中断该循环。

音讯 (Audio)

The idea is to record audio from the microphone and convert it to text using Google’s speech recognition API. The API needs a continuous voice from the microphone which is not plausible so the audio is recorded in chunks such there is no compulsory space requirement in using this method (a ten-second wave file had a size of 1.5 Mb so a three-hour exam should have roughly 1.6 Gb). A different thread is used to call the API so that a continuous recording can without interruptions, and the API processes the last one stored, appends its data to a text file, and then deletes it to save space.

这个想法是要录制来自麦克风的音频，然后使用Google的语音识别API将其转换为文本。 API需要来自麦克风的连续声音，这是不合理的，因此音频被成块记录，因此使用此方法无需强制占用空间(十秒钟的波形文件的大小为1.5 Mb，因此需要进行三小时的检查应该有大约1.6 Gb)。使用不同的线程来调用API，以便可以连续录制而不会中断，并且API处理最后存储的那个，将其数据附加到文本文件，然后删除它以节省空间。

After that using NLTK, we remove the stopwords from it. The question paper (in text format) is taken whose stopwords are also removed and their contents are compared. We assume if someone wants to cheat, they will speak something from the question paper. Finally, the common words along with its frequency are presented to the proctor. The proctor can also look at the text file which has all the words spoken by the candidate during the exam.

之后，使用NLTK，我们从中删除停用词。抽取试卷(以文本格式)，其停用词也将被删除，并对其内容进行比较。我们假设如果有人想作弊，他们会从试卷中说出一些话。最后，将常用词及其频率显示给检察官。监考人员还可以查看文本文件，其中包含应试者在考试期间说出的所有单词。

Until line 85 in the code, we are continuously recording, converting, and storing text data in a file. The function read_audio(), as its name suggests, is used to record audio using a stream passed on to it by stream_audio(). The function convert() uses the API to convert it to text and appends it to a file test.txt along with a blank space. This part will run for the entire duration of the examination.

在代码的第85行之前，我们一直在记录，转换和存储文本数据到文件中。顾名思义，函数read_audio()用于通过stream_audio()传递给它的流来记录音频。函数convert()使用API将其转换为文本，并将其与空白一起附加到文件test.txt中。这部分将在整个考试期间进行。

After this, using NLTK, we convert the text stored to tokens and remove the stop-words. The same is done for a text file of the question paper as well and then common words are found out and reported to the proctor.

之后，使用NLTK，我们将存储的文本转换为令牌并删除停用词。对试卷的文本文件也执行相同的操作，然后找到常用词并将其报告给检察官。

This system can be combined with a secure browser to prevent cheating. This project does not eliminate the need for a proctor as he is required to perform certain operations. There are certain ways to cheat as well through this system like a person sitting behind the laptop communicating with the test-taker by writing. To completely stop cheating we would need external hardware like a spectacle camera to cover the whole field of view of the test-taker and apply computer vision on its feed. But that would eliminate the goal to make an AI that anyone can use without using anything extra other than a standard laptop and using this a proctor can also multiple students at once.

该系统可以与安全的浏览器结合使用以防止作弊。这个项目并不能消除对监理人的需要，因为他需要执行某些操作。还有一些通过该系统作弊的方法，例如坐在笔记本电脑后面的人通过书写与考生交流。要完全停止作弊，我们需要像眼镜相机这样的外部硬件来覆盖测试人员的整个视野并将计算机视觉应用于其供稿。但这将消除目标，即打造一个任何人都可以使用的AI，而无需使用标准笔记本电脑以外的任何东西，而使用该AI的监理人也可以一次多个学生。

For the full code refer here.

有关完整的代码，请参见此处。