如何打造跨领域的线上活动_打造离线社会认可系统

如何打造跨领域的线上活动

We introduce a social recognition system which uses two machine learning models to classify emotion across cultures. The current models are used to classify the universal emotions of contempt, disgust and anger.

我们引入了一种社会识别系统,该系统使用两种机器学习模型对跨文化的情感进行分类。 当前的模型用于对蔑视,厌恶和愤怒的普遍情绪进行分类。

Image for post
Observing Persian facial expressions using OpenFace
使用OpenFace观察波斯人的面部表情

动机 (Motivation)

Social recognition systems are capable of identifying emotions in humans based on their perceived facial expressions. This entails observing data, like a set of frames from a collection of videos containing faces of humans, then training a machine learning model on this data.

社会识别系统能够根据感知到的面部表情识别人类的情绪。 这需要观察数据,例如从包含人脸的视频集合中获取一组帧,然后根据该数据训练机器学习模型。

The current state of the literature indicates that there is a lack of research regarding social recognition systems and contempt across different cultures. Cultures display emotions different and therefore, can impact how a social recognition system will perceive that emotion.

文献的现状表明,缺乏关于社会认可系统和对不同文化的蔑视的研究。 文化表现出不同的情感,因此会影响社会认可系统如何看待这种情感。

Our main contribution, which the current project aims to address is filling in the gap of recognizing, cross-culturally, contempt. This will provide further support to computers which utilize recognition systems to detect emotions in humans.

当前项目旨在解决的主要问题是填补跨文化承认轻蔑的空白。 这将为利用识别系统检测人类情绪的计算机提供进一步的支持。

Two machine learning models are used to classify the emotions of contempt, disgust and anger based on action units (AU). The models are classifying contempt across three different cultures: Filipino, Persian and North American. The two models used are Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP).

两种机器学习模型用于根据动作单位(AU)对轻蔑,厌恶和愤怒的情绪进行分类。 这些模型对三种不同文化的蔑视进行了分类:菲律宾,波斯和北美。 使用的两个模型是支持向量机(SVM)和多层感知器(MLP)。

An SVM attempts to separate data into categories. SVM’s separate the data using many hyperplanes. Once the data is separated by hyperplanes, the model attempts to find the largest hyperplane which gives the best separation. The AU attributes of each video is converted into a vector, called a feature vector, which is inputted into the model. Using the feature vectors and a Gaussian radial basis function as a kernel, the model can map the AU attributes to a high dimensional space in order to find the optimal hyperplanes for separation. The separation will involve classifying the data into emotions amongst a variety of different cultures.

SVM尝试将数据分为几类。 SVM使用许多超平面分离数据。 一旦数据被超平面分开,该模型将尝试找到最大的超平面,以提供最佳的分离。 每个视频的AU属性被转换为一个向量,称为特征向量,该向量被输入到模型中。 使用特征向量和高斯径向基函数作为内核,该模型可以将AU属性映射到高维空间,以找到用于分离的最佳超平面。 分离将涉及将数据分类为各种不同文化之间的情感。

An MLP is a deep, artificial neural network that is composed of multiple perceptrons. MLPs consist of a input layer which takes in data and an output layer which makes a prediction about the data. Between these two layers are a number of hidden layers which serve as the computational engine of MLPs. In other words, MLPs learn to model the correlation between data (input) and predictions about that data (output).

MLP是由多个感知器组成的深层人工神经网络。 MLP由接受数据的输入层和对数据进行预测的输出层组成。 在这两层之间是许多隐藏层,它们充当MLP的计算引擎。 换句话说,MLP学习为数据(输入)和有关该数据的预测(输出)之间的相关性建模。

The current MLP consisted of 3 hidden layers with 24, 12, 6 units respectively. The 16 AU attributes from the data set were fed into the input layer. The output layer used a Softmax layer to classify them into the 3 emotions. After finding a model which worked best on validation sets, the model was further examined on a separate test data set.

当前的MLP包含3个隐藏层,分别具有24、12、6个单位。 数据集中的16个AU属性被输入到输入层。 输出层使用Softmax层将它们分为3种情绪。 在找到在验证集上最有效的模型后,对该模型进行了单独的测试数据集进一步检查。

数据集收集和预处理 (Dataset Collection and Preprocessing)

We collected 236 videos that contained anger, disgust or contempt from YouTube videos covering Filipino, North American and Persian cultures. Due to the copy right, we are not going to publish the dataset overtime.

我们从涵盖菲律宾,北美和波斯文化的YouTube视频中收集了236个包含愤怒,厌恶或蔑视的视频。 由于拥有版权,我们不会超时发布数据集。

Each video was analyzed using OpenFace. OpenFace software breaks down each video into individual frames and identifies facial movements and landmarks. We extracted AUs along with frame number, face id, confidence, and success for each frame in a CSV file.

每个视频都使用OpenFace进行了分析。 OpenFace软件将每个视频分解为单独的帧,并识别面部动作和地标。 我们在CSV文件中为每个帧提取了AU以及帧号,面部ID,置信度和成功率。

In next step, we cleaned the data. It involved removing images from the dataset that had a confidence of less than 0.8 and a success value that did not equal 1.

在下一步中,我们清除了数据。 它涉及从数据集中删除置信度小于0.8且成功值不等于1的图像。

结果 (Results)

The results for each model were compared with one another. Before compiling the model a k-fold cross validation was conducted to assess the true predictive power of each model. The training dataset was created by removing all rows belonging to 10% of the videos and using the removed videos as the test dataset.

将每个模型的结果相互比较。 在编译模型之前,进行了k倍交叉验证,以评估每个模型的真实预测能力。 通过删除属于视频的10%的所有行并将删除的视频用作测试数据集来创建训练数据集。

The MLP was ran over 100 epochs and produced an accuracy score of 66% on the validation data and an accuracy score of 49% on the test data. While the SVM produced an accuracy score of 96% on the validation data and an accuracy score of 40% on the test data. This increased accuracy score on the validation data for SVM might indicate a level of overfitting. In addition to accuracy, the f1-score the the test dataset for the SVM was 0.29 while the f1-score for the MLP was 0.45. The MLP, therefore, proved to be the best model for this current dataset.

MLP运行了100个纪元,在验证数据上的准确度得分为66%,在测试数据上的准确度得分为49%。 而SVM在验证数据上的准确度得分为96%,在测试数据上的准确度得分为40%。 SVM验证数据上准确性得分的提高可能表明过度拟合的水平。 除准确性外,SVM的测试数据集的f1得分为0.29,而MLP的f1得分为0.45。 因此,MLP被证明是此当前数据集的最佳模型。

Image for post
Result of our model on validation and test set
我们的模型在验证和测试集上的结果
Image for post
Training and validation loss over 100 epochs
培训和验证损失超过100个纪元

Although the accuracy was relatively low for the test data, this might indicate that the cultural differences do affect the outcome of the model.

尽管测试数据的准确性相对较低,但这可能表明文化差异确实会影响模型的结果。

In addition to the models, a gaussian mixture model was used to determine which images were highly related to each action unit associated with contempt.

除模型外,还使用高斯混合模型来确定哪些图像与与鄙视相关的每个动作单元高度相关。

Amongst the 5 AU’s (e.g., AU4, AU7, AU10, AU25 and AU26) Filipino culture represented the best depiction of AU4, the Persian culture represented the best depictions of AU7, AU10 and AU26 whereas the North American culture represented the best depiction of AU25.

在5个AU(例如AU4,AU7,AU10,AU25和AU26)中,菲律宾文化代表AU4的最好描述,波斯文化代表AU7,AU10和AU26的最好描述,而北美文化则代表AU25的最好描述。 。

Image for post
Best depiction of AU4 was from the Filipino category
AU4的最佳描绘来自菲律宾类别

讨论区 (Discussion)

Contempt, anger and disgust are closely related with each other. Making it difficult for humans to differentiate between them. Given the low accuracy, there were several times the model misclassified images that didn’t even contain an emotion. For example, the image below shows an example of an image labelled contempt but clearly did not display contempt.

轻蔑,愤怒和厌恶彼此密切相关。 使人类很难区分它们。 由于准确性较低,因此模型多次对甚至不包含情感的图像进行错误分类。 例如,下图显示了标记为蔑视但显然没有显示蔑视的图像示例。

Image for post

Although our models produced relatively low accuracy scores on the test dataset, it did show that either the dataset requires more data or that the variations across cultures of contempt do affect the results of the model.

尽管我们的模型在测试数据集上产生的准确性得分较低,但确实表明该数据集需要更多数据,或者蔑视文化的差异确实会影响模型的结果。

Further research could look into using audio or even sentiment analysis of the dialogue to add further context to the dataset. As well as compiling more data and possibly across more cultures instead of just three.

可能需要进行进一步的研究,以使用音频甚至对话的情感分析来为数据集添加更多上下文。 以及编辑更多的数据,并可能跨更多的文化而不仅仅是三个。

结论 (Conclusion)

In this blog post, we reported on our efforts in automatic recognition of three negative emotions: anger, contempt and disgust. To this end, we collected videos containing these emotions from YouTube, as well as labeled and analyzed them. We developed and evaluated a deep neural network and a SVM model. The best result was achieved by our DNN model with F1 = 0.649 and F1 = 0.451 on validation set and test set respectively. Analysis on the experiments showed that these three emotions have indistinct boundaries which make them difficult to classify in some cases, especially when we want to generalize it to other cultures. Future works can improve the result by incorporating other modalities like text or audio pitch, or extending dataset.

在此博客文章中,我们报告了我们在自动识别三种负面情绪方面的努力:愤怒,蔑视和厌恶。 为此,我们从YouTube收集了包含这些情绪的视频,并对它们进行了标记和分析。 我们开发并评估了深度神经网络和SVM模型。 我们的DNN模型在验证集和测试集上的F1 = 0.649和F1 = 0.451分别获得了最佳结果。 对实验的分析表明,这三种情感没有明确的界限,因此在某些情况下很难对其进行分类,尤其是当我们要将其概括为其他文化时。 未来的工作可以通过结合其他方式(例如文本或音频音高或扩展数据集)来改善结果。

Feel free to share your thoughts and comment on this project.

随时分享您的想法并对此项目发表评论。

翻译自: https://medium.com/@kevinsangabriel/spirit-an-offline-social-recognition-system-bc0da0ae549b

如何打造跨领域的线上活动

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值