社交媒体运营_在社交媒体上确定投诉相关帖子

最新推荐文章于 2024-10-16 23:34:20 发布

weixin_26632369

最新推荐文章于 2024-10-16 23:34:20 发布

阅读量294

点赞数

文章标签： python

原文链接：https://towardsdatascience.com/identification-of-complaint-relevant-posts-on-social-media-4bc2c8b625ca

版权

社交媒体运营

Through this blog, I aim to explain the work done in the paper “Semi-Supervised Iterative Approach for Domain-Specific Complaint Detection in Social Media” accepted at the 3rd workshop on e-commerce and NLP, Association for Computational Linguistics 2020. This top-tier venue serves an amalgamation of research, including but not limited to computational linguistics, cognitive modeling, information extraction, and semantics. This work is one of the first attempts to leverage the discursive social media landscape to list complaints and identify grievances. We emphasize the utility of our approach by evaluating it over transport-related services on social media platform Twitter. The post will record a brief overview of the motivation, methodology, and applications of this research. More technical details can be found in the paper. Our team would eagerly look forward to any suggestions and improvements regarding this work.

我打算通过此博客解释在“电子商务和NLP”第三次研讨会(2020年计算语言学协会)上接受的论文“ 社交媒体中特定领域投诉检测的半监督迭代方法 ”中所做的工作。多层场所可进行多种研究，包括但不限于计算语言学，认知建模，信息提取和语义。这项工作是利用话语社交媒体格局列出投诉和识别投诉的首批尝试之一。我们通过对社交媒体平台Twitter上与运输相关的服务进行评估来强调我们方法的实用性。该帖子将简要记录本研究的动机，方法和应用。可以在本文中找到更多技术细节。我们的团队将热切期待有关这项工作的任何建议和改进。

动机 (Motivation)

Social media has lately become one of the primary venues where users express their opinions about various products and services. These opinions are extremely useful in understanding the user’s perceptions and sentiment about these services. They are also valuable for identifying potential defects and critical to the execution of downstream customer service responses. Public sector firms like transport and logistics are strongly affected by public opinions and form a critical aspect of a country’s economy. Often, businesses rely on social media to ascertain customer feedback and to initial response. Therefore, automatic detection of user complaints on social media could prove beneficial to both the clients and the service providers.

社交媒体最近已成为用户表达其对各种产品和服务的意见的主要场所之一。 这些意见对于理解用户对这些服务的看法和观点非常有用 。它们对于识别潜在缺陷和对执行下游客户服务响应至关重要。公共部门公司(例如运输和物流)受到公众舆论的强烈影响，并构成一国经济的重要方面。通常，企业依靠社交媒体来确定客户反馈并做出初步React。因此，在社交媒体上自动检测用户投诉可以证明对客户和服务提供商均有利。

Image for post — A social media user tagging relevant authorities with their grievances.

Source: https://money.cnn.com/data/sectors/transportation/?sector=4600&industry=4610 — Transportation-related companies having significant market shares. 与运输相关的公司具有重要的市场份额。

Traditionally, listing complaints involves social media users tagging the relevant individuals with their complaints. However, there are a certain set of drawbacks that reduces the utility of this approach. The prevalence of such posts is low as compared to others where concerned authorities are tagged. Additionally, media platforms are plagued with redundancy, where the posts are rephrased or structurally morphed before being re-posted. Also, vast amounts of inevitable noise make it hard to identify posts that may require immediate attention.

传统上，列出投诉涉及社交媒体用户为其投诉标记相关个人。但是，存在某些缺点，降低了该方法的实用性。与贴有有关当局标签的其他职位相比，此类职位的患病率较低。另外，媒体平台受到冗余的困扰，其中帖子在重新发布之前被重新定义或结构化。同样，大量不可避免的噪音使很难识别可能需要立即关注的帖子。

我们的贡献 (Our Contribution)

To build such detection systems, we could employ supervised approaches that would typically require a large corpus of labeled training samples. However, as discussed, labeling social media posts that capture complaints about a particular service is challenging. Prior work in event detection has demonstrated that simple linguistic indicators (phrases or n-grams) can be useful in the accurate discovery of events in social media. Though user complaints are not the same as events, more of a speech act, we posit that similar indicators can be used in complaint detection. To pursue this hypothesis, we propose a semi-supervised iterative approach to identify social media posts that complain about a specific service. In our experimental work, we started with an annotated set of 326 samples of transportation complaints, and after four iterations of the approach, we collected 2,840 indicators and over 3,700 tweets. We annotated a random sample of 700 tweets from the final dataset and observed that over 47% of the samples were actual transportation complaints. We also characterize the performance of basic classification algorithms on this dataset. In doing so, we also study how different linguistic features contribute to the performance of a supervised model in this domain.

为了建立这样的检测系统，我们可以采用监督方法，通常需要大量标记训练样本。但是，正如所讨论的那样，标记社交媒体帖子以捕获有关特定服务的投诉是具有挑战性的。事件检测的先前工作已经证明，简单的语言指示符(短语或n-gram)可用于在社交媒体中准确发现事件。尽管用户投诉与事件不同，但更多的是言语行为，但我们认为类似的指标可以用于投诉检测。为了遵循这一假设，我们提出了一种半监督迭代方法，以识别抱怨某项特定服务的社交媒体帖子。在我们的实验工作中，我们从一组带注释的326个运输投诉样本开始，经过四次迭代后，我们收集了2,840个指标和3,700多条推文。我们从最终数据集中注释了700条推文的随机样本，并观察到超过47％的样本是实际的运输投诉。我们还描述了该数据集上基本分类算法的性能。在这样做的过程中，我们还研究了不同的语言特征如何在此领域中促进监督模型的性能。

方法论与方法 (Methodology and Approach)

Our proposed approach begins with a large corpus of transport-related tweets and a small set of annotated complaints. We use this labeled data to create a set of seed indicators that drive the rest of our iterative complaint detection process.

我们提议的方法从大量与运输相关的推文和少量带注释的投诉开始。我们使用这些标记的数据来创建一组种子指标，这些指标将驱动其余的迭代投诉检测过程。

数据采集 (Data Collection)

We focused our experimentation over the period of November 2018 to December 2018. Our first step towards creating a corpus of transport-related tweets is to identify linguistic markers related to the transport domain. To this end, we scraped random posts from transport-related web forums. These forums involve users discussing their grievances and raising awareness about a wide array of transportation-related issues. We then processed this data to extract words and phrases (unigrams, bigrams, and trigrams) with high tf-idf scores. We then had human annotators prune them further to remove duplicates and irrelevant items.

我们将实验的重点放在了2018年11月至2018年12月的阶段。迈向创建与运输相关的推文语料库的第一步是确定与运输领域相关的语言标记。为此，我们从交通相关的网络论坛上随机抓取了帖子。这些论坛让用户讨论他们的不满，并提高人们对与运输有关的各种问题的认识。然后，我们对该数据进行处理，以提取出具有较高tf-idf分数的单词和短语(字母，二元组和三字母组)。然后，我们让人工注释者进一步修剪它们，以删除重复项和无关项。

We used Twitter’s public streaming API to query for tweets that contained any of the 75 phrases over the chosen time range. We then excluded non-English tweets and any tweets with less than two tokens. This resulted in a collection of 19,300 tweets. We will refer to this collection as corpus C. We chose a random sample of 1,500 tweets from this collection for human annotation. We employed two human annotators to identify traffic-related complaints from these 1,500 tweets. The annotation details are mentioned in the manuscript. In cases where the annotators disagreed, the labels were resolved through a discussion. After the disagreements were resolved, the final seed dataset had 326 samples of traffic-related complaints. We will refer to this set as Ts.

我们使用Twitter的公共流API来查询包含所选时间范围内的75个短语中的任何一个的推文。然后，我们排除了非英语的推文和任何少于两个令牌的推文。这样一来，收集了19,300条推文。我们将此集合称为语料库C。我们从该集合中随机抽取了1,500条推文样本进行人工注释。我们雇用了两名人工注释者，从这1,500条推文中识别出与交通相关的投诉。注释详细信息在手稿中提到。如果注释者不同意，则通过讨论解决标签。解决分歧之后，最终的种子数据集包含326个与交通相关的投诉样本。我们将此集合称为Ts 。

迭代算法 (Iterative Algorithm)

Our proposed iterative approach is summarized in the figure. First, we use the seed data Ts to build a set of linguistic indicators I for complaints. We then use these indicators to get potential new complaints Tl from the corpus C. We then merge Ts and Tl to build our new dataset. We then use this new dataset to extract a new set of indicators Il. The indicators are combined with the original indicators I to extract the next version of Tl. This process is repeated until we can no longer find any new indicators.

图中总结了我们提出的迭代方法。首先，我们使用种子数据Ts来构建一套用于投诉的语言指标I。然后，我们使用这些指标从语料库C获得潜在的新投诉T1 。然后，我们合并Ts和Tl以构建新的数据集。然后，我们使用此新数据集提取一组新的指标II。 这些指示符与原始指示符I组合以提取T1的下一个版本。重复此过程，直到我们不再找到任何新指标为止。

提取语言指标 (Extracting Linguistic Indicators)

As shown in the algorithm, extracting linguistic indicators (n-grams) is one of the most important steps in the process. These indicators are critical to identifying tweets that are most likely domain-specific complaints. We employ two different approaches for extracting these indicators. For seed data, Ts, which is annotated, we just select n-grams with the highest tf-idf scores. In our experimental work, Ts had 326 annotated tweets. We identified 50 n-grams with the highest tf-idf scores to initialize I. Some examples included terms like problem, station, services, toll-fee, reply, fault, provide information, driver, district, and passenger.

如该算法所示，提取语言指标(n元语法图)是该过程中最重要的步骤之一。这些指标对于识别最有可能是针对特定领域的投诉的推文至关重要。我们采用两种不同的方法来提取这些指标。对于带有注释的种子数据Ts ，我们只选择tf-idf得分最高的n-gram。在我们的实验工作中， Ts有326条带注释的推文。我们确定了具有最高tf-idf分数的50个n-gram来初始化I。一些示例包括诸如问题，车站，服务，收费，答复，故障，提供信息，驾驶员，地区和乘客之类的术语。

When extracting indicators from Tl, which is not annotated, it is possible that there could be frequently occurring phrases that are not necessarily indicative of complaints. These phrases could lead to a concept drift in subsequent iterations. To avoid these digressions, we use a measure of domain relevance when selecting indicators. This is defined as the ratio of the frequency of an n-gram in Tl to that of in Tr. Tr is a collection of randomly chosen tweets that do not intersect with C. We defined Tr as a random sample of 5,000 tweets from a different time range than that of C.

当从T1提取未注释的指示符时，可能存在可能不一定指示抱怨的频繁出现的短语。这些短语可能导致后续迭代中的概念漂移。为了避免这些离题，我们在选择指标时使用了领域相关性的度量。这被定义为T1中的n-gram的频率与Tr中的n-gram的频率之比。 Tr是不与C相交的随机选择的推文的集合。我们将Tr定义为与C不同的时间范围内的5,000条推文的随机样本。

Our iterative approach converged in four rounds, after which it did not extract any new indicators. After four iterations, this approach chose 3,732 tweets and generated 2,840 unique indicators. We also manually inspected the indicators chosen during the process. We observed that only indicators with a domain relevance score greater than 2.5 were chosen for subsequent iterations.

我们的迭代方法收敛于四轮，之后没有提取任何新指标。经过四次迭代，此方法选择了3,732条推文，并生成了2,840个唯一指标。我们还手动检查了过程中选择的指标。我们观察到，只有域相关性得分大于2.5的指标才被选择用于后续迭代。

We chose a random set of 700 tweets from the final complaints dataset T and annotated them manually to help understand the quality. The guidelines have been discussed in the manuscript and also employed the same annotators as before. The annotators obtained a high agreement score of kappa= 0.83. After resolving the disagreements, we observed that 332 tweets were labeled as complaints. This accounts for 47.4% of the sampled 700 tweets. This demonstrates that nearly half the tweets selected by our semi-supervised approach were traffic-related complaints. This is a significantly higher proportion in the original seed data Ts, where only 21.7% were actual complaints.

我们从最终投诉数据集T中随机选择了700条推文，并手动对其进行了注释，以帮助理解质量。该指南已在手稿中进行了讨论，并使用了与以前相同的注释器。注释者获得了较高的一致分数kappa = 0.83。解决分歧后，我们观察到332条推文被标记为投诉。这占采样的700条推文的47.4％。这表明，我们半监督方式选择的推文中有将近一半是与交通相关的投诉。在原始种子数据Ts中 ，这一比例要高得多，其中实际投诉只有21.7％。

造型 (Modelling)

We conducted a series of experiments to understand if we can automatically build simple machine learning models to detect complaints. These experiments also helped us evaluate the quality of the final dataset. Additionally, this experimental work also studies how different types of linguistic features contribute to the detection of social media complaints. For these experiments, we used the annotated sample of 700 posts as a test dataset. We built our training dataset by selecting another 2,000 posts from the original corpus C and annotated them once again. In order to evaluate the predictive strength of machine learning algorithms, we used various linguistic features. These features can be broadly broken down into four groups.

我们进行了一系列实验，以了解是否可以自动构建简单的机器学习模型来检测投诉。这些实验还帮助我们评估了最终数据集的质量。此外，这项实验性工作还研究了不同类型的语言特征如何有助于发现社交媒体投诉。对于这些实验，我们使用带注释的700个帖子样本作为测试数据集。我们通过从原始语料库C中选择另外2,000个帖子来构建训练数据集，并再次对其进行注释。为了评估机器学习算法的预测强度，我们使用了各种语言功能。这些功能可以大致分为四类。

(i) Semantic: The first group of features is based on simple semantic properties such as n-grams, word embeddings, and part of speech tags.

(i) 语义：第一组特征基于简单的语义属性，例如n-gram，单词嵌入和语音标记的一部分 。

(ii) Sentiment: The second group of features is based on pre-trained sentiment models or lexicons.

(ii) 情感：第二组功能基于预先训练的情感模型或词典。

(iii) Orthographic: The third group of features uses orthographic information such as hashtags, user mentions, and intensifiers.

(iii) 正字法 ：第三组功能使用正字法信息，例如主题标签 ， 用户提及和增强器 。

(iv) Request: The last group of features again use pre-trained models or lexicons associated with a request, which is a closely related speech act.

(iv) 请求：最后一组功能再次使用与请求相关联的预训练模型或词典，该请求是紧密相关的语音行为。

For experimentation purposes, we either used a quantitative or normalized score for the complete tweet from each of the pre-trained models or lexicon. More details about prior literature with regards to these types of features can be accessed from the paper.

出于实验目的，我们对每个预训练模型或词典的完整推文使用了定量或标准化分数。可以从本文中获取有关这些类型的功能的现有文献的更多详细信息。

结果 (Results)

We trained a logistic regression model for complaint detection using each one of the features described. The best performing model is based on unigrams, with an accuracy of 75.3%. There is not a significant difference in the performance of different sentiment models. It is also interesting to observe that simple features like the counts of varying pronoun types and counts of intensifiers have strong predictive ability. Overall, we observe that most of the features studied here have some ability to predict complaints.

我们使用上述每个功能训练了用于投诉检测的逻辑回归模型。表现最佳的模型是基于unigram的，准确性为75.3％。不同情绪模型的表现没有显着差异。还有趣的是，观察到简单的功能(例如各种代词类型的计数和增强词的计数)具有很强的预测能力。总体而言，我们观察到这里研究的大多数功能都具有一定的预测投诉的能力。

这项研究的潜在用例 (Potential Use-Cases of this Research)

Our utility of the proposed architecture is multi-fold as discussed: (i) We believe that our work could be the first step in improving downstream tasks, which are complaint relevant such as chat-bots development, creating automatic query resolution tools, or gathering low-cost public opinion about services. (ii) Our methodology would help linguists understand the language used in criticism and complaints from a lexical or semantic point of view.(iii) The proposed approach is highly flexible as it can be expanded to other domains, based on the lexicons used in the seed data.(iv) Iterative nature of the architecture reduces the human intervention, hence any unintentional bias during the training phase. It also makes it robust to lexical variations in posts taking place over time. (iv) Since it is semi-supervised, it reduces the dependence on a large number of pre-labeled samples for complaint detection and also mitigates the problem of class imbalance highly prevalent in supervised approaches.

所讨论的架构对我们提出的架构的效用是多种多样的：(i)我们认为，我们的工作可能是改善下游任务的第一步，这些下游任务与投诉相关，例如聊天机器人开发，创建自动查询解决工具或收集有关服务的低成本公众舆论。 (ii)我们的方法将帮助语言学家从词汇或语义的角度理解批评和投诉所使用的语言。(iii)所提出的方法具有高度的灵活性，因为它可以基于语言中使用的词典扩展到其他领域。 (iv)体系结构的迭代性质减少了人工干预，因此减少了培训阶段的任何无意偏差。它还使它对于随着时间发生的帖子中的词汇变化具有鲁棒性 。 (iv)由于它是半监督的，因此可以减少对大量预先标记样品进行投诉检测的依赖性，还可以缓解在监督方法中普遍存在的班级失衡问题。

结论与未来工作 (Conclusion and Future Work)

As a part of this work, we presented an iterative semi-supervised approach for automatic detection of complaints. Complaint resolution is a significant part of product improvement initiatives of various product-based companies; hence we believe that our proposed method could be effectively leveraged for gauging a low-cost assessment of the public opinion or routing the grievances to appropriate platforms. We manually validated the usefulness of the proposed approach and observed a significant improvement in the collection of complaint relevant tweets. In the future, we aim to deploy clustering mechanisms for isolating event relevant tweets of diverse nature. We also plan to use an additional meta-data context and tweet conversational nature to augment the system performance. Our team would eagerly look forward to any feedback or suggestions regarding the paper. Please feel free to reach out to any of the authors of the paper. I hope that this post would motivate another young researcher like me to take up a relevant social problem and harness the potential of Artificial Intelligence and Data Science to solve it.

作为这项工作的一部分，我们提出了一种自动检测投诉的迭代半监督方法。解决投诉是各种基于产品的公司的产品改进计划的重要组成部分；因此，我们认为，我们提出的方法可以有效地用于对公众舆论进行低成本评估或将申诉转移至适当的平台。我们手动验证了所建议方法的有效性，并观察到与投诉相关的推文收集方面的显着改善。未来，我们旨在部署聚类机制，以隔离各种性质的事件相关推文。我们还计划使用附加的元数据上下文和推特对话性质来增强系统性能。我们的团队将热切期待有关本文的任何反馈或建议。请随时与本文的任何作者联系。我希望这篇文章能够激励像我这样的年轻研究人员解决相关的社会问题，并利用人工智能和数据科学的潜力来解决这一问题。