ai人工智能将替代人类
Peter Baldridge, Data Scientist at AI@T-Mobile
AI @ T-Mobile的数据科学家Peter Baldridge
Picture This:
图片:
You and your team just spent two weeks collecting survey data. You now have thousands of responses including a bunch of free-form text answers. You look in your inbox and see an email marked “important.” It turns out executive management would like to see a report on the survey in three days. 😨
您和您的团队仅花了两个星期的时间收集调查数据。 现在,您有成千上万的回复,包括一堆自由格式的文本答案。 您在收件箱中查看,并看到标记为“重要”的电子邮件。 事实证明,执行管理层希望在三天内看到一份有关调查的报告。 😨
So what do you do?
所以你会怎么做?
Scenario A)
情况A)
You hire a fleet of consultants to clean, label, and organize your free text responses. The process costs tens of thousands of dollars, but you get your results back in two days instead of three. Plus, they throw in an extremely polished PowerPoint deck.
您雇用了一组顾问来清洁,标记和组织您的自由文本回复。 该过程花费数万美元,但是您可以在两天内(而不是三天内)得到结果。 此外,它们还可以放入非常抛光的PowerPoint面板。
Scenario B)
情况B)
You call a data scientist. 😎
您致电数据科学家。 😎
The Opportunity:
机会:
Every year, T-Mobile conducts an enterprise fraud risk assessment. Part of the assessment is to send a survey to several thousand employees. The survey contains multiple choice questions as well as free text questions. Processing multiple choice questions is easy but processing the free text responses is not. Each comment must be hand-labelled according to its risk category before a summary can be generated.
每年,T-Mobile都会进行企业欺诈风险评估。 评估的一部分是向数千名员工发送调查。 该调查包含多项选择题以及自由文本题。 处理多项选择题很容易,但处理自由文本答复却不容易。 在生成摘要之前,必须根据风险类别对每个注释进行手工标记。
You could label these comments one-by-one, but that approach would be very time-consuming (not to mention boring). Or you could figure out a way to do it faster…🏃♀️
您可以一对一地标记这些注释,但是这种方法非常耗时(更不用说很无聊了)。 或者,您可以想出一种更快地执行此操作的方法……🏃♀️
The Approach:
该方法:
In principal, we want to bucket similar survey responses so that our fraud managers can label them in one fell swoop.
原则上,我们希望对类似的调查答复进行分类,以便我们的欺诈管理者可以一口气将其标记出来。
To do this, we need to create a similarity score for each pair of survey responses. If we do it the right way, phrases such as, “T-Mobile is great,” and “T-Mobile is awesome” will be considered similar.
为此,我们需要为每对调查答复创建一个相似性评分。 如果我们以正确的方式进行操作,则诸如“ T-Mobile很棒”和“ T-Mobile很棒”之类的短语将被视为相似。
TF-IDF and Cosine Similarity:
TF-IDF和余弦相似度:
TF-IDF, which stands for term frequency — inverse document frequency is way to vectorize documents by highlighting unique keywords (iPhone, Galaxy, Pixel) while downplaying common words (the, of, from). It’s a way of bringing the most important information to the surface, almost like sifting for gold.
TF-IDF代表术语频率-反向文档频率是通过突出显示唯一关键字( iPhone,Galaxy,Pixel )而对普通单词( the,of,from )轻描淡写来矢量化文档的方法。 这是一种将最重要的信息浮出水面的方法,几乎就像筛金一样。
Next, we calculate similarity scores for each pair of phrases by looking at the angles between different vectors. The key idea is that similar phrases will also be closer in space.
接下来,我们通过查看不同向量之间的角度来计算每对短语的相似性得分。 关键思想是类似的短语在空间上也将更接近。
Hierarchical Clustering (with Maximum Linkage):
分层群集(具有最大链接):
Finally, we bucket our responses. We use hierarchical clustering for this step, and more importantly we used complete linkage to ensure that there is some minimum degree of similarity between each pair of responses within each cluster.
最后,我们总结了我们的回应。 我们在此步骤中使用分层聚类,更重要的是,我们使用完整链接来确保每个聚类中每对响应之间存在某种最小程度的相似性。
The Results:
结果:
By bucketing our survey responses, we were able to fit 2,649 survey responses into 110 buckets. That’s an improvement of more than 95 percent! 😲
通过对调查问卷进行分类,我们能够将2649个调查问卷放入110个分类中。 改善了95%以上! 😲
What the Risk Management Team Found:
风险管理团队发现了什么:
The Risk Management team was able to make the labelling process even more efficient. They did it with a couple of ingenious tricks.
风险管理团队能够使标记过程更加高效。 他们用了一些巧妙的技巧。
Clustering ²:
聚类²:
In many cases there were several clusters related to the same category. Sorting in different ways as well as adding some rudimentary Excel formulas for certain keywords allowed the team to label several clusters at once. They were also able to easily identify exceptions.
在许多情况下,有几个与同一类别相关的集群。 通过以不同的方式进行排序以及为某些关键字添加一些基本的Excel公式,团队可以一次标记多个群集。 他们还能够轻松识别异常。
To put it another way: they clustered the clusters.
换句话说,他们将集群聚类。
🤯
🤯
We know what you’re thinking
我们知道您在想什么
Information-Rich vs. Information-Poor Responses:
信息丰富与信息贫乏的响应:
The audit team found that there were a lot of survey responses but not a lot of meaningful responses. During previous years, responses had to be manually tagged one-by-one (often with errors). The Fraud Management team found an ingenious way to separate information-rich responses from information-poor responses. Here’s a quote from Jonathan Arras, T-Mobile’s Director of Fraud Strategy about how he tackled the issue:
审计团队发现,有很多调查答复,但没有很多有意义的答复。 在过去的几年中,必须手动对响应进行逐一标记(通常带有错误)。 欺诈管理团队发现了一种巧妙的方法,可以将信息丰富的响应与信息贫乏的响应分开。 以下是T-Mobile欺诈策略总监乔纳森·阿拉斯(Jonathan Arras)关于他如何解决此问题的报价:
“I ran through a sample of about 50 records and personally evaluated what I believed the significance of each comment to be. Then I calculated the number of characters in each cell and without looking at my manual results drew some significance cuts based on word count. The results of each approach were almost identical and while it may seem crude to say “number of words = significance” it really did prove out and automating the rest of the counting/labeling took just seconds after that.”
“我浏览了大约50条记录的样本,并亲自评估了我认为每条评论的意义。 然后,我计算了每个单元格中的字符数,而没有看我的手工结果,就根据字数进行了一些有意义的削减。 每种方法的结果几乎相同,虽然说“单词数=重要性”似乎很粗略,但确实确实证明了这一点,并且自动进行其余的计数/标记工作仅需几秒钟。”
In effect, he created a classifier for identifying information-richness of a response based on response length alone.
实际上,他创建了一个分类器,用于仅根据响应长度来识别响应的信息丰富程度。
TL;DR
TL; DR
The Fraud Management team could have simply taken our results and called it a day. But instead they applied their own understanding of both the domain and analytics to make the process even faster.
欺诈管理团队本来可以将我们的结果称为“一天”。 但是相反,他们运用了自己对域和分析的理解,以使流程更快。
Data scientists: I think the lesson here is that there’s really no substitute for proper domain knowledge. Data science can only take a problem so far. But when you combine analytical techniques with proper domain knowledge: that’s when you unlock a solution’s full potential!
数据科学家:我认为这里的教训是,没有什么可以替代适当的领域知识。 到目前为止,数据科学只能解决一个问题。 但是,当您将分析技术与适当的领域知识相结合时:那就是您释放解决方案的全部潜力!
Other Interesting Findings:
其他有趣的发现:
ELMo and Bert:
ELMo和Bert:
We love using new tools as much as the next team! We tried using ELMo or BERT as opposed to TF-IDF, but neither embedding structure worked as well as our TF-IDF embeddings.
我们喜欢和下一个团队一样使用新工具! 我们尝试使用ELMo或BERT而不是TF-IDF,但是没有一种嵌入结构能像我们的TF-IDF嵌入一样好。
We suspect that this is because we were using pre-trained ELMo and BERT embeddings. Had these embeddings been trained on a T-Mobile vocabulary, they may have done much better. It just goes to show how powerful the classical tools are, especially in a pinch.
我们怀疑这是因为我们使用的是预训练的ELMo和BERT嵌入。 如果在T-Mobile词汇表上训练了这些嵌入,它们的效果可能会更好。 它只是说明了经典工具的强大功能,尤其是在紧要关头。
Fat-Tailed Distribution:
胖尾分布:
Interestingly, bucket sizes followed a fat-tailed distribution. This is a common phenomenon in nature.
有趣的是,铲斗尺寸遵循胖尾分布。 这是自然界中的普遍现象。
Citations:
引文:
https://en.wikipedia.org/wiki/Gold_panning#/media/File:Gold_panning_at_Bonanza_Creek.JPG
https://zh.wikipedia.org/wiki/Gold_panning#/media/File:Gold_panning_at_Bonanza_Creek.JP G
翻译自: https://medium.com/tmobile-tech/humans-in-the-loop-using-ai-to-get-big-tasks-done-fast-ea334418d3e9
ai人工智能将替代人类