NLP对放射科医生的评价

Natural language processing (NLP) uses computers to analyze human language. SpaCy is a powerful open-source natural language processing library for Python that removes a lot of the heaving lifting from using NLP.

Ñatural语言处理(NLP)使用计算机来分析人类语言。 SpaCy是用于Python的功能强大的开源自然语言处理库,它消除了使用NLP带来的麻烦。

Below is a quick tongue-in-cheek example of using NLP with spaCy to show briefly how it works for semantic analysis using word vectorization.

下面是一个将NLP与spaCy结合使用的快速示例,以简要展示它如何使用词向量化进行语义分析。

To install, use either pip:

要安装,请使用任一pip:

$ pip install -U spacy

or conda:

或conda:

$ conda install -c conda-forge spacy

Once installed, you will need to download a language model. For this exercise, I will use the large core English model (this takes a while to download):

安装后,您将需要下载语言模型。 在本练习中,我将使用大型核心英语模型(需要一段时间才能下载):

$ python -m spacy download en_core_web_lg

Now, you’re all set to start analyzing the English language!

现在,您都准备开始分析英语了!

You can characterize text many ways using NLP. One such way is word vectorization where text is converted into numeric values that computers have an easier time manipulating. In particular, spaCy converts text into 300-dimensional vectors. While it is difficult for me to conceptualize vectors in 300-dimensional space, it is much easier for computers to do so!

您可以使用NLP以多种方式表征文本。 一种这样的方法是单词向量化,将文本转换为数值值,计算机可以更轻松地进行操作。 特别是,spaCy将文本转换为300维向量。 虽然我很难在300维空间中概念化矢量,但计算机却更容易做到这一点!

Let’s start with some imports:

让我们从一些导入开始:

import spacy
from scipy import spatial

SpaCy uses numpy arrays to store the word vectors, but this all happens behind the scenes. The spatial function from scipy will be necessary to compare the angle between vectors, thus showing the similarity between words (smaller angle=more similar).

SpaCy使用numpy数组存储单词向量,但这一切都在幕后发生。 来自scipy的空间函数对于比较矢量之间的角度将是必需的,因此显示了单词之间的相似性(较小的角度=相似的)。

Now, load the English language model we downloaded before:

现在,加载我们之前下载的英语语言模型:

nlp = spacy.load('en_core_web_lg')

Let’s think of some English language words to play around with. As a radiologist, I have been accused of not seeing patients and not having much of a personality — two claims I dispute, but let’s see what NLP thinks!

让我们考虑一些可玩的英语单词。 作为一名放射科医生,我被指控没有见到患者并且没有太多个性-我有两个主张,但让我们看看NLP的想法!

We’ll first load the words, ‘physician’, ‘stethoscope’, and ‘personality’ from the NLP vocabulary of over 1.3 million words and get the vectors for each. We’ll also load the word ‘xray.’

我们将首先从超过130万个单词的NLP词汇表中加载单词“医师”,“听诊器”和“个性”,然后获取每个单词的向量。 我们还将加载单词“ xray”。

word1 = nlp.vocab['physician'].vector
word2 = nlp.vocab['stethoscope'].vector
word3 = nlp.vocab['personality'].vector
word4 = nlp.vocab['xray'].vector

Since these variables all store mathematical representations of 300-dimensional vectors, we can do simple arithmetic with them. To see what happens when we have a physician, take away his/her stethoscope and personality and give him/her a double dose of xrays, we can calculate:

由于这些变量都存储着300维向量的数学表示,因此我们可以对它们进行简单的算术运算。 为了了解当我们有医生时会发生什么,拿走他/她的听诊器和个性并给他/她加倍的X射线,我们可以计算出:

new_calculated_vector = word1 - word2 - word3 + 2*word4

This new_calculated_vector is not a “word” itself, but is rather a vector that we can compare to the vectors of other words. To do this, let’s loop through the entire lexicon and compare the cosine angle between each word vector and our newly calculated vector.

这个new_calculated_vector本身不是一个“单词”,而是一个我们可以与其他单词的向量进行比较的向量。 为此,让我们遍历整个词典,比较每个单词向量和我们新计算的向量之间的余弦角。

similarities = []for word in nlp.vocab:
if word.has_vector and word.is_alpha and word.is_lower:
similarities.append((word,
spatial.distance.cosine(new_calculated_vector, word.vector)))

Above, we first created an empty list of word similarities. We then cycled through each word in the entire vocabulary. We filtered out words without vectors or that aren’t made of letters, and we filtered out mixed-case words. Finally, we added a tuple to the list consisting of the word and its cosine distance with (similarity to) the calculated vector from before.

上面,我们首先创建了单词相似性的空白列表。 然后,我们循环浏览整个词汇表中的每个单词。 我们过滤掉没有矢量或不是由字母组成的单词,然后过滤出大小写混合的单词。 最后,我们向列表中添加了一个由单词及其余弦距离组成的元组,该元组与之前计算出的矢量(相似)相似。

We can then sort this list of tuples by the similarity values in ascending order (remember, smaller angle = more similar).

然后,我们可以按照相似性值按升序对这个元组列表进行排序(请记住,较小的角度=相似性更高)。

sorted_similarities = sorted(similarities, key=lambda item: item[1])

Finally, we can print out the five words whose vectors are most similar to the vector we calculated by “physician minus stethoscope minus personality plus 2*xray.”

最后,我们可以打印出五个单词,它们的向量与我们通过“医生减去听诊器减去人格减去2 * X射线”计算出的向量最相似。

for top_similar_word in sorted_similarities[:5]:
print(top_similar_word[0].text)

Our results:

我们的结果:

xray
xrays
mri
radiologist
radiology

So, it appears that NLP agrees that, as a radiologist, I should recede to my dark room and stay cut off from the outside world! Joking aside, this brief demonstration shows just a fraction of the power of NLP and its ability to analyze language.

因此,似乎NLP同意,作为放射科医生,我应该退回到我的暗室,并与外界隔绝! 除了开玩笑,这个简短的演示仅显示了NLP的功能及其分析语言的能力的一小部分。

Full code:

完整代码:

import spacy
from scipy import spatialnlp = spacy.load('en_core_web_lg')word1 = nlp.vocab['physician'].vector
word2 = nlp.vocab['stethoscope'].vector
word3 = nlp.vocab['personality'].vector
word4 = nlp.vocab['xray'].vectornew_calculated_vector = word1 - word2 - word3 + 2*word4similarities = []for word in nlp.vocab:
if word.has_vector and word.is_alpha and word.is_lower:
similarities.append((word,
spatial.distance.cosine(new_calculated_vector, word.vector)))sorted_similarities = sorted(similarities, key=lambda item: item[1])for top_similar_word in sorted_similarities[:5]:
print(top_similar_word[0].text)

翻译自: https://medium.com/@lancereinsmithtx/what-nlp-has-to-say-about-radiologists-9c0b878a7dea

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值