医学超声图像小型数据集_转移学习用于超小型生物医学数据集中的分类

医学超声图像小型数据集

Do you see cancer in the mammogram above? If you’re struggling, don’t worry, you’re not alone. Biomedical imagery is a domain where computer vision and artificial intelligence could be better suited to outperform human judgement. A recent Google study that used over 25,000 medical images supports this claim; they were able to build a machine learning model for breast cancer detection that outperformed humans, presumably, in part, because breast cancer detection is difficult, even for trained professionals. Machines often excel at these texture-based challenges. (In case you are curious, there’s no cancer in that mammogram.)

您在上面的乳房X线照片中看到癌症了吗? 如果您正在挣扎,请不要担心,您并不孤单。 生物医学图像是计算机视觉和人工智能可以更好地胜过人类判断的领域。 谷歌最近的一项研究使用了25,000幅医学图像来支持这一说法 ; 他们能够建立一个乳腺癌检测的机器学习模型,该模型的性能优于人类,部分原因是,即使对于受过培训的专业人员,乳腺癌的检测也很困难。 机器通常在应对这些基于纹理的挑战方面表现出色。 (以防万一,您在该X光检查中没有癌症。)

But before computer vision can broadly assist in evaluating biomedical imagery, there’s a data problem to solve: many possible biomedical applications have access to only a few hundred labeled images. If machine learning researchers have only hundreds, and not tens of thousands, of biomedical images, can a useful, predictive tool still be built?

但是在计算机视觉可以广泛地帮助评估生物医学图像之前,要解决一个数据问题:许多可能的生物医学应用程序只能访问几百个带有标签的图像。 如果机器学习研究人员只有数百个而不是数万个生物医学图像,是否仍可以构建有用的预测工具?

In this series of posts, we will empirically explore some of the options and tools that data scientists can use when working on extremely small biomedical imagery datasets. Our focus will be on classification tasks that do not require segmentation; for example, these datasets could be for identifying if there is a tumor in a brain scan, not where the tumor is in the brain scan. This series will explore questions such as:

在本系列文章中,我们将根据经验探索数据科学家在处理极小的生物医学图像数据集时可以使用的一些选项和工具。 我们的重点将放在不需要细分的分类任务上; 例如,这些数据集可用于识别脑部扫描中是否存在肿瘤,而不是脑部扫描肿瘤所在的位置。 本系列将探讨以下问题:

  • How applicable is transfer learning from existing, general-purpose models? Do ImageNet models (trained to distinguish between a thousand classes of things like cats and dogs) help us detect differences in cell-based photographs?

    从现有的通用模型进行转移学习的适用性如何? ImageNet模型(经过训练以区分猫和狗等一千种事物)是否有助于我们检测基于细胞的照片中的差异?

  • If not, can we build a general-purpose “CellNet” model that can be used successfully for transfer learning for cell-based biomedical images?

    如果不是,我们是否可以建立一个通用的“ CellNet ”模型,该模型可以成功用于基于细胞的生物医学图像的转移学习?

  • What kind of data augmentation and pre-processing works for this biomedical domain?

    对于该生物医学领域,什么样的数据增强和预处理有效?
  • What other approaches, such as dataset purification or using only high-confidence predictions for image triage, could we employ to make these models, built off ultra-small images, more usable in real-world settings?

    我们还可以采用其他哪些方法(例如数据集净化或仅对图像分类使用高可信度预测)来构建这些基于超小图像的模型,使其在现实环境中更有用?

转移学习对超小型生物医学数据集有效吗? (Does Transfer Learning Work for Ultra-small Biomedical Datasets?)

In this first post, we’re going to tackle the common problem of limited training data by examining how, when, and why ImageNet-based transfer learning can be used effectively (or not). Transfer learning refers to the idea that a large, pre-trained model can be reused on a new dataset, recycling the learned parameters and their weights for a new classification task. For example, you can download a model that was pre-trained on millions of images from ImageNet to predict common objects. Then you can replace its final layer to predict, for example, four different types of white blood cells, instead of birds or cars.

在第一篇文章中,我们将通过研究基于ImageNet的方式,时间和原因来解决培训数据有限的普遍问题。 转移学习可以被有效地使用(或不被有效地使用)。 转移学习是指可以将大型的,经过预先训练的模型重用于新的数据集,从而将学习到的参数及其权重回收用于新的分类任务。 例如,您可以从ImageNet下载数以百万计的图像进行预训练的模型,以预测常见的对象。 然后,您可以替换其最后一层来预测例如四种不同类型的白细胞,而不是鸟或汽车。

These pre-trained models have learned to recognize lower-level features like straight lines versus curves, which could help distinguish the outlines of a cat’s whiskers from a dog’s snout. With transfer learning, these simple features can then be recycled when trying to differentiate different blood cell types, obviating the need for thousands of blood cell images and countless hours of model retraining. That’s the theory at least.

这些经过预先训练的模型已经学会了识别低级特征,例如直线与曲线,这可以帮助区分猫须和狗的鼻子。 通过转移学习,这些简单的功能可以在尝试区分不同的血细胞类型时进行回收,从而避免了成千上万的血细胞图像和数小时的模型再训练。 至少这是理论。

Image for post
ImageNet Large Scale Visual Recognition Challenge,’ 2015. ImageNet大规模视觉识别挑战赛 ”,2015年。

How well transfer learning works depends, in part, on the similarity between a dataset and ImageNet, above (assuming you’re using a model built off of ImageNet, like pre-trained vgg or ResNet). On one hand, transfer learning seems to work well for many biomedical applications. On the other hand, it often doesn’t. Recent work out of NeurIPS by Raghu and colleagues fr

  • 3
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值