medical image analysis 笔记2

最新推荐文章于 2023-02-13 00:17:05 发布

穹镜

最新推荐文章于 2023-02-13 00:17:05 发布

阅读量206

点赞数

分类专栏： medical image analysis 笔记文章标签： python

本文链接：https://blog.csdn.net/weixin_42890793/article/details/118596949

版权

类别不平衡加权损失函数医学图像分析损失计算模型优化

关键词由CSDN通过智能技术生成

medical image analysis 笔记专栏收录该内容

2 篇文章 0 订阅

订阅专栏

AI for medical image analysis

2.Counting labels 课程代码

避免类不平衡影响损失函数的一种方法是对损失进行不同的加权。
要选择权重，您首先需要计算类别频率。
对于本练习，您将只获得每个标签的计数。
稍后，您将使用此处练习的概念来计算作业中的频率！

# Import the necessary packages
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# Read csv file containing training datadata
train_df = pd.read_csv("nih/train-small.csv")

# Count up the number of instances of each class (drop non-class columns from the counts)
class_counts = train_df.sum().drop(['Image','PatientId'])

for column in class_counts.keys():
    print(f"The class {column} has {train_df[column].sum()} samples")

在这里插入图片描述

# Plot up the distribution of counts
sns.barplot(class_counts.values, class_counts.index, color='b')
plt.title('Distribution of Classes for Training Dataset', fontsize=15)
plt.xlabel('Number of Patients', fontsize=15)
plt.ylabel('Diseases', fontsize=15)
plt.show()

在这里插入图片描述

Weighted Loss function

下面是一个计算加权损失的例子。
在作业中，您将计算加权损失函数。
此示例代码将使您直观了解加权损失函数的作用，并帮助您练习将在评分作业中使用的一些语法。
对于此示例，您将首先定义一组假设的真实标签，然后定义一组预测。
运行下一个单元格以创建“ground truth”标签。

# Generate an array of 4 binary label values, 3 positive and 1 negative
y_true = np.array(
        [[1],
         [1],
         [1],
         [0]])
print(f"y_true: \n{y_true}")

两个模型为了更好地理解损失函数，
对于给出的任何示例，模型 1 始终输出 0.9。
对于给出的任何示例，模型 2 始终输出 0.1。

# Make model predictions that are always 0.9 for all examples
y_pred_1 = 0.9 * np.ones(y_true.shape)
print(f"y_pred_1: \n{y_pred_1}")
print()
y_pred_2 = 0.1 * np.ones(y_true.shape)
print(f"y_pred_2: \n{y_pred_2}")

正则损失函数的问题这里的学习目标是要注意，使用正则损失函数（不是加权损失），总是输出 0.9 的模型比模型 2 的损失更小（表现更好）。这是因为有一个
类别不平衡，其中 4 个标签中有 3 个是 1。如果数据完全平衡（两个标签为 1，两个标签为 0），模型 1 和模型 2 将具有相同的损失。
每个人都会得到两个正确的例子和两个不正确的例子。
但是，由于数据不平衡，常规损失函数意味着模型 1 优于模型 2。

注意常规非加权损失的缺点看看你从这两个模型中得到了什么损失（模型 1 总是预测 0.9，模型 2 总是预测 0.1），看看每个模型的常规（未加权）损失函数是什么。

loss_reg_1 = -1 * np.sum(y_true * np.log(y_pred_1)) + \
                -1 * np.sum((1 - y_true) * np.log(1 - y_pred_1))
print(f"loss_reg_1: {loss_reg_1:.4f}")

loss_reg_2 = -1 * np.sum(y_true * np.log(y_pred_2)) + \
                -1 * np.sum((1 - y_true) * np.log(1 - y_pred_2))
print(f"loss_reg_2: {loss_reg_2:.4f}")

print(f"When the model 1 always predicts 0.9, the regular loss is {loss_reg_1:.4f}")
print(f"When the model 2 always predicts 0.1, the regular loss is {loss_reg_2:.4f}")

When the model 1 always predicts 0.9, the regular loss is 2.6187
When the model 2 always predicts 0.1, the regular loss is 7.0131

请注意，当预测始终为 0.1 时，损失函数会产生更大的损失，因为数据是不平衡的，并且有三个标签为 1 但只有一个标签为 0。给定具有更多正标签的类不平衡，常规损失函数意味着
具有较高预测值 0.9 的模型的性能优于具有较低预测值 0.1 的模型。
加权损失如何相同地对待两个模型使用加权损失函数，当预测全部为 0.9 时与预测全部为 0.1 时，您将获得相同的加权损失。
注意 0.9 的预测如何与 1 的正标签相距 0.1。
还要注意 0.1 的预测如何与 0 的负标签相距 0.1 因此模型 1 和 2 沿 0.5 的中点“对称”，如果绘制它们在 0 到 1 之间的数轴上。
在这里插入图片描述

由于此示例数据集足够小，因此您可以计算要在加权损失函数中使用的正权重。
要获得正权重，计算存在多少 NEGATIVE 标签，除以示例总数。
在这种情况下，有一个负面标签，总共有四个示例。
同样，负权重是正标签的比例。
运行下一个单元格以定义正负权重。

# calculate the positive weight as the fraction of negative labels
w_p = 1/4

# calculate the negative weight as the fraction of positive labels
w_n = 3/4

print(f"positive weight w_p: {w_p}")
print(f"negative weight w_n {w_n}")

# Calculate and print out the first term in the loss function, which we are calling 'loss_pos'
loss_1_pos = -1 * np.sum(w_p * y_true * np.log(y_pred_1 ))
print(f"loss_1_pos: {loss_1_pos:.4f}")

loss_1_pos: 0.0790

# Calculate and print out the second term in the loss function, which we're calling 'loss_neg'
loss_1_neg = -1 * np.sum(w_n * (1 - y_true) * np.log(1 - y_pred_1 ))
print(f"loss_1_neg: {loss_1_neg:.4f}")

loss_1_neg: 1.7269

# Sum positive and negative losses to calculate total loss
loss_1 = loss_1_pos + loss_1_neg
print(f"loss_1: {loss_1:.4f}")

loss_1: 1.8060
在这里插入图片描述
尽管存在类别不平衡，即有 3 个正标签但只有一个负标签，但加权损失通过给予负标签比给予正标签更多的权重来解释这一点。
多个类别的加权损失

穹镜

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
medical image analysis 笔记2

AI for medical image analysis2.Counting labels 课程代码避免类不平衡影响损失函数的一种方法是对损失进行不同的加权。要选择权重，您首先需要计算类别频率。对于本练习，您将只获得每个标签的计数。稍后，您将使用此处练习的概念来计算作业中的频率！# Import the necessary packagesimport numpy as npimport pandas as pdimport seaborn as snsimport matplotli
复制链接

扫一扫

专栏目录