Kaggle最速入门：318场TOP方案，学完就上手！

计算机与软件考研

于 2023-09-06 14:00:51 发布

阅读量124

点赞数

文章标签：深度学习人工智能机器学习神经网络

原文链接：https://mp.weixin.qq.com/s?__biz=MzU2ODAzNTMyMg==&mid=2247583765&idx=1&sn=a6d4c4827e1d089856227aef05ad167f&chksm=fc97da80cbe05396059dfa88e21bfa9fd55c2a9c2fdb1a31144dc468ee520997396e3f9f8369&scene=126&sessionid=0

版权

Kaggle比赛，绝对是同学们提升背景性价比最高的途径！一段KagglePrizeWinner经历，让你在众多竞争者中脱颖而出，为申请、求职起到强有力的背书。

很多同学担心自己的比赛经验不多，无法上手。其实在入门的时候，多学习Kaggle大神的TOP方案是一个找到比赛思路的高效方法。我总结了318场比赛TOP方案，有需要的同学可以扫码领取。

扫码回复“比赛”

领318篇TOP方案

最近Kaggle也发布了一个CV方向的比赛：RSNA 2023 Abdominal Trauma Detection 腹部创伤检测竞赛。适合同学们进行实战练习。

我专门联合Kaggle前1000大神Mozak老师，带来本次比赛的赛题讲座，为大家详解高分Baseline。讲座免费观看，扫码就能解锁。

扫码回复“比赛”

赛题讲座免费看

本次比赛的思路为：

数据预处理和图像特征提取

使用Python库如pydicom来加载这些图像。由于DICOM数据可能包含多个图像序列，可以根据patient_id和series_id将它们组织起来。

模型选择

可以选择使用经典的卷积神经网络架构，如ResNet、DenseNet或EfficientNet。

模型训练和验证

将数据分为训练集和验证集，用于训练和调整模型。

模型推断和测试集评估

在训练好的模型上进行推断，对测试集中的图像进行预测。

扫码回复“比赛”

赛题讲座免费看

以下为部分关键代码：

将DICOM转换为jpg

def dicom_to_image(dicom_image):
    """
    Read the dicom file and preprocess appropriately.
    """
    pixel_array = dicom_image.pixel_array
    
    if dicom_image.PixelRepresentation == 1:
        bit_shift = dicom_image.BitsAllocated - dicom_image.BitsStored
        dtype = pixel_array.dtype 
        new_array = (pixel_array << bit_shift).astype(dtype) >>  bit_shift
        pixel_array = pydicom.pixel_data_handlers.util.apply_modality_lut(new_array, dicom_image)
    
    if dicom_image.PhotometricInterpretation == "MONOCHROME1":
        pixel_array = 1 - pixel_array
    
    # transform to hounsfield units
    intercept = dicom_image.RescaleIntercept
    slope = dicom_image.RescaleSlope
    pixel_array = pixel_array * slope + intercept
    
    # windowing
    window_center = int(dicom_image.WindowCenter)
    window_width = int(dicom_image.WindowWidth)
    img_min = window_center - window_width // 2
    img_max = window_center + window_width // 2
    pixel_array = pixel_array.copy()
    pixel_array[pixel_array < img_min] = img_min
    pixel_array[pixel_array > img_max] = img_max
    
    # normalization
    pixel_array = (pixel_array - pixel_array.min())/(pixel_array.max() - pixel_array.min())
    
    return (pixel_array * 255).astype(np.uint8)

模型预测

pred_list = []
for pid in train_pids[:10]:    
    pid_paths_dcm_paths = glob.glob('/kaggle/input/rsna-2023-abdominal-trauma-detection/train_images/' + str(pid) + '/*/*')
    pid_paths_dcm_paths.sort()
    
    pid_paths_dcm_paths = pid_paths_dcm_paths[-5:]
    imgs = [Image.fromarray(dicom_to_image(pydicom.read_file(x))) for x in pid_paths_dcm_paths]
    imgs = [transform(x) for x in imgs]
    
    imgs = torch.cat(imgs, 0)
    imgs = imgs[:, None, :, :]
    with torch.no_grad():
        imgs = imgs.cuda()
        output = model(imgs)[:, :-1]
        pred = torch.sigmoid(output).data.cpu().numpy().round(3)
        pred = pred.mean(0)
        
    pred_list.append(pred)

扫码回复“比赛”

赛题讲座免费看