kaggle实战


比赛信息详见此链接

本文展示的实现思路是分别使用三个神经网络模型。

  • 输入数据,对模型进行训练,最后通过K折交叉验证对模型进行评估。
  • 但是本方案并没有达到比赛的要求,本方案并没有给出比赛要求的预测方框,并且本方案使用的评估参数也于比赛要求的不同
  • 本方案的优点:实现简单,易于理解。模型评估时,展现出很高预测准确率。
  • 本方案的缺点:在kaggle上提供的资源运行该方案时,会报内存溢出的错。

构建神经网络预测肺炎

具体内容详见此链接

第0部分 前期准备工作

数据来源详见此链接
首先是引入相关的包

# Imports
import os
import cv2
import glob
import time
import pydicom
import skimage
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from skimage import feature, filters
%matplotlib inline

from functools import partial
from collections import defaultdict
from joblib import Parallel, delayed
from lightgbm import LGBMClassifier
from tqdm import tqdm

# Tensorflow / Keras
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import *
from tensorflow.keras import Model
from tensorflow.keras.applications.vgg16 import VGG16
from keras import models
from keras import layers

# sklearn
from sklearn.model_selection import KFold
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import RandomizedSearchCV

sns.set_style('whitegrid')
np.warnings.filterwarnings('ignore')	#忽略训练中的不影响运行的报错

接下来就是将数据路径关联起来

# 关联数据读取路径
trainImagesPath = "../input/rsna-pneumonia-detection-challenge/stage_2_train_images"
testImagesPath = "../input/rsna-pneumonia-detection-challenge/stage_2_test_images"

labelsPath = "../input/rsna-pneumonia-detection-challenge/stage_2_train_labels.csv"
classInfoPath = "../input/rsna-pneumonia-detection-challenge/stage_2_detailed_class_info.csv"

# 读取标签和类信息
labels = pd.read_csv(labelsPath)
details = pd.read_csv(classInfoPath)

第1部分:以适当的格式实现我们的培训和测试数据

"""
@Description: Reads an array of dicom image paths, and returns an array of the images after they have been read
              读取一组 dicom 图像路径,并在读取后返回一组图像
@Inputs: An array of filepaths for the images
         图像的文件路径数组
@Output: Returns an array of the images after they have been read
         读取图像后返回图像数组
"""
def readDicomData(data):
    
    res = []
    
    for filePath in tqdm(data): # Loop over data
        
        # We use stop_before_pixels to avoid reading the image (Saves on speed/memory)
        f = pydicom.read_file(filePath, stop_before_pixels=True)
        res.append(f)
    
    return res
# 获取一系列测试和训练文件路径
trainFilepaths = glob.glob(f"{
     trainImagesPath}/*.dcm")
testFilepaths = glob.glob(f"{
     testImagesPath}/*.dcm")

# 将数据读入数组
trainImages = readDicomData(trainFilepaths[:5000])
testImages = readDicomData(testFilepaths)

|100%|██████████| 5000/5000 [00:46<00:00, 107.50it/s]
100% |██████████| 3000/3000 [00:27<00:00, 110.21it/s]|

第2部分:平衡数据

COUNT_NORMAL = len(labels.loc[labels['Target'] == 0]) # 没有肺炎的患者数量
COUNT_PNE = len(labels.loc[labels['Target'] == 1]) # 肺炎患者数量
TRAIN_IMG_COUNT = len(trainFilepaths) # 总患者数

# 计算每一个的权重
weight_for_0 = (1 / COUNT_NORMAL)*(TRAIN_IMG_COUNT)/2.0 
weight_for_1 = (1 / COUNT_PNE)*(TRAIN_IMG_COUNT)/2.0

classWeight = {
   0: weight_for_0, 
               1: weight_for_1}

print(f"Weights: {
     classWeight}")

Weights: {0: 0.6454140866873065, 1: 1.3963369963369963}

第3部分:获取train_y&test_y

"""
@Description: 此功能解析包含的Meta-Data包含的医学图像

@Inputs: 在读取后接受DICOM图像

@Output: 返回解压后的数据和组元素关键字
"""
def parseMetadata(dcm):
    
    unpackedData = {
   }
    groupElemToKeywords = {
   }
    
    for d in dcm: # Iterate here to force conversion from lazy RawDataElement to DataElement
        pass
    
    # Un-pack Data
    for tag, elem in dcm.items():
        tagGroup = tag.group
        tagElem = tag.elem
        keyword = elem.keyword
        groupElemToKeywords[(tagGroup, tagElem)] = keyword
        value = elem.value
        unpackedData[keyword] = value
        
    return unpackedData, groupElemToKeywords
# 解析这些元数据到词典中
trainMetaDicts, trainKeyword = zip(*[parseMetadata(x) for x in tqdm(trainImages)])
testMetaDicts, testKeyword = zip(*[parseMetadata(x) for x in tqdm(testImages)])

100%|██████████| 5000/5000 [00:04<00:00, 1123.70it/s]
100%|██████████| 3000/3000 [00:02<00:00, 1279.92it/s]

"""
@Description: 此功能通过DICOM图像信息并返回1或0(取决于图像是否包含肺炎或不存在)
@Inputs: 包含元数据的数据帧

@Output: 返回结果Y(即:我们的训练和测试数据的结果Y)
"""
def createY(df):
    y = (df['SeriesDescription'] == 'view: PA')
    Y = np.zeros(len(y)
  • 1
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 7
    评论
评论 7
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值