阅读nnUNet V2代码——nnUNetv2_plan_and_preprocess命令

w1ndfly

已于 2024-08-14 10:21:34 修改

阅读量1.1k

点赞数 8

分类专栏：阅读nnUNet V2代码文章标签：人工智能计算机视觉深度学习神经网络机器学习

于 2024-05-30 17:12:38 首次发布

本文链接：https://blog.csdn.net/shey_joe/article/details/139328671

版权

阅读nnUNet V2代码专栏收录该内容

9 篇文章 6 订阅

订阅专栏

在配置好nnUNet_raw文件夹后，开始预处理原始数据。

nnUNetv2_plan_and_preprocess命令入口在nnunetv2/experiment_planning/plan_and_preprocess_entrypoints.py文件的plan_and_preprocess_entry函数

函数主要做了三件事，生成dataset_fingerprint.json文件、生成nnUNetPlans.json文件、根据上述两个文件对数据集预处理，代码结构较清晰。

生成dataset_fingerprint.json文件

阅读plan_and_preprocess_entry函数，找到生成dataset_fingerprint.json文件的入口：

# fingerprint extraction
print("Fingerprint extraction...")
extract_fingerprints(args.d, args.fpe, args.npfp, args.verify_dataset_integrity, args.clean, args.verbose)

进入extract_fingerprints函数：

def extract_fingerprints(dataset_ids: List[int], fingerprint_extractor_class_name: str = 'DatasetFingerprintExtractor',
                         num_processes: int = default_num_processes, check_dataset_integrity: bool = False,
                         clean: bool = True, verbose: bool = True):
    """
    clean = False will not actually run this. This is just a switch for use with nnUNetv2_plan_and_preprocess where
    we don't want to rerun fingerprint extraction every time.
    """
    fingerprint_extractor_class = recursive_find_python_class(join(nnunetv2.__path__[0], "experiment_planning"),
                                                              fingerprint_extractor_class_name,
                                                              current_module="nnunetv2.experiment_planning")
    # fingerprint_extractor_class 默认是
    # nnunetv2.experiment_planning.dataset_fingerprint.fingerprint_extractor.DatasetFingerprintExtractor
    for d in dataset_ids:
        extract_fingerprint_dataset(d, fingerprint_extractor_class, num_processes, check_dataset_integrity, clean,
                                    verbose)

fingerprint_extractor_class类默认是DatasetFingerprintExtractor，该类的路径在上面的注释当中。进入extract_fingerprint_dataset函数：

def extract_fingerprint_dataset(dataset_id: int,
                                fingerprint_extractor_class: Type[
                                    DatasetFingerprintExtractor] = DatasetFingerprintExtractor,
                                num_processes: int = default_num_processes, check_dataset_integrity: bool = False,
                                clean: bool = True, verbose: bool = True):
    """
    Returns the fingerprint as a dictionary (additionally to saving it)
    """
    dataset_name = convert_id_to_dataset_name(dataset_id)
    print(dataset_name)

    if check_dataset_integrity:
        verify_dataset_integrity(join(nnUNet_raw, dataset_name), num_processes)

    fpe = fingerprint_extractor_class(dataset_id, num_processes, verbose=verbose)
    return fpe.run(overwrite_existing=clean)

该函数首先获取数据集名称，检查nnUNet_raw/数据集文件下的目录格式是否正确，之后实例化指纹提取类，最后运行其run函数

关于DatasetFingerprintExtractor的代码阅读见

阅读nnUNet V2代码——生成dataset_fingerprint.json-CSDN博客

生成nnUNetPlans.json文件

阅读plan_and_preprocess_entry函数，找到生成nnUNetPlans.json文件的入口：

# experiment planning
print('Experiment planning...')
plans_identifier = plan_experiments(args.d, args.pl, args.gpu_memory_target, args.preprocessor_name,
                                        args.overwrite_target_spacing, args.overwrite_plans_name)

进入plan_experiments函数：

def plan_experiments(dataset_ids: List[int], experiment_planner_class_name: str = 'ExperimentPlanner',
                     gpu_memory_target_in_gb: float = None, preprocess_class_name: str = 'DefaultPreprocessor',
                     overwrite_target_spacing: Optional[Tuple[float, ...]] = None,
                     overwrite_plans_name: Optional[str] = None):
    """
    overwrite_target_spacing ONLY applies to 3d_fullres and 3d_cascade fullres!
    """
    if experiment_planner_class_name == 'ExperimentPlanner':
        print("\n############################\n"
              "INFO: You are using the old nnU-Net default planner. We have updated our recommendations. "
              "Please consider using those instead! "
              "Read more here: https://github.com/MIC-DKFZ/nnUNet/blob/master/documentation/resenc_presets.md"
              "\n############################\n")
    experiment_planner = recursive_find_python_class(join(nnunetv2.__path__[0], "experiment_planning"),
                                                     experiment_planner_class_name,
                                                     current_module="nnunetv2.experiment_planning")
    # experiment_planner 默认是
    # nnunetv2.experiment_planning.experiment_planners.default_experiment_planner.ExperimentPlanner
    print(experiment_planner)
    plans_identifier = None
    for d in dataset_ids:
        _, plans_identifier = plan_experiment_dataset(d, experiment_planner, gpu_memory_target_in_gb,
                                                      preprocess_class_name,
                                                      overwrite_target_spacing, overwrite_plans_name)
    return plans_identifier

experiment_planner类默认是ExperimentPlanner类，该类的路径在上面的注释当中。进入plan_experiment_dataset函数：

def plan_experiment_dataset(dataset_id: int,
                            experiment_planner_class: Type[ExperimentPlanner] = ExperimentPlanner,
                            gpu_memory_target_in_gb: float = None, preprocess_class_name: str = 'DefaultPreprocessor',
                            overwrite_target_spacing: Optional[Tuple[float, ...]] = None,
                            overwrite_plans_name: Optional[str] = None) -> Tuple[dict, str]:
    """
    overwrite_target_spacing ONLY applies to 3d_fullres and 3d_cascade fullres!
    """
    kwargs = {}
    if overwrite_plans_name is not None:
        kwargs['plans_name'] = overwrite_plans_name
    if gpu_memory_target_in_gb is not None:
        kwargs['gpu_memory_target_in_gb'] = gpu_memory_target_in_gb

    planner = experiment_planner_class(dataset_id,
                                       preprocessor_name=preprocess_class_name,
                                       overwrite_target_spacing=[float(i) for i in overwrite_target_spacing] if
                                       overwrite_target_spacing is not None else overwrite_target_spacing,
                                       suppress_transpose=False,  # might expose this later,
                                       **kwargs
                                       )
    ret = planner.plan_experiment()
    return ret, planner.plans_identifier

该函数先配置两个参数，然后实例化实验计划类，运行其plan_experiment函数，最后返回相关信息。

关于ExperimentPlanner代码阅读见

阅读nnUNet V2代码——生成nnUNetPlans.json—__init__函数-CSDN博客

阅读nnUNet V2代码——生成nnUNetPlans.json—determine_fullres_target_spacing函数-CSDN博客

阅读nnUNet V2代码——生成nnUNetPlans.json—determine_transpose函数_nnuet代码-CSDN博客
 阅读nnUNet V2代码——生成nnUNetPlans.json—static_estimate_VRAM_usage函数_unetv2代码-CSDN博客

阅读nnUNet V2代码——生成nnUNetPlans.json—get_plans_for_configuration函数-CSDN博客

阅读nnUNet V2代码——生成nnUNetPlans.json—determine_resampling函数-CSDN博客

阅读nnUNet V2代码——生成nnUNetPlans.json—determine_segmentation_softmax_export_fn函数-CSDN博客
 阅读nnUNet V2代码——生成nnUNetPlans.json—determine_normalization_scheme_and_whether_mask_is_used_for_norm函数-CSDN博客

阅读nnUNet V2代码——生成nnUNetPlans.json—plan_experiment函数-CSDN博客

ExperimentPlanner类内其余函数例如load_plans、save_plans等函数没有阅读

数据预处理

入口：

if not args.no_pp:
     print('Preprocessing...')
     preprocess(args.d, plans_identifier, args.c, np, args.verbose)

进入preprocess函数：

def preprocess(dataset_ids: List[int],
               plans_identifier: str = 'nnUNetPlans',
               configurations: Union[Tuple[str], List[str]] = ('2d', '3d_fullres', '3d_lowres'),
               num_processes: Union[int, Tuple[int, ...], List[int]] = (8, 4, 8),
               verbose: bool = False):
    for d in dataset_ids:
        preprocess_dataset(d, plans_identifier, configurations, num_processes, verbose)

参数有数据集ID、nnUNetPlans、配置（2d、3d_fullres等）、进程数、是否打印详细信息的标志位。

函数内对每一个数据集预处理，进入preprocess_dataset函数：

def preprocess_dataset(dataset_id: int,
                       plans_identifier: str = 'nnUNetPlans',
                       configurations: Union[Tuple[str], List[str]] = ('2d', '3d_fullres', '3d_lowres'),
                       num_processes: Union[int, Tuple[int, ...], List[int]] = (8, 4, 8),
                       verbose: bool = False) -> None:
    if not isinstance(num_processes, list):
        num_processes = list(num_processes)
    if len(num_processes) == 1:
        num_processes = num_processes * len(configurations)
    if len(num_processes) != len(configurations):
        raise RuntimeError(
            f'The list provided with num_processes must either have len 1 or as many elements as there are '
            f'configurations (see --help). Number of configurations: {len(configurations)}, length '
            f'of num_processes: '
            f'{len(num_processes)}')

    dataset_name = convert_id_to_dataset_name(dataset_id)
    print(f'Preprocessing dataset {dataset_name}')
    plans_file = join(nnUNet_preprocessed, dataset_name, plans_identifier + '.json')
    plans_manager = PlansManager(plans_file)
    for n, c in zip(num_processes, configurations):
        print(f'Configuration: {c}...')
        if c not in plans_manager.available_configurations:
            print(
                f"INFO: Configuration {c} not found in plans file {plans_identifier + '.json'} of "
                f"dataset {dataset_name}. Skipping.")
            continue
        configuration_manager = plans_manager.get_configuration(c)
        preprocessor = configuration_manager.preprocessor_class(verbose=verbose)
        preprocessor.run(dataset_id, c, plans_identifier, num_processes=n)

    # copy the gt to a folder in the nnUNet_preprocessed so that we can do validation even if the raw data is no
    # longer there (useful for compute cluster where only the preprocessed data is available)
    from distutils.file_util import copy_file
    maybe_mkdir_p(join(nnUNet_preprocessed, dataset_name, 'gt_segmentations'))
    dataset_json = load_json(join(nnUNet_raw, dataset_name, 'dataset.json'))
    dataset = get_filenames_of_train_images_and_targets(join(nnUNet_raw, dataset_name), dataset_json)
    # only copy files that are newer than the ones already present
    for k in dataset:
        copy_file(dataset[k]['label'],
                  join(nnUNet_preprocessed, dataset_name, 'gt_segmentations', k + dataset_json['file_ending']),
                  update=True)

函数首先判断部分参数是否符合要求。

之后获取数据集名称、载入nnUNetPlan.json文件（实例化PlansManager类）。

之后for循环按照每一个配置类型（2d、3d_fullres等）对数据集做不同的预处理。内部细节在后面。

循环结束后对dataset.json文件、nnUNet_raw / 数据集名称 / labelsTr内的掩码文件（具体包含哪些掩码文件见博客内的get_filenames_of_train_images_and_targets函数）复制到nnUNet_preprocessed / 数据集名称 / 文件夹内。

回到for循环内，从configuration_manager开始的三行代码涉及三个类，一个是上面提到的PlansManager类，一个是与PlansManager类处于同文件内的ConfigurationManager类，一个是DefaultPreprocessor类。

PlansManager类和ConfigurationManager类偏向于服务型，给DefaultPreprocessor类提供信息，信息内容来自于上面生成的两个json文件（先统计、后处理）。DefaultPreprocessor类进行数据的预处理，代码见上面DefaultPreprocessor的超链接

至此，nnUNetv2_plan_and_preprocess命令阅读完毕。

之后是训练部分