python -- 实现路径的匹配，剔除掉指定路径，并保存路径

最新推荐文章于 2024-06-28 18:57:07 发布

简朴-ocean

最新推荐文章于 2024-06-28 18:57:07 发布

阅读量364

点赞数

分类专栏： python-海洋数据处理文章标签： python windows 开发语言

本文链接：https://blog.csdn.net/weixin_44237337/article/details/132620130

版权

python-海洋数据处理专栏收录该内容

79 篇文章 237 订阅

订阅专栏

python – 实现路径的匹配，剔除掉指定路径，并保存路径

在处理nc数据时，由于部分数据在插值的过程中，存在过多的0值，使得在制作标签时该时刻的数据出现报错，但是对于一年的数据量来说，无关紧要，所以只是记录了出现报错的时刻的路径，方面在后续变量读取过程中进行剔除，报错后续文件的处理。
下面记录一下主要的代码过程，包含以下部分
1、记录报错的文件路径
2、剔除原始路径中报错的路径
3、匹配其他数据剔除后的路径并保存

1、记录保存的文件路径

	skipped_files = []  # 记录跳过的文件路径
	cloud_label = []
    start = time.time()
    for filename in data:
        print(filename)
        try:
        	cloud_data = process_cloud(filename)
            cloud_label.append(cloud_data)
        except Exception as e:
            print(f"Error occurred while processing {filename}: {str(e)}")
            skipped_files.append(filename)
            
    cloud_label = np.array(cloud_label)    
    np.savez_compressed('cloud_label',cloud_label=cloud_label)
    if skipped_files:
        with open("skipped_files.txt", "w") as f:
            f.write("\n".join(skipped_files))

2、剔除原始路径中报错的路径

原始路径中包含出现报错的路径

import pandas as pd        
import pickle
def read_pickle_file(file_path):
    
    with open(file_path, 'rb') as file:
        
        data = sorted(pickle.load(file))
        
    return data

sate_path  = read_pickle_file(r'./match_sate_list_2018_2018.pkl')
gpm_path   = read_pickle_file(r'./match_gpm_list_2018_2018.pkl')
skip_path  = pd.read_csv('./skipped_files.txt',header=None, squeeze=True)


removed_indices = []

for index, path in enumerate(sate_path):
    print(index,path)
    if any(skip in path for skip in skip_path):
        removed_indices.append(index)
        
remaining_sate_path = [path for index, path in enumerate(sate_path) if index not in removed_indices]

remaining_gpm_path  = [path for index, path in enumerate(gpm_path) if index not in removed_indices]

# 打印删除后剩余的路径
print("删除后剩余的 sate_path:", remaining_sate_path)
print("删除后剩余的 sate_path:", remaining_gpm_path)

保存后的索引显示如下：
检查记录的索引是否与原始路径对的上，可以发现是对的上的。skip-path的第一个对于原始路径中的第12个索引位置，结果是没有问题的

上述代码思路为:

1、读取目标文件的路径，包含两个原始路径sate和gpm，以及一个记录出现报错的路径skip
2、通过循环，记录出现报错的路径在原始路径中的索引位置
3、再次通过循环，剔除掉在原始路径中出现报错信息对应索引位置的路径，并保存剔除后的路径

保存处理后的路径

保存方式1：

def save_paths_to_file(file_path, data):
    with open(file_path, 'w') as file:
        for path in sorted(data):
            file.write(path + '\n')

save_paths_to_file( remaining_sate_path,'seafog_sate_path.pkl')
save_paths_to_file( remaining_gpm_path,'seafog_gpm_path.pkl',)

保存方式2：

def save_to_pickle(data, file_path):
    with open(file_path, 'wb') as f:
        pickle.dump(data, f)
        print(file_path, 'saved.')
        
save_to_pickle( remaining_sate_path,'2018_seafog_sate_path.pkl')
save_to_pickle( remaining_gpm_path,'2018_seafog_gpm_path.pkl')