与风景对话_交互式旅游推荐系统_数据收集与预处理

本文链接：https://blog.csdn.net/chenxucn/article/details/139910112

4. 数据的整合与合并

文章目录

- - 4. 数据的整合与合并
  - - Json文件的合并
    - 结果

Json文件的合并

经过以上处理，我们得到了许多json文件，位于同一目录/res之下，如图
在这里插入图片描述

而我们下一步就是需要将这些json文件进行合并，注意操作时仅需要合并json文件，其他txt文件不需要合并，因为所有txt经过上一步都转换为了json，以下是转换的具体操作。

转换函数的详细解释：

def merge_json_files(directory, output_filepath):

定义了一个名为merge_json_files的函数，它接受两个参数：directory（包含JSON文件的目录路径）和output_filepath（合并后的输出文件路径）。

    merged_data = []

初始化一个空列表merged_data，用于存储所有合并的JSON对象。

    # 获取目录中的所有json文件
    for filename in os.listdir(directory):

使用os.listdir(directory)列出指定目录中的所有文件和子目录，并开始遍历这些文件名。

        if filename.endswith('.json'):
            filepath = os.path.join(directory, filename)
            with open(filepath, 'r', encoding='utf-8') as file:
                # 读取并清理文件内容
                file_content = file.read().replace('\n', '')

检查当前文件名是否以.json结尾，如果是，构建文件的完整路径并以只读模式打开文件。读取文件内容并移除所有换行符（\n），将其存储在file_content变量中。

                # 将文件内容转换为json对象并添加到列表
                json_objects = file_content.split('}{')
                for i in range(len(json_objects)):
                    if not json_objects[i].startswith('{'):
                        json_objects[i] = '{' + json_objects[i]
                    if not json_objects[i].endswith('}'):
                        json_objects[i] = json_objects[i] + '}'

将文件内容按'}{'分割成多个JSON对象字符串，然后修正分割后每个对象的格式，确保每个对象以'{'开始并以'}'结束。

                for obj in json_objects:
                    try:
                        json_obj = json.loads(obj)
                        merged_data.append(json_obj)
                    except json.JSONDecodeError as e:
                        print(f"Error decoding JSON from file {filename}: {e}")

遍历每个修正后的JSON字符串，将其解析为JSON对象并添加到merged_data列表中。如果解析过程中遇到错误（如JSON格式错误），则捕获json.JSONDecodeError异常，并打印错误信息。

    # 将合并的数据写入指定的输出文件
    with open(output_filepath, 'w', encoding='utf-8') as output_file:
        json.dump(merged_data, output_file, ensure_ascii=False, indent=4)

使用with open语句打开指定的输出文件路径，并以写入模式和UTF-8编码进行操作。然后使用json.dump将合并的JSON数据写入文件，确保非ASCII字符不被转义，并使用4个空格进行缩进格式化。

    print(f"Merged {len(merged_data)} JSON objects into {output_filepath}")

打印合并操作完成后的信息，显示合并的JSON对象数量和输出文件路径。

# 使用脚本
directory_path = 'E:\OneDrive\桌面\spider\\res'  # 替换为你的json文件所在目录
output_path = 'E:\OneDrive\桌面\spider\merged.json'  # 替换为你想要的输出文件路径
merge_json_files(directory_path, output_path)

设置脚本的使用路径，指定JSON文件所在的目录路径directory_path和输出文件路径output_path，然后调用merge_json_files函数进行合并操作。