数据预处理（csv--＞json）

最新推荐文章于 2024-08-06 13:58:32 发布

铭铭铭铭铭铭铭铭

最新推荐文章于 2024-08-06 13:58:32 发布

阅读量650

点赞数

分类专栏：数据类文章标签： python 大数据 csv json

本文链接：https://blog.csdn.net/weixin_44540204/article/details/109609187

版权

数据类专栏收录该内容

2 篇文章 0 订阅

订阅专栏

对于csv文件数据的预处理过程

1. csv文件读取并转换成列表存储

首先，我们将讨论从CSV文件中获取数据并将其转换为以JSON格式存储。我们有一个示例CSV文件colors.csv，其中包含一些有关颜色的数据。加载并打印内容时，我们可以看到每行对应一种颜色，并包含ID，颜色名称，该颜色的十六进制代码以及该颜色的红色，绿色和蓝色值用于RGB代码。

import csv, json
from pprint import pprint

reader = csv.reader(open("./data/colors.csv", "r")) 
color_lists = list(reader)
pprint(color_lists[:8])

[['air_force_blue_raf', 'Air Force Blue (Raf)', '#5d8aa8', '93', '138', '168'],
 ['air_force_blue_usaf', 'Air Force Blue (Usaf)', '#00308f', '0', '48', '143'],
 ['air_superiority_blue',
  'Air Superiority Blue',
  '#72a0c1',
  '114',
  '160',
  '193'],
 ['alabama_crimson', 'Alabama Crimson', '#a32638', '163', '38', '56'],
 ['alice_blue', 'Alice Blue', '#f0f8ff', '240', '248', '255'],
 ['alizarin_crimson', 'Alizarin Crimson', '#e32636', '227', '38', '54'],
 ['alloy_orange', 'Alloy Orange', '#c46210', '196', '98', '16'],
 ['almond', 'Almond', '#efdecd', '239', '222', '205']]

2.将列表里面的数据转移到字典内

现在，我们可以遍历此颜色列表并创建字典来存储与每种颜色关联的值。我们可以将这些词典放在另一个列表中：

color_dicts = []
for row in color_lists:
    
    color = {
        "id" : row[0],
        "name" : row[1],
        "hex_value" : row[2],
        "r" : row[3],
        "g" : row[4],
        "b" : row[5]
    }
    
    color_dicts.append(color)
pprint(color_dicts[:5])

输出：

[{'b': '168',
  'g': '138',
  'hex_value': '#5d8aa8',
  'id': 'air_force_blue_raf',
  'name': 'Air Force Blue (Raf)',
  'r': '93'},
 {'b': '143',
  'g': '48',
  'hex_value': '#00308f',
  'id': 'air_force_blue_usaf',
  'name': 'Air Force Blue (Usaf)',
  'r': '0'},
 {'b': '193',
  'g': '160',
  'hex_value': '#72a0c1',
  'id': 'air_superiority_blue',
  'name': 'Air Superiority Blue',
  'r': '114'},
 {'b': '56',
  'g': '38',
  'hex_value': '#a32638',
  'id': 'alabama_crimson',
  'name': 'Alabama Crimson',
  'r': '163'},
 {'b': '255',
  'g': '248',
  'hex_value': '#f0f8ff',
  'id': 'alice_blue',
  'name': 'Alice Blue',
  'r': '240'}]

3.字典可以轻易转换为JSON文件

json.dump(color_dicts, open("data/color-dicts.json", "w"))

4.notes

(1). json.dumps() json.dump()的区别

dumps是将dict转化成str格式，loads是将str转化成dict格式。

dump和load也是类似的功能，只是与文件操作结合起来了。

(2). 如何让json file数据变得更加有可读性？

nobel_laureates = json.load(open("data/nobel-laureates.json", "r"))

with open("./data/nobel-laureates-scalable.json", 'w') as f:
    for laureate in nobel_laureates['laureates']:
        f.write(json.dumps(laureate) + "\n")