很累想被爱

文章描述了一个Python脚本,通过遍历指定目录下的JSON文件,筛选出不含异物和锡的问题,将数据提取到DataFrame并分批存储到SQLite数据库中,用于数据分析。
摘要由CSDN通过智能技术生成
col_names = []
with open(col_name_file, 'r') as f:
    # Read the content of the file line by line
    for line in f.readlines():
        # Split the line into words
        col_names.append(line.strip("\n"))
print(col_names)

from tqdm import tqdm
import sqlite3

conn = sqlite3.connect('data3.db')
t = []
i = 0

for root, dirs, files in os.walk(data_dir):
    for file in tqdm(files):
        if not file.endswith(".json"):
            continue

        with open(os.path.join(root, file), "r", encoding="utf-8") as file:
            jsonData = json.load(file)
            defectDir = os.path.dirname(root).split(os.path.sep)[-1]

            for item in jsonData:
                defectName_1 = item.get('defectName-1', "")
                defectName_2 = item.get('defectName-2', "")
                defectName_3 = item.get('defectName-3', "")

                if "异物" in item.get('problemCause-1', "") or "异物" in item.get('problemCause-2', ""):
                    continue
                if "锡" not in defectName_1 and "锡" not in defectName_2 and defectName_1 != "":
                    continue

                row = []
                for col_name in col_names:
                    row.append(item.get(col_name, ""))
                row.append(defectDir)

                t.append(row)
                i += 1

                if len(t) == 500000:
                    df = pd.DataFrame(t, columns=col_names + ["defectDir"])
                    df.to_sql("data_defect2", conn, if_exists="append")
                    t = []

df = pd.DataFrame(t, columns=col_names + ["defectDir"])
df.to_sql("data_defect2", conn, if_exists="append")
t = []
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值