java删除csv一行_如何删除两个CSV之间的不常见行?

这篇博客展示了如何创建和解析CSV文件,然后通过Python进行数据过滤,去除异常数据点。作者首先创建了包含'坏数据'和'好数据'的示例文件,接着读取文件并将其转换为字典列表。通过提取'好数据'中的时间戳,筛选掉不符合条件的条目,最终得到干净的数据集。这是一个基础的数据预处理过程,对于数据分析和机器学习至关重要。
摘要由CSDN通过智能技术生成

创建演示数据:

# bad data, the weird ones are bad

data = """

ts1,d001,d002,d003

ts2,d001,d002,d003

ts3,d001,d002,d003

weird1,d001,d002,d003

weird2,d001,d002,d003

ts4,d001,d002,d003

"""

# the good data

other = """

ts1,f001,f002,f003

ts2,f001,f002,f003

ts3,f001,f002,f003

ts4,f001,f002,f003

"""

# create demo files

fn1 = "d1.csv"

fn2 = "d2.csv"

with open(fn1,"w") as f:

f.write(data)

with open(fn2,"w") as f:

f.write(other)

现在解析:

import csv

def readFile(name):

"""returns a dict for data with 4 columns"""

result = []

with open(name,"r") as f:

k = csv.DictReader(f,fieldnames=["ts","dp1","dp2","dp3"])

for l in k:

result.append(l)

return result

badData = readFile(fn1)

goodData = readFile(fn2)

print(badData)

print(goodData)

输出:

# weired data

[{'dp3': 'd003', 'ts': 'ts1', 'dp1': 'd001', 'dp2': 'd002'},

{'dp3': 'd003', 'ts': 'ts2', 'dp1': 'd001', 'dp2': 'd002'},

{'dp3': 'd003', 'ts': 'ts3', 'dp1': 'd001', 'dp2': 'd002'},

{'dp3': 'd003', 'ts': 'weird1', 'dp1': 'd001', 'dp2': 'd002'},

{'dp3': 'd003', 'ts': 'weird2', 'dp1': 'd001', 'dp2': 'd002'},

{'dp3': 'd003 ', 'ts': 'ts4', 'dp1': 'd001', 'dp2': 'd002'}]

# good data

[{'dp3': 'f003', 'ts': 'ts1', 'dp1': 'f001', 'dp2': 'f002'},

{'dp3': 'f003', 'ts': 'ts2', 'dp1': 'f001', 'dp2': 'f002'},

{'dp3': 'f003', 'ts': 'ts3', 'dp1': 'f001', 'dp2': 'f002'},

{'dp3': 'f003 ', 'ts': 'ts4', 'dp1': 'f001', 'dp2': 'f002'}]

现在要消除不良数据点:

# get all the "good" ts

goodTs = set( oneDict["ts"] for oneDict in goodData)

# clean the bad data, only keep those "ts" that are in goodTs

cleanedData = [x for x in badData if x["ts"] in goodTs]

print(cleanedData)

输出:

# filtered weired data

[{'dp3': 'd003', 'ts': 'ts1', 'dp1': 'd001', 'dp2': 'd002'},

{'dp3': 'd003', 'ts': 'ts2', 'dp1': 'd001', 'dp2': 'd002'},

{'dp3': 'd003', 'ts': 'ts3', 'dp1': 'd001', 'dp2': 'd002'},

{'dp3': 'd003 ', 'ts': 'ts4', 'dp1': 'd001', 'dp2': 'd002'}]

完成 .

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值