python将缺失的数据填充到csv中,在Python中以CSV填充空点

I am parsing a csv file to create charts. I am able to do this with no problem, EXCEPT in a single case... Whenever there is a null slot in the csv file. For example:

Col1 Col2 Col3 Col4 Col5

45   34     23     98     18

66            25     0

18            52     56    100

There are two blank entries in the file in column 2 and 5. I want to fill these spots with 0. I'm fairly new to Python. In the case where there is a null spot in the csv, I would like to insert a 0. Because I may sometimes have blanks in my csv file, I get the error TypeError: unsupported operand type(s) for -: 'int' and 'str'. It can be tiresome to have to go into the csv file to check whether there is a null spot and manually fill it with zero so I would like a way to do this in the script. Here is my code:

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

file_name = "myfile.csv"

df = pd.read_csv(file_name)

names = df['name'].values

x = np.arange(len(names))*2

w = 0.40

col2 = df.columns[1]

col3 = df.columns[2]

col4 = df.columns[3]

col5 = df.columns[4]

dif = df[col4] - df[col3]

colors = ['Red' if d < -5 else 'Blue' for d in dif]

plt.bar(x-w, df[col2].values, width=w*0.7, label=col2, color = "cyan")

plt.bar(x, df[col3].values, width=w*0.7, label=col3, color = "green")

plt.bar(x+w, df[col4].values, width=w*0.7, label=col4, color = colors)

plt.plot(x, df[col5].values, lw=2, label="Goal", color = "red")

plt.xticks(x, names, rotation='vertical')

plt.ylim([0,100])

plt.show()

Note: As I mentioned above, I'm reading the dataframe from a csv file.

EDIT:

I have added this line to my code:

df.replace(r'^\s*$', 0, regex=True)

#For testing purposes, I also added this:

print(df.replace(r'^\s*$', 0, regex=True))

I can see that the empty slots are now filled with zeros but I am still getting the error TypeError: unsupported operand type(s) for -: 'str' and 'int' for dif = df[col4] - df[col3]. Is it possibly reading those inserted 0 as strings?

I have also tried to wrap df[col3] and df[col4] in int() but no luck there. It gives the error TypeError: cannot convert the series to . I then tried df[col4].astype(int) - df[col3].astype(int) and got the error ValueError: invalid literal for int() with base 10.

EDIT 2:

I just added the line print(df.dtypes). For some reason the fourth column (which was containing the replaced 0 in this case) is being seen as an object instead of int64 like the rest of the columns.

解决方案

import pandas as pd

file_name = "myfile.csv"

df = pd.read_csv(file_name)

# a Pandas method that fills any NaN value with 0, you can change 0 to any value you

# want, you can use mean or median, etc

df.fillna(0, inplace=True)

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值