What i'm trying to do is query a Panda DataFrame in order to give me a filtered version of the original one
self.waferInfo = pd.read_csv(fileName, index_col= None, na_values=['NA', ""] , usecols=[18,5,6,8,2])
print(self.waferInfo.head(5))
self.df2 = self.waferInfo[(self.waferInfo.FILE_FINISH_TS >= dateBegin) & (self.waferInfo.FILE_FINISH_TS <= dateEnd) ]
print(self.df2.head(5))
when the first print happens the expected rows print out but when the 2nd one is called, it appears empty. I figured out the reason that was happening was because the original DataFrame has some blanks
for example :
18 5 6 8 2
A B C E
D E T Y P
F R B A L
I would want my Dataframe to return
18 5 6 8 2
D E T Y P
F R B A L
the fact that Column 8 has an empty cell it returns a complete empty DataFrame. I know this because I deleted all the rows that had empty cell's in excel and the DataFrame worked fine after that.
is there any way to ignore rows that have a missing value.
解决方案
I do not think that your assumptions about the root cause of the problem are correct. See below.
"""
18 5 6 8 2
A B C E
D E T Y P
F R B A L
"""
import pandas as pd
import numpy as np
df = pd.read_clipboard()
print(df)
print("\n")
print(df.dropna())
Output:
18 5 6 8 2
0 A B C E None
1 D E T Y P
2 F R B A L
18 5 6 8 2
1 D E T Y P
2 F R B A L
If df2.head(5) returns nothing, then it's because df2 is empty, which is not because there are NaN's in your df.
Perhaps
self.waferInfo[(self.waferInfo.FILE_FINISH_TS >= dateBegin) & \
(self.waferInfo.FILE_FINISH_TS <= dateEnd) ]
should be
self.waferInfo.loc[(self.waferInfo.FILE_FINISH_TS >= dateBegin) & \
(self.waferInfo.FILE_FINISH_TS <= dateEnd) ]
I can't say for sure because you haven't provided enough sample data.