两个DataFrame方法官方说明奉上,
df0 = pd.read_csv(file_name)
df0 = df0.append(df,ignore_index=True)
df0.drop_duplicates(subset="trade_date", inplace=True)
print(df0)
df0.to_csv(file_name, index=False)
df0 = pd.read_csv(file_name)
df0.drop_duplicates(subset="trade_date", inplace=True)
print(df0)
trade_date ggt_ss ggt_sz hgt sgt north_money south_money
0 20190829 1960.0 418.0 567.11 199.79 766.90 2378.0
1 20190828 1687.0 493.0 -112.55 -303.25 -415.80 2180.0
2 20190827 2621.0 762.0 7371.21 4394.13 11765.34 3383.0
3 20190826 4005.0 1599.0 -2107.11 -530.21 -2637.32 5604.0
4 20190823 2041.0 1013.0 91.95 1446.33 1538.28 3054.0
10 20190822 2384.0 554.0 643.87 1268.83 1912.70 2938.0
11 20190821 2089.0 927.0 1432.15 891.26 2323.41 3016.0
12 20190820 1978.0 1007.0 -367.52 -471.31 -838.83 2985.0
13 20190819 2075.0 1395.0 3861.04 4621.52 8482.56 3470.0
14 20190816 3811.0 1726.0 -102.61 253.84 151.23 5537.0
25 20190829 1960.0 418.0 567.11 199.79 766.90 2378.0
26 20190828 1687.0 493.0 -112.55 -303.25 -415.80 2180.0
27 20190827 2621.0 762.0 7371.21 4394.13 11765.34 3383.0
28 20190826 4005.0 1599.0 -2107.11 -530.21 -2637.32 5604.0
29 20190823 2041.0 1013.0 91.95 1446.33 1538.28 3054.0
30 20190822 2384.0 554.0 643.87 1268.83 1912.70 2938.0
31 20190821 2089.0 927.0 1432.15 891.26 2323.41 3016.0
32 20190820 1978.0 1007.0 -367.52 -471.31 -838.83 2985.0
33 20190819 2075.0 1395.0 3861.04 4621.52 8482.56 3470.0
34 20190816 3811.0 1726.0 -102.61 253.84 151.23 5537.0
trade_date ggt_ss ggt_sz hgt sgt north_money south_money
0 20190829 1960.0 418.0 567.11 199.79 766.90 2378.0
1 20190828 1687.0 493.0 -112.55 -303.25 -415.80 2180.0
2 20190827 2621.0 762.0 7371.21 4394.13 11765.34 3383.0
3 20190826 4005.0 1599.0 -2107.11 -530.21 -2637.32 5604.0
4 20190823 2041.0 1013.0 91.95 1446.33 1538.28 3054.0
5 20190822 2384.0 554.0 643.87 1268.83 1912.70 2938.0
6 20190821 2089.0 927.0 1432.15 891.26 2323.41 3016.0
7 20190820 1978.0 1007.0 -367.52 -471.31 -838.83 2985.0
8 20190819 2075.0 1395.0 3861.04 4621.52 8482.56 3470.0
9 20190816 3811.0 1726.0 -102.61 253.84 151.23 5537.0
所读的数据就是df.to_csv写的,数据格式样的,但是不行,最后尝试将append后的数据先to_csv再read_csv,然后再drop_duplicates,惊人发现此时可以去重,不知何故,本质一定是pandas认为这里不存在重复被,但是为什么呢
2020年2月10日发现了问题所在,自问自答一下:
原来是数据类型不一样,默认pd.read_csv会把某些数据转换成int,float等类型,而新加的数据我都是str的,所以去重去不了,只要df1 = df1.astype(type("1"))都转换成str再去重就OK了参考更改dataFrame数据类型https://blog.csdn.net/python_ai_road/article/details/81158376
当然也可以读取csv的时候指定数据类型参考pandas官方说明文档。https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html?highlight=read_csv#pandas.read_csv
数据列太多不便指定就一律转换成str吧