I have a column cash_date in pandas dataframe which is a object. I am not able to use pandas to_datetime function here. Shape of my data frame is (47654566,5).My data frame looks like
cash_date amount id
02-JAN-13 12.00.00.000000000 AM 100 1
13-FEB-13 12.00.00.000000000 AM 200 2
09-MAR-13 12.00.00.000000000 AM 300 3
03-APR-13 12.00.00.000000000 AM 400 4
02-JAN-13 06.26.02.438000000 PM 500 7
17-NOV-18 08.31.47.443000000 PM 700 8
I tried following ways -
df.cash_date = pd.to_datetime(df['cash_date'], errors='coerce') # Not working
for i in range(len(df)):
df.cash_date = df.cash_date.astype(str).str.split('\d\d.\d\d.\d\d.\d\d\d\d\d\d\d\d\d')[i][0] # Not working
I want the data frame looks like s-
cash_date amount id date
02-JAN-13 12.00.00.000000000 AM 100 1 02-JAN-13
13-FEB-13 12.00.00.000000000 AM 200 2 13-FEB-13
09-MAR-13 12.00.00.000000000 AM 300 3 09-MAR-13
03-APR-13 12.00.00.000000000 AM 400 4 03-APR-13
02-JAN-13 06.26.02.438000000 PM 500 7 02-JAN-13
17-NOV-18 08.31.47.443000000 PM 700 8 17-NOV-18
解决方案
Specify a format=... argument.
pd.to_datetime(df['cash_date'], format='%d-%b-%y %H.%M.%S.%f %p', errors='coerce')
0 2013-01-02 12:00:00.000
1 2013-02-13 12:00:00.000
2 2013-03-09 12:00:00.000
3 2013-04-03 12:00:00.000
4 2013-01-02 06:26:02.438
5 2018-11-17 08:31:47.443
Name: cash_date, dtype: datetime64[ns]
Details about acceptable formats may be found at http://strftime.org.
From here, you can floor the datetimes using dt.floor:
df['date'] = pd.to_datetime(
df['cash_date'], format='%d-%b-%y %H.%M.%S.%f %p', errors='coerce'
).dt.floor('D')
df
cash_date amount id date
0 02-JAN-13 12.00.00.000000000 AM 100 1 2013-01-02
1 13-FEB-13 12.00.00.000000000 AM 200 2 2013-02-13
2 09-MAR-13 12.00.00.000000000 AM 300 3 2013-03-09
3 03-APR-13 12.00.00.000000000 AM 400 4 2013-04-03
4 02-JAN-13 06.26.02.438000000 PM 500 7 2013-01-02
5 17-NOV-18 08.31.47.443000000 PM 700 8 2018-11-17
OTOH, if you are looking to extract the date component without parsing the date, there are a couple of options:
str.split
df['date'] = df['cash_date'].str.split(n=1).str[0]
df
cash_date amount id date
0 02-JAN-13 12.00.00.000000000 AM 100 1 02-JAN-13
1 13-FEB-13 12.00.00.000000000 AM 200 2 13-FEB-13
2 09-MAR-13 12.00.00.000000000 AM 300 3 09-MAR-13
3 03-APR-13 12.00.00.000000000 AM 400 4 03-APR-13
4 02-JAN-13 06.26.02.438000000 PM 500 7 02-JAN-13
5 17-NOV-18 08.31.47.443000000 PM 700 8 17-NOV-18
Or, using a list comprehension.
df['date'] = [x.split(None, 1)[0] for x in df['cash_date']]
df
cash_date amount id date
0 02-JAN-13 12.00.00.000000000 AM 100 1 02-JAN-13
1 13-FEB-13 12.00.00.000000000 AM 200 2 13-FEB-13
2 09-MAR-13 12.00.00.000000000 AM 300 3 09-MAR-13
3 03-APR-13 12.00.00.000000000 AM 400 4 03-APR-13
4 02-JAN-13 06.26.02.438000000 PM 500 7 02-JAN-13
5 17-NOV-18 08.31.47.443000000 PM 700 8 17-NOV-18