python read_excel converters_使用python中的大熊猫的read_excel函数将日期作为字符串

本文探讨了如何使用pandas的read_excel函数读取Excel文件时,不让日期列自动转换为datetime类型,而是保留为字符串。尝试了不同方法,包括设置parse_dates参数和使用converters。解决方案是在read_excel中使用converters={'Date': str}选项来实现。
摘要由CSDN通过智能技术生成

Python 2.7.10

Tried pandas 0.17.1 -- function read_excel

Tried pyexcel 0.1.7 + pyexcel-xlsx 0.0.7 -- function get_records()

When using pandas in Python is it possible to read excel files (formats: xls|xlsx) and leave columns containing date or date + time values as strings rather than auto-converting to datetime.datetime or timestamp types?

If this is not possible using pandas can someone suggest an alternate method/library to read xls|xlsx files and leave date column values as strings?

For the pandas solution attempts the df.info() and resultant date column types are shown below:

>>> df.info()

Int64Index: 117 entries, 0 to 116

Columns: 176 entries, Mine to Index

dtypes: datetime64[ns](2), float64(145), int64(26), object(3)

memory usage: 161.8+ KB

>>> type(df['Start Date'][0])

Out[6]: pandas.tslib.Timestamp

>>> type(df['End Date'][0])

Out[7]: pandas.tslib.Timestamp

Attempt/Approach 1:

def read_as_dataframe(filename, ext):

import pandas as pd

if ext in ('xls', 'xlsx'):

# problem: date columns auto converted to datetime.datetime or timestamp!

df = pd.read_excel(filename) # unwanted - date columns converted!

return df, name, ext

Attempt/Approach 2:

import pandas as pd

# import datetime as datetime

# parse_date = lambda x: datetime.strptime(x, '%Y%m%d %H')

parse_date = lambda x: x

elif ext in ('xls', 'xlsx', ):

df = pd.read_excel(filename, parse_dates=False)

date_cols = [df.columns.get_loc(c) for c in df.columns if c in ('Start Date', 'End Date')]

# problem: date columns auto converted to datetime.datetime or timestamp!

df = pd.read_excel(filename, parse_dates=date_cols, date_parser=parse_date)

And have also tried pyexcel library but it does the same auto-magic convert behavior:

Attempt/Approach 3:

import pyexcel as pe

import pyexcel.ext.xls

import pyexcel.ext.xlsx

t0 = time.time()

if ext == 'xlsx':

records = pe.get_records(file_name=filename)

for record in records:

print("start date = %s (type=%s), end date = %s (type=%s)" %

(record['Start Date'],

str(type(record['Start Date'])),

record['End Date'],

str(type(record['End Date'])))

)

解决方案Using converters{'Date': str} option inside the pandas.read_excel which helps.

pandas.read_excel(xlsx, sheet, converters={'Date': str})

you can try convert your timestamp back to the original format

df['Date'][0].strftime('%Y/%m/%d')

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值