python批量读取dbf_python下用dbfread操作DBF文件

最新推荐文章于 2024-07-01 20:58:38 发布

weixin_39758041

最新推荐文章于 2024-07-01 20:58:38 发布

阅读量500

点赞数

文章标签： python批量读取dbf

今天要从同事发给我的一个文件中统计一些数字，一看还是数据库文件，以DBF结尾，近1个G呢。电脑上也没装ACCESS等数据库管理软件。后来找了个DBF阅读器，发现虽然能打开，但是筛选什么的不方便，也不好导出EXCEL表！怎么搞？当前是看看python能不能帮忙哈！

网上找了很多python关于dbf文件操作的项目。最后还是发现用dbfread比较方便。项目地址：https://github.com/zycool/dbfread

一、安装

pip install dbfread

二、打开一个DBF文件

>>> from dbfread import DBF

>>> table = DBF('people.dbf')

>>> for record in table:

... print(record)

OrderedDict([('NAME', 'Alice'), ('BIRTHDATE', datetime.date(1987, 3, 1))])

OrderedDict([('NAME', 'Bob'), ('BIRTHDATE', datetime.date(1980, 11, 12))])

>>> len(table)

>>> for record in table.deleted:

... print(record)

OrderedDict([('NAME', 'Deleted Guy'), ('BIRTHDATE', datetime.date(1979, 12, 22))])

>>> len(table.deleted)

二、一盘情况下，一次只缓存一条数据在内存中。其它的都是直接从硬盘中读取。但是可以通过参数load=True，来进行全部加载。这种加载是随机的:

>>> table = DBF('people.dbf', load=True)

>>> print(table.records[1]['NAME'])

Bob

>>> print(table.records[0]['NAME'])

Alice

Alternatively, you can load the records later by calling table.load(). This is useful when you want to look at the header before you commit to loading anything. For example, you can make a function which returns a list of tables in a directory and load only the ones you need.

If you just want a list of records and you don’t care about the other table attributes you can do:

>>> records = list(DBF('people.dbf'))

You can unload records again with table.unload().

If the table is not loaded, the records and deleted attributes return RecordIterator objects.

Loading or iterating over records will open the DBF and memo file once for each iteration. This means the DBF object doesn’t hold any files open, only the RecordIterator object does.

Character Encodings

All text fields and memos (except binary ones) will be returned as unicode strings.

dbfread will try to detect the character encoding (code page) used in the file by looking at the language_driver byte. If this fails it reverts to ASCII. You can override this by passingencoding='my-encoding'. The encoding is available in the encoding attribute.

Memo Files

If there is at least one memo field in the file dbfread will look for the corresponding memo file. For buildings.dbf this would be buildings.fpt (for Visual FoxPro) or buildings.dbt (for other databases).

Since the Windows file system is case preserving, the file names may end up mixed case. For example, you could have:

Buildings.dbf BUILDINGS.DBT

This creates problems in Linux, where file names are case sensitive. dbfread gets around this by ignoring case in file names. You can turn this off by passing ignorecase=False.

If the memo file is missing you will get a MissingMemoFile exception. If you still want the rest of the data you can pass ignore_missing_memofile=True. All memo field values will now be returned as None, as would be the case if there was no memo.

dbfread has full support for Visual FoxPro (.FPT) and dBase III (.DBT) memo files. It reads dBase IV (also .DBT) memo files, but only if they use the default block size of 512 bytes. (This will be fixed if I can find more files to study.)

Record Factories

If you don’t want records returned as collections.OrderedDict you can use the recfactoryargument to provide your own record factory.

A record factory is a function that takes a list of (name, value) pairs and returns a record. You can do whatever you like with this data. Here’s a function that creates a record object with fields as attributes:

class Record(object):

def __init__(self, items):

for (name, value) in items:

setattr(self, name, value)

for record in DBF('people.dbf', recfactory=Record, lowernames=True):

print(record.name, record.birthdate)

If you pass recfactory=None you will get the original (name, value) list. (This is a shortcut for recfactory=lambda items: items.)

Custom Field Types

If the included message types are not enough you can add your own by subclassing FieldParser. As a silly example, here how you can read text (C) fields in reverse:

from dbfread import DBF, FieldParser

class MyFieldParser(FieldParser):

def parseC(self, field, data):

# Return strings reversed.

return data.rstrip(' 0').decode()[::-1]

for record in DBF('files/people.dbf', parserclass=MyFieldParser):

print(record['NAME'])

and here’s how you can return invalid values as InvalidValue instead of raising ValueError:

from dbfread import DBF, FieldParser, InvalidValue

class MyFieldParser(FieldParser):

def parse(self, field, data):

try:

return FieldParser.parse(self, field, data)

except ValueError:

return InvalidValue(data)

table = DBF('invalid_value.dbf', parserclass=MyFieldParser):

for i, record in enumerate(table):

for name, value in record.items():

if isinstance(value, InvalidValue):

print('records[{}][{!r}] == {!r}'.format(i, name, value))

This will print:

records[0][u'BIRTHDATE'] == InvalidValue(b'NotAYear')

weixin_39758041

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python批量读取dbf_python下用dbfread操作DBF文件

今天要从同事发给我的一个文件中统计一些数字，一看还是数据库文件，以DBF结尾，近1个G呢。电脑上也没装ACCESS等数据库管理软件。后来找了个DBF阅读器，发现虽然能打开，但是筛选什么的不方便，也不好导出EXCEL表！怎么搞？当前是看看python能不能帮忙哈！网上找了很多python关于dbf文件操作的项目。最后还是发现用dbfread比较方便。项目地址：https://github.com/z...
复制链接

扫一扫