I see plenty of examples of importing a CSV into a PostgreSQL db, but what I need is an efficient way to import 500,000 CSV's into a single PostgreSQL db. Each CSV is a bit over 500KB (so grand total of approx 272GB of data).
The CSV's are identically formatted and there are no duplicate records (the data was generated programatically from a raw data source). I have been searching and will continue to search online for options, but I would appreciate any direction on getting this done in the most efficient manner possible. I do have some experience with Python, but will dig into any other solution that seems appropriate.
Thanks!
解决方案
If you start by reading the PostgreSQL guide "Populating a Database" you'll see several pieces of advice:
Load the data in a single transaction.
Use COPY if at all possible.
Remove indexes, foreign key constraints etc before loading the data and restore them afterwards.
PostgreSQL's COPY statement already supports the CSV format:
COPY table (column1, column2, ...) FROM '/path/to/data.csv' WITH (FORMAT CSV)
so it looks as if you are best off not using Python at all, or using Python only to generate the required sequence of COPY statements.