I'm reading a 6 million entry .csv file with Python, and I want to be able to search through this file for a particular entry.
Are there any tricks to search the entire file? Should you read the whole thing into a dictionary or should you perform a search every time? I tried loading it into a dictionary but that took ages so I'm currently searching through the whole file every time which seems wasteful.
Could I possibly utilize that the list is alphabetically ordered? (e.g. if the search word starts with "b" I only search from the line that includes the first word beginning with "b" to the line that includes the last word beginning with "b")
I'm using import csv.
(a side question: it is possible to make csv go to a specific line in the file? I want to make the program start at a random line)
Edit: I already have a copy of the list as an .sql file as well, how could I implement that into Python?
解决方案
If the csv file isn't changing, load in it into a database, where searching is fast and easy. If you're not familiar with SQL, you'll need to brush up on that though.
Here is a rough example of inserting from a csv into a sqlite table. Example csv is ';' delimited, and has 2 columns.
import csv
import sqlite3
con = sqlite3.Connection('newdb.sqlite')
cur = con.cursor()
cur.execute('CREATE TABLE "stuff" ("one" varchar(12), "two" varchar(12));')
f = open('stuff.csv')
csv_reader = csv.reader(f, delimiter=';')
cur.executemany('INSERT INTO stuff VALUES (?, ?)', csv_reader)
cur.close()
con.commit()
con.close()
f.close()