Reading Text Tables with Python
Reading tables is a pretty common thing to do and there are a number of ways to read tables besides writing a read function yourself. That’s not to say these are magic bullets. Every table is different and can have its own eccentricities. If you find yourself reading the same type of quirky file over and over again it could be worth your effort to write your own reader that does things just the way you like. That said, here are some other options.
numpy.loadtxt
numpy.loadtxt is a very simple reader. There are ways to make it do some advanced things like handle missing data or read non-numeric columns but they are all a bit tedious so this function is best used with well behaved tables.
numpy.loadtxt has a couple of useful keywords. Use the skiprows
keyword to skip header lines. By default numpy.loadtxt will split columns on white space but you can specify other separators using the delimiter
keyword. If you want select only certain columns from the table use the usecols
keyword. That can be useful if you want to skip a text column.
Normally the data is returned as one large 2D array but setting unpack=True
will return the the columns as individual arrays.
Examples
Reading a well formatted, white space delimited table into a single array:
>>> np.loadtxt('data_table.txt', skiprows=1)
array([[ 0.2536, 0.1008, 0.3857],
[ 0.4839, 0.4536, 0.3561],
[ 0.1292, 0.6875, 0.5929],
[ 0.1781, 0.3049, 0.8928],
[ 0.6253, 0.3486, 0.8791]])
Reading a well formatted, white space delimited table into a three arrays:
>>> a,b,c = np.loadtxt('data_table.txt', skiprows=1, unpack=True)
>>> a
array([ 0.2536, 0.4839, 0.1292, 0.1781, 0.6253])
>>> b
array([ 0.1008, 0.4536, 0.6875, 0.3049, 0.3486])
>>> c
array([ 0.3857, 0.3561, 0.5929, 0.8928, 0.8791])
Reading a table with NAN
values:
>>> np.loadtxt('data_table2.txt', skiprows=1)
array([[ 0.4839, 0.4536, 0.3561],
[ 0.1292, 0.6875, nan],
[ 0.1781, 0.3049, 0.8928],
[ nan, 0.5801, 0.2038],
[ 0.5993, 0.4357, 0.741 ]])
Reading a nicely formatted CSV file. Skip the first column since it contains strings:
>>> np.loadtxt('exoplanetData_clean.csv', skiprows=1, delimiter=',', usecols=(1,2,3))
array([[ 0.2 , 0.33 , 29.329 ],
[ 9.1 , 1.29 , 60.3251],
[ 17. , 0.96 , 143.213 ],
[ 6.8 , 0.38 , 20.86