Sorry for resurrecting an ancient thread, but I recently wanted to do this, but I wanted a 100% portable bash script to do it. So here's my solution using only grep and sed.
The below was bashed out very quickly, and so could be made much more elegant, but I'm just getting started really with sed/awk etc...
curl "http://www.webpagewithtableinit.com/" 2>/dev/null | grep -i -e '\?TABLE\|\?TD\|\?TR\|\?TH' | sed 's/^[\ \t]*//g' | tr -d '\n' | sed 's/]*>/\n/Ig' | sed 's/]*>//Ig' | sed 's/^]*>\|]*>$//Ig' | sed 's/]*>]*>/,/Ig'
As you can see I've got the page source using curl, but you could just as easily feed in the table source from elsewhere.
Here's the explanation:
Get the Contents of the URL using cURL, dump stderr to null (no progress meter)
curl "http://www.webpagewithtableinit.com/" 2>/dev/null
.
I only want Table elements (return only lines with TABLE,TR,TH,TD tags)
| grep -i -e '\?TABLE\|\?TD\|\?TR\|\?TH'
.
Remove any Whitespace at the beginning of the line.
| sed 's/^[\ \t]*//g'
.
Remove newlines
| tr -d '\n\r'
.
Replace with newline
| sed 's/]*>/\n/Ig'
.
Remove TABLE and TR tags
| sed 's/]*>//Ig'
.
Remove ^
, ^, $, $| sed 's/^]*>\|]*>$//Ig'
.
Replace
with comma| sed 's/]*>]*>/,/Ig'
.
Note that if any of the table cells contain commas, you may need to escape them first, or use a different delimiter.
Hope this helps someone!