I have a compressed folder called gziptest.tar.gz which contains several plaintext files.
I'd like to be able to get the filenames and corresponding contents of the files, but the examples of usage for the gzip library don't cover this.
The following code:
import gzip
in_f = gzip.open('/home/cholloway/gziptest.tar.gz')
print in_f.read()
produces the output:
gzip test/file2000664 001750 001750 00000000016 12621163624 015761 0ustar00chollowaycholloway000000 000000 I like apples
gzip test/file1000664 001750 001750 00000000025 12621164026 015755 0ustar00chollowaycholloway000000 000000 hello world
line two
gzip test/000775 001750 001750 00000000000 12621164026 015035 5ustar00chollowaycholloway000000 000000
I could use some regular expressions to detect the start of a new file and extract the filename, but I'm wondering if this functionality already exists within gzip or another standard python library.
解决方案
For that file, don't use the gzip library. Use the tarfile library.
The file you are working with is the gzip-compression of a tar archive of the files test/*.
If you only want to recover the tar archive, then use gzip to uncompress the file. The resulting file is (as you discovered) an archive of the files you want.
Logically, if you want to access the files inside the tar archive, we must first use the gzip library to recover the tar archive and then use the tarfile library to recover the files.
Practically, we only use the tarfile library: the tarfile library will automatically invoke the gzip library on your behalf.
I've copied this example from the examples section of the tarfile man page:
import tarfile
tar = tarfile.open("sample.tar.gz")
tar.extractall()
tar.close()