I have several large text text files that all have the same structure and I want to delete the first 3 lines and then remove illegal characters from the 4th line. I don't want to have to read the entire dataset and then modify as each file is over 100MB with over 4 million records.
Range 150.0dB -64.9dBm
Mobile unit 1 Base -17.19968 145.40369 999.8
Fixed unit 2 Mobile -17.20180 145.29514 533.0
Latitude Longitude Rx(dB) Best unit
-17.06694 145.23158 -050.5 2
-17.06695 145.23297 -044.1 2
So lines 1,2 and 3 should be deleted and in line 4, "Rx(db)" should be just "Rx" and "Best Unit" be changed to "Best_Unit". Then I can use my other scripts to geocode the data.
I can't use commandline programs like grep (as in this question) as the first 3 lines are not all the same -the numbers (such as 150.0dB, -64*) will change in each file so you have to just delete the whole of lines 1-3 and then grep or similar can do the search-replace on line 4.
Thanks guys,
=== EDIT new pythonic way to handle larger files from @heltonbiker. Error.
import os, re
##infile = arcpy.GetParameter(0)
##chunk_size = arcpy.GetParameter(1) # number of records in each dataset
infile='trc_emerald.txt'
fc= open(infile)
Name = infile[:infile.rfind('.')]
outfile = Name+'_db.txt'
line4 = fc.readlines(100)[3]
line4 = re.sub('\([^\)].*?\)', '', line4)
line4 = re.sub('Best(\s.*?)', 'Best_', line4)
newfilestring = ''.join(line4 + [line for line in fc.readlines[4:]])
fc.close()
newfile = open(outfile, 'w')
newfile.write(newfilestring)
newfile.close()
del lines
del outfile
del Name
#return chunk_size, fl
#arcpy.SetParameterAsText(2, fl)
print "Completed"
Traceback (most recent call last): File "P:\2012\Job_044_DM_Radio_Propogation\Working\FinalPropogation\TRC_Emerald\working\clean_file_1c.py",
line 13, in
newfilestring = ''.join(line4 + [line for line in fc.readlines[4:]]) TypeError: 'builtin_function_or_method' object is
unsubscriptable
解决方案
As wim said in the comments, sed is the right tool for this. The following command should do what you want:
sed -i -e '4 s/(dB)//' -e '4 s/Best Unit/Best_Unit/' -e '1,3 d' yourfile.whatever
To explain the command a little:
-i executes the command in place, that is it writes the output back into the input file
-e execute a command
'4 s/(dB)//' on line 4, subsitute '' for '(dB)'
'4 s/Best Unit/Best_Unit/' same as above, except different find and replace strings
'1,3 d' from line 1 to line 3 (inclusive) delete the entire line
sed is a really powerful tool, which can do much more than just this, well worth learning.