python删除txt指定内容,在python中删除大文本文件中的特定行

I have several large text text files that all have the same structure and I want to delete the first 3 lines and then remove illegal characters from the 4th line. I don't want to have to read the entire dataset and then modify as each file is over 100MB with over 4 million records.

Range 150.0dB -64.9dBm

Mobile unit 1 Base -17.19968 145.40369 999.8

Fixed unit 2 Mobile -17.20180 145.29514 533.0

Latitude Longitude Rx(dB) Best unit

-17.06694 145.23158 -050.5 2

-17.06695 145.23297 -044.1 2

So lines 1,2 and 3 should be deleted and in line 4, "Rx(db)" should be just "Rx" and "Best Unit" be changed to "Best_Unit". Then I can use my other scripts to geocode the data.

I can't use commandline programs like grep (as in this question) as the first 3 lines are not all the same -the numbers (such as 150.0dB, -64*) will change in each file so you have to just delete the whole of lines 1-3 and then grep or similar can do the search-replace on line 4.

Thanks guys,

=== EDIT new pythonic way to handle larger files from @heltonbiker. Error.

import os, re

##infile = arcpy.GetParameter(0)

##chunk_size = arcpy.GetParameter(1) # number of records in each dataset

infile='trc_emerald.txt'

fc= open(infile)

Name = infile[:infile.rfind('.')]

outfile = Name+'_db.txt'

line4 = fc.readlines(100)[3]

line4 = re.sub('\([^\)].*?\)', '', line4)

line4 = re.sub('Best(\s.*?)', 'Best_', line4)

newfilestring = ''.join(line4 + [line for line in fc.readlines[4:]])

fc.close()

newfile = open(outfile, 'w')

newfile.write(newfilestring)

newfile.close()

del lines

del outfile

del Name

#return chunk_size, fl

#arcpy.SetParameterAsText(2, fl)

print "Completed"

Traceback (most recent call last): File "P:\2012\Job_044_DM_Radio_Propogation\Working\FinalPropogation\TRC_Emerald\working\clean_file_1c.py",

line 13, in

newfilestring = ''.join(line4 + [line for line in fc.readlines[4:]]) TypeError: 'builtin_function_or_method' object is

unsubscriptable

解决方案

As wim said in the comments, sed is the right tool for this. The following command should do what you want:

sed -i -e '4 s/(dB)//' -e '4 s/Best Unit/Best_Unit/' -e '1,3 d' yourfile.whatever

To explain the command a little:

-i executes the command in place, that is it writes the output back into the input file

-e execute a command

'4 s/(dB)//' on line 4, subsitute '' for '(dB)'

'4 s/Best Unit/Best_Unit/' same as above, except different find and replace strings

'1,3 d' from line 1 to line 3 (inclusive) delete the entire line

sed is a really powerful tool, which can do much more than just this, well worth learning.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值