文本的格式大概是这样的:
Energy Usage:
----------------------------------------------------------------
Usage Avg. Kw-hr Avg. Peak Cost
Pump Factor Effic. /m3 Kw Kw /day
----------------------------------------------------------------
6 100.00 75.00 0.34 81.19 156.78 0.00
----------------------------------------------------------------
Demand Charge: 0.00
Total Cost: 0.00
Node Results at 0:00:00 hrs:
----------------------------------------------
Demand Head Pressure
Node L/s m m
----------------------------------------------
2 20.00 27.81 27.81
3 20.00 31.85 31.85
4 15.00 32.01 32.01
5 30.00 37.15 37.15
6 -322.92 0.00 0.00 Reservoir
1 237.92 15.00 15.00 Tank
Node Results at 1:00:00 hrs:
----------------------------------------------
Demand Head Pressure
Node L/s m m
----------------------------------------------
2 20.00 105.42 105.42
3 20.00 105.42 105.42
4 0.00 105.51 105.51
5 0.00 105.60 105.60
6 -40.00 0.00 0.00 Reservoir
1 0.00 20.00 20.00 Tank
Node Results at 2:00:00 hrs:
----------------------------------------------
Demand Head Pressure
Node L/s m m
----------------------------------------------
2 20.00 105.42 105.42
3 20.00 105.42 105.42
4 0.00 105.51 105.51
5 0.00 105.60 105.60
6 -40.00 0.00 0.00 Reservoir
1 0.00 20.00 20.00 Tank
Node Results at 3:00:00 hrs:
----------------------------------------------
Demand Head Pressure
Node L/s m m
----------------------------------------------
2 20.00 101.58 101.58
3 20.00 101.60 101.60
4 15.00 101.63 101.63
5 30.00 101.85 101.85
6 -85.00 0.00 0.00 Reservoir
1 0.00 20.00 20.00 Tank
Analysis ended Tue Jul 3 16:22:49 2018
业务的目标:1,提取Pressure的数据,只提取四行
2,压力多少时的时间是多少?
业务思路:
1,先用re正则表达式,确定文本的信息,所在的位置
pattern = re.compile(r'NodeResultsat')
match(pattern,joinedlines)
2,提取时间,用处在的位置
3,提取压强,就是把这一行的数字识别出来,然后添加进数组里面,这样想取最后一个数字就可以直接index查找得到。
pressure = get_pressure(joinedlines1)
ss.append(pressure[3])
这段代码的细节可以忽略,主要是提供文本提取信息的一个思路。首先,要找规律,然后借助re定位,最后用字符串处理的一些方法提取信息。
文件:点击打开链接
完整代码:
import re
file = open('d://QQ数据//1.txt')
lines = file.readlines()
#for line in lines:
# print('line:\n',line)
#粘合在一起用re
pattern = re.compile(r'NodeResultsat')
def joinlines(lines,replace = True):
new = ''.join(lines)
if replace ==True:
new = new.replace(' ','')
return new
#验证位置匹配
def match(pattern,test):
if re.match(pattern,test):
return True
else:
return False
#这里输入的是压缩过的字符串
def get_hour(str1):
pattern = re.compile(r'NodeResultsat')
if match(pattern,str1):
# print(str1[13])
if str1[14]==':':
return str1[13] #一位数时间
else:
return str1[13]+str1[14] #两位数时间
else:
print('没有检测到时间,报错!')
#这里是没有经过压缩的字符串
#先将数字提取出来,然后取最后一个
def get_pressure(str1):
#形式 '1 23.2 34.2 23.3'
s = []
yucun = []
for i in range(len(str1)):
# print('轮数\n',i)
if i == len(str1)-1:
new = ''.join(yucun)
s.append(new)
if (str1[i]== ' ' and len(yucun)>0):
new = ''.join(yucun)
s.append(new)
yucun = []
continue
if (str1[i]== ' ' and len(yucun)==0):
continue
else:
yucun.append(str1[i])
if len(s)<3:
print('没有把四个数全部收纳')
return s
#s1 = '1 32 34 12'
#print(get_pressure(s1))
def process(lines):
num = len(lines)
s = []
hours = []
for i in range(num):
joinedlines = joinlines(lines[i])
if match(pattern,joinedlines):
hour = get_hour(joinedlines) #得到这个的小时数
hours.append(hour)
s.append(i) #记录行数
ss = []
for item in s:
# print('item\n',item)
for j in range(item+5,item+9,1):
joinedlines1 = joinlines(lines[j],replace=False)
# print(joinedlines1)
pressure = get_pressure(joinedlines1)
# print('pressure\n',pressure[3])
ss.append(pressure[3])
print(ss)
print(hours)
return ss
process(lines)