利用地质年代图谱精准判读文献中的地质时间
项目场景:
地质时间是地球科学数据的重要基础标签,其测年结果直接影响地学相关的物质分析与地球演化预期。因此,准确获知样品的地质时间对地学分析具有重要意义。随着深时数字地球(DDE)项目的研究深入,越来越多历史文献中的地质样品及其理化属性信息获得关注。例如,文献Friend et al.(1976, p. 43-46)中数据可被用于的古气候模拟。然而,文献中记载MiddleDevonian是基于1980s科学界的认知,与当代ISC2020版本的MiddleDevonian绝对时间存在差异。因此,如何精准判读文献中的地质时间是地学领域的难题之一。
随着DDE大科学计划的实施推进,海量地质年代图谱(1917-2020)被构建发布,这些不同年代的地质年代图谱给精准判读文献中的地质时间提供了相对的时间参考系,使得精准判读文献中的地质时间成为可能。
为了让文献中的海量地质时间被广泛应用,竞赛组设置“利用地质年代图谱精准判读文献中的地质时间”赛题,评比参赛算法的性能。期望参赛者在探索算法的同时,深刻理解文献知识的内涵以及文献中相关样品名称、空间位置、样品类型等属性信息对地质年代版本的影响关系,以加深对地球科学数据和知识的理解。
赛题介绍
针对竞赛组数据集,利用DDE大科学计划中大知识组构建标准地质年代图谱(共计53项)设计相关算法(Python实现),准确地判读所给数据集中每条样品记录的绝对地质时间区间及相应不确定度,例如样品中标注的Devonian为[419.2Ma±3.2Ma, 258.9Ma±2.4Ma]。
- 竞赛数据集(记录数2479条,提供包括字段LithologyCode、LithologyIDNumber、OldIDNumber、LAT(latitude)、NS(North or South)、LONG(longitude)、EW(East or west)、Continent、Country、GeogComments、LMU、Period、Stage、AgeComments、Lithology、Formation、LithComments、PrimaryReference、SeeAlso);
问题描述
这个是官方比赛赛题网址,我记得注册数据就可以下载,所以公开数据不违法,传送门http://hellodde2022.org/algorithm?type=1,数据样式
utf:middle_pennsylvanian
skos:prefLabel "middle_pennsylvanian"@en ;
rdf:type utf:Interval_time ;
utf:start 311.7;
utf:start_uncertainty 0.0 ;
utf:end 306.5 ;
utf:end_uncertainty 0.0 ;
utf:unit "Ma"@en ;
utf:reference_system utf:BGSGT12 ;
utf:partOf utf:pennsylvanian;
.
utf:phanerozoic
skos:prefLabel "phanerozoic"@en ;
rdf:type utf:Interval_time ;
utf:start 542.0;
utf:start_uncertainty 0.0 ;
utf:end 0.0 ;
utf:end_uncertainty 0.0 ;
utf:unit "Ma"@en ;
utf:reference_system utf:BGSGT12 ;
.
utf:cenozoic
skos:prefLabel "cenozoic"@en ;
rdf:type utf:Interval_time ;
utf:start 65.5;
utf:start_uncertainty 0.0 ;
utf:end 0.0 ;
utf:end_uncertainty 0.0 ;
utf:unit "Ma"@en ;
utf:reference_system utf:BGSGT12 ;
utf:partOf utf:phanerozoic;
.
utf:mesozoic
skos:prefLabel "mesozoic"@en ;
rdf:type utf:Interval_time ;
utf:start 251.0;
utf:start_uncertainty 0.0 ;
utf:end 65.0 ;
utf:end_uncertainty 0.0 ;
utf:unit "Ma"@en ;
utf:reference_system utf:BGSGT12 ;
utf:partOf utf:phanerozoic;
.
utf:palaeozoic
skos:prefLabel "palaeozoic"@en ;
rdf:type utf:Interval_time ;
utf:start 542.0;
utf:start_uncertainty 0.0 ;
utf:end 251.0 ;
utf:end_uncertainty 0.0 ;
utf:unit "Ma"@en ;
utf:reference_system utf:BGSGT12 ;
utf:partOf utf:phanerozoic;
.
utf:early_palaeozoic
skos:prefLabel "early_palaeozoic"@en ;
rdf:type utf:Interval_time ;
utf:start 542.0;
utf:start_uncertainty 0.0 ;
utf:end 416.0 ;
utf:end_uncertainty 0.0 ;
utf:unit "Ma"@en ;
utf:reference_system utf:BGSGT12 ;
utf:partOf utf:palaeozoic;
.
utf:late_palaeozoic
skos:prefLabel "late_palaeozoic"@en ;
rdf:type utf:Interval_time ;
utf:start 416.0;
utf:start_uncertainty 0.0 ;
utf:end 251.0 ;
utf:end_uncertainty 0.0 ;
utf:unit "Ma"@en ;
utf:reference_system utf:BGSGT12 ;
utf:partOf utf:palaeozoic;
.
根据LMU,Period,和Stage在上面数据集找到合适的数据后填入黄色的内容中start,start_uncertainty,end,end_uncertainty,reference_system utf。
数据预处理:
我只用了九个ttl,官方下载是ttl文件很无语,我首先就是转载成txt。
import os
def renaming(file):
"""修改后缀"""
ext = os.path.splitext(file) # 将文件名路径与后缀名分开
if ext[1] == '.ttl': # 文件名:ext[0]
new_name = ext[0] + '.txt' # 文件后缀:ext[1]
os.rename(file, new_name) # tree()已切换工作地址,直接替换后缀
def tree(path):
"""递归函数"""
files = os.listdir(path) # 获取当前目录的所有文件及文件夹
for file in files:
file_path = os.path.join(path, file) # 获取该文件的绝对路径
if os.path.isdir(file_path): # 判断是否为文件夹
tree(file_path) # 开始递归
else:
os.chdir(path) # 修改工作地址(相当于文件指针到指定文件目录地址)
renaming(file) # 修改后缀
this_path = os.getcwd() # 获取当前工作文件的绝对路径(文件夹)
tree(r'D:\pythonProject\DDE\参考地质年表\database')
将九个txt,融合成一张大表txt。
# -*- coding:utf-8 -*-
import os
"""
合并多个txt
"""
# 获取目标文件夹的路径
path = "D:\pythonProject\DDE\参考地质年表\database"
# 获取当前文件夹中的文件名称列表
filenames = os.listdir(path)
result = "D:\pythonProject\DDE\参考地质年表\database\merge.txt"
# 打开当前目录下的result.txt文件,如果没有则创建
file = open(result, 'w+', encoding="utf-8")
# 向文件中写入字符
# 先遍历文件名
for filename in filenames:
filepath = path + '/'
filepath = filepath + filename
# 遍历单个文件,读取行数
for line in open(filepath, encoding="utf-8"):
file.writelines(line)
file.write('\n')
# 关闭文件
file.close()
算法模型:
大家看完直接用就行备注写的很详细了,里面地质具体匹配的规则我也不懂,就只是一个需求:
import os
import xlrd
from xlutils.copy import copy
def extract():
dir = r"./test/"
osfile = []
for root, dirs, files in os.walk(dir):
for file in files:
if file.split('.')[1] == 'txt':
osfile.append(file)
for indexFile in range(len(osfile)):
with open('./test/' + osfile[indexFile], "r", encoding="utf-8") as f:
lines = f.readlines()
# print(lines)
flatstrat = []
flatend = []
for index in range(len(lines)):
if lines[index].find('skos:prefLabel')!=-1:
flatstrat.append(index-1)
# 由于每一段结尾并不能确定,所有采用flatstrat.append(index-5)当每一段结尾,最后一个结尾为len(lines)
for end in range(1,len(flatstrat)):
flatend.append(flatstrat[end]-2)
flatend.append(len(lines))
#print(len(flatstrat),len(flatend))
# 将没段存入词典
examples = []
for indexSE in range(len(flatstrat)):
examples.append(lines[flatstrat[indexSE]:flatend[indexSE]+1])
# 清洗
count = 0
for exs in range(len(examples)):
results.append([x.strip() for x in examples[exs] if x.strip() != ''])
for res in range(len(results)):
results[res].pop()
for resinner in range(len(results[res])):
results[res][resinner]=results[res][resinner].replace(';','').replace('@en','').replace('early','Lower').replace('late','Upper').replace('middle','Middle').strip()
# 锚点检测一下一定有skos:prefLabel 但不一定有utf:partOf因此要达到数量一一致
for indexre in range(len(results)):
flagpartOf = 1
for indexreinner in range(len(results[indexre])):
if results[indexre][indexreinner].find('utf:partOf') != -1:
flagpartOf = 0
if flagpartOf:
results[indexre].append('utf:partOf utf:NUlL')
# 提取所有skos:prefLabel 和 utf:partOf 的数据
for indexre in range(len(results)):
for indexreinner in range(len(results[indexre])):
if results[indexre][indexreinner].find('prefLabel') != -1:
prefLabel.append(results[indexre][indexreinner])
if results[indexre][indexreinner].find('utf:partOf') != -1:
partOf.append(results[indexre][indexreinner])
# for index in range(len(results)):
# print(results[index])
print('===========================================================')
def targetxlsx():
# 打开文件,获取excel文件的workbook(工作簿)对象
workbook = xlrd.open_workbook("test.xlsx") # 文件路径
worksheet = workbook.sheet_by_index(0)
name = worksheet.name # 获取表的姓名
ncols = worksheet.ncols # 获取该表总列数
# 找到所有列数据
for cols in range(ncols):
# print(worksheet.col_values(cols))
# print(len(worksheet.col_values(index)))
# 用数组进行备份
totalTagert.append(worksheet.col_values(cols))
# for totalIndex in range(len(totalTagert)):
# print(totalTagert[totalIndex])
def targetMatch():
# 打印目标三列
LMU = totalTagert[10]
print(LMU)
Period = totalTagert[11]
print(Period)
Stage = totalTagert[12]
print(Stage)
print('====+++++++++++++++++++++++++++=====')
print(prefLabel)
print(partOf)
for indexstage in range(1,len(Stage)):
# 第一种情况Stage存在的时候且Period和LMU都有
if Stage[indexstage] and Period[indexstage] and LMU[indexstage]:
flagindex = 1
innerindex = []
for indexpref in range(len(prefLabel)):
if prefLabel[indexpref].lower().find(Stage[indexstage].lower())!= -1 and partOf[indexpref].lower().find(Period[indexstage].lower())!= -1:
# 找到了
flagindex = 0
innerindex.append(indexpref)
# print(indexpref)
# print(Stage[indexstage])
# print(Period[indexstage])
# print(prefLabel[indexpref])
# print(partOf[indexpref])
# print('=====================================')
# 没找到
if flagindex:
innerindex.append(-1)
indexresults.append(innerindex)
# 第二种情况Period和LMU都有且LMU不为缩写
elif Period[indexstage] and not LMU[indexstage].find('-')!= -1:
flagindex = 1
innerindex = []
for indexpref in range(len(prefLabel)):
if prefLabel[indexpref].lower().find(Period[indexstage].lower()) != -1 and prefLabel[indexpref].lower().find(LMU[indexstage].lower()) != -1:
# 找到了
flagindex = 0
innerindex.append(indexpref)
# elif prefLabel[indexpref].lower().find(Period[indexstage].lower()) != -1:
# # 找到了
# flagindex = 0
# innerindex.append(indexpref)
# 没找到
if flagindex:
innerindex.append(-1)
indexresults.append(innerindex)
# 第三种情况Period和LMU且LMU为缩写模式
elif Period[indexstage] and LMU[indexstage].find('-')!=-1:
flagindex = 1
innerindex = []
# ('early','Lower') ('late','Upper') ('middle','Middle') L - M - U
if LMU[indexstage].find('L') != -1 and LMU[indexstage].find('U') != -1:
# print(LMU[indexstage])
innerindex.append(-2)
for indexpref in range(len(prefLabel)):
if prefLabel[indexpref].lower().find(Period[indexstage].lower()) != -1 and (prefLabel[
indexpref].lower().find('Lower'.lower()) != -1 or prefLabel[
indexpref].lower().find('Upper'.lower()) != -1):
# 找到了
flagindex = 0
innerindex.append(indexpref)
elif LMU[indexstage].find('L') != -1 and LMU[indexstage].find('M')!= -1 :
#print(LMU[indexstage])
innerindex.append(-2)
for indexpref in range(len(prefLabel)):
if prefLabel[indexpref].lower().find(Period[indexstage].lower()) != -1 and (prefLabel[
indexpref].lower().find('Lower'.lower()) != -1 or prefLabel[
indexpref].lower().find('Middle'.lower()) != -1):
# 找到了
flagindex = 0
innerindex.append(indexpref)
elif LMU[indexstage].find('M')!= -1 and LMU[indexstage].find('U') != -1:
print(LMU[indexstage])
innerindex.append(-2)
for indexpref in range(len(prefLabel)):
if prefLabel[indexpref].lower().find(Period[indexstage].lower()) != -1 and (prefLabel[
indexpref].lower().find('Middle'.lower()) != -1 or prefLabel[
indexpref].lower().find('Upper'.lower()) != -1):
# 找到了
flagindex = 0
innerindex.append(indexpref)
# 没找到
if flagindex:
innerindex.append(-1)
indexresults.append(innerindex)
else:
indexresults.append([-1])
def writeToTarger(path, writeresult):
#print(indexresults)
for index in range(len(indexresults)):
print(indexresults[index])
# 读取存在数字列表如果是唯一一个数据,且不是空数据
for index in range(len(indexresults)):
wirtelist = []
if len(indexresults[index]) == 1 and indexresults[index][0] != -1:
#print(results[indexresults[index][0]])
for writeindex in range(len(results[indexresults[index][0]])):
if results[indexresults[index][0]][writeindex].find('utf:start ') != -1:
wirtelist.append(results[indexresults[index][0]][writeindex][9:].strip())
if results[indexresults[index][0]][writeindex].find('utf:start_uncertainty ') != -1:
wirtelist.append(results[indexresults[index][0]][writeindex][21:].strip())
if results[indexresults[index][0]][writeindex].find('utf:end ') != -1:
wirtelist.append(results[indexresults[index][0]][writeindex][7:].strip())
if results[indexresults[index][0]][writeindex].find('utf:end_uncertainty ') != -1:
wirtelist.append(results[indexresults[index][0]][writeindex][19:].strip())
if results[indexresults[index][0]][writeindex].find('utf:reference_system utf:') != -1:
wirtelist.append(results[indexresults[index][0]][writeindex][25:].strip())
writeresult.append(wirtelist)
elif len(indexresults[index]) != 1 and indexresults[index][0] != -2 and indexresults[index][0] != -1:
# print(indexresults[index])
# 建立temp比大小 默认为第一个
starttemp = ''
start_uncertaintytemp = ''
flotstart_uncertaintytemp = 0.0
endtemp = ''
end_uncertaintytemp = ''
referencetemp = ''
for writeindex in range(len(results[indexresults[index][0]])):
if results[indexresults[index][0]][writeindex].find('utf:start ') != -1:
starttemp = results[indexresults[index][0]][writeindex][9:].strip()
if results[indexresults[index][0]][writeindex].find('utf:start_uncertainty ') != -1:
start_uncertaintytemp = results[indexresults[index][0]][writeindex][21:].strip()
flotstart_uncertaintytemp = float(start_uncertaintytemp)
if results[indexresults[index][0]][writeindex].find('utf:end ') != -1:
endtemp = results[indexresults[index][0]][writeindex][7:].strip()
if results[indexresults[index][0]][writeindex].find('utf:end_uncertainty ') != -1:
end_uncertaintytemp = results[indexresults[index][0]][writeindex][19:].strip()
if results[indexresults[index][0]][writeindex].find('utf:reference_system utf:') != -1:
referencetemp = results[indexresults[index][0]][writeindex][25:].strip()
# 从第二个开始
for indexinner in range(1,len(indexresults[index])):
# print(results[indexresults[index][indexinner]])
start = ''
start_uncertainty = ''
floatstart_uncertainty = 0.0
end = ''
end_uncertainty = ''
reference = ''
for indextemp in range(len(results[indexresults[index][indexinner]])):
if results[indexresults[index][indexinner]][indextemp].find('utf:start_uncertainty ') != -1:
start_uncertainty = results[indexresults[index][indexinner]][indextemp][21:].strip()
floatstart_uncertainty = float(start_uncertainty)
if results[indexresults[index][indexinner]][indextemp].find('utf:start ') != -1:
start = results[indexresults[index][indexinner]][indextemp][9:].strip()
if results[indexresults[index][indexinner]][indextemp].find('utf:end ') != -1:
end = results[indexresults[index][indexinner]][indextemp][7:].strip()
if results[indexresults[index][indexinner]][indextemp].find('utf:end_uncertainty ') != -1:
end_uncertainty = results[indexresults[index][indexinner]][indextemp][19:].strip()
if results[indexresults[index][indexinner]][indextemp].find('utf:reference_system utf:') != -1:
reference = results[indexresults[index][indexinner]][indextemp][25:].strip()
# 比大小
if flotstart_uncertaintytemp > floatstart_uncertainty:
starttemp = start
start_uncertaintytemp = start_uncertainty
endtemp = end
end_uncertaintytemp = end_uncertainty
referencetemp = reference
wirtelist.append(starttemp)
wirtelist.append(start_uncertaintytemp)
wirtelist.append(endtemp)
wirtelist.append(end_uncertaintytemp)
wirtelist.append(referencetemp)
writeresult.append(wirtelist)
#######################################################################
# 第三种输入中存在多个LMU的
elif len(indexresults[index]) != 1 and indexresults[index][0] == -2:
indexresults[index].pop(0)
print(indexresults[index])
L = []
M = []
U = []
# 将LMU各项分类
for indexLMU in range(len(indexresults[index])):
if results[indexresults[index][indexLMU]][1].lower().find('Lower'.lower())!=-1:
L.append(results[indexresults[index][indexLMU]])
if results[indexresults[index][indexLMU]][1].lower().find('Upper'.lower())!=-1:
U.append(results[indexresults[index][indexLMU]])
if results[indexresults[index][indexLMU]][1].lower().find('Middle'.lower())!=-1:
M.append(results[indexresults[index][indexLMU]])
# 建立分别比较原则 L-U start start_uncertainty为L,end end_uncertainty为U,因此依次分为3类
print('&'*100)
print(L)
print(U)
print(M)
print('+'*100)
# 建立通用保存数据数列
starttemp = ''
start_uncertaintytemp = ''
flotstart_uncertaintytemp = 0.0
endtemp = ''
end_uncertaintytemp = ''
floatend_uncertaintytemp = 0.0
referencetemp = ''
# 第一类L-U
if L and U:
# 比较晒选出符合L的start start_uncertainty
# 建立temp比大小 默认为第一个
for indexL in range(len(L[0])):
if L[0][indexL].find('utf:start ') != -1:
starttemp = L[0][indexL][9:].strip()
if L[0][indexL].find('utf:start_uncertainty ') != -1:
start_uncertaintytemp = L[0][indexL][21:].strip()
flotstart_uncertaintytemp = float(start_uncertaintytemp)
if L[0][indexL].find('utf:reference_system utf:') != -1:
referencetemp = L[0][indexL][25:].strip()
for indexU in range(len(U[0])):
if U[0][indexU].find('utf:end ') != -1:
endtemp = U[0][indexU][7:].strip()
if U[0][indexU].find('utf:end_uncertainty ') != -1:
end_uncertaintytemp = U[0][indexU][19:].strip()
floatend_uncertaintytemp = float(end_uncertaintytemp)
#从第二个开始 L 为start U 为end
for indexLt in range(1,len(L)):
start = ''
start_uncertainty = ''
floatstart_uncertainty = 0.0
reference = ''
for indexLttemp in range(len(L[indexLt])):
if L[indexLt][indexLttemp].find('utf:start ') != -1:
start = L[indexLt][indexLttemp][9:].strip()
if L[indexLt][indexLttemp].find('utf:start_uncertainty ') != -1:
start_uncertainty = L[indexLt][indexLttemp][21:].strip()
floatstart_uncertainty = float(start_uncertainty)
if L[indexLt][indexLttemp].find('utf:reference_system utf:') != -1:
reference = L[indexLt][indexLttemp][25:].strip()
# 比大小
if flotstart_uncertaintytemp > floatstart_uncertainty:
starttemp = start
start_uncertaintytemp = start_uncertainty
referencetemp = reference
for indexUt in range(1,len(U)):
end = ''
end_uncertainty = ''
floatend_uncertainty = 0.0
for indexUttemp in range(len(U[indexUt])):
if U[indexUt][indexUttemp].find('utf:end ') != -1:
end = U[indexUt][indexUttemp][7:].strip()
if U[indexUt][indexUttemp].find('utf:end_uncertainty ') != -1:
end_uncertainty = U[indexUt][indexUttemp][19:].strip()
floatend_uncertainty = float(end_uncertainty)
# 比大小
if floatend_uncertaintytemp > floatend_uncertainty:
endtemp = end
end_uncertaintytemp = end_uncertainty
# 第二类L-M
elif L and M:
# 比较晒选出符合L的start start_uncertainty
# 建立temp比大小 默认为第一个
for indexL in range(len(L[0])):
if L[0][indexL].find('utf:start ') != -1:
starttemp = L[0][indexL][9:].strip()
if L[0][indexL].find('utf:start_uncertainty ') != -1:
start_uncertaintytemp = L[0][indexL][21:].strip()
flotstart_uncertaintytemp = float(start_uncertaintytemp)
if L[0][indexL].find('utf:reference_system utf:') != -1:
referencetemp = L[0][indexL][25:].strip()
for indexM in range(len(M[0])):
if M[0][indexM].find('utf:end ') != -1:
endtemp = M[0][indexM][7:].strip()
if M[0][indexM].find('utf:end_uncertainty ') != -1:
end_uncertaintytemp = M[0][indexM][19:].strip()
floatend_uncertaintytemp = float(end_uncertaintytemp)
# 从第二个开始 L 为start M 为end
for indexLt in range(1, len(L)):
start = ''
start_uncertainty = ''
floatstart_uncertainty = 0.0
reference = ''
for indexLttemp in range(len(L[indexLt])):
if L[indexLt][indexLttemp].find('utf:start ') != -1:
start = L[indexLt][indexLttemp][9:].strip()
if L[indexLt][indexLttemp].find('utf:start_uncertainty ') != -1:
start_uncertainty = L[indexLt][indexLttemp][21:].strip()
floatstart_uncertainty = float(start_uncertainty)
if L[indexLt][indexLttemp].find('utf:reference_system utf:') != -1:
reference = L[indexLt][indexLttemp][25:].strip()
# 比大小
if flotstart_uncertaintytemp > floatstart_uncertainty:
starttemp = start
start_uncertaintytemp = start_uncertainty
referencetemp = reference
for indexMt in range(1,len(M)):
end = ''
end_uncertainty = ''
floatend_uncertainty = 0.0
for indexMttemp in range(len(M[indexMt])):
if M[indexMt][indexMttemp].find('utf:end ') != -1:
end = M[indexMt][indexMttemp][7:].strip()
if M[indexMt][indexMttemp].find('utf:end_uncertainty ') != -1:
end_uncertainty = M[indexMt][indexMttemp][19:].strip()
floatend_uncertainty = float(end_uncertainty)
# 比大小
if floatend_uncertaintytemp > floatend_uncertainty:
endtemp = end
end_uncertaintytemp = end_uncertainty
# 第三类
elif M and U:
# 比较晒选出符合M的start start_uncertainty
# 建立temp比大小 默认为第一个
for indexM in range(len(M[0])):
if M[0][indexM].find('utf:start ') != -1:
starttemp = M[0][indexM][9:].strip()
if M[0][indexM].find('utf:start_uncertainty ') != -1:
start_uncertaintytemp = M[0][indexM][21:].strip()
flotstart_uncertaintytemp = float(start_uncertaintytemp)
if M[0][indexM].find('utf:reference_system utf:') != -1:
referencetemp = M[0][indexM][25:].strip()
for indexU in range(len(U[0])):
if U[0][indexU].find('utf:end ') != -1:
endtemp = U[0][indexU][7:].strip()
if U[0][indexU].find('utf:end_uncertainty ') != -1:
end_uncertaintytemp = U[0][indexU][19:].strip()
floatend_uncertaintytemp = float(end_uncertaintytemp)
# 从第二个开始 L 为start U 为end
for indexMt in range(1,len(M)):
start = ''
start_uncertainty = ''
floatstart_uncertainty = 0.0
reference = ''
for indexMttemp in range(len(M[indexMt])):
if M[indexMt][indexMttemp].find('utf:start ') != -1:
start = M[indexMt][indexMttemp][9:].strip()
if M[indexMt][indexMttemp].find('utf:start_uncertainty ') != -1:
start_uncertainty = M[indexMt][indexMttemp][21:].strip()
floatstart_uncertainty = float(start_uncertainty)
if M[indexMt][indexMttemp].find('utf:reference_system utf:') != -1:
reference = M[indexMt][indexMttemp][25:].strip()
# 比大小
if flotstart_uncertaintytemp > floatstart_uncertainty:
starttemp = start
start_uncertaintytemp = start_uncertainty
referencetemp = reference
for indexUt in range(1,len(U)):
end = ''
end_uncertainty = ''
floatend_uncertainty = 0.0
for indexUttemp in range(len(U[indexUt])):
if U[indexUt][indexUttemp].find('utf:end ') != -1:
end = U[indexUt][indexUttemp][7:].strip()
if U[indexUt][indexUttemp].find('utf:end_uncertainty ') != -1:
end_uncertainty = U[indexUt][indexUttemp][19:].strip()
floatend_uncertainty = float(end_uncertainty)
# 比大小
if floatend_uncertaintytemp > floatend_uncertainty:
endtemp = end
end_uncertaintytemp = end_uncertainty
print(starttemp)
print(start_uncertaintytemp)
print(endtemp)
print(end_uncertaintytemp)
print(referencetemp)
wirtelist.append(starttemp)
wirtelist.append(start_uncertaintytemp)
wirtelist.append(endtemp)
wirtelist.append(end_uncertaintytemp)
wirtelist.append(referencetemp)
writeresult.append(wirtelist)
else:
writeresult.append(['NULL'])
print('*'*100)
for index in range(len(writeresult)):
print(writeresult[index])
print(len(writeresult))
print('=============================')
# 写入表格
index = len(writeresult) # 获取需要写入数据的行数
workbook = xlrd.open_workbook(path) # 打开工作簿
sheets = workbook.sheet_names() # 获取工作簿中的所有表格
worksheet = workbook.sheet_by_name(sheets[0]) # 获取工作簿中所有表格中的的第一个表格
rows_old = worksheet.nrows # 获取表格中已存在的数据的行数
new_workbook = copy(workbook) # 将xlrd对象拷贝转化为xlwt对象
new_worksheet = new_workbook.get_sheet(0) # 获取转化后工作簿中的第一个表格
for i in range(0, index):
for j in range(0, len(writeresult[i])):
new_worksheet.write(i + 1, j + 19, writeresult[i][j]) # 追加写入数据,注意是从i+rows_old行开始写入
new_workbook.save(path) # 保存工作簿
print("xls格式表格【追加】写入数据成功!")
if __name__ == '__main__':
results = []
prefLabel = []
partOf = []
extract()
totalTagert = []
targetxlsx()
LMU = []
Period = []
Stage = []
TargetMatchIndex = []
indexresults = []
targetMatch()
writeresult = []
book_name_xls = 'test.xlsx'
writeToTarger(book_name_xls, writeresult)
Github:https://github.com/zhichen-roger/Accurately-interpreting-the-geological-time-in-the-literature-by-using-the-geological-time-map.git