python 输出文件分隔符_python:如何从自然语言文件中提取记录,只有分隔符是从记录开头的5个字符...

我需要从一个相当古老的系统生成的日志文件中提取单个记录,并让它们准备好进行数据库输入。这些平面文件都是我可以提取的(只是格式化查询花了几周时间)。这是一个有两个记录的文件的例子。我看到的唯一分隔符是“/ 11 S11-”,它本身在5个字符的常规位置,但不是在开头或结尾处。

对于那些观看,是的,这与my other newb question有关。我查看了python文档,一些google结果和一些related questions。所以,我的问题是

a)如何使用在记录中开始5个字符的分隔符?

b)如何抓住这些自然语言的大块?

c)如何摆脱换行符后的空格?这可能是最简单的部分:我可以在查询中指定每个字段有多长。现在,accessionDate长度为10个字符,accessionNumber长度为10个字符,patMedicalRecordNum长度为15个字符。所以finalDxText上的空格是35个字符。

01/01/11 S11-55555 20/444-55-6666 A. PROSTATE AND SEMINAL VESICLES, PROSTATECTOMY:

- ADENOCARCINOMA.

TOTAL GLEASON SCORE: GLEASON 5+4=9

TUMOR LOCATION: BILATERAL

TUMOR QUANTITATION: 15% OF PROSTATE INVOLVED BY TUMOR

EXTRAPROSTATIC EXTENSION: PRESENT AT RIGHT POSTERIOR

SEMINAL VESICLE INVASION: PRESENT

MARGINS: UNINVOLVED

LYMPHOVASCULAR INVASION: PRESENT

PERINEURAL INVASION: PRESENT

LYMPH NODES (SPECIMENS B AND C):

NUMBER EXAMINED: 25

NUMBER INVOLVED: 1

DIAMETER OF LARGEST METASTASIS: 1.7 mm

ADDITIONAL FINDINGS: HIGH-GRADE PROSTATIC INTRAEPITHELIAL NEOPLASIA,

ACUTE AND CHRONIC INFLAMMATION, INTRADUCTAL EXTENSION OF INVASIVE

CARCINOMA

PATHOLOGIC STAGE: pT3b N1 MX

B. LYMPH NODES, RIGHT PELVIC, EXCISION:

- ONE OF SEVENTEEN LYMPH NODES POSITIVE FOR METASTASIS (1/17).

C. LYMPH NODES, LEFT PELVIC, EXCISION:

- EIGHT LYMPH NODES NEGATIVE FOR METASTASIS (0/8).

01/02/11 S11-4444 20/111-22-3333 PROSTATE AND SEMINAL VESICLES, PROSTATECTOMY:

- ADENOCARCINOMA.

GLEASON SCORE: 3 + 3 = 6 WITH TERTIARY PATTERN OF 5.

TUMOR QUANTITATION: APPROXIMATELY 10% BY VOLUME.

TUMOR LOCATION: BILATERAL.

EXTRAPROSTATIC EXTENSION: NOT IDENTIFIED.

MARGINS: NEGATIVE.

PERINEURAL INVASION: IDENTIFIED.

LYMPH-VASCULAR INVASION: NOT IDENTIFIED.

SEMINAL VESICLE/VASA DEFERENTIA INVASION: NOT IDENTIFIED.

LYMPH NODES: NONE SUBMITTED.

OTHER: HIGH GRADE PROSTATIC INTRAEPITHELIAL NEOPLASIA.

PATHOLOGIC STAGE (pTNM): pT2c NX.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值