python实现即时标记

这篇博客介绍了如何使用Python将文本文件切分成段落。通过遇到空行来收集行,形成一个文本块。内容涉及一个名为util.py的文本块生成器,以及handler.py中可能包含的语法分析器、规则、过滤器和处理程序。示例展示了World Wide Spam, Inc.的公司历史、产品和联系方式。" 87604594,8346758,Android开发:网络状态与类型获取工具类,"['Android开发', '网络编程']
摘要由CSDN通过智能技术生成

一个纯文本test.txt

Welcome to World Wide Spam, Inc.


These are the corporate web pages of *World Wide Spam*, Inc. We hope
you find your stay enjoyable, and that you will sample many of our
products.

A short history of the company

World Wide Spam was started in the summer of 2000. The business
concept was to ride the dot-com wave and to make money both through
bulk email and by selling canned meat online.

After receiving several complaints from customers who weren't
satisfied by their bulk email, World Wide Spam altered their profile,
and focused 100% on canned goods. Today, they rank as the world's
13,892nd online supplier of SPAM.

Destinations

From this page you may visit several of our interesting web pages:

  - What is SPAM? (http://wwspam.fu/whatisspam)

  - How do they make it? (http://wwspam.fu/howtomakeit)

  - Why should I eat it? (http://wwspam.fu/whyeatit)

How to get in touch with us

You can get in touch with us in *many* ways: By phone (555-1234), by
email (wwspam@wwspam.fu) or by visiting our customer feedback page
(http://wwspam.fu/feedback).
初次实现

首先需要把文本切分成段落

找出块的一个简单方法就是收集遇到的所有行,直到遇到一个空行,然后返回已经收集的行。那些返回的行就是一个块

文本块生成器util.py

def lines(file):
    for line in file: yield line
    yield '\n'

def blocks(file):
    block = []
    for line in lines(file):
        if line.strip():
            block.append(line)
        elif block:
            yield ''.join(block).strip()
            block = []

添加一些html标识simple.py

import sys, re
from util import *

print '<html><head><title>...</title><body>'

title = True
for block in blocks(sys.stdin):
    block = re.sub(r'\*(.+?)\*', r'<em>\1</em>', block)
    if title:
        print '<h1>'
        print block
        print '</h1>'
        title = False
    else:
        print '<p>'
        print block
        print '</p>'

print '</body></html>'

在cmd里执行的时候输入simple.py <test.txt> test.html


<html><head><title>...</title><body>
<h1>
Welcome to World Wide Spam, Inc.
</h1>
<p>
These are the corporate web pages of <em>World Wide Spam</em>, Inc. We hope
you find your stay enjoyable, and that you will sample many of our
products.
</p>
<p>
A short history of the company
</p>
<p>
World Wide Spam was started in the summer of 2000. The business
concept was to ride the dot-com wave and to make money both through
bulk email and by selling canned meat online.
<
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值