python项目之即时标记

最新推荐文章于 2021-08-09 19:51:14 发布

笑着的程序员

最新推荐文章于 2021-08-09 19:51:14 发布

阅读量232

点赞数

分类专栏： python学习专栏文章标签： python

本文链接：https://blog.csdn.net/weixin_45541762/article/details/113532051

版权

python学习专栏专栏收录该内容

23 篇文章 0 订阅

订阅专栏

python项目之即时标记

本人学习编程已有一段不短的时间了，现在想要快速提升自己的编程能力，将自己积累的各种知识转化为有用的成果输出，以为即将到来的2021年谋一份令人满意的工作而努力奋斗！
最有效的办法当然是做一些小的项目，这样能够快速整合并熟练掌握之前学习的知识，使自己得到快速的成长，同时也为下一份工作积累一定的资源。所以现在就要开始准备了。今天就从python的一个小项目开始，后续还会有更多的python及C++的项目输出，此时此刻，在这里希望自己能够快速成长，尽其所能，得其所好，和喜欢的人在一起，这就是幸福！
先把代码奉上，随后再为自己做更多的注解：

import sys, re


def lines(file):
    for line in file: yield line
    yield '\n'


def blocks(file):
    block = []
    for line in lines(file):
        if line.strip():
            block.append(line)
        elif block:
            yield ''.join(block).strip()
            block = []


class Handler:
    def callback(self, prefix, name, *args):
        method = getattr(self, prefix + name, None)
        if callable(method): return method(*args)

    def start(self, name):
        self.callback('start_', name)

    def end(self, name):
        self.callback('end_', name)

    def sub(self, name):
        def substitution(match):
            result = self.callback('sub_', name, match)
            if result is None: match.group(0)
            return result

        return substitution


class HTMLRenderer(Handler):
    def start_document(self):
        print('<html><head><title>...</title></head><body>')

    def end_document(self):
        print('</body></html>')

    def start_paragraph(self):
        print('<p>')

    def end_paragraph(self):
        print('</p>')

    def start_heading(self):
        print('<h2>')

    def end_heading(self):
        print('</h2>')

    def start_list(self):
        print('<ul>')

    def end_list(self):
        print('</ul>')

    def start_listitem(self):
        print('<li>')

    def end_listitem(self):
        print('</li>')

    def start_title(self):
        print('<h1>')

    def end_title(self):
        print('</h1>')

    def sub_emphasis(self, match):
        return '<em>%s</em>' % match.group(1)

    def sub_url(self, match):
        return '<a href="%s">%s</a>' % (match.group(1), match.group(1))

    def sub_mail(self, match):
        return '<a href="mailto:%s">%s</a>' % (match.group(1), match.group(1))

    def feed(self, data):
        print(data)


class Rule:
    def action(self, block, handler):
        handler.start(self.type)
        handler.feed(block)
        handler.end(self.type)
        return True


class HeadingRule(Rule):
    type = 'heading'

    def condition(self, block):
        return not '\n' in block and len(block) <= 70 and not block[-1] == ':'


class TitleRule(HeadingRule):
    type = 'title'
    first = True

    def condition(self, block):
        if not self.first: return False
        self.first = False
        return HeadingRule.condition(self, block)


class ListItemRule(Rule):
    type = 'listitem'

    def condition(self, block):
        return block[0] == '-'

    def action(self, block, handler):
        handler.start(self.type)
        handler.feed(block[1:].strip())
        handler.end(self.type)
        return True


class ListRule(ListItemRule):
    type = 'list'
    inside = False

    def condition(self, block):
        return True

    def action(self, block, handler):
        if not self.inside and ListItemRule.condition(self, block):
            handler.start(self.type)
            self.inside = True
        elif self.inside and not ListItemRule.condition(self, block):
            handler.end(self.type)
            self.inside = False
        return False


class ParagraphRule(Rule):
    type = 'paragraph'

    def condition(self, block):
        return True


class Parser:
    """
    A parser reads a text file, applying rules and controlling a  handler.
    """

    def __init__(self, handler):
        self.handler = handler
        self.rules = []
        self.filters = []

    def addRule(self, rule):
        self.rules.append(rule)

    def addFilter(self, pattern, name):
        def filter(block, handler):
            return re.sub(pattern, handler.sub(name), block)

        self.filters.append(filter)

    def parse(self, file):
        self.handler.start('document')
        for block in blocks(file):
            for filter in self.filters:
                block = filter(block, self.handler)
            for rule in self.rules:
                if rule.condition(block):
                    last = rule.action(block, self.handler)
                    if last: break
        self.handler.end('document')


class BasicTextParser(Parser):
    def __init__(self, handler):
        Parser.__init__(self, handler)
        self.addRule(ListRule())
        self.addRule(ListItemRule())
        self.addRule(TitleRule())
        self.addRule(HeadingRule())
        self.addRule(ParagraphRule())

        self.addFilter(r'\*(.+?)\*', 'emphasis')
        self.addFilter(r'(http://[\.a-zA-Z/]+)', 'url')
        self.addFilter(r'([\.a-zA-Z]+@[\.a-zA-Z]+[a-zA-Z]+)', 'mail')


handler = HTMLRenderer()
parser = BasicTextParser(handler)


file = 'test.txt'
f = open(file)	#open()函数返回一个 file 对象
# print(f)
parser.parse(f)
f.close()

# handler = HTMLRenderer()
# parser = BasicTextParser(handler)
#
# parser.parse(sys.stdin)

使用的文本是下面这个：

World Wide Spam was started in the summer of 2000. The business 
concept was to ride the dot-com wave and make money both through
bulk email and by selling canned meat online.


After receiving several complaints from customers who weren't 
satisfied by their bulk email.


From this page you my visit several of our interesting web pages:


    -what is SPAM? (http://wwspam.fu/whatisspam)
    
    -How do they make it? (http://wwspam.fu/howtomakeit)

笑着的程序员

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
打赏
2
评论
python项目之即时标记

python项目之即时标记本人学习编程已有一段不短的时间了，现在想要快速提升自己的编程能力，将自己积累的各种知识转化为有用的成果输出，以为即将到来的2021年谋一份令人满意的工作而努力奋斗！最有效的办法当然是做一些小的项目，这样能够快速整合并熟练掌握之前学习的知识，使自己得到快速的成长，同时也为下一份工作积累一定的资源。所以现在就要开始准备了。今天就从python的一个小项目开始，后续还会有更多的python及C++的项目输出，此时此刻，在这里希望自己能够快速成长，尽其所能，得其所好，和喜欢的人在一起，
复制链接

扫一扫