即时标记_再次解决

最新推荐文章于 2017-10-23 13:58:37 发布

_芝士就是力量_

最新推荐文章于 2017-10-23 13:58:37 发布

阅读量814

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/muyuxiaozi_2013/article/details/20636385

版权

python 专栏收录该内容

24 篇文章 0 订阅

订阅专栏

前半部分，在即时标记简单解决里面：http://blog.csdn.net/muyuxiaozi_2013/article/details/20635991

接下来，根据书里面的内容，改进代码，使程序更具有扩展性，更加模块化。在程序变得越来越复杂的时候，需要进行一些抽象来让程序变得可控，首先，列出一些可能的模块。

语法分析器：一个读取文本，管理其它类的对象。

规则：可以为每个种类的段落制订一条规则，规则能检测适用的段落类型并且进行适当的格式化。

过滤器：使用过滤来包装一些处理内嵌元素的正则表达式。

处理程序：语法分析器使用处理程序来产生输出。每个处理程序能够产生不同的标记。

一、处理程序（handlers.py）：

主要包含两个类

Handler：主要是规范每个标签的名称。比如说一个块被判定为paragraph，就有start_paragraph的一个处理方式和end_paragraph。

HTMLRenderer：对于每个标签的不同处理方法。比如说start_paragraph就是输出<p>，end_paragraph输出</p>。

代码如下：

# -*- coding: cp936 -*-
class Handler:
 
    def callback(self,refix,name,*args):
        method = getattr(self,refix+name,None)
        if callable(method): return method(*args)
    def start(self,name):
        self.callback('start_',name)
    def end(self,name):
        self.callback('end_',name)
    def sub(self,name):
        def substitution(match):
            result = self.callback('sub_',name,match)
            if result is None: match.group()
            return result
        return substitution
 
class HTMLRenderer(Handler):
 
    def start_document(self):
        print('<html><head><title>...</title></head><body>')
    def end_document(self):
        print('</body></html>')
    def start_paragraph(self):
        print('<p>')
    def end_paragraph(self):
        print('</p>')
    def start_heading(self):
        print('<h2>')
    def end_heading(self):
        print('</h2>')
    def start_list(self):
        print('<ul>')
    def end_list(self):
        print('</ul>')
    def start_listitem(self):
        print('<li>')
    def end_listitem(self):
        print('</li>')
    def start_title(self):
        print('<h1>')
    def end_title(self):
        print('</h1>')
    def sub_emphasis(self,match):
        return '<em>%s</em>' % match.group(1)
    def sub_url(self,match):
        return '<a href = "%s">%s</a>' % (match.group(1),match.group(1))
    def feed(self,data):
        print data

二、规则（rules.py）

主要实现两个目的：其一，能够判定传进来的块是什么样的块（标题，段落或者列表等），其二，按照识别的内容进行处理。比如说，比如说传进来的块是以连字符开始的，则说明这是列表，接着按照列表的规则来执行，列表的规则是去除连字符。

代码如下（书里面的代码对于列表的规则的执行结果一直排版排不好看，所以，我就放弃了列表的排列方式）：

class Rule:
 
    def action(self,block,handler):
        handler.start(self.type)
        handler.feed(block)
        handler.end(self.type)
        return True
 
class HeadingRule(Rule):
 
    type = 'heading'
    def condition(self,block):
        return not '\n' in block and len(block) <=70 and not block[-1] == ':'
 
class TitleRule(HeadingRule):
 
    type = 'title'
    first = True
 
    def condition(self,block):
        if not self.first: return False
        self.first = False
        return HeadingRule.condition(self,block)
 
class ListItemRule(Rule):
 
    type = 'listitem'
    def condition(self,block):
        return block[0] == '-'
    def action(self,block,handler):
        handler.start(self.type)
        handler.feed(block[1:].strip())
        handler.end(self.type)
        return True
 
class ListRule(ListItemRule):
 
    type = 'list'
    inside = False
    def contidion(self,blck):
        return True
    def action(self,block,handler):
        if not self.inside and ListItemRule.condition(self,block):
            handler.start(self.type)
            self.inside = True
        handler.end(self.type)
        return False
 
class ParagraphRule(Rule):
 
    type = 'paragraph'
    def condition(self,block):
        return True

三、主程序（makeup.py）

这个部分包含两个最重要的内容，一个是语法分析器，另一个是过滤器。语法分析器主要是读取文本文件、应用规则并且控制处理程序；在这个例子中，过滤器主要是过滤url，mail，emphasis（比如说列表）的部分。代码如下：

class Rule:
 
    def action(self,block,handler):
        handler.start(self.type)
        handler.feed(block)
        handler.end(self.type)
        return True
 
class HeadingRule(Rule):
 
    type = 'heading'
    def condition(self,block):
        return not '\n' in block and len(block) <=70 and not block[-1] == ':'
 
class TitleRule(HeadingRule):
 
    type = 'title'
    first = True
 
    def condition(self,block):
        if not self.first: return False
        self.first = False
        return HeadingRule.condition(self,block)
 
class ListItemRule(Rule):
 
    type = 'listitem'
    def condition(self,block):
        return block[0] == '-'
    def action(self,block,handler):
        handler.start(self.type)
        handler.feed(block[1:].strip())
        handler.end(self.type)
        return True
 
class ListRule(ListItemRule):
 
    type = 'list'
    inside = False
    def contidion(self,blck):
        return True
    def action(self,block,handler):
        if not self.inside and ListItemRule.condition(self,block):
            handler.start(self.type)
            self.inside = True
        handler.end(self.type)
        return False
 
class ParagraphRule(Rule):
 
    type = 'paragraph'
    def condition(self,block):
        return True

四、之前写的util.py

主要是读取文本，并进行分块处理的。代码如下：

def lines(file):
    for line in file:
        yield line
    yield '\n'
 
def blocks(file):
    block = []
    for line in lines(file):
        if line.strip():
            block.append(line)
        elif block:
            yield ''.join(block).strip()
            block = []

_芝士就是力量_

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
即时标记_再次解决

前半部分，在即时标记简单解决里面：http://blog.csdn.net/muyuxiaozi_2013/article/details/20635991接下来，根据书里面的内容，改进代码，使程序更具有扩展性，更加模块化。在程序变得越来越复杂的时候，需要进行一些抽象来让程序变得可控，首先，列出一些可能的模块。语法分析器：一个读取文本，管理其它类的对象。规则：可以为每个种类的段落制订一
复制链接

扫一扫

专栏目录