用Python花样输出段落

大家好,我是阿闽。今天更新Python标准库学习。
最近发现标准库学习没必要死扣源码,因为对我这个小白而言不容易看懂。以后决定以熟悉接口为主要目的,看源代码为辅助手段。

fill函数

函数原型

textwrap.fill(
	text,
 	width=70, *,
	initial_indent='', 
	subsequent_indent='',
	expand_tabs=True,
	replace_whitespace=True,
	fix_sentence_endings=False, 
	break_long_words=True,
	drop_whitespace=True,
	break_on_hyphens=True, 
	tabsize=8, 
	max_lines=None
	)

我们可以看到,它有1个必填参数text,11个可选参数。
下面我们先看官方文档:

Wraps the single paragraph in text, and returns a single string containing the wrapped paragraph.

拙译如下:

包装文本的一个段落,返回一个包含被包装段落的字符串。

我们至少知道它的输入输出都是一个字符串。
我们先来直截了当地先看一段代码

import textwrap

samp = '''If you're referring to this property from JavaScript as 
a member of the HTMLElement.style object, modern browsers support float, 
but in older browsers you have to spell it as cssFloat, with Internet 
Explorer versions 8 and older using styleFloat. This was an exception to the rule, 
that the name of the DOM member is the camel-case name of the dash-separated CSS name 
(because "float" is a reserved word in JavaScript, as seen in the need to escape "class" as "className"
 and escape <label>'s "for" as "htmlFor").'''

wrapped_samp = textwrap.fill(samp)
print(wrapped_samp)

输出如下:

If you're referring to this property from JavaScript as  a member of
the HTMLElement.style object, modern browsers support float,  but in
older browsers you have to spell it as cssFloat, with Internet
Explorer versions 8 and older using styleFloat. This was an exception
to the rule,  that the name of the DOM member is the camel-case name
of the dash-separated CSS name  (because "float" is a reserved word in
JavaScript, as seen in the need to escape "class" as "className"  and
escape <label>'s "for" as "htmlFor").

可以发现返回的字符串在一些本没有换行的位置换了行,在一些原本换行的位置又以一个空格代替。
这是默认的width参数在起作用。

width参数

类型:正整数。(请不要传入非正整数,谢谢)
它会把text每with个字符后面加上一个换行符。

空白合并

而原本换行符被以一个空格代替,如原字符串samp第一行最后一个"as"之后的换行,在输出中以一个空格代替。这是fill函数的效果之一。将原本的所有空白字符合并为一个空格。

下面我们来学习在” * “之后定义的关键字参数。

initial_indent参数

类型:字符串。

该参数将添加在返回值字符串的第一行,且受width的限制。

代码示例:

import textwrap
samp = '''If you're referring to this property from JavaScript as 
a member of the HTMLElement.style object, modern browsers support float, 
but in older browsers you have to spell it as cssFloat, with Internet 
Explorer versions 8 and older using styleFloat. This was an exception to the rule, 
that the name of the DOM member is the camel-case name of the dash-separated CSS name 
(because "float" is a reserved word in JavaScript, as seen in the need to escape "class" as "className"
 and escape <label>'s "for" as "htmlFor").'''
print(textwrap.fill(samp, initial_indent=" " * 4)

这里尝试给第一行增加四个空格

    If you're referring to this property from JavaScript as  a member
of the HTMLElement.style object, modern browsers support float,  but
in older browsers you have to spell it as cssFloat, with Internet
Explorer versions 8 and older using styleFloat. This was an exception
to the rule,  that the name of the DOM member is the camel-case name
of the dash-separated CSS name  (because "float" is a reserved word in
JavaScript, as seen in the need to escape "class" as "className"  and
escape <label>'s "for" as "htmlFor").

可见第一行增加了四个空格。

subsequent_inden属性

类型:字符串。

该参数在返回的字符串的所有非首行前添加,每行都受width限制。

代码示例:

import textwrap

samp = '''If you're referring to this property from JavaScript as 
a member of the HTMLElement.style object, modern browsers support float, 
but in older browsers you have to spell it as cssFloat, with Internet 
Explorer versions 8 and older using styleFloat. This was an exception to the rule, 
that the name of the DOM member is the camel-case name of the dash-separated CSS name 
(because "float" is a reserved word in JavaScript, as seen in the need to escape "class" as "className"
 and escape <label>'s "for" as "htmlFor").'''

wrapped_samp = textwrap.fill(samp, subsequent_indent='>>>')
print(wrapped_samp)

输出:

If you're referring to this property from JavaScript as  a member of
>>>the HTMLElement.style object, modern browsers support float,  but
>>>in older browsers you have to spell it as cssFloat, with Internet
>>>Explorer versions 8 and older using styleFloat. This was an
>>>exception to the rule,  that the name of the DOM member is the
>>>camel-case name of the dash-separated CSS name  (because "float" is
>>>a reserved word in JavaScript, as seen in the need to escape
>>>"class" as "className"  and escape <label>'s "for" as "htmlFor").

expand_tabs参数

类型: bool
默认值: True

官方文档我没有找到对此属性的描述,幸而我通过查看源码找到了该属性的含义。

在标准库中存在一个TextWrapper类,往fill函数中传入的参数除了width,其他的都用于构造该类。

而expand_tabs作为该类的一个属性,用于设置处理文档的一个方式。

TextWrapper对该属性的doc:

Expand tabs in input text to spaces before further processing. Each tab will become 0 … ‘tabsize’ spaces, depending on its position in its line. If false, each tab is treated as a single character

拙译:
为进一步处理,将输入的文本的制表符转换为空格。每个制表符将依它们在它们所在的行中的位置而变成0~tabsize个空格。如果值为False,制表符将被当做一个字符。

tabsize就是fill函数中的一个参数。

tabsize参数

类型:整数
默认值: 8
用于指定扩展原字符串中制表符成为空格的数量。

replace_whitespace参数

类型: bool
默认值: True

同样,官网也没找到关于该参数的描述,因此我又查看了源码。

TextWrapper类对该属性的doc:

Replace all whitespace characters in the input text by spaces after tab expansion. Note that if expand_tabs is false and replace_whitespace is true, every tab will be converted to a single space! 拙译: 在制表符扩展处理后,将输入文本的所有空白字符在替换为空格。注意如果expand_tabs是False而replace_whitespace是True,那么制表符将被转换为一个空格字符!

意思就是说,如果原文本中有制表符,而如果将expand_tabs设置为False并把replace_whitespace设置为True,那么文本中的制表符将被转换为一个空格,这通常是不好的。

import textwrap

samp = "\t哈   哈"
print(samp)
print(textwrap.fill(samp, expand_tabs=True, replace_whitespace=True))
print(textwrap.fill(samp, expand_tabs=False, replace_whitespace=True))
print(textwrap.fill(samp, expand_tabs=True, replace_whitespace=False))
print(textwrap.fill(samp, expand_tabs=False, replace_whitespace=False))

输出:

	哈   哈
        哈   哈
 哈   哈
        哈   哈
	哈   哈

可以看到,第三行输出中制表符被转换为了一个空格。

fix_sentence_endings参数

类型: bool
默认值: False

源码doc:

Ensure that sentence-ending punctuation is always followed by two spaces. Off by default because the algorithm is (unavoidably) imperfect.

拙译如下:
确保句末标点后都有两个空格,因为该算法不可避免地不完美,所以该选项默认关闭。

我认为作为世界性的大项目,这种算法真的难以做到对世界上大多数语言都通用,因此可以理解。下面我们以英语文本测试该开关。

import textwrap

samp = "Ensure that sentence-ending punctuation is always followed by two spaces. Off by default because the algorithm is (unavoidably) imperfect."
print(samp)
print(textwrap.fill(samp))
print(textwrap.fill(samp, fix_sentence_endings=True))0

输出:

Ensure that sentence-ending punctuation is always followed by two spaces. Off by default because the algorithm is (unavoidably) imperfect.
Ensure that sentence-ending punctuation is always followed by two
spaces. Off by default because the algorithm is (unavoidably)
imperfect.
Ensure that sentence-ending punctuation is always followed by two
spaces.  Off by default because the algorithm is (unavoidably)
imperfect.

从最后一行的输出中我们可以看到,第一句的句点后都完美确保了两个空格,即使原文本有一个空格。

break_long_words参数

类型: bool
默认值: True

源码doc:

Break words longer than ‘width’. If false, those words will not be broken, and some lines might be longer than ‘width’.

拙译如下:
将那些长度超出width的单词换行。若设为False,则这些单词不会被换行,且一些行的长度可能超过width。
这里我们简单验证一下:

import textwrap

import textwrap

samp = "Breakwordslongerthan'width'.Iffalse,thosewordswillnotbebroken,andsomelinesmightbelonger than 'width'"
print(samp)
print(textwrap.fill(samp, width=71))
print(textwrap.fill(samp, width=71, break_long_words=False))


输出:

Breakwordslongerthan'width'.Iffalse,thosewordswillnotbebroken,andsomelinesmightbelonger than 'width'
Breakwordslongerthan'width'.Iffalse,thosewordswillnotbebroken,andsomeli
nesmightbelonger than 'width'
Breakwordslongerthan'width'.Iffalse,thosewordswillnotbebroken,andsomelinesmightbelonger
than 'width'

可以看到第二三两行输出由于一个单词过长,在单词中间换行。而第四五两行以钢铁的意志绝不在单词中间换行。

drop_whitespace参数

类型: bool
默认值: True

源码doc:
Drop leading and trailing whitespace from lines.
拙译:将每行开头和结尾的空白字符去掉。

实际应用中这个参数的作用远不如replace_spaces常用、

break_on_hyphens参数

类型: bool
默认值: True

源码doc:
Allow breaking hyphenated words. If true, wrapping will occur
preferably on whitespaces and right after hyphens part of
compound words.

拙译如下:
允许在有连字符的字母之间换行,如果为True,则打包过程将可倾向于在复合单词空白字符和连字符的后面出现换行。

我们简单看一下效果:

D:\code\jet\Pystd\Scripts\python.exe D:/code/jet/PhpStormProjects/Pystd/TextWrap/fortab.py
first:

Allow breaking hyphenated words. If true, wrapping will occur preferably on white-spaces and right after hyphens part of compound words.

second:
 Allow breaking hyphenated words. If true, wrapping will occur preferably on white-
spaces and right after hyphens part of compound words.
third:
 Allow breaking hyphenated words. If true, wrapping will occur preferably on
white-spaces and right after hyphens part of compound words.
--end--

Process finished with exit code 0

输出:

first:

Allow breaking hyphenated words. If true, wrapping will occur preferably on white-spaces and right after hyphens part of compound words.

second:
 Allow breaking hyphenated words. If true, wrapping will occur preferably on white-
spaces and right after hyphens part of compound words.
third:
 Allow breaking hyphenated words. If true, wrapping will occur preferably on
white-spaces and right after hyphens part of compound words.
--end--

maxlines参数

类型: 正整数。(同样,请遵守接口约定)
默认值: None(即不设置最多行数)

源码doc:
Truncate wrapped lines.

这个参数容易理解。若设置了一个有效的值,当输出的文本行数超过maxlines时,会导致文本剩下的部分被折叠成字符串“ […]”输出。

下面我们简单运用一下fill函数

import textwrap

text = '''
Each key in kw is an option and each value should be a list or a tuple (usually) containing statespecs grouped in tuples, lists, or some other preference. A statespec is a compound of one or more states and then a value.
'''

processed_text = textwrap.fill(text, initial_indent=' ' * 4, subsequent_indent="-> " )
print(processed_text)

输出:

Each key in kw is an option and each value should be a list or a
-> tuple (usually) containing statespecs grouped in tuples, lists, or
-> some other preference. A statespec is a compound of one or more
-> states and then a value.

wrap函数

函数原型

textwrap.wrap(
text,
width=70, *,
initial_indent=‘’,
subsequent_indent=‘’,
expand_tabs=True,
replace_whitespace=True,
fix_sentence_endings=False,
break_long_words=True,
drop_whitespace=True,
break_on_hyphens=True,
tabsize=8,
max_lines=None
)
它的参数表和fill函数一模一样。来看看官方文档对它的描述:
Wraps the single paragraph in text (a string) so every line is at most width characters long. Returns a list of output lines, without final newlines.
拙译如下:
包装一段字符串文本,使得每行长度都是most width。返回这些行组成的列表,且不在末尾加上换行符。

看起来就是把fill函数返回值的每一行拿出来,去掉末尾换行符,组成一个列表来返回。

import textwrap

text = '''
Each key in kw is an option and each value should be a list or a tuple (usually) containing statespecs grouped in tuples, lists, or some other preference. A statespec is a compound of one or more states and then a value.
'''

processed_text = textwrap.fill(text, initial_indent=' ' * 4, subsequent_indent="-> " )
print(processed_text)

wrapped = textwrap.wrap(text, initial_indent=' ' * 4, subsequent_indent='-> ')
for w in wrapped:
    print(w)

输出:

     Each key in kw is an option and each value should be a list or a
-> tuple (usually) containing statespecs grouped in tuples, lists, or
-> some other preference. A statespec is a compound of one or more
-> states and then a value.
     Each key in kw is an option and each value should be a list or a
-> tuple (usually) containing statespecs grouped in tuples, lists, or
-> some other preference. A statespec is a compound of one or more
-> states and then a value.

正如所料。

shorten函数

函数原型

textwrap.shorten(
text, width, *,
fix_sentence_endings=False,
break_long_words=True,
break_on_hyphens=True,
placeholder=’ […]’
)

参数表中有四个我们已经见过,还有一个placeholder似乎眼熟,待会研究。
官网对函数的描述是:
Collapse and truncate the given text to fit in the given width.

摧毁并截断给予的文本以适应所给的宽度。

先上代码看效果:

import textwrap

text = "Collapse and truncate the given text to fit in the given width."

print(textwrap.shorten(text, 30))

输出:

Collapse and truncate [...]

我们注意到,这一段字符串被截断,在末尾处加上了placeholder。
而被截断之前的剩余长度是21,加上placeholder的宽度为27。实际上,这里的width就是根据加上placeholder之后的值。
现在我们看该函数的doc:

Collapse and truncate the given text to fit in the given width.
The text first has its whitespace collapsed. If it then fits in the width, it is returned as is. Otherwise, as many words as possible are joined and then the placeholder is appended:
>>> textwrap.shorten(“Hello world!”, width=12)
‘Hello world!’
>>> textwrap.shorten(“Hello world!”, width=11)
‘Hello […]’

文本首先会被折叠它的空白字符,如果它的长度符合了width,则直接返回该被折叠文本。否则,将尽可能多的获取单词,并在最后加上placeholder.

该函数源码也很简单,我们可以直接解读:

def shorten(text, width, **kwargs):
    """Collapse and truncate the given text to fit in the given width.

    The text first has its whitespace collapsed.  If it then fits in
    the *width*, it is returned as is.  Otherwise, as many words
    as possible are joined and then the placeholder is appended::

        >>> textwrap.shorten("Hello  world!", width=12)
        'Hello world!'
        >>> textwrap.shorten("Hello  world!", width=11)
        'Hello [...]'
    """
    w = TextWrapper(width=width, max_lines=1, **kwargs)
    return w.fill(' '.join(text.strip().split()))

先设定最大行数为1的TextWrapper,然后通过字符串的strip方法去除两端的空白字符,再通过split方法将中间的空白字符去除,最后通过join方法实现对中间所有空白字符的折叠,传入fill函数处理。非常简单。

dedent函数

函数原型

textwrap.dedent(text)

这也是一个简单的函数,详见dedent官方文档
官方文档解释的很详细,就是说将text每一行的前导空白去除。我们看看效果:

import textwrap

text = '''
    Remove any common leading whitespace from every line in text.
    '''
print(text)
print('---')
print(textwrap.dedent(text))


输出:


    Remove any common leading whitespace from every line in text.
    
---

Remove any common leading whitespace from every line in text.

效果立竿见影。
该函数源代码涉及re模块.(其实就是用来匹配空白字符啦),这是下周更新的内容,敬请期待

def dedent(text):
    """Remove any common leading whitespace from every line in `text`.

    This can be used to make triple-quoted strings line up with the left
    edge of the display, while still presenting them in the source code
    in indented form.

    Note that tabs and spaces are both treated as whitespace, but they
    are not equal: the lines "  hello" and "\\thello" are
    considered to have no common leading whitespace.

    Entirely blank lines are normalized to a newline character.
    """
    # Look for the longest leading string of spaces and tabs common to
    # all lines.
    margin = None
    text = _whitespace_only_re.sub('', text)
    indents = _leading_whitespace_re.findall(text)
    for indent in indents:
        if margin is None:
            margin = indent

        # Current line more deeply indented than previous winner:
        # no change (previous winner is still on top).
        elif indent.startswith(margin):
            pass

        # Current line consistent with and no deeper than previous winner:
        # it's the new winner.
        elif margin.startswith(indent):
            margin = indent

        # Find the largest common whitespace between current line and previous
        # winner.
        else:
            for i, (x, y) in enumerate(zip(margin, indent)):
                if x != y:
                    margin = margin[:i]
                    break

    # sanity check (testing/debugging only)
    if 0 and margin:
        for line in text.split("\n"):
            assert not line or line.startswith(margin), \
                   "line = %r, margin = %r" % (line, margin)

    if margin:
        text = re.sub(r'(?m)^' + margin, '', text)
    return text

indent函数

函数原型

textwrap.indent(text, prefix, predicate=None)
官方文档对此函数也有详细解析:indent函数
Add prefix to the beginning of selected lines in text.
在文本中所选的行之前加上前缀字符串。
函数有两个必须参数,一个可选参数predicate。

函数参数

text参数

类型: 字符串
所要加上字符串的文本。

prefix参数

类型: 字符串
给选择的行前所加的前缀。

predicate参数

类型: def foo(p: str) -> bool : …
若传入该参数,则将text的每一行传入该函数,给每个返回值为True的字符串前添加prefix。

函数用例

代码:

import textwrap

text = '''
Add
prefix
to 
the 
beginning
of
selected
lines 
in 
ext.
    '''
print(text)
print('---')
print(textwrap.indent(text, 'OK ', lambda a: a.startswith('t')))

输出:

Add
prefix
to 
the 
beginning
of
selected
lines 
in 
ext.
    
---

Add
prefix
OK to 
OK the 
beginning
of
selected
lines 
in 
ext.
    

函数源码

indent函数源代码十分简单,我们不妨学习学习吧

def indent(text, prefix, predicate=None):
    """Adds 'prefix' to the beginning of selected lines in 'text'.

    If 'predicate' is provided, 'prefix' will only be added to the lines
    where 'predicate(line)' is True. If 'predicate' is not provided,
    it will default to adding 'prefix' to all non-empty lines that do not
    consist solely of whitespace characters.
    """
    if predicate is None:
        def predicate(line):
            return line.strip()

    def prefixed_lines():
        for line in text.splitlines(True):
            yield (prefix + line if predicate(line) else line)
    return ''.join(prefixed_lines())

先判断predicate是否提供,若未提供,则将其变量名定义为一个函数,其作用是调用字符串的strip函数,去除字符串的首尾空白。保证不为一个空行添加前缀。然后定义一个生成器函数,减少了内存占用,对每个符合predicate函数返回True的text的行加上前缀返回,否则不加上前缀返回。然后将它们重新组织成一个字符串。不愧是标准库,代码简介精妙,小巧可爱。

好啦本周的Python标准库学习就到这里,咱们下周见~

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值