用于修复文本中智能引号的Python脚本

大约六年前,我们在Scribus邮件列表上遇到了一个问题,该问题是想知道是否存在将打字机引号转换为印刷引号的自动方法的人。 如果您不知道这是什么意思,则有时将印刷引号(例如 )称为卷曲引号,而不是打字机上的不正确版本(即“” )。键盘上的引号也可以用作英尺或英寸或分钟和秒的缩写,但是大多数时候您确实希望在文本中使用弯引号(编者注:Opensource.com的风格是尽可能使用直引号。)

尽管大多数文字处理器会自动将打字机引号替换为印刷引号,但Scribus却没有内置功能。 对我来说,这似乎是一个有趣的脚本挑战,所以我接受了它。 这导致了一个脚本,一个引号转换维基页面 ,最后,该脚本(称为Autoquote)被认为足够有用,可以包含在Scribus软件包中。 (在本文底部找到Autoquote的当前版本。)此后,我们为Scribus附带的Autoquote2版本添加了增强功能。 此选项提供了一个法语对话选项,并努力在引号和文本之间放置空格(就像法语一样)。

这个想法很简单,通常来说,如果其中一个标记在空格后或在段落的开头,并且后面跟随某种字符,则应使用左引号,并且大多数情况下应使用右引号-引号后跟一个空格或在段落的末尾。 但是,您然后有了诸如收缩之类的词,例如“ are n't”和“嵌套嵌套引号”呢? (例如,LibreOffice的嵌套引号是错误的,但是Autoquote正确地执行了它们。)最后,印刷引号并非对所有语言都一样。 法语和俄语使用guillemet( «» ),而其他语言则将引号置于不同的位置-在某些情况下,它们是不同的字形(例如˛'“” )。 实际上有很多变化令人惊奇。

您在脚本的开头看到的是,您需要告诉脚本您使用的是哪种语言:



   
   
    lang = scribus. valueDialog ( "Choose by language or country" , 'Language: af, be, ch, cs, de, en, es, et, fi, fr, \n hu, is, lt, mk, nl, pl, ru, se, sk, sl, sq and uk \n are current choices' , 'en' )

之后是一长串作业,例如:



   
   
if ( lang == 'en' ) :
   lead_double = u " \u 201c"
   follow_double = u " \u 201d"
   lead_single = u " \u 2018"
   follow_single = u " \u 2019"
elif ( lang == 'de' ) :
   lead_double = u " \u 201e"
   follow_double = u " \u 201c"
   lead_single = u " \u 2019"
   follow_single = u " \u 201a"
elif ( lang == 'fr' ) :      
   lead_double = u " \u 00ab"
   follow_double = u " \u 00bb"
   lead_single = u " \u 2018"
   follow_single = u " \u 2019"

这将为替换前导或后引号分配正确的unicode字符。 此后,脚本便可以开始工作了,它通过解析所选文本框架的整个文本来完成工作,而解析意味着逐个字符地分析文本。 更为复杂的是,在查看Scribus文档时看不到许多字符。 您实际上看不到回车符,但它在那里。 此外,还有一些控制字符可用于更改字符或段落样式,并且您不想在处理过程中弄乱它们。 就任何引号分配方案而言,您都必须忽略它们,但是您需要保持它们不变。 如果使用Python命令len检查这些字符,您会发现它们的长度为0,因此这就是我们找到它们的方式。

如果不清楚,脚本将执行的操作是逐个字符地撕裂文本框架的内容,然后根据需要更改打字机的单引号和双引号,以重建框架的内容。 在Wiki页面上,您可以看到这种逻辑如何发挥作用。

我没有弄清楚脚本中的一个缺陷,就是逻辑上的缺陷-单词开头出现缩略的情况(例如' twas)。 即使在LibreOffice中,它也作为左单引号出现,但应像 twas”一样用右手。

这是有关“自动报价”脚本如何产生的故事。 后来我意识到,此文本框架解析过程也可以用于其他情况,我将在下一篇Python和Scribus文章中进行讨论。

Autoquote.py脚本



   
   
#!/usr/bin/env python
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Autoquote.py - changes typewriter quotes to typographic quotes
# © 2010.08.28 Gregory Pittman
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
"""
USAGE

You must have a document open, and a text frame selected.
There will be a valueDialog asking for your language for the quotes,
the default is 'en', but change the default to suit your needs.
Detected errors shut down the script with an appropriate message.

"""

import scribus

if scribus. haveDoc ( ) > 0 :
    c = 0
    lang = scribus. valueDialog ( "Choose by language or country" , 'Language: af, be, ch, cs, de, en, es, et, fi, fr, \n hu, is, lt, mk, nl, pl, ru, se, sk, sl, sq and uk \n are current choices' , 'en' )
    if ( lang == 'en' ) :
        lead_double = u " \u 201c"
        follow_double = u " \u 201d"
        lead_single = u " \u 2018"
        follow_single = u " \u 2019"
    elif ( lang == 'de' ) :
        lead_double = u " \u 201e"
        follow_double = u " \u 201c"
        lead_single = u " \u 2019"
        follow_single = u " \u 201a"
    elif ( lang == 'fr' ) :      
        lead_double = u " \u 00ab"
        follow_double = u " \u 00bb"
        lead_single = u " \u 2018"
        follow_single = u " \u 2019"   # am hoping this will cover contractions like je t'aime
    elif ( lang == 'pl' ) :
        lead_double = u " \u 201e"
        follow_double = u " \u 201d"
        lead_single = u " \u 201a"
        follow_single = u " \u 2019"
    elif ( ( lang == 'se' ) or ( lang == 'fi' ) ) :
        lead_double = u " \u 201d"
        follow_double = u " \u 201d"
        lead_single = u " \u 2019"
        follow_single = u " \u 2019"
    elif ( lang == 'af' ) :
        lead_double = u " \u 201c"
        follow_double = u " \u 201d"
        lead_single = u " \u 2018"
        follow_single = u " \u 2019"
    elif ( lang == 'sq' ) :
        lead_double = u " \u 201e"
        follow_double = u " \u 201c"
        lead_single = u " \u 2018"
        follow_single = u " \u 2019"
    elif ( ( lang == 'be' ) or ( lang == 'ch' ) or ( lang == 'uk' ) or ( lang == 'ru' ) ) :
        lead_double = u " \u 00ab"
        follow_double = u " \u 00bb"
        lead_single = u " \u 2039"
        follow_single = u " \u 203a"
    elif ( lang == 'uk' ) :
        lead_double = u " \u 00ab"
        follow_double = u " \u 00bb"
        lead_single = u " \u 2039"
        follow_single = u " \u 203a"
    elif ( lang == 'es' ) :
        lead_double = u " \u 00ab"
        follow_double = u " \u 00bb"
        lead_single = u " \u 2018"
        follow_single = u " \u 2019"
    elif ( ( lang == 'lt' ) or ( lang == 'is' ) or ( lang == 'sk' ) or ( lang == 'sl' ) or ( lang == 'cs' ) or ( lang == 'et' ) ) :
        lead_double = u " \u 201e"
        follow_double = u " \u 201c"
        lead_single = u " \u 201a"
        follow_single = u " \u 2018"
    elif ( lang == 'mk' ) :
        lead_double = u " \u 201e"
        follow_double = u " \u 201c"
        lead_single = u " \u 2019"
        follow_single = u " \u 2018"
    elif ( ( lang == 'hu' ) or ( lang == 'nl' ) ) :
        lead_double = u " \u 201e"
        follow_double = u " \u 201d"
        lead_single = u " \u 00bb"
        follow_single = u " \u 00ab"
    else :
        scribus. messageBox ( 'Language Error' , 'You need to choose an available language' , scribus. ICON_WARNING , scribus. BUTTON_OK )
        sys . exit ( 2 )
       
else :
    scribus. messageBox ( 'Usage Error' , 'You need a Document open' , scribus. ICON_WARNING , scribus. BUTTON_OK )
    sys . exit ( 2 )

if scribus. selectionCount ( ) == 0 :
    scribus. messageBox ( 'Scribus - Usage Error' ,
        "There is no object selected. \n Please select a text frame and try again." ,
        scribus. ICON_WARNING , scribus. BUTTON_OK )
    sys . exit ( 2 )
if scribus. selectionCount ( ) > 1 :
    scribus. messageBox ( 'Scribus - Usage Error' ,
        "You have more than one object selected. \n Please select one text frame and try again." , scribus. ICON_WARNING , scribus. BUTTON_OK )
    sys . exit ( 2 )
textbox = scribus. getSelectedObject ( )
pageitems = scribus. getPageItems ( )
boxcount = 1
for item in pageitems:
    if ( item [ 0 ] == textbox ) :
        if ( item [ 1 ] != 4 ) :
            scribus. messageBox ( 'Scribus - Usage Error' , "This is not a textframe. Try again." , scribus. ICON_WARNING , scribus. BUTTON_OK )
            sys . exit ( 2 )
contents = scribus. getTextLength ( textbox )
while c <= ( contents - 1 ) :
    if ( ( c + 1 ) > contents - 1 ) :
        nextchar = ' '
    else :
        scribus. selectText ( c+ 1 , 1 , textbox )
        nextchar = scribus. getText ( textbox )
    scribus. selectText ( c , 1 , textbox )
    char = scribus. getText ( textbox )
    if ( len ( char ) != 1 ) :
        c + = 1
        continue
    if ( ( ord ( char ) == 34 ) and ( c == 0 ) ) :
        scribus. deleteText ( textbox )
        scribus. insertText ( lead_double , c , textbox )
    elif ( ord ( char ) == 34 ) :
        if ( ( prevchar == '.' ) or ( prevchar == ',' ) or ( prevchar == '?' ) or ( prevchar == '!' ) ) :
            scribus. deleteText ( textbox )
            scribus. insertText ( follow_double , c , textbox )
        elif ( ( ord ( prevchar ) == 39 ) and ( ( nextchar != ' ' ) and ( nextchar != ',' ) and ( nextchar != '.' ) ) ) :
            scribus. deleteText ( textbox )
            scribus. insertText ( lead_double , c , textbox )
        elif ( ( nextchar == '.' ) or ( nextchar == ',' ) ) :
            scribus. deleteText ( textbox )
            scribus. insertText ( follow_double , c , textbox )

        elif ( ( prevchar == ' ' ) or ( ( nextchar != ' ' ) and ( ord ( nextchar ) != 39 ) ) ) :
            scribus. deleteText ( textbox )
            scribus. insertText ( lead_double , c , textbox )
        else :
            scribus. deleteText ( textbox )
            scribus. insertText ( follow_double , c , textbox )
           
    if ( ( ord ( char ) == 39 ) and ( c == 0 ) ) :
        scribus. deleteText ( textbox )
        scribus. insertText ( lead_single , c , textbox )
    elif ( ord ( char ) == 39 ) :
        if ( ( prevchar == '.' ) or ( prevchar == ',' ) or ( prevchar == '?' ) or ( prevchar == '!' ) ) :
            scribus. deleteText ( textbox )
            scribus. insertText ( follow_single , c , textbox )
        elif ( ( ord ( prevchar ) == 34 ) and ( ( nextchar != ' ' ) and ( nextchar != ',' ) and ( nextchar != '.' ) ) ) :
            scribus. deleteText ( textbox )
            scribus. insertText ( lead_single , c , textbox )
        elif ( ( prevchar != ' ' ) and ( ord ( prevchar ) != 34 ) and ( nextchar != ' ' ) ) :
            scribus. deleteText ( textbox )
            scribus. insertText ( follow_single , c , textbox )
        elif ( ( prevchar == ' ' ) or ( ( nextchar != ' ' ) and ( ord ( nextchar ) != 34 ) ) ) :
            scribus. deleteText ( textbox )
            scribus. insertText ( lead_single , c , textbox )
        else :
            scribus. deleteText ( textbox )
            scribus. insertText ( follow_single , c , textbox )
           
    c + = 1
    prevchar = char

scribus. setRedraw ( 1 )
scribus. docChanged ( 1 )
endmessage = 'Successfully ran script \n Last character read was ' + str ( char ) # Change this message to your liking
scribus. messageBox ( "Finished" , endmessage , icon = scribus. ICON_NONE , button1 = scribus. BUTTON_OK )

翻译自: https://opensource.com/article/17/3/python-scribus-smart-quotes

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值