python实现Flesch Reading Ease Readability Formula——Flesch阅读易读性公式

最新推荐文章于 2023-05-09 16:03:31 发布

剪刀石头.布

最新推荐文章于 2023-05-09 16:03:31 发布

阅读量8.9k

点赞数 13

本文链接：https://blog.csdn.net/Granery/article/details/88912059

版权

自然语言处理专栏收录该内容

1 篇文章

订阅专栏

介绍Flesch阅读易读性公式的背景及计算方法，包括评估文本难度的指标及Python实现。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

什么是Flesch阅读易读性公式?

Flesch阅读易读性公式被认为是最古老，最准确的可读性公式之一。Rudolph Flesch，作家，写作顾问，以及平原英语运动的支持者，于1948年开发了这个公式。在奥地利长大，Rudolph Flesch学习法律并获得博士学位。来自哥伦比亚大学的英语。弗莱希通过他的着作和演讲，主张回归拼音。在1948年发表于应用心理学杂志的文章“新的可读性尺度 ”中，Flesch提出了Flesch阅读易读性公式。
Flesch阅读易读性公式是一种评估读者年级水平的简单方法。这也是我们可以依赖的少数准确措施之一，而不需要太多的审查。此公式最适用于学校文本。它已成为许多美国政府机构使用的标准可读性公式，包括美国国防部。然而，主要是，我们使用该公式来评估用英语写的阅读段落的难度。

如果我们从Flesch Reading Ease Formula中得出结论，那么最好的文本应该包含更短的句子和单词。60到70之间的分数在很大程度上被认为是可接受的下表还有助于评估文档中易读性：
90-100：非常容易
80-89：容易
70-79：相当容易
60-69：标准
50-59：相当困难
30-49：困难
0 -29：非常困惑
文章翻译自：http://www.readabilityformulas.com/flesch-reading-ease-readability-formula.php

Flesch阅读易读性公式的实现

运行环境：python3.6

计算RE值前的准备工作

计算公式：RE = 206.835 - (1.015 * ASL) - (84.6 * ASW)

RE =可读性缓解
ASL =平均句子长度（即单词数除以句子数）
ASW =每个单词的平均音节数（即，音节数除以单词数）

计算RE值之前，我们需要得到下面的这些数

文章单词数
文章句子数
文章音节数

相信文章单词数与文章句子数，大家通过使用python都能够基本地实现出来，那么重点就在于如何计算文章的音节数了

单词音节数的计算

网上关于计算单词音节数的资料不多，但是一点都不慌，在搜索过程中我发现了这个东西
https://pronouncing.readthedocs.io/en/latest/index.html
这是一个关于英语发音的python库——名字就叫 ‘pronouncing ’ 里面封装了许多关于处理英语发音的方法，其中就包括单词音节数的计算。

我们需要的就是下面这一小段东西
在这里插入图片描述
在使用之前，我们需要先安装“pronouncing”

pronouncing的安装

pip install pronouncing

RE值的计算

我以一段存在记事本的英语文章为例进行计算，文件名是项目目录下的‘detail.txt’
部分内容如下：

Sitting on a simple mattress on the floor, Jili Yigu was still in a panic while receiving a transfusion at a hospital in Xiangshui county, Jiangsu province, on Friday afternoon.
The 36-year-old thought he had been struck by thunder when he heard a big explosion in his sleep. He found himself buried by debris inside the collapsed building where he lived.
“I was on night shift and was sleeping. If it were not a single-floor building, I could have been dead,” he said in describing how he survived a blast at the chemical plant where he worked.
His desperate shouting, however, got no reply at all. With his left leg stuck in debris, Jili thought there was no hope of survival until his wife came to look for him after about two hours…

计算单词数

因为后面会需要用到单词列表，所以返回值返回的是单词列表，最后获取单词数的时候使用 len() 方法来获得单词数

def word_list(filename):
   '''返回单词列表'''
   try:
       with open(filename, 'r', encoding='UTF-8') as f:   # 从文件中读取内容
           content = f.read()
   except FileNotFoundError:
       errmsg = filename + '文件不存在'	# 如果文件不存在则抛出异常
       print(errmsg)
   else:
       word_re = re.compile(r'[^A-Za-z’\']+')	#	编写正则表达式
       words = word_re.split(content.lower())	# 将单词转为小写再进行正则匹配
       print('单词数：' + str(len(words)))
   return words

注意：这样获取得到的单词列表，里面包括像【isn’t、people’s】这类的单词,这部分有点小疑惑，不清楚类似这样的应该归为一个单词还是两个单词，还有这类的单词在提取的时候还是需要另外做处理的，这里我就暂时不做处理

计算句子长度
句子长度就是简单地通过匹配句号进行计算，代码实现与计算单词数差不多

def sentence_count(filename):
    '''计算句子长度'''
    try:
        with open(filename, 'r', encoding='UTF-8') as f:
            content = f.read()
    except FileNotFoundError:
        errmsg = filename + '文件不存在'
        print(errmsg)
    else:
        point_re = re.compile(r'\.')
        point = point_re.split(content)
        print('句子长度：' + str(point))
        return (len(point))

计算音节数
音节数的计算就是用到上面提到的pronouncing库来进行处理,下面给出链接，想要了解更多的可以点进去学习一下。
https://pronouncing.readthedocs.io/en/latest/tutorial.html#counting-syllables

import pronouncing

def get_pronouncing_num(word):
    '''计算单词音节数'''
    # https://pronouncing.readthedocs.io/en/latest/tutorial.html#counting-syllables
    try:
        pronunciation_list = pronouncing.phones_for_word(word)
        num = pronouncing.syllable_count(pronunciation_list[0])
    except Exception as e:
 	# 由于部分特定名词在词库中匹配不到对应的音节，所以会报错，还有一些比如isn't、people's等也匹配不到音节，这里暂时不做处理，默认这些音节数为1
        print('计算音节数异常:异常单词:"' + word + '"')
        return 1
    else:
        return num

def get_pronouncing_nums(words):
    '''计算文本音节总数'''
    counts = 0
    for word in words:
            counts += get_pronouncing_num(word)
    print('音节总数：' + str(counts))
    return counts

有了上面的函数我们就可以开始求RE值了
下面是主函数的内容：

if __name__ == '__main__':
    filename = 'detail.txt'
    # 求ASL  单词数/句子数
    word_num = len(word_list(filename))
    sentence_num = sentence_count(filename)
    ASL = word_num / sentence_num

    # 求ASW  音节数/单词数     pronouncing_num/word_num
    words = word_list(filename)
    pronouncing_nums = get_pronouncing_nums(words)
    ASW = pronouncing_nums / word_num

    # 求RE = 206.835 - （1.015 x ASL） - （84.6 x ASW）
    RE = 206.835 - (1.015 * ASL) - (84.6 * ASW)

这样我们就完成了Flesch阅读易读性公式的计算了。

完整代码：

# 1.计算单词数
# 2.计算句子数
# 3.计算音节数
# 计算RE值
# RE = 206.835 - （1.015 x ASL） - （84.6 x ASW）
# RE =可读性缓解
# ASL =平均句子长度（即单词数除以句子数）
# ASW =每个单词的平均音节数（即，音节数除以单词数）
import re
import pronouncing


def word_list(filename):
    '''返回单词列表'''
    try:
        with open(filename, 'r', encoding='UTF-8') as f:
            content = f.read()
    except FileNotFoundError:
        errmsg = filename + '文件不存在'
        print(errmsg)
    else:
        word_re = re.compile(r'[^A-Za-z’\']+')
        words = word_re.split(content.lower())
    return words


def sentence_count(filename):
    '''计算句子长度'''
    try:
        with open(filename, 'r', encoding='UTF-8') as f:
            content = f.read()
    except FileNotFoundError:
        errmsg = filename + '文件不存在'
    else:
        point_re = re.compile(r'\.')
        point = point_re.split(content)
        # print('句子长度：' + str(point))
        return (len(point))

def get_pronouncing_num(word):
    '''计算单词音节数'''
    # https://pronouncing.readthedocs.io/en/latest/tutorial.html#counting-syllables
    try:
        pronunciation_list = pronouncing.phones_for_word(word)
        num = pronouncing.syllable_count(pronunciation_list[0])
    except Exception as e:
        print('计算音节数异常:异常单词:"' + word + '"')
        return 1
    else:
        return num


def get_pronouncing_nums(words):
    '''计算文本音节总数'''
    counts = 0
    for word in words:
        counts += get_pronouncing_num(word)
    return counts


# 计算RE值
# RE = 206.835 - （1.015 x ASL） - （84.6 x ASW）
# RE =可读性缓解
# ASL =平均句子长度（即单词数除以句子数）
# ASW =每个单词的平均音节数（即，音节数除以单词数）

if __name__ == '__main__':
    filename = 'detail.txt'
    # 求ASL  单词数/句子数
    word_num = len(word_list(filename))
    sentence_num = sentence_count(filename)
    print(str(word_num) + ',' + str(sentence_num))
    ASL = word_num / sentence_num

    # 求ASW  音节数/单词数     pronouncing_num/word_num
    words = word_list(filename)
    print(len(words))
    pronouncing_nums = get_pronouncing_nums(words)
    ASW = pronouncing_nums / word_num

    # 求RE = 206.835 - （1.015 x ASL） - （84.6 x ASW）
    RE = 206.835 - (1.015 * ASL) - (84.6 * ASW)

    print('ASW:' + str(ASW))
    print('ASL：' + str(ASL))
    print('RE:' + str(RE))

运行结果：
在这里插入图片描述