自然语言处理:Python的spaCy库及文章人名统计

本文介绍了如何使用Python的spaCy库进行基础的自然语言处理任务,包括分词、句子切分、词性标注和命名实体识别,特别关注了人名识别的应用,并通过一个关于Maplewood小镇复兴的实际案例展示了其在社区分析中的力量。
摘要由CSDN通过智能技术生成

在不断发展的自然语言处理领域中,Python的spaCy库以其强大和用户友好的特性脱颖而出。本学习笔记深入探讨利用spaCy进行基本NLP任务,包括分词、句子切分、词性标注、命名实体识别,以及一个实际应用示例——识别文本中的人名。

安装spaCy库

spaCy · Industrial-strength Natural Language Processing in Python

点击USAGE,根据配置选择命令,在anaconda命令行里面安装即可。

基本功能

import spacy
from spacy import displacy
from collections import Counter

nlp = spacy.load("en_core_web_sm")

doc = nlp("Only when you understand the true meaning of life can you live truly. Bittersweet as life is, it is still wonderful, and it's fascinating even in tragedy.")

# 分词
for token in doc:
    print(token)
# 分句
for sent in doc.sents:
    print(sent)
# 词性
for token in doc:
    print('{} - {}'.format(token, token.pos_))

分句按照句号分离句子,词性在分词的基础上给每个词标注词性,如:

Only - ADV
when - SCONJ
you - PRON
understand - VERB
the - DET

命名体识别

常见的命名实体类型:

PERSON: 人物名称,如 "Alan"。

ORG: 组织名称,包括公司、政府机构、非政府组织等,如 "Wuhan University"。

GPE: 地缘政治实体,如国家、城市、州等,例如 "China"、"New York"。

DATE: 日期或时间段,例如 "today"、"1992"、"20th century"。

TIME: 时间,指一天中的时间点或持续时间,例如 "8:00 AM"、"two hours"。

# 命名体识别
doc_2 = nlp("Alan went to Wuhan University today")
for ent in doc_2.ents:
    print('{} - {}'.format(ent, ent.label_))
displacy.render(doc_2, style='ent', jupyter = True)

输出结果: 

文章人名统计

(用到的文本放在文章结尾)

def read_file(file_name):
    with open(file_name, 'r', encoding='utf-8') as file:
        return file.read()

text = read_file('text04.txt')
prd_text = nlp(text)
sentences = [s for s in prd_text.sents]
print(len(sentences))   # 26

def find_person(doc):
    c = Counter()
    for ent in prd_text.ents:
        if ent.label_ == 'PERSON':
            c[ent.lemma_]+=1
    return c
print(find_person(prd_text))

输出结果:

Counter({'James': 5, 'Liz': 3, 'Mary Jenkins': 2, 'Jessica Morales': 2, 'GW': 2, 'James Thompson': 1, 'Elizabeth "Liz" Harper': 1, 'Gazette': 1, 'Aaron Lee': 1, 'Lee': 1, 'Sarah Nguyen': 1, 'Ethan Smith': 1, 'George Washington': 1, 'Amelia Richardson': 1, 'Richard': 1, 'Maplewood': 1, 'Mary': 1, 'Aaron': 1, 'Jessica': 1, 'Sarah': 1, 'Ethan': 1, 'Amelia': 1})

附录

In the quaint town of Maplewood, nestled in the heart of America, the winds of change began to stir with the return of James Thompson, a decorated veteran, to his once vibrant hometown. Maplewood, known for its annual apple festival and community spirit, had seen better days. The local park, where James had spent countless childhood hours, was now neglected, and Main Street's bustling shops were struggling to stay afloat.

Determined to restore Maplewood to its former glory, James reached out to his childhood friend, Elizabeth “Liz” Harper, now the editor of the Maplewood Gazette, to share his vision. Liz, always a champion for local causes, saw an opportunity to rally the community and offered the Gazette as a platform to spark interest in James's project.

Together, they organized a town hall meeting, inviting not just long-time residents like the ever-energetic Mary Jenkins, who ran the local bakery, but also newcomers like Dr. Aaron Lee, a young physician passionate about public health, and Jessica Morales, a tech entrepreneur interested in sustainable living.

The meeting was a turning point. Inspired by James’s passion, Liz’s enthusiasm, and the shared love for Maplewood, the attendees brainstormed a series of initiatives. Mary Jenkins proposed a “Clean and Green” weekend, rallying volunteers to beautify the park and plant trees. Dr. Lee suggested free health screenings at these events, emphasizing the importance of health in community well-being. Jessica Morales, seeing the potential for technology to enhance community engagement, offered to create a mobile app to keep residents informed and involved in local events.

The ripple effect was immediate. High school students, led by the ambitious Sarah Nguyen and tech-savvy Ethan Smith, formed a “Youth for Maplewood” group, organizing fundraisers and social media campaigns to support the initiatives. Local historian and retired teacher, Mr. George Washington Carver (affectionately known as “Mr. GW”), offered history walks, sharing stories of Maplewood’s heritage, further instilling a sense of pride and belonging among the residents.

One of the most touching contributions came from Mrs. Amelia Richardson, a widow who donated her late husband’s collection of historical photographs of Maplewood for a public exhibition. “Richard loved this town as much as anyone,” she said, tears glistening in her eyes. “I know he’d be proud to see us all coming together like this.”

As months passed, Maplewood transformed. The park was no longer a place to avoid but a community hub, vibrant with laughter and activities. Main Street thrived as locals and visitors alike were drawn to its renewed energy and charm. The “Clean and Green” weekends became a beloved tradition, symbolizing the community’s commitment to their town and to each other.

Reflecting on the journey, James remarked, “I came back hoping to find the Maplewood I remembered. What I found was something even better – a community ready to fight for its future. It’s been an honor to stand alongside Liz, Mary, Aaron, Jessica, Sarah, Ethan, Mr. GW, Amelia, and every single person who believed in what we could achieve together.”

The story of Maplewood’s revival serves as a beacon of hope, a testament to the power of community when hearts and hands come together for a common cause. In Maplewood, the spirit of unity, fueled by the dedication of its residents, turned the tide, proving that even small towns could achieve big dreams.

  • 32
    点赞
  • 27
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值