用Python进行自然语言处理读书笔记第一章

最新推荐文章于 2023-05-28 17:48:31 发布

Radiumm

最新推荐文章于 2023-05-28 17:48:31 发布

阅读量801

点赞数 1

分类专栏： python与自然语言处理文章标签：自然语言处理

本文链接：https://blog.csdn.net/zhlbjtu2016/article/details/79823013

版权

本文是《用Python进行自然语言处理》第一章的读书笔记，涵盖了文本搜索、词汇计数、基本编程概念、统计分布分析以及NLP的基础，如词意消歧和指代消解，探讨了自动理解自然语言的挑战和局限性。

摘要由CSDN通过智能技术生成

用Python进行自然语言处理（第一章）

搜索文本

  
  text1.concordance("monstrous")#搜索文章中的词语
  text3.concordance("lived")
  text1.similar("monstrous")#近义词
  text2.common_contexts(["monstrous","very"])#两个词共同的上下文
  text4.dispersion_plot(['citizens','democracy','freedom','duties','America'])#该函数需要依赖numpy和matplotlib库

计数词汇

  set(text3)#text3中所有标点，单词的集合，去重
  sorted(set(text3))#text3中的所有标点、单词排序之后，去重
  len(set(text3))#text3的独一无二的标点、单词类型个数，称为唯一项目类型
  print(len(text3) / len(set(text3)))#每个字平均被使用的次数
  print(text3.count("smote"))#统计一个词语在一个文本中出现的次数
  print(100 * text4.count('a') / len(text4))#'统计一个词语占全部词语的百分比是多少

函数

  def关键字定义,lexical_diversity为函数名，text为参数
  def lexical_diversity(text):
      return len(text) / len(set(text))
  print(lexical_diversity(text3));
  def percentage(count, total):
      return  100 * count / total

将文本当作词链表

  a = ['Call','me','Ishmael','.']
  print(a[1])#索引是从0开始的
  print(text4[173]);#找到索引处的元素
  print(text4.index('awaken'));#找到元素第一次出现的索引
  print(text5[16715:16735])#获取链表中任意片段中的元素
  
  sent = ['word1','word2','word3','word4','word5','word6','word7','word8','word9','word10'];
  print(sent[5:8])#sent[m:n] m:n-1 m represents index
  print(sent[:3])