第四章-编写结构化程序(Natural Language Processing with Python第二版)

第四章 编写结构化程序

1.如何编写结构良好、可读性强的程序以能够方便地重用?

2.基本构建块是如何工作的,比如循环、函数和赋值?

3.Python编程有哪些陷阱,您如何避免它们?

4.1 回归基础

1) 赋值

2) 等于

3) 条件

4.2 序列

1)

2)组合不同类型的序列

words = 'I turned off the spectroroute'.split()
wordlens = [(len(word), word) for word in words]
wordlens.sort()
print( ' '.join(w for (_, w) in wordlens))

元组和列表区别

('grail', 'noun') ('noun', 'grail')是不一样的
元组中只包含一个元素时,需要在元素后面添加逗号,否则括号会被当作运算符使用
元组中的元素值是不允许修改的 
	
['venetian','blind']  ['blind', 'venetian']一样的
你可以对列表的数据项进行修改或更新,你也可以使用append()方法来添加列表项

4.3 风格问题

4.4函数:结构化编程的基础

函数使我们的工作具有重用性和可读性,并且使我们的工作更加可靠,当我们重用已经开发并测试过
的代码的时候,我们会对它能正确处理各种各样的情况有信心,我们也消除了忘记一些重要步骤或者
引入bug的风险,使用这些函数的项目也增加了可靠性,

1)函数的输入和输出

我们通过函数的参数来传递信息,在函数名后面用括号括起来的变量和常量就是参数

def repeat(msg, num):
 return ' '.join([msg] * num)
monty = 'Monty Python'
print(repeat(monty, 3)) 

函数中的参数不是必不可少的,我们可以看下面的例子:

def monty():
 return "Monty Python"
print(monty())

正如我们刚才看到的,函数通常通过return语句将结果返回给调用程序。对于调用程序,看起来函数调用已被函数的结果替换

print(repeat(monty(), 3))
print(repeat('Monty Python', 3))

函数可以修改参数的内容,或者返回一个值,但不能两者同时使用,否则容易出问题

def my_sort1(mylist):   
 mylist.sort()
def my_sort2(mylist):  
 return sorted(mylist)
def my_sort3(mylist):  
 mylist.sort()
 return mylist

2)参数传递

def set_up(word, properties):
 word = 'lolcat'
 properties.append('noun')
 properties = 5
w = ''
p = []
set_up(w, p)
print(w) # ''
print(p) #['noun']

w输出还是’’ 这个参数传递与下面的序列赋值相同:

w = ''
word = w
word = 'lolcat'
print(w)

p输出变成了[‘noun’],因为list是结构化对象

p = []
properties = p
properties.append('noun')
properties = 5

3)变量作用域

LGB rule:local then global then built-in
LGB变量名引用分为三个作用域进行查找:首先是本地,然后是全局,最后是内置。
可以通过global 声明在函数里创建一个全局变量,但这种做法应该尽量避免,因为函数里的全局变量依赖上下文并且限制了函数的可移植性(或可重用性)。一般来说,函数输入应该使用参数,函数输出应该使用返回值。

4)检查参数类型

def tag(word):
 if word in ['a', 'the', 'all']:
  return 'det'
 else:
  return 'noun'
print(tag('the')) #det
print( tag('knight'))  #noun
print( tag(["'Tis", 'but', 'a', 'scratch'])) #noun

使用assert语句和Python的basestring类型,后者可以在unicode和str上进行泛化。python3里的str包含python2中的unicode,判断有所不同

def tag(word):
 assert isinstance(word, str),"argument to tag() must be a string"
 if word in ['a', 'the', 'all']:
  return 'det'
 else:
  return 'noun'
print(tag('the'))
print(tag('knight'))
tag(["'Tis", 'but', 'a', 'scratch']) #AeesrtionError

5)Functional Decomposition

当一个代码块太长超过10-20行,最好分成很多函数,增加可读性。

def freq_words(html, freqdist, n):
 text = BeautifulSoup(html, "html5lib").get_text()
 for word in nltk.word_tokenize(text):
  freqdist[word.lower()]+=1
 print(list(freqdist.keys())[:n])
constitution=open("./html.html").read()
fd = nltk.FreqDist()
print(freq_words(constitution, fd, 20))

def freq_words(html):
 freqdist = nltk.FreqDist()
 text = BeautifulSoup(html, "html5lib").get_text()
 for word in nltk.word_tokenize(text):
  freqdist[word.lower()] += 1
 return freqdist

constitution=open("./html.html").read()
fd = freq_words(constitution)
print (list(fd.keys())[:20])

constitution=open("./html.html").read()
text = nltk.word_tokenize(BeautifulSoup(constitution, "html5lib").get_text())
fd=nltk.FreqDist(text)
print (list(fd.keys())[:20])

6)Documenting Functions

Python3 文档字符串(Docstring)是一个字符常量,它是模块,函数,类,方法定义中的第一个声明,声明文档字符串后,可以使用相应对象的__doc__属性引用文档字符串,定义文档字符串后可以使用特定的工具提取这些文档字符串,生成Python3函数或类的使用说明文档。并且 Python 3 编码规范建议在模块,类,方法,函数定义中都包含文档字符串。

print(nltk.word_tokenize.__doc__)
print(nltk.FreqDist.__doc__)
print(text.clear.__doc__)

4.5深入了解函数

1)函数作为参数

sent = ['Take', 'care', 'of', 'the', 'sense', ',', 'and', 'the',
    'sounds', 'will', 'take', 'care', 'of', 'themselves', '.']
def extract_property(prop):
  return [prop(word) for word in sent]
print(extract_property(len))
def last_letter(word):
  return word[-1]
print(extract_property(last_letter))

注意,只有在调用函数时,才在函数名后面使用括号; 当我们简单地将函数作为对象处理时,这些将被省略。

2)累计函数

3)高阶函数

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Python Natural Language Processing by Jalaj Thanaki English | 31 July 2017 | ISBN: 1787121429 | ASIN: B072B8YWCJ | 486 Pages | AZW3 | 11.02 MB Key Features Implement Machine Learning and Deep Learning techniques for efficient natural language processing Get started with NLTK and implement NLP in your applications with ease Understand and interpret human languages with the power of text analysis via Python Book Description This book starts off by laying the foundation for Natural Language Processing and why Python is one of the best options to build an NLP-based expert system with advantages such as Community support, availability of frameworks and so on. Later it gives you a better understanding of available free forms of corpus and different types of dataset. After this, you will know how to choose a dataset for natural language processing applications and find the right NLP techniques to process sentences in datasets and understand their structure. You will also learn how to tokenize different parts of sentences and ways to analyze them. During the course of the book, you will explore the semantic as well as syntactic analysis of text. You will understand how to solve various ambiguities in processing human language and will come across various scenarios while performing text analysis. You will learn the very basics of getting the environment ready for natural language processing, move on to the initial setup, and then quickly understand sentences and language parts. You will learn the power of Machine Learning and Deep Learning to extract information from text data. By the end of the book, you will have a clear understanding of natural language processing and will have worked on multiple examples that implement NLP in the real world. What you will learn Focus on Python programming paradigms, which are used to develop NLP applications Understand corpus analysis and different types of data attribute. Learn NLP using Python libraries such as NLTK, Polyglot,

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值