Python learning-Tuples & Regular Expressions

Python小白的学习之路 Day5

  • Tuples (Chapter 10)
  • Regular Expressions (Chapter 11)

Tuples (Chapter 10)

Tuples Are Like Lists

  • Tuples are another kind of sequence that functions much like a list - they have elements which are indexed starting at 0
  • But… Tuples are “immutable”
  • Unlike a list, once you create a tuple, you cannot alter its contents - similar to a string
>>> z = (5, 4, 3)
>>> z[2] = 0
Traceback:'tuple' object does 
not support item 
Assignment

A Tale of Two Sequences

>>> l = list()
>>> dir(l)
['append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']

>>> t = tuple()
>>> dir(t)
['count', 'index']

Tuples are More Efficient

  • Since Python does not have to build tuple structures to be modifiable, they are simpler and more efficient in terms of memory use and performance than lists
  • So in our program when we are making “temporary variables” we prefer tuples over lists

Tuples and Assignment

  • We can also put a tuple on the left-hand side of an assignment statement
  • We can even omit the parentheses
>>> (x, y) = (4, 'fred')
>>> print(y)
fred
>>> (a, b) = (99, 98)
>>> print(a)
99

Tuples and Dictionaries

  • The items() method in dictionaries returns a list of (key, value) tuples
>>> d = dict()
>>> d['csev'] = 2
>>> d['cwen'] = 4
>>> for (k,v) in d.items(): 
...     print(k, v)
...
csev 2
cwen 4

Tuples are Comparable

  • The comparison operators work with tuples and other sequences. If the first item is equal, Python goes on to the next element, and so on, until it finds elements that differ. (check one by one)
>>> (0, 1, 2) < (5, 1, 2)
True
>>> (0, 1, 2000000) < (0, 3, 4)
True
>>> ( 'Jones', 'Sally' ) < ('Jones', 'Sam')
True

Sorting Lists of Tuples

  • First we sort the dictionary by the key using the items() method and sorted() function
>>> d = {'a':10, 'b':1, 'c':22}
>>> t = sorted(d.items())
>>> t
[('a', 10), ('b', 1), ('c', 22)]

注意和list中的方法 .sort() 区分

Sort by Values Instead of Key

  • If we could construct a list of tuples of the form (value, key) we could sort by value
>>> c = {'a':10, 'b':1, 'c':22}
>>> tmp = list()
>>> for k, v in c.items() :
...     tmp.append( (v, k) )
... 
>>> print(tmp)
[(10, 'a'), (22, 'c'), (1, 'b')]
>>> tmp = sorted(tmp, reverse=True)
>>> print(tmp)
[(22, 'c'), (10, 'a'), (1, 'b')]

The top 10 most common words

fhand = open('romeo.txt')
counts = {}
for line in fhand:
    words = line.split()
    for word in words:
        counts[word] = counts.get(word, 0 ) + 1

lst = []
for key, val in counts.items():
	newtup = (val, key) 
    lst.append(newtup)

lst = sorted(lst, reverse=True)

for val, key in lst[:10] :
    print(key, val)

Even Shorter Version

>>> c = {'a':10, 'b':1, 'c':22}

>>> print( sorted( [ (v,k) for k,v in c.items() ] ) )

[(1, 'b'), (10, 'a'), (22, 'c')]
  • List comprehension creates a dynamic list. In this case, we make a list of reversed tuples and then sort it.

区分list, dictionary, and tuple 的符号

list = [1,2, 3]
Dic = {"1":2, "2":2, "3":3}
Tuple = (1, 2, 3)

Regular Expressions (Chapter 11)

  • In computing, a regular expression, also referred to as “regex” or “regexp”, provides a concise and flexible means for matching strings of text, such as particular characters, words, or patterns of characters. A regular expression is written in a formal language that can be interpreted by a regular expression processor.
  • Really clever “wild card” expressions for matching and parsing strings

Understanding Regular Expressions

  • Very powerful and quite cryptic
  • Fun once you understand them
  • Regular expressions are a language unto themselves
  • A language of “marker characters” - programming with characters
  • It is kind of an “old school” language - compact

Regular Expression Quick Guide

The Regular Expression Module

  • Before you can use regular expressions in your program, you must import the library using “import re”
  • You can use re.search() to see if a string matches a regular expression, similar to using the find() method for strings
  • You can use re.findall() to extract portions of a string that match your regular expression, similar to a combination of find() and slicing: var[5:10]

Using re.search() Like startswith()

hand = open('mbox-short.txt')
for line in hand:
    line = line.rstrip()
    if line.startswith('From:') :
        print(line)
import re

hand = open('mbox-short.txt')
for line in hand:
    line = line.rstrip()
    if re.search('^From:', line) :
        print(line)

Wild-Card Characters

  • The dot character matches any character
  • If you add the asterisk character, the character is “any number of times
^X.*:
X-Sieve: CMU Sieve 2.3
X-DSPAM-Result: Innocent
X-DSPAM-Confidence: 0.8475

Fine-Tuning Your Match

  • Depending on how “clean” your data is and the purpose of your application, you may want to narrow your match down a bit
^X-\S+:
X-Sieve: CMU Sieve 2.3
X-DSPAM-Result: Innocent
X-: Very Short
X-Plane is behind schedule: two weeks # wrong

Matching and Extracting Data

  • re.search() returns a True/False depending on whether the string matches the regular expression
  • If we actually want the matching strings to be extracted, we use re.findall()
  • When we use re.findall(), it returns a list of zero or more sub-strings that match the regular expression
# "[0-9]+" means one or more digits
>>> import re
>>> x = 'My 2 favorite numbers are 19 and 42'
>>> y = re.findall('[0-9]+',x)
>>> print(y)
['2', '19', '42']

Warning: Greedy Matching

  • The repeat characters (* and +) push outward in both directions (greedy) to match the largest possible string
  • difference: The " + " matches at least one character and the " * " matches zero or more characters
>>> import re
>>> x = 'From: Using the : character'
>>> y = re.findall('^F.+:', x)
>>> print(y)
['From: Using the :']
# Not 'From:'

Non-Greedy Matching

  • Not all regular expression repeat codes are greedy! If you add a ? character, the + and * chill out a bit…
# ".+?" one or more characters but not greedy
>>> import re
>>> x = 'From: Using the : character'
>>> y = re.findall('^F.+?:', x)
>>> print(y)
['From:']

Fine-Tuning String Extraction

# "\S+@\S+" means at least one non-whitespace character
>>> x = "From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008"
>>> y = re.findall('\S+@\S+',x)
>>> print(y)
['stephen.marquard@uct.ac.za’]
  • Parentheses are not part of the match - but they tell where to start and stop what string to extract (make matching preciser)
>>> x = "From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008"
>>> y = re.findall('^From (\S+@\S+)',x)
>>> print(y)
['stephen.marquard@uct.ac.za']

Compare two kinds of method

  • The Double Split Pattern
line = From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008
words = line.split()
email = words[1]
pieces = email.split('@')
print(pieces[1])
>>> 'uct.ac.za'
  • The Regex Version
# [^ ] means Match non-blank character
import re 
lin = 'From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008'
y = re.findall('@([^ ]*)',lin)
print(y)

['uct.ac.za']
  • Even Cooler Regex Version
y = re.findall('^From .*@([^ ]*)',lin)

Escape Character

  • If you want a special regular expression character to just behave normally (most of the time) you prefix it with ’ \ ’
>>> import re
>>> x = 'We just received $10.00 for cookies.'
>>> y = re.findall('\$[0-9.]+',x)
>>> print(y)
['$10.00']

Summary

  • Regular expressions are a cryptic but powerful language for matching strings and extracting elements from those strings
  • Regular expressions have special characters that indicate intent

熟悉之后会非常高效,建议是先理解过后,多写出来练练

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

J.Sampson

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值