regular expression__python

1.建一个poem.txt文件,把待处理的文档存进去

There was movement at the station, for the word had passed around
That the colt from old Regret had got away
And had joined the wild bush horses - he was worth a thousand pound
So all the cracks had gathered to the fray
All the tried and noted riders from the stations near and far
Had mustered at the homestead overnight
For the bushmen love hard riding where the wild bush horses are
And the stock-horse snuffs the battle with delight

There was Harrison, who made his pile when Pardon won the cup
The old man with his hair as white as snow
But few could ride beside him when his blood was fairly up
He would go wherever horse and man could go
And Clancy of the Overflow came down to lend a hand
No better horsemen ever held the reins
For never horse could throw him while the saddle-girths would stand
He learnt to ride while droving on the plains

And one was there, a stripling on a small and weedy beast
He was something like a racehorse undersized
With a touch of Timor pony - three parts thoroughbred at least
And such as are by mountain horsmen prized
He was hard and tough and wiry - just the sort that won't say die
There was courage in the quick impatient tread
And he bore the badge of gameness in his bright and fiery eye
And the proud and lofty carriage of his head

But still so slight and weedy, one would doubt his power to stay
And the old man said, "That horse will never do
For a long and tiring gallop - lad, you'd better stop away
Those hills are far too rough for such as you
So he waited, sad and wistful - only Clancy stood his friend
I think we ought to let him come, he said
I warrant he'll be with us when he's wanted at the end
For both his horse and he are mountain bred

He hails from Snowy River, up by Kosciusko's side
Where the hills are twice as steep and twice as rough
Where a horse's hoofs strike firelight from the flint stones every stride
The man that holds his own is good enough
And the Snowy River riders on the mountains make their home
Where the river runs those giant hills between
I have seen full many horsemen since I first commenced to roam
But nowhere yet such horsemen have I seen

So he went; they found the horses by the big mimosa clump
They raced away towards the mountain's brow
And the old man gave his orders, "Boys, go at them from the jump
No use to try for fancy riding now
And, Clancy, you must wheel them, try and wheel them to the right
Ride boldly, lad, and never fear the spills
For never yet was rider that could keep the mob in sight
If once they gain the shelter of those hills

So Clancy rode to wheel them - he was racing on the wing
Where the best and boldest riders take their place
And he raced his stock-horse past them, and he made the ranges ring
With the stockwhip, as he met them face to face
Then they halted for a moment, while he swung the dreaded lash
But they saw their well-loved mountain full in view
And they charged beneath the stockwhip with a sharp and sudden dash
And off into the mountain scrub they flew

Then fast the horsemen followed, where the gorges deep and black
Resounded to the thunder of their tread
And the stockwhips woke the echoes, and they fiercely answered back
From cliffs and crags that beetled overhead
And upward, ever upward, the wild horses held their way
Where mountain ash and kurrajong grew wide
And the old man muttered fiercely, "We may bid the mob good day
No man can hold them down the other side

When they reached the mountain's summit, even Clancy took a pull
It well might make the boldest hold their breath
The wild hop scrub grew thickly, and the hidden ground was full
Of wombat holes, and any slip was death
But the man from Snowy River let the pony have his head
And he swung his stockwhip round and gave cheer
And he raced him down the mountain like a torrent down its bed
While the others stood and watched in very fear

He sent the flint-stones flying, but the pony kept his feet
He cleared the fallen timber in his stride
And the man from Snowy River never shifted in his seat
It was grand to see that mountain horseman ride
Through the stringy barks and saplings, on the rough and broken ground
Down the hillside at a racing pace he went
And he never drew the bridle till he landed safe and sound
At the bottom of that terrible descent

He was right among the horses as they climber the farther hill
And the watchers on the mountain, standing mute
Saw him ply the stockwhip fiercely; he was right among them still
As he raced across the clearing in pursuit
Then they lost him for a moment, where two mountain gullies met
In the ranges - but a final glimpse reveals
On a dim and distant hillside the wild horses racing yet
With the man from Snowy River at their heels

And he ran them single-handed till their sides were white with foam
He followed like a bloodhound on their track
Till they halted, cowed and beaten; then he turned their heads for home
And alone and unassisted brought them back
But his hardy mountain pony he could scarcely raise a trot
He was blood from hip to shoulder from the spur
But his pluck was still undaunted, and his courage fiery hot
For never yet was mountain horse a cur

And down by Kosciusko, where the pine-clad ridges raise
Their torn and rugged battlements on high
Where the air is clear as crystal, and the white stars fairly blaze
At midnight in the cold and frosty sky
And where around the Overflow the reed-beds sweep and sway
To the breezes, and the rolling plains are wide
The Man from Snowy River is a household word today
And the stockmen tell the story of his ride

hello world 123

 

2.建一个regex.py文件,设置一个空的text字符串变量,用open函数读取文件,存入file,再遍历file中的每一行字符串,存入text字符串变量中,文件读取完毕关闭文件。

text = ''
file = open('poem.txt')
for line in file:
    text = text + line

file.close()

3.因为要用到regular expression所以导入Python中的re库

import re

text = ''
file = opent('poem.txt')
for line in file:
    text = text + line

file.close()

4.找读取的text种有几个to单词

import re

text = ''
file = open('poem.txt')
for line in file:
    text = text + line

file.close()

result = re.findall(' to ',text)  # 前后空格保证是一个完整的单词
print(result)
print(len(result))  # 看匹配出的结果有多少个

5.找text中有过少以a开头的3个字母的单词

import re

text = ''
file = open('poem.txt')
for line in file:
    text = text + line

file.close()

result = re.findall('a..',text)  #模糊匹配:.可代表任何字符
print(result)  # 结果可能是带空格,或者某个单词的一小部分

6.过滤空格 a[a-z]c可以匹配: aac\abc\acc\adc\....azc,中间只能是a到z之间的字符

import re

text = ''
file = open('poem.txt')
for line in file:
    text = text + line

f.close()

result = re.findall('a[a-z][a-z]',text)  # []可匹配括号中a-z任何一个元素
print(result)  # 结果中还有一些是某个单词的一部分

7.去除掉其它单词的一部分

import re

text = ''
fiel = open('poem.txt')
for line in file:
    text = text + line

file.close()

result = re.findall(' a[a-z][a-z] ',text)  # 如果是一个独立的单词,肯定前后各有一个空格
print(result)  # 结果忽略了句首顶格的单词,且结果显示为列表,列表中每个元素都包含了前后两个空格

8.希望输出的结果中,每个单词不包含收尾的空格

import re

text = ''
file = open('poem.txt')
for line in file:
    text = text + line

file.close()

result = re.findall(' (a[a-z][a-z]) ',text)  # 圆括号包围的部分为输出部分
print(result)  # 输出的列表中,每个字符串元素不包含前后两个空格了,但有重复的元素。

9.输出结果去重:set()将列表转换为集合,集合的特性是自动去重

import re

text = ''
file = open('poem.txt')
for line in file:
    text = text + line

file.close()

result = re.findall(' (a[a-z][a-z]) ',text)
result = set(result)  # set()将变量result转换成集合,自动去重
print(result)  # 结果是把带A大写字母的3个字单词忽略了

10.保留带A大写字母的3个字单词:特殊字符[],将会匹配[]中所包含字符的任何一个字符

import re

text = ''
file = open('poem.text')
for line in file:
    text = text + line

file.close()

result = re.findall(' ([Aa][a-z][a-z]) ',text)  # [Aa]匹配A或者a开头的任何一个
result = set(result)
print(result)  # 结果没有包含在句首顶格的And

11.包含在顶格的And:使用”*“星号:可以匹配一个/多个。eg:a*可以匹配:空,a,aa,aaa,....无数个a。故:空格*可以匹配到:没有空格,和到任意多个空格。

import re

text = ''
file = open('poem.txt')
for line in file:
    text = text + line

file.close()

result = re.findall(' *([Aa][a-z][a-z]) ',text)  # 空格*可以匹配前面没有空格,前面任意多个空格
result = set(result)
print(result)  # 结果出现了之前没有匹配过的'ads','afe'等奇怪字符串,查找文档发现是单词的一部分,例如是safe最后3个字符。

12.分段匹配:先匹配小写字母开头的(有前后两个空格作为单词的分界线),再匹配顶格大写字母开头的(开头没有空格,结尾有空格作为单词的分界线)。

import re

text = ''
file = open('poem.txt')
for line in file:
    text = text + line

file.close()

result = re.findall(' (a[a-z][a-z]) |(A[a-z][a-z]) ',text)  # |表示或者
result = set(result)
print(result)  # 匹配出来的结果是一个大集合,集合里面的元素是一个元组,每个元组中各有两个元素,分别代表|左边和右边部分的匹配结果。但我们不需要输出''空结果

13.输出指定的结果:从|返回的结果对中找出我们想要的输出

import re

text = ''
file = open('poem.txt')
for line in file:
    text = text + line

file.close()

result = re.findall(' (a[a-z][a-z]) |(A[a-z][a-z]) ',text) # |表示或,匹配规则只能二者取其一。如果左边部分没有匹配到就空,右边没有就空,所以匹配的结果是一对一对的元组。
final_result = set()  # 先设定final_result为空集合,再把找到的结果一个一个添加进去
for pair in final_result:
    if pair[0] not in final_result:  # 左边规则匹配的结果在pair[0]位置,右边规则匹配的结果在pair[1]位置。如果不在set中就添加进去
        final_result.add(pair[0])
    if pair[1] not in final_result:
        final_result.add(pair[1])

final_result.remove('')  # 最后把开头的空元素去除掉
print(final_result)  # 最终打印的结果是一个集合

14.在整个文档中match一下数字:

import re

text = ''
file = open('poem.txt')
for line in file:
    text = text + line

file.close()

result = re.findall('\d',text)  # \d表示match 0-9的数字
print(result)  # 结果显示3个未连续的数字

15显示连续的数字:

import re

text = ''
file = open('poem.txt')
for line in file:
    text = text + line

file.close()
result = re.findall('\d+',text)  # \d+匹配至少1个数字,而a*可以匹配空
print(result)  # 结果匹配出3个连续的数字

16.指定到match多少个字符

import re

text = ''
file = open('poem.text')
for line in file:
    text = text + line

file.close()

result = re.findall('\d{2}',text)  # {2}:match刚好2个数字
print(result)  # 结果显示['12']

17.指定match一定数量范围的字符

import re

text = ''
file = open('poem.text')
for line in file:
    text = text + line

result = re.findall('\d{2,3}',text)  # {2,3}:match 2-3个数字(包括2,包括3),结果取最大的
print(result)  # 结果显示['123']

18.匹配2-3个字母

import re

text = ''
file = open('poem.text')
for line in file:
    text = text + line

file.close()

result = re.findall('\w{2,3}',text)  # \w表示匹配字母
print(result)

视频笔记:https://www.bilibili.com/video/av7036891?from=search&seid=24093045172170287

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值