pythonsearch结果_Python:隔离re.search结果

本文介绍了一种从CSV文件中读取HTML内容,并通过Python和正则表达式提取URL的方法。针对原始实现中返回match对象而非字符串的问题,文章给出了改进方案,即通过使用group(0)方法来直接获取匹配到的URL字符串。
摘要由CSDN通过智能技术生成

所以我有这个代码(可能超级低效,但这是另一个故事),它从博客的

HTML代码中提取网址.我有一个.csv中的html,我将它放入

python中,然后运行正则表达式以获取URL.这是代码:

import csv, re # required imports

infile = open('Book1.csv', 'rt') # open the csv file

reader = csv.reader(infile) # read the csv file

strings = [] # initialize a list to read the rows into

for row in reader: # loop over all the rows in the csv file

strings += row # put them into the list

link_list = [] # initialize list that all the links will be put in

for i in strings: # loop over the list to access each string for regex (can't regex on lists)

links = re.search(r'((https?|ftp)://|www\.)[^\s/$.?#].[^\s]*', i) # regex to find the links

if links != None: # if it finds a link..

link_list.append(links) # put it into the list!

for link in link_list: # iterate the links over a loop so we can have them in a nice column format

print(link)

但是,当我打印结果时,它的工作原理是:

有没有办法让我从包含的其他废话中拉出链接?那么,这只是正则表达式搜索的一部分吗?谢谢!

最佳答案 这里的问题是re.search返回

match object而不是匹配字符串,你需要使用

group属性来访问你想要的结果.

如果您想要所有捕获的组,则可以使用组属性,对于特殊组,您可以将预期组的数量传递给它.

在这种情况下似乎你想要整个匹配,所以你可以使用group(0):

for i in strings: # loop over the list to access each string for regex (can't regex on lists)

links = re.search(r'((https?|ftp)://|www\.)[^\s/$.?#].[^\s]*', i) # regex to find the links

if links != None: # if it finds a link..

link_list.append(links.group(0))

group([group1, …])

Returns one or more subgroups of the match. If there is a single argument, the result is a single string; if there are multiple arguments, the result is a tuple with one item per argument. Without arguments, group1 defaults to zero (the whole match is returned). If a groupN argument is zero, the corresponding return value is the entire matching string; if it is in the inclusive range [1..99], it is the string matching the corresponding parenthesized group. If a group number is negative or larger than the number of groups defined in the pattern, an IndexError exception is raised. If a group is contained in a part of the pattern that did not match, the corresponding result is None. If a group is contained in a part of the pattern that matched multiple times, the last match is returned.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值