1. Tags that contain an id attribute (quick and dirty)
<[^>]+\sid\b[^>]*>
eg.
<a id>
2. Tags that contain an id attribute(more reliable)
<(?:[^>"']|"[^"]*"|'[^']*')+?\sid\s*=\s*("[^"]*"|'[^']*')(?:[^>"']|"[^"]*"|'[^']*')*>
eg.
<a id="ddd">
3. <div> tags that contain an id attribute
<div\s(?:[^>"']|"[^"]*"|'[^']*')+?\sid\s*=\s*("[^"]*"|'[^']*')(?:[^>"']|"[^"]*"|'[^']*')*>
eg.
<div id="ddd">
4. Tags that contain an id attribute with the value "my-id"
<(?:[^>"']|"[^"]*"|'[^']*')+?\sid\s*=\s*(?:"my-id"|'my-id')(?:[^>"']|"[^"]*"|'[^']*')*>
eg.
<div id="my-id">
5. Tags that contain "my-class" within their class attribute value
Step 1:
Find tags: (subject)
<(?:[^>"']|"[^"]*"|'[^']*')+>
Step 2:
Search within each match for a class attribute
<(?:[^>"']|"[^"]*"|'[^']*')+?\sid\s*=\s*("[^"]*"|'[^']*')
Step 3: using Captue 1 of Step 2
["'\s]my-class["'\s]
Code:
import re
subject = '''<a class="my-class">'''
list = []
innerre = re.compile(r'''(?:[^>"']|"[^"]*"|'[^']*')+?\sclass\s*=\s*("[^"]*"|'[^']*')''')
for outermatch in re.finditer(r'''<(?:[^>"']|"[^"]*"|'[^']*')+>''', subject):
list.extend(innerre.findall(outermatch.group(0)))
Python 正则表达式查找特定XML Tag中的Attribute
最新推荐文章于 2024-01-26 23:35:32 发布