python requests做爬虫爬取oxford词典单词音标

最新推荐文章于 2025-03-20 08:53:38 发布

一粒马豆

最新推荐文章于 2025-03-20 08:53:38 发布

阅读量3.1k

点赞数

文章标签： python 爬虫 requests html 正则表达式

本文链接：https://blog.csdn.net/MAILLIBIN/article/details/83152531

版权

本文介绍了如何利用Python的requests库爬取Oxford词典网站，通过解析HTML并运用正则表达式提取单词的音标信息，详细阐述了爬虫实现的步骤和技术细节。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

import requests
import re


def phonetic_spelling(word):
    
    word=word.replace(" ","_")
    
    phoneticSpelling=""
    
    #url的格式有规律
    request=requests.get("https://en.oxforddictionaries.com/definition/"+word)
    
    html=request.text
    
    #查看网页发现音标所处的行HTML格式有规律 使用正则表达式描述
    regularExpression=r'<span\s+class="phoneticspelling">/([^\/]*)/</span>'
    
    matchObject=re.search(regularExpression,html,re.I)
    
    
    if matchObject:
        if matchObject.group(1):
            phoneticSpelling=matchObject.group(1)
            print("\nphoneticSpelling: ",word,"--->",phoneticSpelling)
        else:
            print("\nword \""+word+&