使用xpath 定位 p标签，定位到了，但取不到内容。。。，爬虫：番组计划

最新推荐文章于 2024-09-25 16:51:14 发布

临安初雨一夜落红

最新推荐文章于 2024-09-25 16:51:14 发布

阅读量1.1w

点赞数

分类专栏：爬虫-页面解析文章标签： python xpath spider bangumi 番组计划

本文链接：https://blog.csdn.net/liumangjuntuan/article/details/80931657

版权

爬虫-页面解析专栏收录该内容

11 篇文章 0 订阅

订阅专栏

这是我爬取的目标网站

start_url： http://bangumi.tv/person/1/works/voice

在抓取角色页的日文名字和名字的href属性时，都成功了，详细的日文名字的定位xpath语法如下：

role_item["role_japanese_name"] = role.xpath('./div[@class="ll innerLeftItem"]//h3/a/text()')[0] if len(role.xpath('./div[@class="ll innerLeftItem"]//h3/a/text()')) > 0 else ''

对于同级p标签，中文名字使用相同定位语法理论上也可以拿到，代码如下：

role_item["role_chinese_name"] = role.xpath('./div[@class="ll innerLeftItem"]//h3/p/text()')[0] if len(role.xpath('./div[@class="ll innerLeftItem"]//h3/p/text()')) > 0 else ''

但是拿不到中文名字啊，修改了半天，最后改成这样，代码如下（具体原因，自己也没有找到。。。）：

role_item["role_chinese_name"] = role.xpath('.//p/text()')[0] if len(role.xpath('.//p/text()')) > 0 else ''

完整代码如下：

# 9、处理角色
                # 获取角色链接    例子：  'http://bangumi.tv/person/5076/works/voice'
                role_lists = list()
                role_link = html.xpath('//*[@id="headerSubject"]//a[contains(text(),"角色")]/@href')
                # print(role_link)
                # print(url)
                if len(role_link) > 0:
                    '''处理角色详情页情况'''
                    role_link = 'http://bangumi.tv' + role_link[0]
                    # print(role_link)
                    role_html = self.handle_speical_url(role_link)
                    if role_html is not '':
                        role_position = role_html.xpath('//*[@id="columnCrtB"]//ul[@class="browserList"]/li')
                        if len(role_position) > 0:
                            for role in role_position:
                                role_item = dict()

                                # 1、角色的日文名字
                                role_item["role_japanese_name"] = role.xpath('./div[@class="ll innerLeftItem"]//h3/a/text()')[0] if len(role.xpath('./div[@class="ll innerLeftItem"]//h3/a/text()')) > 0 else ''
                                # print(role_item["role_japanese_name"])
                                # print(role_link)

                                # 2、角色的中文名字
                                # role_item["role_chinese_name"] = role.xpath('./div[@class="ll innerLeftItem"]//h3/p/text()')
                                role_item["role_chinese_name"] = role.xpath('.//p/text()')[0] if len(role.xpath('.//p/text()')) > 0 else ''
                                # print(role_item["role_chinese_name"])
                                # print(role_link)